WO2016084190A1

WO2016084190A1 - Storage device

Info

Publication number: WO2016084190A1
Application number: PCT/JP2014/081364
Authority: WO
Inventors: 智也室谷; 広一岡田
Original assignee: 株式会社日立製作所
Priority date: 2014-11-27
Filing date: 2014-11-27
Publication date: 2016-06-02

Abstract

In a storage device according to one aspect of the present invention, a startup volume, which comprises a plurality of virtual pages to which storage regions for initial data and storage regions for update data are mapped, is provided to a host. Furthermore, the storage device is configured so as to cause data selected on the basis of the access frequency of each storage region to reside in a cache memory. When a read request with respect to the startup volume is received from the host, and the data to be read resides in the cache memory, the storage device reads the data from the cache memory and returns this data to the host.

Description

Storage device

The present invention relates to a storage apparatus.

Due to the low price of computer hardware, many host computers have been introduced and operated in companies. In a computer system using a large number of host computers, in many cases, data handled by the host computers are consolidated and operated in a single storage device in order to reduce operation management costs. In the computer system, a storage area (volume) of the storage device is allocated to each host computer, and each host computer is made to use the allocated storage area. As the number of host computers increases, more storage areas are required.

On the other hand, data of the same content is often used in each host computer. As an example, when the hardware specifications of each host computer are the same, each host computer executes a program such as the same operating system (OS). For this reason, among the volumes allocated to each host computer, most of the contents of the volume (so-called boot disk) in which programs such as the OS are stored are often common.

In Patent Document 1, among data used by each host computer, data having a common content, for example, OS data is stored in a single storage area (hereinafter, data stored in the single storage area is referred to as data stored in the single storage area). (Referred to as initial data), data whose contents differ for each host computer (referred to as differential data; specifically, an OS patch or the like) is stored in the storage pool, and the initial data and storage are stored in each host computer. A technique for providing a virtual volume composed of differential data in a pool is disclosed. As a result, it is not necessary to store data having the same contents in duplicate, so that a storage area can be saved.

US Patent Application Publication No. 2012/0066680

In the system described in Patent Document 1, common data used by many host computers is stored in a single storage area. Since many host computers will access this single storage area, the access performance tends to deteriorate.

Also, as time passes, part of the volume data is updated. For example, when applying a patch to the OS, part of the volume data is updated. In the system described in Patent Document 1, update data is stored as difference data in a storage pool. Each storage area that is continuous on the virtual volume may be mapped to a physically discrete storage area. This can also be a factor that degrades the access performance.

A storage apparatus according to an aspect of the present invention provides a host with a startup volume composed of a plurality of virtual pages to which a storage area for initial data or a storage area for update data is mapped. The storage device is configured to store data selected based on the access frequency of each storage area in a cache memory. When the storage apparatus receives a read request for the startup volume from the host and the read target data is stored in the cache memory, the storage apparatus reads the data from the cache memory and returns it to the host.

According to the present invention, it is not necessary to read out necessary data from the storage device, so that the access performance of the virtual volume can be improved.

1 is a configuration diagram of a storage system according to an embodiment of the present invention. FIG. It is a figure which shows the structure of a virtual volume management table. It is a figure which shows the structure of a GI mapping table. It is a figure which shows the structure of a cache management table. It is a figure which shows the structure of a pool management table. It is a figure which shows the structure of an access frequency table. It is a figure which shows the structure of an access pattern table. It is a flowchart of a process before starting. It is a flowchart of a read process. It is a flowchart of a write process. It is a flowchart of the update process of an access frequency table. It is a flowchart of a purge process. It is a flowchart of a deduplication process. It is a flowchart of a page sort process. It is a flowchart of a staging process. It is a figure which shows the structure of the management information and program which a storage apparatus has. It is explanatory drawing of the relationship between a pool, a virtual volume (starting volume), and a logical volume, and the relationship (mapping) of a virtual page and a real page. It is explanatory drawing of the comparison method of an access pattern.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. The embodiments described below do not limit the invention according to the claims, and all the elements and combinations described in the embodiments are essential for the solution of the invention. Is not limited.

In the following description, the management information of the present invention may be described in terms of “aaa table” or the like, but these information may be expressed in other than the data structure such as a table.

In the following description, “program” may be used as the subject, but in practice, the program is executed by a processor (CPU (Central Processing Unit)) to perform a predetermined process. However, to prevent the explanation from becoming redundant, the program may be described as the subject. Further, part or all of the program may be realized by dedicated hardware. Various programs may be installed in each apparatus by a program distribution server or a computer-readable storage medium. As the storage medium, for example, an IC card, an SD card, a DVD, or the like may be used.

Hereinafter, the configuration of a computer system according to an embodiment of the present invention will be described. FIG. 1 is a diagram illustrating a hardware configuration of the computer system according to the embodiment. The computer system includes one or more hosts 2, a storage apparatus 1 that accepts I / O requests from the hosts 2, and a management terminal 13. The host 2 and the storage apparatus 1 are connected via a communication network 6. The storage device 1 and the management terminal 13 are connected via the management network 7.

The storage device 1 includes a storage controller (hereinafter also abbreviated as “controller”) 11 and a disk unit 12 including a plurality of drives 121. The storage controller 11 transfers data to and from the MPB 111 that is a processor board that performs control such as I / O processing performed in the storage apparatus 1, a channel adapter (CHA) 112 having a data transfer interface with the host 2, and the disk unit 12. A disk adapter (DKA) 113 having an interface and a memory package (CMPK) 114 having a memory for storing cache data and control information are interconnected by a switch (SW) 115. The number of components (MPB 111, CHA 112, DKA 113, CMPK 114) is not limited to the number shown in FIG. 1, but usually there are a plurality of components to ensure high availability. It is also possible to add these components later.

Each MPB 111 is a processor package board having one or more processors (also called MPs) 141 and a local memory (LM) 142 for storing data used by the processors 141.

The CMPK 114 temporarily stores write data from the host 2 and data read from the drive 121. The CM 144 is an area used as a so-called disk cache, and the SM 143 is an area for storing control information used by the MPB 111. Have Information stored in the SM 143 can be accessed from all the MPs 141 of all the MPBs 111. In order to prevent the data in the CMPK 114 from being lost when a failure such as a power failure occurs, the CMPK 114 may be provided with means such as a battery backup.

A plurality of drives 121 are mounted on the disk unit 12. Each drive 121 is a storage device for storing write data from the host 2 and is connected to a data transfer interface of the DKA 113. For example, a magnetic disk such as an HDD is used for the drive 121, but a storage medium other than the HDD such as an SSD (Solid State Drive) may be used.

As an example, the communication network 6 is a so-called storage area network (SAN) including a network device (one or more) such as a switch (fiber channel switch) and a transmission line (one or more) such as a fiber channel cable. . However, the communication network 6 may not include a network device such as a switch. In that case, the host 2 and the storage apparatus 1 are connected only by the transmission line. Further, the physical standards of network devices and transmission lines constituting the communication network are not limited to the fiber channel. A communication network using Ethernet may be used.

As with the communication network 6, the management network 7 is also composed of network devices and transmission lines. As an example, an Ethernet switch or an Ethernet cable is used as a network device and a transmission line used in the management network 7. However, a network device or transmission line conforming to a physical standard other than Ethernet may be used.

The host 2 is a computer including at least a CPU 21, a memory 22, and an HBA (Host Bus Adapter) 23 that is an interface for connecting the host 2 to the communication network 6. The CPU 21 executes a program loaded on the memory 22. On the host 2, as an example, an application program such as a database management system (DBMS) is executed to access data stored in the storage apparatus 1. A plurality of virtual machines may operate on the host 2 and each virtual machine may access the storage apparatus 1. In FIG. 1, only two hosts 2 are shown, but the number of hosts 2 is not limited to the number shown in FIG. The configuration may be such that more than two hosts access the storage apparatus 1.

Here, main terms used in this embodiment will be described.

"Logical volume":
A logical volume is a logical storage area formed by the storage apparatus 1 using storage areas of one or more drives 121 in the disk unit 12. The storage apparatus 1 according to the present embodiment can form a plurality of logical volumes. In this embodiment, the logical volume may be expressed as “LDEV”.

“Virtual Volume”, “Page”:
A virtual volume is a storage area provided to the host 2. The storage apparatus 1 according to this embodiment can define a plurality of virtual volumes and provide a plurality of virtual volumes to the host 2. The host 2 recognizes one virtual volume as one disk device. A virtual volume is formed using a storage area of a logical volume. The storage device 1 manages the virtual volume area by dividing it into partial area units of a predetermined size called “pages (or virtual pages)”. The storage device 1 also manages mapping between virtual pages and logical volume storage areas (also called real pages). The virtual volume is a volume formed using a known Thin Provisioning technology, and the storage apparatus 1 dynamically changes the storage area of the logical volume to a virtual page when receiving an access request for the virtual page of the virtual volume. Map.

“Normal Virtual Volume”:
In the storage apparatus according to the present embodiment, two types of virtual volumes can be provided to the host 2. The first type of virtual volume is the same as a volume that is formed by using a well-known Thin Provisioning technology. No real page is mapped to each page of the first type virtual volume in the initial state. Hereinafter, this virtual volume is referred to as a “normal virtual volume”. When the storage apparatus 1 receives a write request for a virtual page of a normal virtual volume from the host 2, the storage apparatus 1 uses an unused area (still mapped to a virtual page) in the storage area (real page) of the logical volume. No area) is selected, and the selected area is mapped to the virtual page to be accessed. Write data from the host 2 is stored in an area (real page) mapped to this access target virtual page.

"Startup volume":
The second type of virtual volume is a volume for which a real page to be mapped to each virtual page is determined at the time of the initial state. When a read request for a virtual page is received from the host 2, the storage apparatus 1 reads data stored in the real page mapped to the virtual page to be read and returns it to the host 2. When the storage apparatus 1 receives a write request and write data for a virtual page, the storage apparatus 1 maps another unused real page to the virtual page instead of the mapped real page. The write data is written to the real page newly mapped to the virtual page. The second type of virtual volume is mainly used as a disk (boot device) that is read at startup, such as an operating system (OS) used by the host 2. Therefore, hereinafter, the second type of virtual volume is referred to as a “startup volume”. However, the use of the startup volume is not necessarily limited to the use as a boot device, and may be used for another use.

"Pool":
A pool is a concept provided for managing a storage area of a logical volume mapped to a virtual page. The storage apparatus 1 according to the present embodiment has a plurality of pools. When the storage apparatus 1 according to the present embodiment has a plurality of pools, each virtual volume (startup volume and normal virtual volume) belongs to any one pool. Further, only the storage area of the pool to which the virtual volume belongs is mapped to the virtual page of each virtual volume.

“Golden Image (GI)”, “Differential Image (DI)”:
A real page is already mapped to each virtual page of the startup volume at the time of the initial state. The real page (or data stored in the real page) mapped to the virtual page at the time of the initial state is referred to as Golden Image (GI). A real page (or data stored in the real page) that is newly mapped when an update (write) request for a virtual page of the startup volume is received is referred to as a differential image (DI).

Subsequently, management information and programs used by the storage apparatus 1 according to the present embodiment will be described. Although the storage apparatus 1 actually has management information and programs other than those described below, only the management information necessary for explaining the embodiment of the present invention will be described below.

As shown in FIG. 16, the storage apparatus 1 stores the I / O program 1001, pre-startup program 1002, access frequency management program 1003, purge program 1004, deduplication program 1005, page sort program 1006, staging program in the LM 142. 1007 is stored. When the MP 141 executes these programs, various processes described below are performed.

The storage apparatus 1 also has a virtual volume management table T200, a GI mapping table T250, a cache management table T300, a pool management table T400, an access frequency table T500, and an access pattern table T550 in the SM 143.

Note that the storage device 1 manages a plurality of volumes (logical volumes, virtual volumes). Therefore, unique identifiers are assigned to the logical volume and virtual volume (normal virtual volume and startup volume) within the storage apparatus 1. The identifier attached to the logical volume is called a logical volume ID (LDEV ID), and the identifier attached to the virtual volume is called a virtual volume ID (VVOL ID).

Similarly, the storage device 1 manages a plurality of pools and a plurality of pages. Each pool is given an identifier that is unique within the storage apparatus 1. An identifier attached to the pool is called a pool ID. Each page (virtual page) in the virtual volume is assigned an identification number that is unique in the virtual volume. This identification number is called a page ID, and an integer value of 0 or more is used for the page ID. An integer value starting from 0 is assigned as a page ID in order from the virtual page located in the head area of the virtual volume. In this embodiment, the size of each virtual page (and real page) is fixed (42 MB as an example).

In the storage apparatus 1 according to the present embodiment, an integer value is used for the identifier. In the following, when LDEV # n (n is an integer value) is indicated, it means a logical volume whose LDEV ID is n. Similarly, VVOL # n represents a virtual volume with a VVOL ID of n, and page #n (or Page # n) represents a virtual page with a page ID of n. In addition, the integer used for the identifier may be represented by a hexadecimal number such as 4 digits or 8 digits. In particular, when expressing a 4-digit hexadecimal number, there is a notation method of “AB: CD” (A, B, C, and D are each an integer from 0 to 9 or an alphabet from a to f). Sometimes used.

FIG. 2 shows an example of the virtual volume management table T200. The virtual volume management table T200 is a table for managing the mapping between each virtual page of the virtual volume and the area (real page) of the logical volume. The VVOL ID (T201) and page ID (T202) in each row (record) of this table are information for specifying a virtual page to be managed. The LDEV ID (T204) and address (Addr.) T205 in each row represent the start address of the logical volume area (real page) mapped to the virtual page to be managed. That is, the area for one page starting from the position on the logical volume specified by the LDEV ID (T204) and address (Addr.) T205 is the virtual specified by the VVOL ID (T201) and page ID (T202). It is mapped to a page.

Attribute (T206) represents the attribute of the virtual page to be managed. There are three types of virtual page attributes. When the value stored in the attribute (T206) is 0, it indicates that the virtual page to be managed is a virtual page of a normal virtual volume (and that the virtual volume to which the virtual page belongs is a normal virtual volume). ). When the value stored in the attribute (T206) is 1, the virtual volume to which the virtual page to be managed belongs is the startup volume, and the virtual page is in the initial state (GI is mapped to the virtual page) Represents that. When the value stored in the attribute (T206) is 2, the virtual volume to which the virtual page to be managed belongs is a startup volume, and DI is mapped to the virtual page (updated from the initial state). Represents that.

In the example of FIG. 2, the VVOL ID (T201) is all 0 rows, and the attribute (T206) is 1 (GI) or 2 (DI). Therefore, the virtual volume whose VVOL ID (T201) is 0 is a startup volume. Also, since the attribute (T206) of the row with the VVOL ID (T201) of 0 and the page ID (T202) of 00002 is 2, DI is mapped to the virtual page with the page ID (T202) of 00002. I understand that.

When the startup volume is defined, the attribute (T206) of all virtual pages of the startup volume is 1 (GI) in the initial state. When the virtual page is updated, the attribute (T206) of the virtual page is updated to 2 (DI). Further, a virtual page with an attribute (T206) of 0 (normal) and a virtual page with an attribute (T206) of 1 (GI) or 2 (DI) are not mixed in one virtual volume. Also, the attribute (T206) of all pages of the normal virtual volume (the virtual volume with the VVOL ID 02 in the example of FIG. 2) is 0 (normal). The attribute (T206) of each page of the normal virtual volume does not change even when the virtual page is updated.

The pool ID (T203) is a column for storing the pool ID of the pool to which the virtual volume belongs.

FIG. 5 shows a configuration example of the pool management table T400. The pool management table T400 is a table for managing the LDEV belonging to the pool and the state of each area (real page) of the LDEV.

In each row (record) of the pool management table T400, information on an LDEV actual page belonging to the pool is managed. The pool ID (T401) stores the pool ID to which the LDEV in which the real page exists belongs. The LDEV ID (T402) stores the LDEV ID of the LDEV belonging to the pool. The type of LDEV (LDEV specified by LDEV ID (T402)) is stored in Disk Type (T403). As described above, the LDEV is formed using the storage area of the drive 121. The “type” here means the type of drive used for the LDEV.

In the case of the present embodiment, there are three types of SSD, SAS, and SATA as types stored in the Disk Type (T403). SAS or SATA indicates that the drive 121 is an HDD. SAS means that the drive 121 interface (interface connected to the DKA 113) is a SAS (Serial Attached SCSI) drive, and SATA means that the drive 121 interface is a SATA (Serial AT Attachment) drive. To do. When the Disk Type (T403) is SSD, the access performance of the real page (the LDEV in which it exists) is the highest. When the disk type (T403) is SATA, the access performance is the lowest.

In Addr (T405), an address in the LDEV is stored. Each row (record) of the pool management table T400 is a virtual in which an area (real page) in the LDEV with Addr (T405) as the start address is specified by the VVOL ID (T406) and the Page ID (T407). Indicates that it is mapped to a page. If the area (real page) for one page in the LDEV starting with Addr (T405) is not mapped to any virtual page (if it is an unused real page), the VVOL ID (T406) And an invalid value (a value not used for Page ID or VVOL ID such as NULL.-1) is stored in and Page ID (T407).

The attribute T404 represents the attribute of the virtual page as described in the explanation of the virtual volume management table T200 in FIG. The value stored in the attribute is the same as that described in the description of the virtual volume management table T200 (values 0 to 2 are stored).

The access pattern T408 stores information (pattern information) determined based on the access frequency of a real page (that is, an area on a logical volume for one page having Addr (T405) as a start address). The access pattern will be described later.

Next, the configuration of the GI mapping table T250, which is a type of GI and GI management information, will be described. First, the relationship between the pool, virtual volume (startup volume), and logical volume, and the relationship between virtual page and real page (mapping), which are defined by the storage apparatus 1 according to the present embodiment, will be described with reference to FIG.

As described above, the area (real page) on the logical volume that is mapped to the virtual page of the virtual volume belongs to a management unit called a pool. The logical volume belonging to the pool is selected by a computer system user (such as an administrator of the storage apparatus 1). The user uses the management terminal 13 to instruct the storage apparatus 1 to make the logical volume belong to the pool. Upon receipt of this instruction, the storage apparatus 1 makes the logical volume belong to the pool.

The user also defines the virtual volume using the management terminal 13. When defining a virtual volume, the user specifies the pool ID of the pool to which the virtual volume to be defined belongs. It is also possible to specify the type of virtual volume to be defined (whether it is a startup volume or a normal virtual volume). Hereinafter, a case where the type of virtual volume to be defined is a startup volume will be described.

The startup volume is in the initial state (immediately after the definition), and a real page (Golden Image) is mapped to each page. Therefore, before the activation volume is defined, a real page (Golden Image) to be mapped to the activation volume must be determined.

In the storage apparatus 1 according to the present embodiment, the user can set each area (real page) in one logical volume in the pool as a Golden Image. An example is shown in FIG. In FIG. 17, each real page of LDEV # 0 (LDEV ID means 0th LDEV) is defined as GI.

When the startup volume is defined by the user, the real pages of LDEV # 0 are sequentially mapped to each virtual page of the startup volume. In the example of FIG. 17, the first real page (real page # 0) of LDEV # 0 is mapped to the first virtual page (virtual page # 0) of the startup volume, and hereinafter, virtual pages # 1, # 2,. . . Are mapped in order to the second and subsequent real pages of LDEV # 0 (real pages # 1, # 2,...).

When each real page of LDEV # 0 is defined as GI, the user can store data in GI. Various means can be adopted as means for storing data in the GI. For example, data may be stored in each GI (real page) from the management terminal 13 via the management network 7. Alternatively, the storage apparatus 1 supports a dedicated command for storing data in the GI from the host 2 via the CHA 112, and in response to receiving the dedicated command for storing data in the GI from the host 2, the host The data received from 2 may be stored in the GI.

In addition, the process of setting a real page as a GI and the process of storing data in the GI may be performed simultaneously. For example, when the storage apparatus 1 receives a dedicated write command for storing data in the GI from the host 2 via the CHA 112, the area (real page) specified by the write command is defined as the GI and at the same time the host The write data received from 2 may be written to the GI.

Multiple virtual volumes (boot volumes) can be defined. For example, FIG. 17 shows an example in which two startup volumes (virtual vol. # 0 and virtual vol. # 1) are defined. In any activation volume, the LDEV # 0 real page (GI) is mapped to all virtual pages immediately after the definition. Therefore, when the startup volume is defined, the same data is stored in all the startup volumes (unless writing is performed to the startup volume). In other words, the startup volume is a kind of snapshot copy of the LDEV in which the GI is stored.

The reason why the storage apparatus 1 according to the present embodiment is provided with a function of providing a type of virtual volume called a startup volume to the host 2 is that a plurality of hosts (or a plurality of virtual volumes) using the same software (OS etc.) are provided. This is because a volume (virtual volume) is efficiently provided to a computer. If the OS boot image or the like is stored in advance in the GI, the user can quickly define the boot volume by simply defining the boot volume using the management terminal 13 or the like without physically copying the boot image. Can be duplicated. Therefore, even when a host is added to the computer system (or when the number of virtual machines running on an existing host is increased), the added boot volume for the host can be defined at high speed. Further, since each virtual page of each startup volume is mapped to the GI (it can be said that the GI is shared), the consumption amount of the storage area can be suppressed.

Each boot volume is configured so that data can be written from the host 2. When the storage apparatus 1 receives a write request and write data for the virtual page of the startup volume from the host 2, it maps a real page different from the GI to the write target virtual page. The real page (real page different from the GI) mapped here is the differential image (DI). The write data is stored in DI.

FIG. 17 shows that the storage device 1 is a virtual Vol. An example in which a write request from the host 2 is received for the virtual page # 2 of # 0 is shown. The example shown in FIG. 17 is an example in which the real page # 2 (GI) of LDEV # 0 is mapped to the virtual page # 2 before the write request is accepted. When the write request is received, a real page (real page # 0) of an LDEV other than LDEV # 0 (an LDEV belonging to the same pool as the LDEV storing the GI. In the example of FIG. 17, LDEV # 1) is transferred to virtual page # 2. To map.

In the example of FIG. 17, two startup volumes belong to the same pool. Therefore, the same data is stored in the two startup volumes. If you want to define a startup volume that stores different data, you can create multiple pools.

FIG. 3 shows a configuration example of the GI mapping table T250.

In the storage apparatus 1 according to the present embodiment, one logical volume in which a GI is stored can be provided in one pool. The GI mapping table T250 is a table for managing real pages set as GI among real pages of logical volumes belonging to each pool. The GI mapping table T250 includes pool ID (T251), page ID (T252), LDEV ID (T253), Addr. (T254) column.

And each line of the GI mapping table T250 represents information about a real page defined as GI. Specifically, LDEV ID (T253), Addr. The area (real page) on the logical page specified by (T254) indicates that it belongs to the pool ID (T251). Further, when the activation volume is defined, the page ID of the virtual page of the activation volume to which each real page is to be mapped is stored in the page ID (T252). In the example of FIG. 3, the LDEV ID (T253) is “00:00”, Addr. An area (T254) whose start address is 00000000 (an area for one page of a logical volume, that is, a real page) has a page ID of 00000 among virtual pages of a startup volume (a startup volume belonging to pool # 0). Indicates that it is mapped to a virtual page.

The storage apparatus 1 that has received an instruction to define the startup volume from the user creates virtual volume information (T201, T202, etc.) in the virtual volume management table T200. Then, the storage apparatus 1 refers to the GI mapping table T250, thereby causing the LDEV ID (T204), Addr. (T205), the LDEV ID (T253) of the GI mapping table T250, Addr. The value of (T254) is stored. As a result, the GI is mapped to each virtual page of the defined startup volume.

Subsequently, the contents of the access frequency table T500 will be described with reference to FIG. The storage apparatus 1 according to the first embodiment manages the access frequency (read count and write count per unit time) from the host 2 for each real page.

T503-1 to T503-7 in the access frequency table T500 are LDEV ID (T302), Addr. This is a column for storing access frequency information for an area (actual page) for one page whose start address is (T303). T503-1 is a column for storing a count result of the access amount (I / O count) generated on Monday. The count results of the access amount (I / O count) generated from Tuesday to Sunday are stored in T503-2 to T503-7 in the following order. Each of T503-1 to T503-7 has 24 columns. In each column, the number of accesses that occurred from 0:00 to 1 o'clock, the number of accesses that occurred from 1 o'clock to 2 o'clock,. . . The count result of the number of accesses generated from 23:00 to 24:00 is stored.

In this embodiment, the number of accesses that have occurred in the past week is stored as a count result of the number of accesses stored in each column in T503-1 to T503-7. However, the storage device 1 observes the number of accesses for the past several weeks (or months), calculates the average number of accesses per unit time based on the observation result, and within T503-1 to T503-7 You may make it store in each column.

Next, the concept of “access pattern” in the present embodiment will be described. As described above, the storage apparatus 1 counts the access frequency (access amount per hour) for each real page. That is, the trend of fluctuation in access frequency at each time of day of the week (0-1 o'clock, 1 o'clock-2 o'clock,..., 23: 00-0 o'clock) is grasped for each real page. A set of real pages having similar access frequency fluctuation trends is managed as one group, and the access frequency fluctuation trend of the group is referred to as an “access pattern”.

Returning to the description of the pool management table T400 in FIG. The pool management table T400 has a column called an access pattern (T408). For real pages having similar access frequency fluctuation trends, the same identifier (A, B, C, etc.) is stored in the access pattern (T408) column.

The outline of the access pattern (T408) determination method will be described. The storage apparatus 1 converts the counted access frequency information (T503-1 to T503-7 managed in the access frequency table) into an index “I / O level”.

An example of the conversion rule from the access frequency to the I / O level in this embodiment is shown below. The following is an example, and different conversion rules may be used.
(A) Access amount per hour is 0 or more and less than 10: I / O level 0
(B) Access amount per hour is 10 or more and less than 20: I / O level 1
(C) Access amount per hour is 20 or more and less than 100: I / O level 2
(D) Access amount per hour is 100 or more and less than 500: I / O level 3
(E) Access amount per hour is 500 or more and less than 1000: I / O level 4
(F) Access amount per hour is 1000 or more: I / O level 5

The I / O level 500 'in FIG. 6 indicates that the LDEV ID (T501) of the records in the access frequency table T500 is 00:00, Addr. The example of the result of having converted the access frequency information with respect to the real page whose (T502) is 00000000 to the I / O level according to the above conversion rule is shown.

The storage device 1 further manages a table called an access pattern table. FIG. 7 shows an example of the access pattern table T550. The access pattern table T550 is a table that stores a plurality of examples of fluctuation patterns of the I / O level at each time of each day of the week (0 to 1 o'clock, 1 o'clock to 2 o'clock, ..., 23 o'clock to 0 o'clock). . In T553-1, the I / O level at each time of Monday (0 to 1 o'clock, 1 o'clock to 2 o'clock, ..., 23 o'clock to 0 o'clock) is stored. Hereinafter, T553-2 to T553-7 are stored. Stores the I / O level at each time on Tuesday to the I / O level at each time on Sunday.

Storage device 1 manages access patterns for each pool. In FIG. 7, only three types of A, B, and C are shown as access patterns (T552) of pool # 0 (pool with pool ID (T551) of 0), but there are more access patterns than three types. May be registered. Specific processing contents when registering an access pattern in the access pattern table T550 will be described later.

The storage apparatus 1 stores the result of converting the contents of the access frequency table T500 in FIG. 6 into the I / O level (I / O level 500 ′ in FIG. 6 is an example thereof) and the access pattern table T550. Compare each pattern. For example, a comparison between the I / O level 500 ′ of FIG. 6 (this is the I / O level of a real page with an LDEV ID of 0:00 and Addr. Of 00000000) and each pattern stored in the access pattern table T550 6 is determined to be similar to the I / O level 500 ′ in FIG. 6 and a certain pattern stored in the access pattern table T550 (for example, the pattern in which the access pattern T552 is stored in the row “A”). If it is, “A” is registered in the access pattern (T408) of the pool management table T400.

FIG. 4 shows a configuration example of the cache management table T300. In the storage device 1 according to the first embodiment, frequently accessed data is stored in the CM 144 (sometimes referred to as “cache”), and the response time when accessing data from the host 2 is shortened. . The cache management table T300 is a table for managing information on an area in which data is stored (cached) among the areas of the CM 144.

In the storage apparatus 1 according to the first embodiment, an area for caching data stored in the startup volume and an area for caching data stored in the normal virtual volume are managed separately. A table for managing an area for caching data stored in the startup volume is a cache management table T300-1. A table for managing an area for caching data stored in the normal virtual volume is a cache management table T300-2. However, the formats of T300-1 and T300-2 are the same. Hereinafter, the contents of the cache management table T300 will be described using the cache management table T300-1 as an example.

Each line (record) of the cache management table T300 is stored in Cache Addr. (T301), an LDEV ID (T302), Addr. This indicates that data to be stored in an area for one page (that is, a real page) having (T303) as a start address is cached. Cache Addr. If data is not cached in the area for one page starting from (T301), the LDEV ID (T302) and Addr. An invalid value (NULL) is stored in (T303).

For example, when the data read from the LDEV (drive 121) is stored in the CM 144, the storage device 1 selects the LDEV ID (T302) and Addr. From each row (record) of the cache management table T300. One row in which the invalid value (NULL) is stored in (T303) is selected. Then, Cache Addr. The read data is stored in an area for one page having (T301) as a start address. Then, the LDEV ID (T302) and Addr. In (T303), the LDEV ID and address (LBA) of the LDEV that is the data reading source are stored.

In the example of FIG. 4, Cache Addr. In the first line of the cache management table T300-1. (T301) is 0, and Cache Addr. (T301) is xxxxxxxx. Then, Cache Addr. On the top line of the cache management table T300-2. Since (T301) is yyyyyyyy, the area from address 0 to (xxxxxxxx + 1 page size) of the CM 144 is used as an area for caching the data stored in the startup volume, and the area after the address yyyyyyyy of the CM 144 Is used as an area for caching data stored in the normal virtual volume (where (xxxxxxxx + 1 page size) <yyyyyyy, ie, in the cache management table T300-1 It is assumed that there is no overlap between the managed area and the area managed by the cache management table T300-2). The addresses xxxxxxxx and yyyyyyyy may be fixed, or may be configured to be changed by the user. When the address is changed by the user, the user can increase (or decrease) the value of xxxxxxxx in consideration of the number of startup volumes and the access frequency.

Subsequently, the flow of processing performed by the storage apparatus 1 according to the present embodiment will be described. Hereinafter, processing related to the startup volume, such as processing when the storage apparatus 1 receives an access request from the host 2 to the startup volume, will be described. Since the processing for the normal virtual volume is the same as the processing performed in a known storage device, the processing for the normal virtual volume is not described in this embodiment.

First, the flow of processing performed when a read request (read command) for the startup volume is received from the host 2 will be described with reference to FIG. This process is executed by the I / O program 1001.

The read request issued by the host 2 to the storage apparatus 1 includes the VVOL ID of the read target virtual volume (or information that can be used to derive the VVOL ID such as the logical unit number [LUN]) and the read target virtual volume. A logical block address (LBA), which is information for specifying the area within, is included. When the I / O program 1001 receives a read request from the host 2, these pieces of information are replaced with information for specifying the real page to be read (specifically, the LDEV ID (T204) and Addr. (Virtual volume management table T200). (T205)) (S21).

The outline of the conversion performed in S21 is as follows. First, the I / O program 100 specifies the VVOL ID and page ID (page ID of the virtual page) of the virtual volume to be accessed from the information included in the read request. The page ID can be calculated by dividing the LBA included in the read request by the page size (if the virtual page size is 42 MB, the page ID can be calculated by calculating LBA ÷ (42 [MB] ÷ 512). Desired). Further, the I / O program 100 refers to the virtual volume management table T200, so that the real page (LDEV ID (T204) and Addr. (T205)) on the logical volume to which the calculated VVOL ID and page ID are mapped. Is identified.

Subsequently, the I / O program 1001 refers to the cache management table T300, so that the LDEV ID (T204) and Addr. It is determined whether the data in the area (T205) is cached (S22). If the data is cached (S22: Yes), the I / O program 1001 reads the read target data from the CM 144 and returns it to the host 2 (S24).

If the data is not cached (S22: No), the I / O program 1001 executes the LDEV ID (T204) and Addr. Read target data is read out from the area on the logical volume with (T205) as the start address and returned to the host 2 (S23). When data is read from the area on the logical volume, the data is actually read from the drive 121 by converting the logical volume address to the drive 121 address. Since this is a known process, a detailed description is omitted.

In S23, the I / O program 1001 stores the read data in the CM 144. When the read data is stored in the CM 144, the I / O program 1001 reads the LDEV ID (T302) and Addr. One row in which the invalid value (NULL) is stored in (T303) is selected. Then, Cache Addr. The read data is stored in an area for one page having (T301) as a start address. Then, the LDEV ID (T302) and Addr. In (T303), the LDEV ID and address (LBA) of the LDEV that is the read source of the read target data are stored.

In the cache management table T300, the LDEV ID (T302) and Addr. If there is no row in which the invalid value (NULL) is stored in (T303), it means that there is no unused area in the CM 144. In that case, for example, an area in which real page data with low access frequency is cached is selected, and the read data is stored in the area. As an example of a method for selecting an area where real page data with low access frequency is cached, a method similar to the method performed in the purge program 1004 described later may be used.

Finally, the I / O program 1001 performs an update process of the access frequency table T500 (S25) and ends the process. Details of S25 will be described later.

Next, the flow of processing performed when a write request (write command) to the startup volume is received from the host 2 will be described with reference to FIG. This process is also executed by the I / O program 1001.

Like the read request, the write request issued by the host 2 to the storage device 1 includes the VVOL ID of the write target virtual volume (or information that can be used to derive the VVOL ID such as the logical unit number [LUN]), A logical block address (LBA) that is information for specifying an area in the write target virtual volume is included. When the I / O program 1001 receives a write request from the host 2, these pieces of information are replaced with information for specifying the write-target real page (that is, the LDEV ID (T204) and Addr. (T205) of the virtual volume management table T200. (S41). This process is the same as S21.

Subsequently, the I / O program 1001 receives the LDEV ID (T204) and Addr. It is determined whether the area (real page) of (T205) is DI or GI. This can be done by referring to the attribute (T206) of the virtual volume management table T200.

If the real page attribute (T206) is 2, that is, DI (S42: Yes), the I / O program 1001 writes the write data received together with the write request to the real page specified in S31 (S43). ). When writing write data to a real page, the logical volume address is converted to the address of the drive 121 in the same way as the processing that is actually performed in a known storage device, and the drive specified by the conversion processing Data is written to address 121. Thereafter, the I / O program 1001 performs an update process of the access frequency table T500 (S44), notifies the host 2 that the write process is completed, and ends the process. The process of S44 is the same as S25.

If the real page attribute (T206) is not 2 (S42: No), that is, if it is GI, the I / O program 1001 newly secures an area (real page) for storing write data. (S46). Specifically, by referring to the pool management table T400, an actual page that has not yet been assigned to any virtual page is specified (that is, the LDEV in the row where the VVOL ID (T406) and the Page ID (T407) are NULL. ID (T402) and Addr. (T405) are specified).

If there are a plurality of real pages that are not yet assigned to any virtual page, an arbitrary method may be used as a method for determining the real page for storing the write data. For example, the I / O program 1001 may preferentially select a real page with high access performance. In this case, the I / O program 1001 preferentially selects a real page whose Disk Type (T403) is SSD. If there is no actual page whose Disk Type (T403) is SSD, the I / O program 1001 may select an actual page whose Disk Type (T403) is SAS.

After S46, the I / O program 1001 stores the write data in the real page secured in S46 (S47). Thereafter, the I / O program 1001 takes over the access frequency information (S48), notifies the host 2 that the write process is completed, and ends the process.

Note that when the process of storing the write data in the actual page is performed in S43 or S47, the write data may be temporarily stored in the CM 144 without directly writing the data in the actual page (the drive 121 in which it exists). Thereby, when a read request for the real page (accurately, the virtual page to which the real page is mapped) is received later, data can be read from the CM 144. In this case, processing similar to S22 and S23 is performed. In other words, the CM 144 is searched for an area where the data of the real page to be written is cached. If there is no area in the CM 144 that caches the data of the real page to be written, an unused area is selected, and write data is stored in the selected area. Then, the cache management table T300 is updated.

Also, an example of processing (example of so-called write-through processing) for notifying the host 2 that write processing has been completed after storing the write data in the real page (the drive 121 in which the write data exists) (S47) will be described. However, a so-called write-back method may be used. In this case, the I / O program 1001 notifies the host 2 that the write process has been completed when the write data is stored in the CM 144. The storage apparatus 1 may store the data in the drive 121 where the real page exists at an arbitrary timing.

The takeover of access frequency information performed in S48 will be described. The case where S48 is executed is a case where it is determined in S42 that the write target area is not DI (GI). The write data is stored in the real page (DI) secured in S46, but the access frequency information of the real page secured in S46 is the same as the access frequency information of the write target area (GI) determined in S42. To do. When there is a write request for a virtual page, the write data is stored in a different area (real page, DI) from the GI, but the access tendency of the host 2 with respect to the write target virtual page changes before and after the write request is received. The possibility of doing is low. That is, the access tendency to the real page (DI) secured here is highly likely to be the same as the access tendency to the real page (GI) that has been allocated to the virtual page. Therefore, in S48, the I / O program 1001 makes the access frequency information of the real page secured in S46 the same as the access frequency information of the write target area (GI) determined in S42.

For example, if the write target area determined in S42 is LDEV ID 0, Addr. Is a real page (GI) of 00000000 and the LDEV ID of the real page secured in S46 is 1, Addr. Is assumed to be 2a000000. In this case, in S48, the I / O program 1001 determines that the LDEV ID (T501) is 0 in the row of the access frequency table T500, Addr. The value of T503-1 to T503-7 in the row where (T502) is 00000000 is set to the access frequency of the real page secured in S46. That is, the values of T503-1 to T503-7 are set so that the LDEV ID (T501) is 1 in the row of the access frequency table T500, Addr. (T502) is copied to the line of 2a000000.

However, the method of inheriting access frequency information is not limited to the method described above. For example, when a plurality of activation volumes are defined in the pool, the access frequency information of the real page secured in S46 is set to “access frequency information of write target area (GI) ÷ number of activation volumes”. Also good.

Next, the update processing of the access frequency table T500 performed in S25 (or S44) will be described with reference to FIG. In S25 (or S44), the I / O program 1001 calls the access frequency management program 1003 and executes the access frequency management program 1003. Thereby, update processing of the access frequency table T500 is performed. The access frequency management program 1003 is also executed when the load on the storage device 1 is low (for example, when there is no I / O request from the host 2 for a certain period of time). Hereinafter, unless otherwise specified, the flow of processing when the access frequency management program 1003 is called in S25 (or S44) will be described.

The access frequency management program 1003 first updates the access frequency table T500 (S61). Specifically, the access frequency management program 1003 executes the execution of S61 among the information stored in the access frequency (T503-1 to T503-7) of the current date and time (the date and time when S61 is executed) of the real page to be accessed. 1 is added to the information corresponding to the current day of the week (and time).

Subsequently, the access frequency management program 1003 converts the access frequency information of the target row to the I / O level, and compares the converted information with the top group of the access pattern table T550 (S62). An example of the comparison method will be described with reference to FIG. The access pattern table T550 stores the I / O level of each day of the week (0-1 o'clock, 1 o'clock-2 o'clock,..., 23: 00-0 o'clock). Similarly, for the converted access frequency information of the target row to the I / O level, the I / O at each time of day of the week (0-1 o'clock, 1 o'clock-2 o'clock, ..., 23 o'clock-0: 00) The O level is calculated (FIG. 18, 500 ′).

FIG. 18 shows an example of a comparison method between the head group of the access pattern table T550 (that is, the row where the access pattern T552 is A) and the access frequency information of the target row converted to the I / O level (500 ′). I will explain. In the comparison of S62, the I / O level at each time of each day of the week (0-1 o'clock, 1 o'clock to 2 o'clock, ..., 23 o'clock to 0 o'clock) in the target row and the top group of the access pattern table Calculate the absolute value of the difference in I / O level at each time of day of the week stored (0-1 o'clock, 1 o'clock-2 o'clock, ..., 23 o'clock-0 o'clock) Calculate the sum of absolute values.

In the example of FIG. 18, 0, 0, 3,..., 5, 3, and 5 are registered as the I / O level fluctuation (access pattern) on Monday in the top group of the access pattern table T550. On the other hand, Monday's access pattern of 500 ′ (the access frequency information of the target row is converted to the I / O level) is 0, 0, 2,. In this case, the absolute value of the I / O level at each time is calculated, and the total sum of the calculated absolute values is calculated. That means
| 0-0 | + | 0-0 | + | 2-3 | + ... + | 3-5 | + | 5-3 | + | 3-5 |
Is calculated. This is similarly calculated for the I / O levels from Tuesday to Sunday. Then, the result of adding all the sums calculated for each day of the week is calculated (the sum of Mondays + the sum of Tuesdays + ... + the sum of Sundays is calculated). Hereinafter, the value calculated by this calculation is referred to as “similarity”. Note that the method of calculating the similarity is not limited to the method described above. The degree of similarity may be calculated using a method other than this.

When the calculated similarity is relatively small (less than a certain threshold value), it is determined that the fluctuation pattern of the I / O level of the target row is similar to the head group of the access pattern table T550. In that case (S63: Yes), the group identifier of the first group is stored in the access pattern (T408) of the real page of the target row (S65), and the process is terminated. For example, when the identifier of the head group is A, A is stored in the access pattern (T408).

If the calculated similarity is equal to or greater than a certain threshold, it can be said that the I / O level variation pattern of the target row is not similar to the top group of the access pattern table T550. In that case, it is determined whether comparison with all access patterns registered in the access pattern table T550 is completed (S67). If there is an access pattern not yet compared among the access patterns registered in the access pattern table T550 (S67: No), one access pattern that has not been compared is selected, and S62 and Similarly, comparison with the converted information is performed (S66). When the comparison with all access patterns registered in the access pattern table T550 is completed (S67: Yes), the access pattern of the target row is similar to any access pattern registered in the access pattern table T550. I can say no. Therefore, the access pattern of the target row is registered in the access pattern table T550 as a new access pattern (S68).

It should be noted that this process can be modified in various ways. For example, when the I / O program 1001 calls the access frequency management program 1003 in S25 (or S44), only S61 is executed, and the processing after S62 is performed periodically, for example, a predetermined value such as 0 minutes per hour. It may be executed at the time. Further, the processing after S62 may be performed only when the load on the storage apparatus 1 is low.

Next, start-up processing will be described. The startup volume defined in the storage apparatus 1 according to the present embodiment has an access frequency of data stored in advance in a real page (that is, the storage area of the drive 121 constituting the logical volume) in accordance with an instruction from the outside. What is expected to be expensive can be cached. As a result, before the host 2 issues an access request to the storage apparatus 1, data with high access frequency is stored in the cache, so that access performance is improved. For example, when the boot volume of the OS used by the host 2 is stored in the boot volume, the boot speed of the host 2 is expected to improve. However, if this process is not performed before the host 2 is activated, it is difficult to obtain a cache effect. Therefore, the user 1 of the computer system uses the management terminal 13 to issue an instruction to the storage apparatus 1 for caching the data of the boot volume used by the boot target host 2. When the storage apparatus 1 accepts this instruction, it starts pre-boot processing. Hereinafter, the process of storing the data read from the drive 121 in the CM 144 is referred to as “staging”.

The pre-startup process is performed by the pre-startup program 1002 being executed by the MP 141. Note that pre-startup processing is performed for each pool. However, the pool that is the target of the pre-boot process is only the pool in which the boot volume is defined. When the user notifies the pre-boot program 1002 of the pool ID of the pool that is the target of the pre-boot process using the management terminal 13, the pre-boot program 1002 is selected from the data stored in the designated pool. The data to be cached is selected based on the access frequency.

Further, as another embodiment, instead of specifying the pool ID of the pool, the real page data mapped to the virtual page of the startup volume can be obtained by specifying the identifier (VVOL ID) of the startup volume. You may make it cache. As yet another embodiment, when the start of the activation program 1002 is instructed by the user, pre-activation processing may be performed for all the pools (pools for which activation volumes are defined).

Hereinafter, a case where a pool ID is designated for the pre-startup program 1002 will be described as an example. The basic concept of processing performed by the pre-startup program 1002 is as follows. Since the GI is normally mapped to virtual pages of a plurality of startup volumes (shared with a plurality of startup volumes), it is expected that the access frequency is higher than that of DI. Therefore, the pre-start program 1002 stages the GI preferentially over the DI. The pre-startup program 1002 first performs GI staging, and then performs DI staging. When DI is staged, the pre-start program 1002 preferentially staged from the DI with high access frequency.

FIG. 8 shows the flow of processing executed by the pre-startup program 1002. First, the pre-start program 1002 refers to the pool management table T400 and the cache management table T300 to check whether there is a GI for which data is not cached among the GIs included in the notified pool (S1). . If there is a GI for which data is not cached (S1: Yes), the GI data is read from the drive 121 to the CM 144 (S2).

Subsequently, the pre-startup program 1002 selects the DI with the highest access frequency among the DIs included in the notified pool (S3). The method of selecting the DI having the highest access frequency refers to the access frequency table T500, and acquires the access frequency at the current time (time when S3 is executed) of each real page. Then, a real page having the highest access frequency may be selected from DIs (that is, real pages) included in the notified pool. Alternatively, instead of referring to the current access frequency, an average access frequency may be calculated from information stored in the access frequency table T500, and a real page (DI) having the highest average access frequency may be selected.

After S3, the pre-startup program 1002 determines whether the data stored in the DI selected in S3 is cached (S4). This determination may be made by referring to the cache management table T300. If the data is not cached (S4: No), the data is staged from the drive 121 (S5). If the data is cached (S4: Yes), the pre-startup program 1002 does not execute the process of S5.

Subsequently, the activation program 1002 determines whether the CM 144 has a free space (S6). The activation program 1002 is stored in the cache management table T300 (T300-1) with the LDEV ID (T302) and Addr. By checking whether there is a row in which (T303) is NULL, it is possible to determine whether the CM 144 has free space. When there is no free space in the CM 144 (S6: No), the activation program 1002 ends the process. If the CM 144 has free space (S6: Yes), the activation program 1002 selects the DI with the next highest access frequency (S7), and repeats the processing of S4 to S6. As a result, when there is a sufficient free space in the CM 144, all DIs (data stored in) are staged. If there is not enough free space in the CM 144, the data is staged in order from the most frequently accessed data in the DI.

In the above description, the case where the pool ID is specified for the pre-start program 1002 has been described as an example. However, the case where the start volume (VVOL ID) is specified for the pre-start program 1002 is the same as above. May be performed. Specifically, the pool (pool ID) to which the designated startup volume belongs is specified, and the above processes S1 to S7 may be performed based on the specified pool ID. As another embodiment, only the GI and DI mapped to the designated startup volume (virtual page thereof) are selected, and the processes of S1 to S7 above are performed only for the selected GI and DI. You may make it do.

Next, processing performed by the purge program 1004 will be described. In the storage apparatus 1 according to the present embodiment, real pages that are not accessed are not cached. If there is a cached real page among the real pages that are not accessed, the cache data is deleted from the CM 144. This process is called “purge”, and the purge program 1004 executes this process.

The purge program 1004 is also performed for each pool as in the pre-startup process. The purge program 1004 is started when the user issues a purge instruction using the management terminal 13. Alternatively, the storage apparatus 1 may periodically execute the purge program 1004. Further, when the I / O program 1001 reads data requested by the host 2 from the drive 121 (real page) to the CM 144, or the write data received by the I / O program 1001 from the host 2 is temporarily stored in the CM 144. At this time, if there is no unused area in the CM 144, the purge program 1004 may be executed to secure the unused area.

The flow of processing executed by the purge program 1004 will be described with reference to FIG. Hereinafter, the flow of processing for a specific one pool will be described. When the purge program 1004 is started, the purge program 1004 refers to the virtual volume management table T200 and the access frequency table T500 to identify a real page with an access frequency of 0 (S101). Here, “real page with access frequency 0” means a real page with all access frequency values stored in T503-1 to T503-7 of the access frequency table T500. If there is no real page with an access frequency of 0, the purge program 1004 ends the process.

If there is a real page with an access frequency of 0, it is determined whether the usage amount of the CM 144 is equal to or greater than a predetermined threshold (S103). The usage amount of the CM 144 is the LDEV ID (T302) and Addr. Of the cache management table T300 (T300-1). (T303) is obtained by counting rows that are not NULL. When the usage amount of the CM 144 is equal to or greater than a predetermined threshold (S103: Yes), it is checked whether the data of the real page is cached in the CM 144 for all the real pages having an access frequency of 0. From the cache management table T300, the information about the real page with the access frequency of 0 (specifically, the LDEV ID (T302) and Addr. (T303) of the real page with the access frequency of 0) is set to NULL (S104). As a result, the cached data is substantially deleted from the CM 144. This process is called “purge”. If the data (data stored in the CM 144) is not yet written in the drive 121 at the time of purging, the data is written in the drive 121.

If the usage amount of the CM 144 is not equal to or greater than the predetermined threshold (S103: No), the purge program 1004 ends the process without doing anything. However, as another embodiment, the determination of S103 may not be performed, and data with an access frequency of 0 may be uniformly deleted (purged) from the CM 144.

In S101, when searching for a real page with an access frequency of 0, methods other than those described above can be used. For example, by first referring to the access pattern table T550, it is determined whether there is an access pattern in which the I / O levels stored in T553-1 to T553-7 are all 0. When there is an access pattern whose I / O levels are all 0 (assuming that the access pattern (T552) is X), in the pool management table T400, the actual page whose access pattern (T408) is X Should be extracted.

Also, instead of searching for a real page with access frequency values all zero stored in T503-1 to T503-7 of the access frequency table T500, the current access frequency value (when the purge program 1004 is started) May be searched, or a real page having an access frequency value of 0 within a predetermined time (for example, within 5 hours) from the present time may be searched. Alternatively, instead of searching for a real page with an access frequency value of 0, a search may be made for a real page whose total value of access frequency values stored in T503-1 to T503-7 is equal to or less than a certain threshold. .

Next, processing performed by the deduplication program 1005 will be described. In the storage apparatus 1 according to the present embodiment, each virtual page of a plurality of startup volumes belonging to one pool is mapped to the GI in the initial state. When the virtual page of the startup volume is updated, the mapping of the virtual page is mapped to the DI that stores the updated data.

If the same data is stored in the same virtual page (virtual page with the same page ID) of all the startup volumes, the data is transferred to the GI, and the mapping is changed so that each virtual page is mapped to the GI. By (returning to the initial state), it is possible to eliminate a state in which a plurality of duplicate data exists in the storage area, and as a result, it is possible to save the storage area. The deduplication program 1005 performs this process.

The deduplication program 1005 is started when the user issues a deduplication instruction using the management terminal 13. Alternatively, the storage apparatus 1 may periodically execute the deduplication program 1005.

The flow of processing executed by the deduplication program 1005 will be described with reference to FIG. This process is executed for each pool (pool to which the startup volume belongs). Therefore, when there are n pools, the process of FIG. 13 is performed n times. Hereinafter, a case where the deduplication program 1005 is executed for a pool with a pool ID of 0 will be described.

When the deduplication program 1005 is started, the deduplication program 1005 checks the contents of each virtual page of the startup volume belonging to the pool # 0. Specifically, when a startup volume with a VVOL ID of 0 to m (m> 0) exists in the pool, the first virtual page to the last virtual page of each startup volume are read and compared. Then, it is checked whether or not there is a virtual page having the same content across all the activation volumes (S151).

For example, when the contents of page #k (k is an integer value of 0 or more) of all boot volumes are the same (exactly, the data stored in the actual page mapped to page #k of all boot volumes is If all are the same), the deduplication program 1005 moves the data stored in page #k to GI (S152). Specifically, the deduplication program 1005 refers to the GI mapping table T250 to identify a row where the pool ID (T251) is 0 and the page ID (T252) is k. The LDEV ID (T253) and Addr. The data stored in page #k of the current startup volume is stored in the real page (this real page is GI) specified in (T254). However, if all the page #k of all startup volumes are GI, the deduplication program 1005 does not perform the process of S152.

Subsequently, the deduplication program 1005 updates the virtual volume management table T200 and the pool management table T400. A specific example will be described in the case where the contents of page #k (k is an integer value of 0 or more) of all the startup volumes are the same. The deduplication program 1005 uses the LDEV ID (T204) and Addr. For all the rows where the pool ID (T203) is 0 and the page ID (T202) is k. The value of (T205) is changed from the LDEV ID and Addr. (S152). The attribute T206 is changed to 1 (GI). As a result, page #k of all startup volumes in pool # 0 is mapped to GI.

On the other hand, until now (until S152 is executed), the real page (DI) mapped to the virtual page (page #k) of each startup volume is no longer necessary (mapping may be released). The deduplication program 1005 uses the VVOL ID (T406) for all the rows in the pool management table T400 where the pool ID (T401) is 0 and the Page ID (T407) is k (except for the row where the attribute T404 is 1). ) And an invalid value (NULL) are stored in Page ID (T407) (S153). As a result, the mapping of the real page mapped to page #k is released. Note that the LDEV ID (T402) and Addr. Of the row in which the invalid value (NULL) is stored in the VVOL ID (T406) and the Page ID (T407). The information of (T405) is used in the next S154. Therefore, the deduplication program 1005 keeps the LDEV ID (T402) and Addr. Of the row in which the invalid value (NULL) is stored in the VVOL ID (T406) and the Page ID (T407) until S154 ends. The information of (T405) is stored.

Finally, the deduplication program 1005 updates the access frequency table T500 (S154) and ends the process. First, the deduplication program 1005 reads the LDEV ID (T501) and Addr. 502) identifies all the lines that are the same as the real page for which mapping was canceled in S153, and the access frequency information (T503-1 to T503-7) of each line is set to each time of day of the week (0 to 1:00, 1:00 ˜2 o'clock,..., 23:00 to 0 o'clock) are calculated. Hereinafter, the calculation result calculated in this way is referred to as access frequency information before deduplication.

Subsequently, the deduplication program 1005 reads the LDEV ID (T501) and Addr. (T502) is the GI LDEV ID and Addr. Identify rows equal to. Then, the access frequency information before deduplication is substituted into the access frequency information (T503-1 to T503-7) of the row. That is, the GI access frequency information is made equal to the sum of the access frequency information of each real page before deduplication. As another embodiment, an average value of access frequency information of each real page before deduplication may be set as GI access frequency information.

Next, processing of the staging program 1007 will be described. The staging program 1007 performs staging of real pages (DI) belonging to an access pattern having a high access frequency. The staging program 1007 is started at 0 minutes or several minutes before the hour. Alternatively, when the user issues a staging instruction using the management terminal 13, the staging program 1007 is started.

FIG. 15 shows the processing flow of the staging program 1007. The staging program 1007 first refers to the access pattern table T550, and from there, the access frequency is high at the current time (the time when the staging program 1007 is started) (I / O level is a predetermined threshold (for example, 3)) The access pattern is specified (S251). For example, it is assumed that the staging program 1007 is started at 21:00 (or a few minutes before) on Monday. When an access pattern having a high access frequency (I / O level) is searched for at 21:00 on Monday in the access pattern table T550 of FIG. 7, the access pattern T552 has an A / O level of 5 (most I / O). Level is high). Therefore, the access pattern with the highest access frequency is identified as A.

Subsequently, the staging program 1007 refers to the pool management table T400, the access pattern T408 extracts the real pages that are the access patterns selected in S251, and performs staging of those real pages (S252). For example, when Group A is specified in S251, the staging program 1007 extracts all real pages whose access pattern T408 is A. After S252, if there is an access pattern with the next highest access frequency (S253: Yes), the staging program 1007 selects the access pattern with the next highest access frequency (S254), and performs the process of S252. This process is repeated until there are no access patterns having an I / O level equal to or higher than a predetermined threshold (for example, 3). If there is no access pattern having an I / O level equal to or higher than a predetermined threshold (S253: No), the process is terminated. Even when there is no access pattern having an I / O level equal to or higher than a predetermined threshold (eg, 3) at the time of execution of S251, the staging program 1007 does not stage any real page and ends the process.

In this way, the staging program 1007 preferentially stages pages that tend to be accessed frequently at the current time (the time when the staging program 1007 is started) based on the access pattern table T550. Therefore, when the host 2 issues an access request for the page, the probability that the page to be accessed is staged in the cache increases. As a result, access performance can be improved.

Next, the page sort process will be described with reference to FIG. This process is executed by the page sort program 1006. The page sort program 1006 is started when the user issues a page sort instruction using the management terminal 13. Alternatively, the storage apparatus 1 may periodically execute the page sort program 1006. The page sort program 1006 is executed for each pool. Hereinafter, a case where the page sort program 1006 is executed for the pool # 0 will be described as an example.

The page sort program 1006 first selects one access pattern that has not been subjected to page sort (the processing of S202 to S204) (S201). Subsequently, all real pages having the same access pattern as the access pattern selected in S201 are specified (S202). This may be done by specifying a row in the pool management table T400 where the pool ID (T401) is 0 and the access pattern T408 is the same as the access pattern selected in S201.

After S202, the page sort program 1006 determines whether there are continuous free real pages in the pool # 0. Specifically, it is determined whether there are as many continuous free real pages as the number of real pages specified in S202 (S203). If there are continuous free real pages, all the data of the real page specified in S202 is moved to this continuous free real page (S204). In this process, data is read from the specified real page, and the read data is written to a free real page.

Also, at the time of this movement, it is necessary to change the position of the real page mapped to the virtual page. For this reason, the pool management table T400, the virtual volume management table T200, and the GI mapping table T250 are updated simultaneously with the data movement of the real page. For example, if the LDEV ID is 00:01, Addr. Is stored in the actual page of 00000000, the LDEV ID is 00:02, Addr. An example of a method for updating each management table in the case where is moved to a real page of 00000000 will be described using the pool management table T400 (FIG. 5) as an example. In the pool management table T400, the LDEV ID (T402) is 00:01, Addr. Referring to the row where (T405) is 00000000, this real page is mapped to a virtual page whose VVOL ID (T406) is 00 and page ID (T407) is 01000. After the data movement of the real page, it is necessary to change the mapping so that the real page of the data movement destination (the real page with the LDEV ID of 00:02 and Addr. Of 00000000) is mapped to this virtual page. The page sort program 1006 has an LDEV ID (T402) of 00:02, Addr. For the row where (T405) is 00000000, the value of VVOL ID (T406) is set to 00, and the value of page ID (T407) is set to 01000. And the LDEV ID (T402) is 00:01, Addr. For the row where (T405) is 00000000, the values of VVOL ID (T406) and page ID (T407) are set to NULL. At this time, information other than the VVOL ID (T406) and page ID (T407), for example, the attribute (T404) and the access pattern (T408) are also moved.

The same processing is performed for the virtual volume management table T200 and the GI mapping table T250. As for the cache management table T300, when the data of the real page of the data transfer source is cached in the CM 144, the data of the real page is purged.

In step S205, the page sort program 1006 determines whether there is an access pattern that has not been subjected to page sort (processing in steps S202 to S204). If there is an access pattern that has not been subjected to page sorting (S205: No), the processing is repeated from S201. If page sorting has been performed for all access patterns (S205: Yes), the process ends.

By the page sort program 1006, data of real pages having the same access pattern (pages having the same access pattern (T408)) is arranged in a continuous area in the pool (that is, a continuous area on the LDEV or the drive 121). . Real pages with the same access pattern are likely to be accessed in the same time zone. Further, when data is arranged in a continuous area on the LDEV or the drive 121, the access speed when data is read from the drive 121 is improved. For this reason, if real page data having the same access pattern is arranged in a continuous area on the LDEV or the drive 121, an effect of improving the access performance can be obtained.

This completes the description of the storage apparatus according to the embodiment of the present invention. In the storage apparatus according to the embodiment, a plurality of virtual volumes (startup volumes) to which real pages (GI) storing original data are mapped are formed. When the virtual volume is updated from the host, the update data is stored in a real page (DI) for storing the update data.

In such a configuration, when a large number of virtual volumes are formed, many accesses to the real page (GI) storing the original data are generated, and the access performance is likely to deteriorate. Further, the update data is stored in a real page (DI) different from the GI, whereby the real page mapped to the virtual volume is mapped to a physically discrete storage area. The configuration in which the pages of the volume are arranged in physically discrete areas leads to an increase in access latency at the time of reading.

In the storage apparatus according to the embodiment of the present invention, the real page mapped to the virtual volume (startup volume) is made resident in the cache based on the access frequency of the real page. Since real pages with high access frequency are made resident in the cache as much as possible, it is possible to suppress a decrease in access performance.

As mentioned above, although the Example of this invention was described, this is an illustration for description of this invention, Comprising: It is not the meaning which limits the scope of the present invention only to these Examples. That is, the present invention can be implemented in various other forms.

For example, in the storage apparatus described above, access frequency information for each time from Monday to Sunday (0-1 o'clock, 1-2 o'clock, ..., 23-24 o'clock) is managed for each real page. However, as another embodiment, instead of managing the access frequency information for each time from Monday to Sunday, the access frequency information for each time of the day may be managed. In this case, the access frequency table T500 does not need to manage access frequency information from Monday to Sunday, that is, seven access frequency information (T503-1 to T503-7), and only one access frequency information needs to be managed. . In particular, when it is known in advance that the access frequency does not vary on each day of the week, managing the access frequency information for each hour of the day does not affect the access performance. Furthermore, when it is known that the access frequency varies not in every week but in another cycle (for example, a 10-day cycle, a January cycle, etc.), the access frequency information is updated in the cycle (10-day unit or monthly unit). You may make it manage.

Further, the storage device 20 for storing the write data of the local file server 30 is not limited to the configuration outside the local file server 30. A configuration in which the storage device 20 is built in the local file server 30 may be adopted.

1: storage device, 2: host, 6: communication network, 7: management network, 11: storage controller, 12: disk unit, 13: management terminal, 111: MPB, 112: CHA, 113: DKA, 114: CMPK, 115: Switch (SW), 121: Drive, 141: MP, 142: LM, 143: SM, 144: CM

Claims

A storage device having a cache memory and a storage device and receiving an access request from a host,
The storage device has a storage area for initial data read by the host and a storage area for update data from the host,
The storage device provides the host with a startup volume composed of a plurality of virtual pages each mapped with the storage area for initial data or the storage area for update data,
The storage device is also configured to store, in the cache memory, data selected based on an access frequency to each storage area from the host from among data stored in the storage area,
When the storage device receives a read request for the startup volume from the host and the read target data is stored in the cache memory, the storage device reads the data from the cache memory and returns it to the host To
A storage apparatus characterized by the above.
When the storage device receives a write request for the virtual page to which the storage area for the initial data is mapped from the host,
The storage area that has not yet been mapped to any of the virtual pages among the storage areas for the update data is mapped to the virtual page, and the write request received together with the write request is mapped to the mapped storage area Store data,
The storage apparatus according to claim 1, wherein:
The storage device manages access frequency information from the host to each storage area,
When the storage device mapped the storage area that has not yet been mapped to any virtual page to the virtual page, the access frequency information of the mapped storage area was mapped to the virtual page Same as the access frequency information of the storage area for the initial data,
The storage apparatus according to claim 2, wherein:
The storage device manages the fluctuation tendency of the access frequency from the host to each storage area,
Managing a plurality of storage areas having similar access frequency fluctuation trends as storage areas of the same access pattern;
The storage apparatus according to claim 2, wherein:
The storage apparatus rearranges data stored in the plurality of storage areas of the same access pattern in a continuous area on the storage device,
The storage apparatus according to claim 4, wherein:
The storage device identifies the access pattern having a high access frequency within a predetermined period from the present based on the access frequency fluctuation trend,
Data stored in the storage area of the identified access pattern is stored in the cache;
The storage apparatus according to claim 4, wherein:
The storage device is configured to provide a plurality of the startup volumes to one or more hosts.
When the same data is stored in the same virtual page of the plurality of startup volumes, the storage device,
Moving the stored data to the storage area for the initial data;
Mapping the storage area for the initial data to which the data has been moved to the same virtual page of the plurality of startup volumes;
The storage apparatus according to claim 1, wherein:
In response to an external instruction, the storage device
Storing the data stored in the storage area for the initial data mapped to the startup volume in the cache memory;
Among the storage areas for the update data mapped to the startup volume, the data is stored in the cache memory in order from the data stored in the storage area having a high access frequency.
The storage apparatus according to claim 1, wherein:
The storage device is configured with a plurality of virtual pages in addition to the startup volume, and provides a virtual volume with no storage area mapped to each of the virtual pages in the initial state to the host,
When the storage apparatus receives a write request for a virtual page to which the storage area is not mapped from the host, the storage apparatus has the storage area that is not mapped to any virtual page among the storage areas of the storage device. Configured to map,
The storage device manages the area in the cache memory separately into an area for storing data stored in the startup volume and an area for storing data stored in the virtual volume. ,
The storage apparatus according to claim 1, wherein:
A control method of a storage apparatus having a cache memory and a storage device and receiving an access request from a host,
The storage device has a storage area for initial data read by the host and a storage area for update data from the host,
The storage device is configured to provide a startup volume composed of a plurality of virtual pages each mapped with the storage area for initial data or the storage area for update data to the host,
The storage device selects data to be stored in the cache memory from data stored in the storage area based on an access frequency to each storage area from the host, and selects the selected data in the cache memory Stored in cache memory,
When a read request for the startup volume is received from the host, if the read target data is stored in the cache memory, the data is read from the cache memory and returned to the host.
A method for controlling a storage apparatus.
When receiving a write request for the virtual page to which the storage area for the initial data is mapped from the host,
The storage device maps, to the virtual page, the storage area that is not yet mapped to any of the virtual pages, among the storage areas for update data, and writes the write request to the mapped storage area Store the write data received with
The method for controlling a storage apparatus according to claim 10, wherein:
The storage device manages access frequency information from the host to each storage area,
When the storage device mapped the storage area that has not yet been mapped to any virtual page to the virtual page, the access frequency information of the mapped storage area was mapped to the virtual page Same as the access frequency information of the storage area for the initial data,
The storage apparatus control method according to claim 11, wherein:
The storage device manages the fluctuation tendency of the access frequency from the host to each storage area,
Managing a plurality of storage areas having similar access frequency fluctuation trends as storage areas of the same access pattern;
The storage apparatus control method according to claim 11, wherein:
The storage apparatus rearranges data stored in the plurality of storage areas of the same access pattern in a continuous area on the storage device,
The storage apparatus control method according to claim 13, wherein: