WO2017077583A1

WO2017077583A1 - Information system including storage system, and performance deterioration prediction method for storage system

Info

Publication number: WO2017077583A1
Application number: PCT/JP2015/080967
Authority: WO
Inventors: 拓海松浪; 淳平清時
Original assignee: 株式会社日立製作所
Priority date: 2015-11-02
Filing date: 2015-11-02
Publication date: 2017-05-11

Abstract

[Problem] To make it possible to accurately predict the point in time when a storage system will begin to fail to satisfy predetermined specifications, even if the storage system undergoes a sudden change in performance deterioration tendency in the future as a result of the value of a specific performance parameter exceeding a threshold value. [Solution] In the storage system according to the present invention, a prediction unit derives a future value of a performance parameter between a primary storage device and a secondary storage device, taking into account an acquired value of the performance parameter between the primary and secondary storage devices, and a tendency coefficient for determining the deterioration tendency of the performance parameter between the primary and secondary storage devices.

Description

Information system including storage system and performance degradation prediction method in storage system

The present invention relates to an information system including a storage system and a performance deterioration prediction method in the storage system, and in particular, information including a storage system that uses a performance deterioration prediction technique for data transfer delay time between a primary storage device and a secondary storage device. It is suitable for application to a system.

Conventionally, it has been known that performance degradation gradually proceeds in a computer system due to the influence of resource load and the like. As a technique for analyzing such a tendency of performance degradation in the future, there is a method for supporting analysis of a computer system and asynchronous remote replication (see Patent Document 1).

In this conventional analysis support method, based on the expectation that the performance deterioration trend from the past to the present will continue in the future, the future performance deterioration is predicted and the future performance deterioration is apparent. To predict the time when the computer system user will not meet the specifications desired by the computer system.

PCT / JP2013 / 062573

However, depending on a specific performance value, performance degradation may suddenly progress beyond a certain threshold, and performance degradation from the past to the present may not always continue in the future, and it is difficult to predict performance degradation There is also.

The present invention has been made in consideration of the above points, and when a specific performance value exceeds a threshold value in the future, even when the tendency of performance deterioration suddenly changes, the time when the predetermined specification is not satisfied is further increased. It is an object of the present invention to propose an information system including a storage system and a performance degradation prediction method in the storage system that can be accurately predicted.

In order to solve this problem, in the present invention, in an information system including a storage system that performs data transfer between a primary storage device and a secondary storage device, a predetermined value is determined between the primary storage device and the secondary storage device. A measurement unit that acquires the measured performance value, an analysis unit that analyzes the measured performance value and acquires a performance deterioration tendency of the performance value between the primary storage device and the secondary storage device, and the primary storage An information storage unit that holds a past performance deterioration tendency related to a performance value between a device and the secondary storage device, a performance value of the primary storage device, a performance value of the secondary storage device, the primary storage device and the Secondary stress In consideration of the performance value between the primary storage device and the tendency factor for obtaining the performance degradation tendency with respect to the performance value between the acquired primary storage device and the secondary storage device, the primary storage device and the A predicting unit for deriving a future performance value with the secondary storage apparatus.

According to the present invention, in the performance degradation prediction method in the storage system for transferring data between the primary storage device and the secondary storage device, a measurement unit is predetermined between the primary storage device and the secondary storage device. A measurement step of acquiring a performance value, an analysis step of analyzing the measured performance value, and an analysis step of acquiring a performance deterioration tendency of the performance value between the primary storage device and the secondary storage device, information An information storage step in which a storage unit holds a past performance deterioration tendency related to a performance value between the primary storage device and the secondary storage device, and a prediction unit performs a performance value of the primary storage device and a performance value of the secondary storage device. The primary In consideration of a trend coefficient for obtaining a performance value between the storage device and the secondary storage device, and a performance deterioration tendency with respect to the acquired performance value between the primary storage device and the secondary storage device, And a prediction step of deriving a future performance value between the primary storage device and the secondary storage device.

According to the present invention, it is possible to more accurately predict the time when the predetermined specification is not satisfied even when the performance deterioration tendency suddenly changes when a specific performance value exceeds a threshold value in the future.

It is a block diagram which shows schematic structure of the information system containing the storage system by this Embodiment. It is a flowchart which shows an example of the future value prediction process of the (data transfer) delay time in the storage system by this Embodiment. It is a flowchart which shows an example of the specific process sequence of the performance value prediction process shown in FIG. It is a flowchart which shows an example of the specific process sequence of the upward correction coefficient addition process shown in FIG. It is a flowchart which shows an example of the specific process sequence of the delay time prediction process shown in FIG. It is a figure which shows the 1st example of the influence (metric contribution degree) which influences delay time. It is a figure which shows the 2nd example of the influence (metric contribution degree) which influences delay time. It is a figure which shows the 3rd example of the influence (metric contribution degree) which influences delay time. It is a figure which shows the 4th example of the influence (metric contribution degree) which influences delay time. It is a figure which shows the 5th example of the influence (metric contribution) which influences delay time. It is a figure which shows an example of how to obtain | require the upward correction coefficient for every metric. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment. It is a figure which shows a mode that a delay time is estimated by the delay time prediction process in this Embodiment.

Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

(1) Configuration of Storage System According to this Embodiment FIG. 1 shows a schematic configuration of an information system including a storage system 1 according to this embodiment. The storage system 1 is connected to the performance data management server 400 and the storage system management server 500 via the network 2. The storage system 1 includes a plurality of storage devices that provide storage areas for storing data. The storage system 1 includes a primary storage device 200 and a secondary storage device 300 as such storage devices, and stores data from the application host 100. The application host 100 is a computer such as a desktop or notebook computer, workstation or mainframe.

The application host 100 includes a CPU (Central Processing Unit), a memory, a cache, a hard disk drive, an interface, and various interface cards (not shown). The application host 100 implements various functions by the built-in processor executing various programs on the memory. An example of these various programs is a host application. The application host 100 mainly accesses data stored in the primary storage device 200 and executes data processing. The data corresponds to the replication source data.

In this storage system 1, the application host 100 issues a write command to the primary storage device 200 to store the data in the primary storage device 200, while copying the data to a secondary storage device that is another storage device. It is stored in the storage device 300. When a failure occurs in the primary storage apparatus 200, the storage system can continue processing using data copied to the secondary storage apparatus 300.

In this embodiment, a storage device that stores data to be replicated (hereinafter referred to as “replication source data”) and a storage that stores data obtained by replicating the replication source data (hereinafter referred to as “replication data”). In order to distinguish between the apparatuses, the side storing the copy source data is called “primary”, and the side storing the copy data is called “secondary”. In this embodiment, the storage device that stores the replication source data is referred to as a “primary storage device” and the storage device that stores the replication data is referred to as a “secondary storage device”, depending on whether it is primary or secondary. It shall be called. In the present embodiment, the volume on the primary storage device side is referred to as a primary volume (corresponding to “P-VOL” in the figure) ”, and the volume in the secondary storage apparatus is referred to as a secondary volume (indicated by“ S-VOL ”in the figure). Equivalent)).

The

storage apparatuses

200 and 300 are connected by a network or a dedicated line, and communication is performed using a predetermined communication protocol. The secondary storage apparatus 300 stores the replicated data of the data already stored in the primary storage apparatus 200 by synchronous data replication processing described later. The secondary storage apparatus 300 stores differential data based on data already stored in the secondary storage apparatus 300 by asynchronous data replication processing described later.

The primary storage apparatus 200 has a function of creating a journal related to the update contents when the copy source data is updated, storing it in the primary journal volume, and replicating the data based on the journal data. The secondary storage device 300 has a function of creating a journal related to the update contents and storing the journal data in the secondary journal volume, almost the same as the primary storage device 200. The primary storage device 200 transfers necessary journal data between both storage devices in the asynchronous data replication processing described later, and updates the replicated data stored in the secondary storage device 300 with the journal data. It has a function to match data with the original data.

As a method (mode) of data replication between the primary storage apparatuses 200, there are synchronous data replication and asynchronous data replication. During normal operation, two types of data replication, that is, synchronous data replication and asynchronous data replication, are performed between the two

storage apparatuses

200 and 300, whereby two replication data are held for one replication source data. Synchronous data replication is a process of updating replication data in synchronization with the update of the replication source data, and the replication data matches the replication source data. On the other hand, asynchronous data replication is a process of updating the replication data based on the journal asynchronously with the update of the replication source data, and the replication data matches after the update of the replication source data.

In the storage system 1 according to the present embodiment, during normal operation, data replication between two storage devices, that is, maintaining the retained state of the replicated data is performed. When a failure occurs in a certain storage device, the storage that is not in the failed state The data replication is continued by transferring the journal between the apparatuses and reflecting the update with the journal data, and the process is resumed.

Here, the storage area of the storage apparatus is logically divided and managed, and the divided storage area is called a logical volume (hereinafter also simply referred to as “volume”). In order to manage the update order of data within and between the logical volumes, a management unit called a group is provided. This group corresponds to the copy groups A, B, C,.

<Storage device configuration>
Next, a hardware configuration example of each storage apparatus will be described. In the present embodiment, the hardware configuration of the primary storage device 200 will be mainly described, but the secondary storage device 300 described above has a substantially similar configuration.

Although the detailed illustration is omitted, the primary storage apparatus 200 includes a master disk controller (hereinafter referred to as “host adapter), DKA (disk adapter), CACHE (cache memory), SM (shared memory), and SWITCH (switch). (Also abbreviated as “M-DKC”) and an HDD (hard disk drive). The host adapter, disk adapter, cache memory, and shared memory are connected via a switch. A plurality of hard disk drives are provided, constituting a so-called RAID (Redundant Array of Inexpensive Disks).

The primary storage device 200 includes a host adapter and a disk adapter as control units for controlling various processes in addition to command reception processing and the like, and executes programs corresponding to the respective processes by the

processors

200b and 200e incorporated therein. . The host adapter 110 has a communication interface for communicating with the application host 100 and a function for exchanging input / output commands, and controls data transfer between the application host 100 and the cache memory. The host adapter is connected to the application host 100 and the secondary storage device 300. The disk adapter 120 controls reading and writing of data with respect to the hard disk drive, and controls data transfer between the cache memory and the hard disk drive.

The cache memory is a type of memory that temporarily stores mainly data received from the application host 100 or data read from the hard disk drive. The shared memory is a memory that is shared and used by all the host adapters and disk adapters in the primary storage apparatus 200, and mainly stores control information and the like.

The primary storage apparatus 200 includes a processor 200b as shown in the figure, and this processor 200b uses a cache 200c by so-called CLPR (Cache Logical Partition), while copying a primary volume 200a (corresponding to “P-VOL” in the figure) Manage groups. Further, the primary storage apparatus 200 includes a processor 200e as shown in the figure, and this processor 200e uses the cache 200f by the CLPR (Cache Logical Partition) and uses master journal data 200d (described in “M-JNL” shown in the figure). Equivalent).

On the other hand, the secondary storage apparatus 300, although not shown in detail, has a restore disk controller (CHA (host adapter), DKA (disk adapter) and CACHE (cache memory), SM (shared memory) and SWITCH (switch). (Hereinafter also abbreviated as “R-DKC”) and HDD (hard disk drive). Similarly, the secondary storage apparatus 300 has basically the same configuration as the primary storage apparatus 200 as described above, and includes a processor 300b. The processor 300b manages a secondary volume 300a (corresponding to “S-VOL” in the figure) and a copy group while using the cache 300c by the CLPR (Cache Logical Partition). Further, the secondary storage apparatus 300 manages restore journal data 300d (corresponding to “R-JNL” in the figure), which will be described later, using the cache 300f using the CLPR (Cache Logical Partition).

<Journal structure>
The journal is data created as information related to data update when the replication source data (primary volume) held by the primary storage device 200 is updated. The journal includes write data and update information. The write data is a copy of data used at the time of data update processing of the primary volume from the application host 100. The data is write data for the update position.

The update information is information for managing write data and journal for each update, and the time (update time) as the time stamp when the write command is received, the management number, the logical address of the write command, and the data of the write data Includes size etc. The update information includes, as a data update identifier, at least one of a time (update time) as a time stamp when the write command is received and a management number. The update order identifies the data update order.

(2) Storage System Management Server Next, the storage system management server 500 will be described. In addition to the storage & copy pair management program 500a, the storage system management server 500 includes a delay time analysis program 500b and a collection database 500c as an example of an analysis unit.

The storage & copy pair management program 500a manages copy pairs between the primary storage apparatus 200 and the secondary storage apparatus 300.

The storage & copy pair management program 500a issues a command necessary for instructing the primary storage apparatus 200 to write or read data.

The storage & copy pair management program 500a is application software that acquires and manages each state of the primary storage apparatus 200 and the secondary storage apparatus 300. The storage & copy pair management program 500a manages each state according to the execution of the synchronous data replication process described above, and detects the pair state between the primary storage apparatus 200 and the secondary storage apparatus 300.

In the present embodiment, the combination of the states of the two storage devices set in synchronization in the synchronous data replication process is called a “copy pair”. In other words, in the present embodiment, the statuses of the primary storage apparatus 200 and the secondary storage apparatus 300 are changed synchronously, and hence are called “copy pairs”.

Examples of the pair state include a duplex state, a suspend state, a copy and pair state, and a writable suspend state (corresponding to a Susp (R) state described later). The duplex state indicates that the data stored in the volumes of both storage devices match. The suspended state indicates a state where data cannot be written or read. The Copy state indicates that data is to be copied, and the Pair state indicates that the same copy data is written to both disks. The writable suspend state is basically a suspend state, but indicates a state where data can be written.

The delay time analysis program 500b is an example of a prediction unit, and will be described in detail later. The performance value of the primary storage device 200, the performance value of the secondary storage device 300, and the performance between the primary storage device 200 and the secondary storage device 300 will be described later. The future performance value between the primary storage device 200 and the secondary storage device 300 in consideration of the value and the tendency factor for obtaining the performance deterioration tendency with respect to the performance value between the primary storage device 200 and the secondary storage device 300 To derive.

The delay time analysis program 500b, for example, analyzes the measured performance value data and acquires the performance deterioration tendency of the performance value between the primary and secondary storage apparatuses. The performance deterioration prediction method by the delay time analysis program 500b will be described with reference to FIG. In this embodiment, as such performance deterioration prediction, for example, a delay time related to data transfer between the primary storage apparatus 200 and the secondary storage apparatus 300 is mainly cited.

(3) Operation Example of Storage System With the above configuration, the storage system 1 and the like have data stored in the primary volume 200a of the primary storage device 200 and the secondary volume 300a of the secondary storage device 300 during normal operation. Is operating so that the data already stored in the data matches (pair status = duplex status).

In the application host 100, the IO control unit issues an IO command to the primary volume. The primary storage apparatus 200 transfers the write data stored in the primary volume 200a by the command in synchronization with the secondary storage apparatus 300, while the secondary storage apparatus 300 copies the transferred write data to the secondary volume 300a. .

When the replication source data stored in the primary volume 200a is updated, the primary storage device 200 creates journal data 200d (corresponding to the illustrated M-JNL) related to the update contents, stores it in the journal volume, and stores the journal data in the journal volume. The data is originally copied. On the other hand, the secondary storage apparatus 300 creates journal data 300d (corresponding to R-JNL in the figure) related to the update contents and stores it in the journal volume, almost like the primary storage apparatus 200. The primary storage apparatus 200 performs data transfer of necessary journal data between these storage apparatuses in asynchronous data replication processing to be described later, and updates the replicated data stored in the secondary storage apparatus 300 with the journal data. It has a function of matching the duplicated data with the original data. Note that the delay time to be targeted later in this embodiment is, for example, a delay time when journal data is transferred between these storage apparatuses.

(4) Performance Data Management Server Next, the performance data management server 400 will be described. The performance data management server 400 includes a performance monitoring program 400a and a performance database 400b. The performance monitoring program 400a corresponds to an example of a measurement unit, sequentially collects performance values of a predetermined type between the primary storage device 200 and the secondary storage device 300, and a collection database 500c as an example of an information storage unit The data (hereinafter also referred to as “storage performance metric”) is stored. In the present embodiment, the predetermined (type) performance value between the primary storage apparatus 200 and the secondary storage apparatus 300 is the data transfer delay time between the primary storage apparatus 200 and the secondary storage apparatus 300. Can be illustrated.

In this performance data management server 400, the performance monitoring program 400a collects performance values (storage performance metrics) having the following names, for example, and stores them in the collection database 500c of the storage system management server 500.

Performance value P-VOL Processor Busy Rate
Performance value P-VOL Cache Write Pending Rate
2. Performance value P-VOL Cache Logical Partition Memory Usage Rate
Performance value4. P-VOL Write Transfer Rate
4. Performance value P-VOL Write I / O
Performance value P-VOL Parity Group Write Transfer Rate
Performance value P-VOL Parity Group Write I / O
Performance value8. M-JNL Processor Busy Rate
Performance value9. M-JNL Cache Write Pending Rate
Performance value10. M-JNL Cache Logical Partition Memory Usage Rate
Performance value 11. M-JNL Write Transfer Rate
Performance value 12. M-JNL Write I / O
Performance value13. M-JNL Parity Group Write Transfer Rate
Performance value14. M-JNL Parity Group Write I / O
Performance value 15. M-JNL Usage Rate
Performance value 16. R-JNL Processor Busy Rate
Performance value 17. R-JNL Cache Write Pending Rate
Performance value 18. R-JNL Cache Logical Partition Memory Usage Rate
Performance value 19. R-JNL Write Transfer Rate
Performance value 20. R-JNL Write I / O
Performance value 21. R-JNL Parity Group Write Transfer Rate
Performance value 22. R-JNL Parity Group Write I / O
Performance value 23. R-JNL Usage Rate
Performance value 24. S-VOL Processor Busy Rate
Performance value 25. S-VOL Cache Write Pending Rate
Performance value 26. S-VOL Cache Logical Partition Memory Usage Rate
Performance value 27. S-VOL Write Transfer Rate
Performance value 28. S-VOL Write I / O
Performance value 29. S-VOL Parity Group Write Transfer Rate
Performance value 30. S-VOL Parity Group Write I / O
Performance value 31. Array Port Transfer Rate
Performance value 32. Array Port I / O, etc.

In the present embodiment, as described above, performance values that have an influence on the delay time are selected in advance from among a large number of storage performance metric (performance value) groups. The underlined performance values correspond to this. That is, it can be said that the following storage performance metrics selected in advance are likely to affect the delay time.

Performance value 25. S-VOL Cache Write Pending Rate
Performance value 24. S-VOL Processor Busy Rate
Performance value 16. R-JNL Processor Busy Rate
Performance value 17. R-JNL Cache Write Pending Rate
Performance value8. M-JNL Processor Busy Rate
Performance value 31. Array Port Transfer Rate

First, the performance value “Cache Write Pending Rate” represents the cache write wait rate, and the higher the value, the higher the possibility that a storage delay of the value to be stored in the volume has occurred. “S-VOL Cache Write Pending Rate” represents the write wait rate of the cache 300c, while “R-JNL Cache Write Pending Rate” represents the write wait rate of the cache 300f. Care should be taken when the performance value is 40% to 80%. Hereinafter, in the present embodiment, such a range of values that require attention is also referred to as a “caution value range”.

Threshold values allowed for performance are set in advance for each performance value described above. The performance database 400b described above stores information (hereinafter also referred to as “limit value information”) relating to a preset threshold value (hereinafter also referred to as “limit value”) that should be allowed for performance for each performance value. In addition, information on a preset attention value range (hereinafter also referred to as “attention value range information”) is stored.

In this embodiment, the degree of influence of each performance value (storage performance metric) selected in advance on C / T Delta is referred to as “metric contribution rate”. C / T Delta represents an index related to a delay time in data transfer between storage apparatuses. Regarding the degree of influence (metric contribution) on C / T Delta, among the above-described six selected performance values, for example, S-VOL Cache Write Pending Rate (performance value 25) has the largest influence (metric). The contribution ratio is large), and the influence of the Array Port Transfer Rate (performance value 31) is the smallest (the metric contribution is small).

Next, the performance value “Processor Busy Rate” represents the usage rate of the processor, and the higher the value, the higher the possibility that processing delay such as I / O for the volume is occurring. The performance value “S-VOL Processor Busy Rate” represents the usage rate of the processor 300b, while the performance value “R-JNL Processor Busy Rate” represents the usage rate of the processor 300e. For example, when the performance value is 30% to 60%, attention is required (corresponding to the attention value range).

Regarding the degree of influence (metric contribution) that the storage performance metric has on C / T Delta, the S-VOL Cache Write Pending Rate (performance value 25) has the largest influence (metric) among the six selected performance values described above. The contribution ratio is large), and the influence of the Array Port Transfer Rate (performance value 31) is the smallest (the metric contribution is small).

Next, the performance value 31 “Array Port Transfer Rate” represents the data transfer amount of the remote path between the primary storage device 200 and the secondary storage device 300, and is close to the line bandwidth set by the user. This indicates that there is a high possibility that the line is tight.

(5) Performance Deterioration Prediction Method in Storage System FIGS. 2 to 5 show an example of a procedure for predicting the future value of the delay time. In the following description, “CWPR” means “Cache Write Pending Rate”, “PBR” means “Processor Busy Rate”, and “APTR” means “Array Port Transfer Rate”.

First, in step S1, an actual measurement value period, an allowable delay time, and a communication bandwidth used for analysis are input and set in advance. Next, in step S2, the delay time analysis program 500b linearly approximates the delay time up to time t and calculates the slope a of the line. Next, in step S3, the delay time analysis program 500b determines whether the slope a of the straight line is 1 or less, and if the slope a of the straight line is 1 or less, there is no tendency for performance degradation. While outputting to the outside (step S4), the delay time analysis program 500b executes performance value prediction processing when the slope a of the straight line is not 1 or less (step S5).

In this performance value prediction process, as shown in FIG. 3, for example, performance values from time t0 to time t are input to a statistical model such as the ARMA model (step S51). Here, time t0 represents a time after the start time of the analysis period designated in advance and at which the oldest performance value exists. Next, in step S52, the delay time analysis program 500b stores the predicted value of the performance value at time t + 1 in the collection database 500c.

Next, in step S6, the delay time analysis program 500b determines whether or not the CWPR of the secondary volume 300a (S-VOL) at time t is 40% or more, and the CWPR of the secondary volume 300a is 40% or more. In this case, an upward correction coefficient adding process described later is executed (step S7).

In this upward correction coefficient addition process, first, in step S71 shown in FIG. 4, the delay time analysis program 500b determines whether or not the variable C needs to be initialized. Note that the case where the initialization of the variable C is necessary assumes a case where the first one of the metrics that require the upward correction coefficient c to be considered at the time t is considered.

When it is determined in step S71 that the variable C needs to be initialized, the delay time analysis program 500b initializes the variable C (step S72). Thereafter, or on the other hand, if it is determined in step S71 that the initialization of the variable C is not necessary, step S73 is executed next.

In this step S73, the delay time analysis program 500b applies the function f1 for calculating the upward correction coefficient c (hereinafter also referred to as “correction coefficient conversion”) for the attention value range of the storage performance metric (hereinafter also referred to as “metric”). By doing so, the upward correction coefficient c as an example of the tendency coefficient is obtained. Here, the function f1, which will be described in detail later, has a function of setting a lower limit and an upper limit threshold for each metric and mapping the values between them to a coefficient.

Next, in step S74, the delay time analysis program 500b adds the metric contribution to the upward correction coefficient c and applies the function f2 to the acquired upward correction coefficient c to acquire the upward correction coefficient c '. The function f2, which will be described in detail later, has a function of multiplying the contribution for each metric. For example, in the case of SWOL CWPR, the upward correction coefficient c ′ = f2 (c) = c × 0.25 Can be sought.

Next, in step S75, the delay time analysis program 500b calculates an effective variable C = C + c 'until the time t = t + 1 is reset.

On the other hand, when the delay time analysis program 500b determines in step S6 shown in FIG. 2 that the CWPR of the S-VOL 300a is not 40% or more, the following step S8 is executed.

Next, in step S8, the delay time analysis program 500b determines whether or not the PBR of the S-VOL 300a at time t is 30% or more. If the PBR of the S-VOL 300a is 30% or more, it is described above. The upward correction coefficient adding process is executed (step S7).

On the other hand, when the delay time analysis program 500b determines in step S8 that the PBR of the S-VOL 300a is not 30% or more, the following step S9 is executed.

Next, in step S9, the delay time analysis program 500b determines whether or not the CWPR of the R-JNL 300d at time t is 40% or more. If the CWPR is 40% or more, the above-described upward correction coefficient is determined. A seasoning process is executed (step S7).

On the other hand, when the delay time analysis program 500b determines in step S9 that the CWPR of the R-JNL 300d is not 40% or more, the following step S10 is executed.

Next, in step S10, the delay time analysis program 500b determines whether or not the PBR of the R-JNL 300d at time t is 30% or more. If the PBR of the R-JNL 300d is 30% or more, it is described above. The upward correction coefficient adding process is executed (step S7). On the other hand, when the delay time analysis program 500b determines in step S10 that the PBR of the R-JNL 300d is not 30% or more, the following step S11 is executed.

Next, in step S11, the delay time analysis program 500b determines whether or not the PBR of the M-JNL 200d at time t is 40% or more. If the PBR is 40% or more, the above-described upward correction coefficient is determined. A seasoning process is executed (step S7). On the other hand, when the delay time analysis program 500b determines in step S11 that the PBR is not 40% or more, the following step S11 is executed.

Next, in step S12, the delay time analysis program 500b determines whether or not the APTR at time t is 90% or more. If the APTR is 90% or more, the above-described upward correction coefficient addition process is executed. (Step S7). On the other hand, when the delay time analysis program 500b determines in step S12 that the APTR is not 90% or more, the following step S13 is executed.

In this step S13, the delay time analysis program 500b executes a delay time prediction process. In this delay time prediction process, the upward correction coefficient c is calculated, and the delay time at time t + 1 is predicted from the delay time at time t.

Specifically, in this delay time prediction process, as shown in FIG. 5, the delay time analysis program 500b inputs a delay time from time t0 to time t to a statistical model such as the ARMA model (step S131). Next, in step S132, the delay time analysis program 500b stores the delay time at time t + 1 predicted by the statistical model in the variable d.

Next, in step S133, the delay time analysis program 500b determines whether or not the delay time at time t is equal to or less than d. When it is determined that the delay time at time t is equal to or less than d, the delay time analysis program 500b sets the predicted value of the delay time at time t + 1 to d (step S134), while the delay time at time t is d. If it is not determined to be less than or equal to, the function f for applying the upward correction coefficient c to the predicted value of the delay time is applied to the predicted value d to obtain the predicted value d ′ (step S135).

The function f is a function of multiplying the difference between the delay time at the time t + 1 and the delay time at the time t by the upper correction coefficient c, for example, d ′ = f (d) = (d−time). The predicted value d ′ is calculated from (delay time) × C. Next, in step S136, the delay time analysis program 500b sets the predicted value of the delay time at time t + 1 to d '.

Next, in step S14 shown in FIG. 2, the delay time analysis program 500b determines whether or not the delay time at time t + 1 is smaller than a preset limit value. If it is determined that the delay time at time t + 1 is smaller than the preset limit value, the delay time analysis program 500b sets time t = t + 1 (step S18), and then the above-described steps. Return to S5 and execute. On the other hand, if it is not determined that the delay time at time t + 1 is smaller than the preset limit value, the delay time analysis program 500b executes step S15.

In this step S15, the delay time analysis program 500b outputs a future prediction value of the delay time and the performance value.

(6) Effect on Delay Time (Metric Contribution) The following performance value that affects the delay time among the selected performance values described above will be described below.

6 to 10 each show an example of the metric contribution as an influence on the delay time. In the example shown in FIG. 6, the metric contribution of S-VOL CWPR is set to 25%, while the metric contribution of PBR of S-VOL is set to 20%.

The R-DKC (restore disk controller) issues a Pull Request as an example of a command for requesting transfer of the next journal from the R-DKC to the M-DKC. This Pull Request by R-DKC is not issued until the data reflection to the secondary volume 300a (hereinafter also referred to as “S-VOL”) is completed (however, the multiplicity is ensured). Therefore, if the data reflection to the S-VOL 300a is delayed, the issue of the next Pull Request is delayed, so the values of CWPR and PBR of the S-VOL 300a become important. Furthermore, the amount of data cached in the S-VOL CLPR has a large effect on C / T Delta. Accordingly, the influence of the PBR that processes the CLPR data of the S-VOL 300a is the second largest.

In the example shown in FIG. 7, the CWPR metric contribution of the S-VOL 300a is set to 25%, the PBR metric contribution of the S-VOL 300a is set to 20%, and the CWPR metric contribution of the R-JNL 300b is 18%. %, And the PBR metric contribution of R-JNL300b is set to 15%.

When CWPR of R-JNL300b is large, generation of update data from journal data received from M-DKC (master disk controller) is delayed. Therefore, by default (cache use option = ON), journal data (in the journal volume) is not used until CWPR reaches 50%. As a result, although the upgrade delay from the journal data to the update data leads to the deterioration of C / T Delta, the impact level is small compared to the S-VOL upgrade, and the impact level of PBR is also small.

In the example shown in FIG. 8, the CWPR metric contribution of the S-VOL 300a is set to 25%, the PBR metric contribution of the S-VOL 300a is set to 20%, and the CWPR metric contribution of the R-JNL 300b is 18%. %, The PBR metric contribution of the R-JNL 300b is set to 15%, and the PBR metric contribution of the M-JNL 200d is set to 12%.

If the PBR of M-JNL200d increases, the generation of journal data will be delayed. However, from the viewpoint of P-VOL writing, it appears that C / T Delta is affected, but the time until journal data is created is not included in the calculation logic of C / T Delta. Accordingly, the location where the M-JNL 200d is sent from the master disk controller (M-DKC) to the restore disk controller (R-DKC) and the Pull Request reception time from the restore disk controller (R-DKC) are affected, but the S-VOL 300a Compared to R-JNL300b, the degree of influence on C / T Delta is small.

In the example shown in FIG. 9, the CWPR metric contribution of the S-VOL 300a is set to 25%, the PBR metric contribution of the S-VOL 300a is set to 20%, and the CWPR metric contribution of the R-JNL 300b is 18%. %, The R-JNL300b PBR metric contribution is set to 15%, the M-JNL200d PBR metric contribution is set to 12%, and the APTR metric contribution is set to 10% .

The remote path between storage system enclosures is difficult to predict because the performance and characteristics vary greatly depending on the environment. The degree of influence of APTR on C / T Delta should be minimal (however, the degree of influence is considerable in a tight line situation).

In the example shown in FIG. 10, the metric contribution of each performance value is determined, and the CWPR metric contribution of the S-VOL 300a is set to 25% as described, and the metric contribution of the PBR of the S-VOL 300a is set. The CWPR metric contribution of the R-JNL300b is set to 18%, the PBR metric contribution of the R-JNL300b is set to 15%, and the PBR metric contribution of the M-JNL200d is The metric contribution of APTR is set to 10%.

Note that each numerical value of the metric contribution is set for convenience as an example for use in a conversion formula that reflects the contribution in the delay time, and does not necessarily have to be this numerical value. In the present embodiment, it is important that the priority order is in the order of CWPR of S-VOL 300a, PBR of S-VOL 300a, CWPR of R-JNL 300b, PBR of R-JNL 300b, and PBR of M-JNL 200d. That is, in the present embodiment, it is important that APTR has priority.

(7) How to Obtain Upper Correction Coefficient c for Each Metric FIGS. 11A and 11B show an example of how to obtain the upper correction coefficient for each metric. As already described, empirically, there is a value range (corresponding to the “attention value range” in the description) that should be noted for the metric.

11A and 11B, for example, “Processor Busy Rate” is illustrated as the type of performance value. As shown in FIG. 11A, the attention value range is, for example, 30% to 60% associated with time t1 to time t2. When this is mapped (ie, normalized), the slope becomes 0 from time t0 to time t1, and has a slope from time t1 to time t2, as shown in FIG. In FIG.

On the other hand, in this embodiment, for example, when “Cache Write Pending Rate” is given as the type of performance value, the attention value range is, for example, 40% to 80% associated with time t1 to time t2.

(8) Application of upward correction coefficient (8-1) Application of upward correction coefficient to delay time When the delay time at time t + 1 is expressed as DelayTime (t + 1), this delay time is
DelayTime (t + 1) = OriginalDelayTime (t + 1) + (OriginalDelayTime (t + 1) −DelayTime (t)) × CorrectioFactor (t + 1)
It is calculated using the following formula.

Note that OriginalDelayTime represents the future value of the delay time predicted by a statistical method. Examples of such statistical methods include applying an ARMA model and adding a slope. However, this is applied only when OriginalDelayTime (t + 1)> DelayTime (t).

(8-2) How to obtain the upward correction coefficient (8-2-1) When each metric is the target of the upward correction coefficient c In this case, the upper side of the target to be corrected with respect to the CorrectionFactor (t + 1) The correction coefficient c × metric contribution is added as shown in the following equation.
(Example) When S-VOL Cache Write Pending Rate = 45% and M-JNL Processing Busy Rate = 35%, the result is as follows.
CorrectionFactor (t + 1) = 12.5% × 25% + 16.7% × 12% = 5.13%

(8-2-2) When each metric is not subject to the upward correction coefficient c In this case, CorrectionFactor (t + 1) = 0.

(9) Example of Delay Time Prediction The configuration of the entire system including the storage system 1 according to this embodiment is as described above. Next, an example of a performance deterioration prediction method in the storage system 1 will be described.

FIG. 12 to FIG. 23 show how the delay time related to data transfer between the primary storage apparatus 200 and the secondary storage apparatus 300 is predicted, respectively. 12 to 23, the horizontal axis represents time (date), while the vertical axis represents the delay time (corresponding to “Delay Time” in the figure), and the right side represents the ratio [%] of each performance value. Yes. In the present embodiment, the delay time analysis program 500b finally tries to predict the delay time affected by each measurement value, regarding whether or not the delay time is scheduled to fall within an allowable range in the future.

(9-1) First time (forecast as of January 5, 2015)
(9-1-1) Preconditions In this embodiment, the following three preconditions can be listed. As a first precondition, there can be mentioned the difference that one copy pair constitutes asynchronous remote replication. As a second precondition, it is assumed that the delay time and the performance value of the one-day interval in the past four days have already been accumulated in the performance database 400b. As a third precondition, it is assumed that the statistical method used adds the slope of the past value to the current value.

(9-1-2) Step I (first time)
First, an analysis target period is determined, and it is determined that all data are to be processed. The allowable delay time is set to 40 seconds, for example, and the communication bandwidth between the primary storage device 200 and the secondary storage device 300 is set to 10 MB / second.

In this case, as APTR data on the performance database 400b, data in units of MB / second is held. The delay time analysis program 500b converts it into a ratio [%] to the communication bandwidth set at the time of analysis. The data from January 1, 2015 to January 4, 2015 is as shown in FIG.

(9-1-3) Step II (first time)
The delay time analysis program 500b predicts the value of January 5, 2015 as shown by the broken line in the figure based on the values up to January 4, 2015 for each performance value shown in FIG. At this time, the delay time (Delay Time) is not predicted as illustrated.

(9-1-4) Step III (first time)
The delay time analysis program 500b determines whether or not to consider the upward correction coefficient c for each performance value. It should be noted that the processor busy rate of the S-VOL 300a needs to take into account the upward correction coefficient c, but does not take into account other performance values. As shown in FIG. 14, the delay time analysis program 500b predicts that the processor busy rate of the S-VOL 300a as of January 5, 2015 is 50%, and falls within the above-mentioned attention range of 30 to 60%. Judge that there is.

(9-1-5) Step IV (first time)
As shown in FIG. 15, the delay time analysis program 500b calculates the upward correction coefficient c as follows.
CorrectionFactor (2015/1/5) = 66.6% × 20% = 13.3%

It should be noted that “66.6%” represents an upward correction coefficient for PBR of S-VOL 300a: 50%, and “20%” represents the metric contribution of the PBR of S-VOL 300a. The delay time analysis program 500b applies the upward correction coefficient c for the predicted delay time.

(9-1-6) Step V (first time)
As a result of the above determination, the delay time analysis program 500b has not reached an unacceptable value for the delay time ("Delay Time" in the figure) as shown in FIG. 16, so the future value as of January 6, 2015. Is predicted (return to step II above).

(9-2) Second time (forecast as of January 6, 2015)
(9-2-1) Preconditions The preconditions in the second round are the same as the preconditions in the first round.

(9-2-2) Step II (second time)
For each performance value, the delay time analysis program 500b predicts the value of January 6, 2015 based on the values up to January 5, 2015, as indicated by the broken line in FIG. That is, the delay time analysis program 500b predicts future values including predicted values.

(9-2-3) Step III (second time)
The delay time analysis program 500b determines whether or not to consider the upward correction coefficient c for each performance value. In the delay time analysis program 500b, for example, the Cache Write Pending Rate of the S-VOL 300a and the Processor Busy Rate of the S-VOL 300a need to consider the upward correction coefficient c, but not other performance values.

Specifically, the delay time analysis program 500b predicts that the PBR of the S-VOL 300a is 55% as of January 6, 2015, as shown in FIG. % To 60%. On the other hand, since the delay time analysis program 500b predicts 45% of the S-VOL CWPR as of January 6, 2015, it is in the range of 40% to 80% of the caution range already described.

(9-2-4) Step IV (second time)
Therefore, the delay time analysis program 500b calculates the upward correction coefficient c using the following calculation formula.
CorrectionFactor (2015/6) = 83.3% × 20% + 12.5% × 25% = 19.8%
The delay time analysis program 500b predicts the delay time at time t + 1 using this calculation result. Further, the delay time analysis program 500b applies the upward correction coefficient c for the predicted delay time.

(9-2-5) Step V (second time)
Since the delay time analysis program 500b determines that the delay time (corresponding to “Delay Time” in the drawing) has not reached an unacceptable value, the future value of January 7, 2015 is predicted as shown in FIG. (Return to Step II above).

(9-3) Third time (forecast as of January 7, 2015)
(9-3-1) Preconditions The preconditions for the third time are the same as the first and second preconditions.

(9-3-2) Step II (3rd)
In the third time, the delay time analysis program 500b uses the method almost the same as the second time, and the performance value shown in FIG. 21 is based on the values until January 6, 2015 as shown in FIG. As indicated by the broken line, the value of January 7, 2015 is predicted.

(9-3-3) Step IV (3rd)
Based on this, the delay time analysis program 500b calculates the upward correction coefficient c.
CorrectionFactor (2015/1/7) = 100% × 20% + 25% × 25% = 26.33%

The delay time analysis program 500b uses the calculated upward correction coefficient c to predict a delay time at time t + 1 (in the example shown, January 7, 2015) as shown by a broken line in FIG.

(9-3-4) Step V (third time)
Based on this, the delay time analysis program 500b ends because the delay time (corresponding to “Delay Time” in the figure) has reached an unacceptable value (45 seconds in the example in the figure).

(10) Effects of this Embodiment As described above, according to this embodiment, in the storage system management server 500, the delay time analysis program 500b performs the performance value of the primary storage device 200 and the performance of the secondary storage device 300. Value, the performance value between the primary storage device 200 and the secondary storage device 300, and the trend coefficient for obtaining the performance deterioration tendency with respect to the performance value between the primary storage device 200 and the secondary storage device 300, Future performance values between the storage device 200 and the secondary storage device 300 are derived.

In this way, it is possible to more accurately predict when the predetermined specification value will not be satisfied even if the tendency of performance deterioration suddenly changes on the storage system 1 when a specific performance value exceeds a threshold value in the future. Will be able to.

Further, in this embodiment, when the performance value selected in advance is within the predetermined attention value range, the delay time analysis program 500b considers the upper correction coefficient c as a thing to be careful and considers a specific performance value (this embodiment). Then, “the above-mentioned delay time of data transfer” is calculated), and a predicted value that will change in the future is calculated.

In this way, even if the specific performance value suddenly changes its performance deterioration tendency in the future, the future predicted value can be obtained more accurately by correcting using the upward correction coefficient c, and the performance deterioration The time can be provided accurately.

In the present embodiment, the performance monitoring program 400a of the performance data management server 400 acquires a plurality of predetermined performance values between the primary storage apparatus 200 and the secondary storage apparatus 300 and stores them in the collection database 500c. On the other hand, the delay time analysis program 500b of the storage system management server 500a refers to these stored performance values and preliminarily assumes that the specific performance value may be influenced from the plurality of types of performance values. A future deterioration tendency of the specific performance value is predicted based on the plurality of selected performance values.

In this way, the delay time analysis program 500b can predict the data transfer delay time as an example of the specific measurement value more accurately than in the past based on the plurality of performance values selected in advance. become able to.

In the present embodiment, the delay time analysis program 500b sets the metric contribution as an index representing the ease of influence on the specific performance value for each of the plurality of performance values selected in advance. In calculating the future predicted value of the specific performance value, the future predicted value is obtained by multiplying the plurality of types of performance values by the corresponding metric contributions.

In this way, the calculated future predicted value is calculated after taking into account the characteristics for each of the plural types of performance values selected in advance, so that it can be obtained more accurately.

In the present embodiment, the performance monitoring program 400a is provided in the performance data management server 400 prepared separately from the storage system 1, while the delay time analysis program 500b is prepared separately from the storage system 1. The storage system management server 500 is provided.

In this way, the performance monitoring program 400a and the delay time analysis program 500b calculate the data transfer time as the specific performance value at a necessary timing without being affected by the processing delay in the storage system 1. be able to.

Further, in the present embodiment, the performance data management server 400 stores information on a plurality of types of performance values selected in advance from the above-described plurality of types of performance values, separately from the performance monitoring program 400a. A performance database 400b is provided.

In this way, each performance value that tends to affect the specific performance value described above is managed in advance, and future prediction is made regarding the delay time of the data transfer time as the specific performance value as in this embodiment. It will be possible to do accurately.

In this embodiment, the performance monitoring program 400a and the delay time analysis program 500b are provided in the performance data management server 400 and the storage system management server 500 that are prepared separately from the storage system 1, respectively. However, the present invention is not limited to this, and the same function may be mounted inside the storage system 1.

(11) Other Embodiments The above embodiment is an example for explaining the present invention, and is not intended to limit the present invention only to these embodiments. The present invention can be implemented in various forms without departing from the spirit of the present invention. For example, in the above-described embodiment, the processing of various programs is described sequentially, but this is not particularly concerned. Therefore, as long as there is no contradiction in the processing result, the processing order may be changed or the operation may be performed in parallel.

DESCRIPTION OF SYMBOLS 1 ... Storage system, 2 ... Network, 100 ... Application host, 200 ... Primary storage apparatus, 300 ... Secondary storage apparatus, 400 ... Performance management server, 500 ... Storage system management server, 500b ... Delay Time analysis program, 500c ... Collection database.

Claims

In an information system including a storage system that transfers data between a primary storage device and a secondary storage device,
A measurement unit for obtaining a predetermined performance value between the primary storage device and the secondary storage device;
Analyzing the measured performance value, and obtaining a performance deterioration tendency of the performance value between the primary storage device and the secondary storage device;
An information storage unit that holds a past performance deterioration tendency related to a performance value between the primary storage device and the secondary storage device;
Performance value of the primary storage device, performance value of the secondary storage device, performance value between the primary storage device and the secondary storage device, and between the acquired primary storage device and the secondary storage device In consideration of a trend coefficient for obtaining a performance deterioration tendency with respect to a performance value, a prediction unit for deriving a future performance value between the primary storage device and the secondary storage device,
An information system including a storage system comprising:
The prediction unit calculates a prediction value that a specific performance value will change in the future in consideration of the tendency coefficient as a precaution when a performance value selected in advance falls within a predetermined attention value range. An information system including the storage system according to item 1.
While the measurement unit acquires a plurality of predetermined performance values between the primary storage device and the secondary storage device and stores them in the information storage unit,
The predicting unit refers to the stored performance values, and based on a plurality of types of performance values selected in advance as a risk of affecting the specific performance value from the plurality of types of performance values. The information system including the storage system according to claim 2, wherein a future deterioration tendency of the specific performance value is predicted.
The prediction unit
A metric contribution as an index indicating the ease of influence on the specific performance value is set for each of the plurality of performance values selected in advance, and a future predicted value of the specific performance value The information system including the storage system according to claim 1, wherein the future predicted value is obtained by multiplying the plurality of types of performance values by the corresponding metric contributions when calculating the value.
While the measurement unit is provided in a performance data management server prepared separately from the primary storage device and the secondary storage device,
The information system including the storage system according to claim 3, wherein the analysis unit and the prediction unit are provided in a storage system management server prepared separately from the primary storage device and the secondary storage device.
In addition to the measurement unit, the performance data management server includes the information storage unit that stores information on a plurality of types of performance values selected in advance from the plurality of types of performance values described above. An information system including the storage system according to claim 5.
In a performance degradation prediction method in a storage system that performs data transfer between a primary storage device and a secondary storage device,
A measurement step in which a measurement unit acquires a predetermined performance value between the primary storage device and the secondary storage device;
An analysis step in which the analysis unit analyzes the measured performance value and acquires a performance deterioration tendency of the performance value between the primary storage device and the secondary storage device;
An information storage step in which an information storage unit holds a past performance deterioration tendency related to a performance value between the primary storage device and the secondary storage device;
A prediction unit configured to calculate the performance value of the primary storage device, the performance value of the secondary storage device, the performance value between the primary storage device and the secondary storage device, and the acquired primary storage device and the secondary storage device; A prediction step for deriving a future performance value between the primary storage device and the secondary storage device in consideration of a trend coefficient for obtaining a performance deterioration tendency with respect to a performance value during
A method for predicting performance degradation in a storage system, comprising:
In the prediction step,
When the performance value selected in advance falls within a predetermined caution value range, the prediction unit calculates a predicted value that a specific performance value will change in the future in consideration of the tendency coefficient as something to be noted. Item 8. A method for predicting performance degradation in a storage system according to Item 7.
In the measurement step,
While the measuring unit obtains a plurality of predetermined performance values between the primary storage device and the secondary storage device and stores them in the information storage unit,
In the prediction step,
The prediction unit refers to the stored performance values, and based on a plurality of types of performance values selected in advance as a possibility of affecting the specific performance value from the plurality of types of performance values. 9. The performance deterioration prediction method for a storage system according to claim 8, wherein a future deterioration tendency of the specific performance value is predicted.
In the prediction step,
The prediction unit sets a metric contribution as an index representing the ease of influence on the specific performance value for each of the plurality of performance values selected in advance. 8. The method for predicting performance deterioration in a storage system according to claim 7, wherein, in calculating the future predicted value, the future predicted value is obtained by multiplying the plurality of types of performance values by the corresponding metric contributions.
While the measurement unit is operating in a performance data management server prepared separately from the primary storage device and the secondary storage device,
The storage system performance degradation prediction method according to claim 9, wherein the analysis unit and the prediction unit operate in a storage system management server prepared separately from the primary storage device and the secondary storage device.
In addition to the measurement unit, the performance data management server includes the information storage unit that stores information on a plurality of types of performance values selected in advance from the plurality of types of performance values described above. The method for predicting performance degradation in a storage system according to claim 11.