WO2018042637A1

WO2018042637A1 - Learning data processing apparatus and method

Info

Publication number: WO2018042637A1
Application number: PCT/JP2016/075883
Authority: WO
Inventors: 中島　淳; 峰義増田; 裕教江丸
Original assignee: 株式会社日立製作所
Priority date: 2016-09-02
Filing date: 2016-09-02
Publication date: 2018-03-08
Also published as: JPWO2018042637A1; JP6775022B2

Abstract

This learning data processing apparatus has: an information collection unit that acquires learning data from components of a target system during operation of the target system; a prediction formula generation unit that generates, on the basis of the learning data, a prediction formula that represents, by using the relationship between purpose information and explanation information, the relationship among the components of the target system; a setting change content determination unit that determines a setting content for changing the configuration of the target system; and a configuration changing unit that changes the configuration of the target system, wherein the setting change content determination unit extracts a state where sufficient learning data has not been obtained as a data shortage state within the range of states of the components allowed to be taken when the configuration of the target system is changed, and determines a setting content for changing the configuration of the target system so as to set the components in the data shortage state, the configuration changing unit changes the configuration of the target system according to the determined setting content, and the information collection unit acquires the learning data in the case of the data shortage state from the components of the target system.

Description

Learning data processing apparatus and method

The present invention relates to a method for processing data acquired in IT system operation management.

With the spread of virtualization mechanisms and the emergence of new system provision forms such as the cloud, IT system operation management is becoming more complex. In addition, as the amount of data handled by the IT system increases explosively, the scale of the IT system increases year by year, and the number of objects handled by management software that manages the IT system (for example, the number of volumes provided by the storage device) also increases. It is increasing. There is a need to manage IT systems with complex and large amounts of data while keeping management costs low.

It is conceivable to automate management to reduce IT system management costs. One of the technologies that can be utilized in the automation of IT system management is a machine learning technology. By collecting various information on each object in the IT system and learning it as learning data, it is possible to identify the function that best fits the learning data regarding the relationship between any element in the IT system and other elements. It becomes.

For example, in Non-Patent Document 1, it is predicted that response performance of processing executed in an IT system is predicted from parameters related to settings such as the number of parallel processing and parameters related to processing targets such as the size of data to be processed. It describes a technique for finding a function to be made possible by learning. By using this function, for example, the response time of the process can be estimated from the parallel number of processes and the data size, so that the execution schedule of the process can be determined. In addition, since the parallel number of processes necessary to obtain the required response performance can be estimated, the amount of resources required to obtain the required response performance can be estimated.

Also, cloud services that provide IT systems via the Internet and charge users according to their use are widespread. Cloud service forms include IaaS (Infrastructure as a Service), PaaS (Platform as a Service), and SaaS (Software as a Service).

In addition, although the cloud service is not suitable for use as a data storage location for highly sensitive data or applications that require real-time performance, management operations can be separated from the original functions of the IT system. It is also suitable for cloud services because of the need to pay for the usage.

From such a background, there is a movement to provide SaaS (Software as a Service) of management software that has been operated on-premises until now, and to undertake a part of the operation work as a service. Patent Document 1 discloses a method in which a provider monitors storage on-premises via a network and performs maintenance work such as configuration change and disk replacement when an event occurs.

JP 2006-107080 A

The technologies disclosed in Patent Document 1 and Non-Patent Document 1 are assumed to be used in an environment with relatively little change, and do not assume that the system configuration is frequently changed. On the other hand, with the widespread use of virtualization mechanisms and the emergence of new system provision forms such as the cloud, the configuration of IT systems can be changed relatively easily, and the frequency of system configuration changes is likely to increase. It is done.

A large amount of learning data is required to improve the accuracy of machine learning. In the management of the IT system, it is necessary to acquire various history information such as performance information and capacity information over a long period from each object of the IT system. However, when a configuration change occurs in the IT system, it becomes necessary to perform long-term learning again after the configuration change occurs. Then, it is conceivable that for a while after the configuration change, the accuracy of machine learning does not increase and efficient management work cannot be performed.

An object of the present invention is to provide a technique for improving the accuracy of machine learning for a system whose configuration is changed.

According to one aspect of the present invention, a learning data processing device includes an information collection unit that acquires learning data from components of a target system while the target system is operated, and a component between components of the target system based on the learning data. A prediction expression generation unit that generates a prediction expression that expresses the relationship between the objective information and the description information, a setting change content determination unit that determines setting contents for changing the configuration of the target system, and a configuration of the target system A state where the learning data is not sufficiently acquired in the range of possible component states when the setting change content determination unit changes the configuration of the target system. The setting contents to be extracted are determined to change the configuration of the target system so that the component is in a data shortage state, and the configuration changing unit determines the target system according to the determined setting contents. Change the configuration systems out, information collection unit acquires the training data when the data starved from components of the target system.

Learning data that will be insufficient when the target system is changed in the future can be acquired by temporarily changing the target system's configuration in advance, so the lack of learning data that occurs when the configuration is actually changed And the accuracy of machine learning is improved early. When machine learning is applied to a target system in which the configuration is changed relatively frequently, it is possible to suppress a decrease in machine learning accuracy that occurs when the configuration is changed.

It is a figure for demonstrating the outline of the computer system by Example 1. FIG. 1 is a configuration diagram of an example of a computer system related to Example 1. FIG. It is a figure which shows an example of the related information table which concerns on Example 1. FIG. FIG. 10 is a diagram showing an example of a performance history information table 1120 according to the first embodiment. 6 is a diagram illustrating an example of a configuration information table 1130 according to Embodiment 1. FIG. 6 is a diagram illustrating an example of a configuration information table 1130 according to Embodiment 1. FIG. 6 is a diagram illustrating an example of a configuration information table 1130 according to Embodiment 1. FIG. It is a figure which shows an example of the prediction formula former information table 1140 which concerns on Example 1. FIG. It is a figure which shows an example of the prediction formula table 1150 which concerns on Example 1. FIG. 3 is a flowchart of a process for generating a prediction formula according to the first embodiment. 6 is a flowchart of processing for determining setting change contents for acquiring learning data according to the first embodiment; It is a flowchart of the process which performs the setting change for learning data acquisition. FIG. 10 is a diagram illustrating an example of a business property management table 1800 according to the second embodiment. 12 is a flowchart of processing for executing learning data sharing according to the second embodiment.

Several embodiments will be described with reference to the drawings. The embodiments described below do not limit the invention according to the claims, and all the elements and combinations described in the embodiments are essential for the solution of the invention. Is not limited. In these drawings, the same reference numerals denote the same components throughout the drawings. In the following description, the information of the present invention will be described using an expression such as “aaa table”, but the information may be expressed in a form other than a data structure such as a table. Therefore, the “aaa table” or the like may be referred to as “aaa information” to indicate that it does not depend on the data structure. Furthermore, in describing the contents of each information, the expressions “identification information”, “identifier”, “name”, and “ID” are used, but these can be replaced with each other.

In the following description, there is a case where “program” is used as the subject. However, the program is executed by the processor, and processing determined by the memory and communication port (communication device, management I / F, data I / F) The description may be made with the processor as the subject. The processing disclosed with the program as the subject may be processing performed by a computer such as a management server (management computer) or an information processing apparatus. Further, part or all of the program may be realized by dedicated hardware. Various programs may be installed in each computer by a program distribution server or a computer-readable storage medium.

Hereinafter, a set of one or more computers that manage the computer system and display the display information of the present invention may be referred to as a management system. When the management server displays the display information, the management server is a management system. A combination of a management server and a display computer is also a management system. In addition, in order to increase the speed and reliability of management processing, processing equivalent to that of the management server may be realized with a plurality of computers. In this case, the plurality of computers (if the display computer performs display, display (Including computers) is the management system.

The computer system according to this embodiment will be described.

FIG. 1 is a diagram for explaining the outline of the computer system according to the first embodiment. The operation described here is mainly executed by the setting change content determination program 1180.

(1) First, the setting change content determination program 1180 refers to the prediction formula table 1150 and the configuration information table 1130, and extracts parameters corresponding to configuration information indicating the configuration of the computer system from the prediction formula parameters. (2) Subsequently, the setting change content determination program 1180 refers to the prediction formula source information table 1140, and specifies a range where the information of the learning data is insufficient within a range that can be taken from the configuration of the extracted parameters. (3) Subsequently, the setting change content determination program 1180 determines the parameter value so as to set the extracted parameter within the specified range, and changes the parameter setting. (4) By operating the computer system with the setting changed, it is possible to acquire the missing learning data.

FIG. 2 is a configuration diagram of an example of a computer system according to the first embodiment. The computer system according to this embodiment includes one or more management servers 1000, one or more storage apparatuses 2000, and one or more servers 3000. The server 3000 and the storage device 2000 are connected to each other via a SAN (Storage Area Network) 4000. A specific example of a SAN is Fiber Channel. The management server 1000, the storage device 2000, and the server 3000 are connected to each other via the management network 5000.

The management server 1000 includes a memory 1100, a communication device 1200, a processor 1300, an output device 1400, an input device 1500, and a storage device 1600. These are connected to each other via an internal bus 1700 in the management server 1000.

The memory 1100 includes a related information table 1110, a performance history information table 1120, a configuration information table 1130, a prediction formula source information table 1140, a prediction formula table 1150, an information collection program 1160, a prediction formula generation program 1170, and a setting change content determination program 1180. The configuration change program 1190 is stored.

The communication device 1200 is a device for connecting the management server 1000 to the management network 5000. The management server 1000 can communicate with a program running on the server 3000 through the management network 5000. The processor 1300 executes various programs developed on the memory 1100. The output device 1400 is a device that outputs a processing result executed by the management server 1000, and is, for example, a display. The input device 1500 is a device for an administrator to input an instruction to the management server 1000, and is, for example, a keyboard. The storage device 1600 is an HDD (Hard Disk Drive), SSD (Solid State Drive) or the like for storing information.

In the example shown in FIG. 2, the various programs and tables are stored in the memory 1100, but may be stored in the storage device 1600 or another storage medium (not shown). In this case, the processor 1300 reads the target program on the memory 1100 when executing the program, and executes the read program.

Further, the above-described program and table may be stored in the memory 2100 of the storage apparatus 2000, and the storage apparatus 2000 or the physical server 3000 may execute the stored program. Further, another device such as another server 3000 or a switch (not shown) may store and execute the above-described program and table.

The storage apparatus 2000 includes a memory 2100, a logical volume providing unit 2200, a disk I / F controller 2300, a management I / F 2400, a processor 2500, and a data I / F 2600. These are connected via a communication path 2700 such as an internal bus in the storage apparatus 2000. The memory 2100 has a disk cache 2110. The memory 2100 stores a configuration performance information collection program 2120. The disk cache 2110 is a storage area for temporarily storing information. The configuration performance information collection program 2120 is a program for transmitting / receiving management information, performance information, and the like of the storage apparatus 2000 to / from the management server 1000. The configuration change program 2130 is a program that is called from the configuration change program 1190 of the management server 1000 and changes the configuration of the storage apparatus 2000.

The logical volume providing unit 2200 includes a disk pool 2220 including a physical area 2230, logically divides the storage area of the disk pool 2220, and provides the logically divided storage area as a volume 2210. Here, the physical area 2230 is a parity group composed of a physical disk or a plurality of physical disks. A physical area can be accessed from a device outside the storage device 2000 via the volume 2210.

The physical area 2230 is assigned a physical area number, the disk pool 2220 is assigned a disk pool number, and the volume 2210 is assigned a volume number. As a result, the storage apparatus 2000 can uniquely identify the physical area 2230, the disk pool 2220, and the volume 2210.

In the example shown in FIG. 2, the disk pool 2220 (POOL1) composed of one physical area (parity group PG1) is logically divided, and one volume 2210 (Vol1) is an apparatus outside the storage apparatus 2000, for example, a server Provided to 3000.

The disk I / F controller 2300 is an interface device for connecting to the logical volume providing unit 2200. The management I / F 2400 is an interface device for connecting to the management network 5000. The processor 2500 executes a program expanded on the memory 2100.

The data I / F 2600 is an interface device for connecting to the SAN 4000. In the example shown in FIG. 2, the configuration performance information collection program 2120 and the configuration change program 2130 are stored in the memory 2100, but are stored in another storage device (not shown) or another storage medium (not shown). May be. In this case, the processor 2500 reads the configuration performance information collection program 2120 and the configuration change program 2130 on the memory 2100 during processing execution, and executes the read program.

Further, the logical volume providing unit 2200 may create all the storage areas of one disk pool 2220 as one volume 2210. Further, the logical volume providing unit 2200 may be a storage area such as a physical disk itself or a flash memory other than the parity group as the physical area 2230.

The server 3000 is a physical server including a memory 3100, a data I / F 3200, a processor 3300, and a management I / F 3400. These are connected to each other via a communication path 3500 such as an internal bus of the server 3000.

The memory 3100 stores a configuration information collection program 3110, a business program 3120, and a configuration change program 3130. The configuration information collection program 3110 is a program for transmitting / receiving management information, performance information, and the like of the server 3000 to / from the management server 1000. The business program 3120 is a program for realizing the business executed by the 3000, and is, for example, a DBMS (Data Base Management System) or a file system. The configuration change program 3130 is reserved from the configuration change program 1190 of the management server 1000 and is a program for changing the configuration of the server 3000.

The server 3000 executes various tasks using the volume 2210 provided from the storage apparatus 2000. In the example shown in FIG. 2, various programs are stored in the memory 3100, but may be stored in another storage device (not shown). In this case, the processor 3300 reads the target program on the memory 3100 at the time of executing the process, and executes the read program. In the example shown in FIG. 2, the server A and the storage apparatus A are connected to each other via the SAN 4000. The connection between the storage apparatus 2000 and the physical server 3000 is not limited to that directly connected via a fiber channel, and may be connected via one or more network devices such as a fiber channel switch. The connection between the storage apparatus 2000 and the physical server 3000 may be a data communication network, and may be an IP (Internet Protocol) network.

FIG. 3 is a diagram illustrating an example of a related information table according to the first embodiment. In the related information table, related information indicating a management target object whose performance is the target information and a management target object logically related to the management target object is stored. An object is a component of a computer system. Note that the constituent elements include physically existing constituent elements and logically defined constituent elements. As an example, the related information table 1110 is a physical information existing on an I / O (input / output) path from the business program 3120 operating on the server 3000 to the physical area constituting the volume used by the server 3000. Management of information indicating virtual devices and devices, that is, information indicating logical relationships between devices and devices based on I / O paths. Here, the logical relationship is “volume” and “pool constituting the volume”, “volume” and “processor in charge of I / O processing to volume”, “volume” and “I / O to volume” Is stored based on the setting, such as “a cache that temporarily stores”.

The related information table 1110 includes fields of a device ID 1111, a volume ID 1112, a processor ID 1113, a cache ID 1114, a pool ID 1115, and a physical area ID 1116.

The device ID 1111 stores an identifier for uniquely identifying the storage 2000. The volume ID 1112 stores an identifier for uniquely identifying the volume 2210. The processor ID 1113 stores an identifier of the processor 2500 in charge of processing for the volume indicated by the volume ID 1112. The cache ID 1114 stores an identifier that uniquely indicates the disk cache 2110 in which processing for the volume indicated by the volume ID 1112 is cached. The pool ID 1115 stores an identifier for uniquely identifying the disk pool 2220 in which the volume 2210 is created. The physical area ID 1116 stores an identifier for uniquely identifying a physical area 2230 constituting a disk pool, such as a parity group or a disk. Information collected from the computer system is stored in the fields of the above columns. The method for collecting and storing information is not particularly limited.

Here, the related information table 1110 according to the present embodiment includes information about the processor 2500, the disk cache 2110, the disk pool 2220, and the physical area 2230 as the management target objects related to the device ID 1111 and the volume ID 1112 and the volume. However, the present invention is not limited to this. Any managed object in the IT system can be handled in the same way.

As another example, the server 3000 accesses the volume 2210 indicated by the volume ID 1136, a drive for uniquely identifying the mount point of the server 3000, which is a managed object used for business access, within the server. In addition, identifiers for identifying other managed objects including physical and virtual, such as server data I / F for uniquely identifying the data I / F 3200 of the server 3000 used at the time may be stored. .

Also, information such as a switch data I / F may be included, and information on a business program (such as a DBMS) on the server 3000, which is a business server, may be stored in association with each other. Further, information on processing units executed by the business program may be stored in association with each other. For example, the processing A in the business program is associated with the server used to execute the processing, the CPU of the server, the memory, and the like. It may be stored.

FIG. 4 is a diagram illustrating an example of the performance history information table 1120 according to the first embodiment. Performance history information table 1120 The performance history acquired from each managed object by the operation of the computer system is stored. The performance history information table 1120 manages performance information of managed objects, for example, performance information related to the volume 2210, the disk pool 2220, etc. in the storage apparatus 2000. An entry can be added to the performance history information table 1120.

The performance history information table 1120 includes fields of time 1121, device ID 1122, device ID 1123, metric 1124, and performance value 1125.

At time 1121, data on the date and time when information is collected from the managed object is stored. The device ID 1122 stores an identifier (device ID) that uniquely identifies the device. The device ID 1123 stores an identifier (device ID) for uniquely identifying a device for which performance information is to be acquired.

The metric 1124 stores information indicating the type of performance information such as the CPU usage rate, the number of I / Os per unit time (for example, 1 second) for the storage device (IOPS), and the response time for the request. In the performance value 1125, the value of the type of performance information indicated by the metric 1124 of the device indicated by the device ID 1123 is acquired from the apparatus including the device and stored.

Here, in the performance history information table 1120 shown in FIG. 4, the storage volume 2210, the processor 2500, and the disk cache 2110 are listed as the performance information acquisition target devices indicated by the device ID 1122 and the device ID 1123. It is not limited to these. It may be a VM (not shown), a storage data I / F 2600, a server data I / F 3200, a switch or a switch port (not shown), and the like.

Further, FIG. 4 shows the response performance to the request, the CPU usage rate, the cache usage rate, the IOPS, the response time to the request, and the like as an example of the metric, but it is not limited to these. I / O busy rate, transfer rate, throughput, database management software buffer hit rate, number of records inserted, updated, or deleted, Web server response time, file system or disk free space or utilization rate, input / output Other metrics such as data volume, network interface error count, buffer overflow, and frame error may be used as metrics.

5A, 5B, and 5C are diagrams illustrating an example of the configuration information table 1130 according to the first embodiment. 5A and 5B show a state before the operation is executed by the configuration change program 1190 in step 301 of FIG. 10 described later. FIG. 5C shows a state after the operation is executed by the configuration change program 1190 in step 301 of FIG.

The configuration information table 1130 stores configuration information of managed objects. For example, the cache size of the disk cache 2110 that is configuration information about the storage apparatus 2000 that is the management target object is stored. In addition, the disk configuration of the physical area (parity group) 2230 is stored. An entry is added to the configuration information table 1130 by a general means.

The configuration information table 1130 includes fields of a device ID 1131, a device ID 1132, a metric 1133, and a value 1134. The device ID 1131 stores an identifier for uniquely identifying the device. The device ID 1132 stores an identifier for uniquely identifying a device from which configuration information is to be acquired. The metric 1133 stores information indicating the type of configuration information such as storage capacity and processing capability. The value 1134 stores a value for the type of configuration information indicated by the metric 1133 of the device indicated by the device ID 1132. This value is obtained from an apparatus including the device.

Here, the devices indicated by the device ID 1131 and the device ID 1132 in the configuration information table 1130 shown in FIGS. 5A to 5C are targets for acquiring the configuration information. Here, the disk cache 2110 (Cache1) and the physical area 2230 (PG1, PG5) of the storage 2000 are listed as the devices from which the configuration information is acquired. However, the present invention is not limited to these. Configuration information of other managed objects may be held. Further, here, as an example of the metric, the cache size, the RAID level of the parity group, and the disk type are listed, but the metric is not limited thereto.

FIG. 6 is a diagram illustrating an example of the prediction formula source information table 1140 according to the first embodiment. The prediction formula source information table 1140 is a table for managing information used as a basis for generating a prediction formula. The prediction formula source information table 1140 manages a management target object to be predicted and its parameters, and other management target objects and parameters related to the management target object to be predicted on the I / O path. The management target object to be predicted and its parameters are the target information of the prediction formula, and the related management target object and its parameters are the description information.

The prediction formula source information table 1140 includes fields of time information 1141, purpose information 11411, and related information 11412. The time information 1141 stores data on the date and time when the information is collected from the managed object. The objective information 11411 stores management target object identification information to be predicted and parameter values of the management target object. The related information 11412 stores information on parameter values of other managed objects that are related to the managed object to be predicted on the I / O path. In this embodiment, device ID 1142, volume ID 1143, and volume response performance 1144 are stored as purpose information 11411. The related information 11412 includes fields of Processor Busy 1145, Cache Usage 1146, Cache Size 1147, Pool Busy 1148, and PG number 1149.

The device ID 1142 stores an identifier (device ID) that uniquely identifies the device. The volume ID 1143 stores an identifier for uniquely specifying the management target object. The volume response performance 1144 stores time information from the reception of the I / O request in the volume to the completion of processing. Here, the response performance of the volume is given as an example of the purpose information 11411, and the Processor Busy 1145, Cache Usage 1146, etc. are given as examples of the related information. However, the present invention is not limited to this.

Among the values and information stored in the table shown in FIG. 6, the time information 1141 is 10:01 information and 10:02 information before the operation is executed by the configuration change program 1190 in step 301 of FIG. The time information 1141 is 15:10 information and 15:11 information indicates the state after the operation is executed by the configuration change program 1190 in step 301 of FIG.

FIG. 7 is a diagram illustrating an example of the prediction formula table 1150 according to the first embodiment. The prediction formula table 1150 is a table for managing information representing a prediction formula. The prediction formula table 1150 stores metrics used in the prediction formula, coefficients related to each metric, and the like. Prediction formula h, specifically, it can be expressed as objective information = description information 1 + description information 2 + description information 3 + description information 4. More specifically, it is information on a function obtained by learning that the response performance of the volume 1 of the storage A = coefficient 1 × Processor Busy + coefficient 2 × Cache Size + coefficient 3 × Pool Busy + coefficient 4 × PG number.

The prediction formula table 1150 includes fields for purpose information 11511 and explanation information 11512. The purpose information 11511 stores identification information of a management target object to be predicted and parameter values of the management target object. The description information 11512 stores parameters of other managed objects that can explain the parameter value of the managed object to be predicted, and information about the values. In the present embodiment, device ID 1151, device ID 1152, and metric 1153 are managed as purpose information 11511, and as processing information 11512, Processor Busy 1154, Cache Size 1155, Pool Busy 1156, PG number 1157, and a field indicating a coefficient for each metric are provided. Including. Here, the response performance of the volume is given as an example of the purpose information 11411, and the Processor Busy 1154, Cache Size 1155, etc. are given as examples of the related information. However, the present invention is not limited to this.

Here, it is assumed that the prediction expression is an expression representing a linear relationship, and the prediction expression table 1150 is an expression of regression analysis for identifying the linear relationship that best fits the data, but is not limited thereto. . As another example, the prediction formula may be a polynomial, and the prediction formula table 1150 may manage information representing the polynomial.

Next, each process executed by the management server 1000 will be described.

FIG. 8 is a flowchart of a process for generating a prediction formula according to the first embodiment. Prediction formula generation is to collect and learn various information on each object as learning data, and to identify the function that best fits the learning data regarding the relationship between the target element and other elements. is there. This prediction formula generation process is performed by the processor 1300 of the management server 1000 executing the prediction formula generation program 1170 expanded on the memory 1100. A specific example of this flowchart is shown below.

First, the prediction formula generation program 1170 refers to the related information table 1110 illustrated in FIG. 3 and identifies a component that is a prediction formula generation target and a component related thereto (step 101). Here, the component for which the prediction formula is to be generated is selected by any method, such as being selected by the user or automatically selected by the prediction formula generation program (for example, executing for all volume response performances). May be specified. Moreover, the timing which the prediction formula production | generation program 1170 starts is arbitrary, such as periodic execution and execution at the arbitrary timings designated by the user.

Here, as a specific example, it is assumed that the volume represented by the volume ID “Vol1” is selected by the user as a target for generating a prediction formula. In this case, the prediction formula generation program 1170 identifies Processor1, Cache1, Pool1, and PG1 as components related to Vol1 (Volume1) from the information stored in the related information table 1110 of FIG.

Returning to FIG. 8, next, the prediction formula generation program 1170 refers to the performance history information table 1120 illustrated in FIG. 4, and as a constituent element for generating a prediction formula and related items, in step 101. The performance history information of the identified component is acquired (step 102). For example, the response time of Volume1 acquired at time 10:01 is 10.2 msec, the usage rate (Busy%) of Processor1 is 40%, the usage rate (Cache%) of Cache1 is 80%, and I per unit time of Pool1 Performance information that the number of / O times is 700 IOPS and the usage rate (Busy%) is 35% is acquired.

Next, the prediction formula generation program 1170 refers to the configuration information table 1130 illustrated in FIG. 5A and FIG. 5B and acquires the configuration information of the prediction formula generation target component and the component specified in step 101 (step 103). For example, from FIG. 5A, the configuration information that the size of Cache 1 of the storage A is 8 GB is acquired. Further, from FIG. 5B, for example, configuration information such as the RAID level of the physical area PG1 of the storage A being RAID 5 (3D + 1P) is acquired.

Next, the prediction formula generation program 1170 stores the information related to the prediction formula generation acquired in step 102 and step 103 in the prediction formula source information table 1140 illustrated in FIG. 6 (step 104). Referring to FIG. 6, for example, performance information acquired at time 10:01 is stored in the prediction formula original information table 1140 of Volume1.

Finally, the prediction formula generation program 1170 generates a prediction formula from the information in the prediction formula original information table 1140 generated in step 104 and stores it in the prediction formula table 1150 illustrated in FIG. 7 (step 105). For example, the prediction formula table 1150 in FIG. 7 includes (Volume 1 response performance of storage A) = 33.76 (coefficient 1) × processor usage rate + 7.27 (coefficient 2) × cache size + 5.1 (coefficient 3) Stored is a prediction formula of × Pool usage rate + 0.80 (coefficient 4) × number of physical areas PG.

The method for generating the prediction formula in step 105 is not particularly limited, and any method including a general method such as regression analysis may be used. In the case of regression analysis, for example, a method of setting all of the related information 11412 shown in the prediction formula source information table 1140 as explanatory variables and removing variables having low relevance to the target information from the explanatory variables. The prediction formula may be generated with In the present embodiment, Cache Usage 1146 is excluded from the explanatory variables in the related information stored in the prediction formula original information table 1140 shown in FIG. 6, and is included in the information stored in the prediction formula table 1150 shown in FIG. Not.

FIG. 9 is a flowchart of processing for determining setting change contents for acquiring learning data according to the first embodiment. This setting change content determination process 200 is implemented after the process which produces | generates the prediction formula shown in FIG. 8, for example. This processing is performed by the processor 1300 of the management server 1000 executing the setting change content determination program 1180 expanded on the memory 1100.

The following is a specific example of this flowchart.

First, the setting change content determination program 1180 extracts the metric of the description information 11512 in the prediction formula table 1150 illustrated in FIG. 7, and performs the following processing for each metric.

First, the setting change content determination program 1180 checks whether the metric is included in the configuration information table 1130 (step 201). If the metric is not included in the configuration information table 1130, the setting change content determination program 1180 proceeds to processing for the next metric in the prediction formula table 1150. If the metric is included in the configuration information table 1130, the setting change content determination program 1180 obtains information on the range that the metric can take (step 202). When the metric is the storage cache size, for example, information on a range of values that can be taken as the cache size is acquired on the hardware specification. For example, information that the cache size is in the range of 1 GB to 72 GB is acquired. When the metric is a parity group of storage, information on the RAID level range is acquired. For example, information that the possible RAID levels are RAID0 (2D), RAID1 (1D + 1P), and RAID5 (3D + 1P) is acquired. The method for obtaining the possible range of these metrics is not particularly limited. For example, information on the range that each metric can take may be stored in a table (not shown) in advance, and the setting change content determination program 1180 may acquire necessary information from the table as appropriate. Alternatively, the setting change content determination program 1180 may be acquired by making a request to hardware such as a storage.

Next, the setting change content determination program 1180 searches for a definition area with insufficient data in the range acquired in step 202 (step 203). Next, it is determined whether or not there is a domain having insufficient data (step 204). If there is no domain, the process proceeds to the process for the next metric in the prediction formula table 1150. If there is an insufficient definition area in step 204, the setting change content determination program 1180 generates a parameter for a setting change operation that enables acquisition of data of the insufficient definition area (step 205). .

For example, focusing on the cache size which is a metric, it is assumed that there is no data other than when 8 GB is set as Cache Size 1147 of the prediction formula source information table 1140 shown in FIG. In that case, the setting change content determination program 1180 tries to acquire data when setting is other than 8 GB. For example, the setting change content determination program 1180 generates a parameter for changing the setting of the cache size to 16 GB.

Here, the setting change content determination program 1180 checks whether or not the SLA (Service Level Agreement) is satisfied when the parameter setting generated in step 205 is changed, and if the changed parameter does not satisfy the SLA. It may be excluded from the parameter setting range. For example, when the cache size of 8 GB is changed to 4 GB, a predetermined requirement (response time within 1 second or the like) is set as the performance of the volume or the performance of the business application running on the server 3000 that uses the volume. ) May not be satisfied, the parameter setting may not be changed to 4 GB.

Next, the setting change content determination program 1180 executes learning data acquisition setting change processing (step 206). Step 206 will be described in detail with reference to FIG.

FIG. 10 is a flowchart of processing for executing setting change for learning data acquisition. The learning data acquisition required setting change processing 300 (learning data acquisition required setting change processing 206 in FIG. 9) is performed by the processor 1300 of the management server 1000 executing the setting change content determination program 1180 expanded on the memory 1100. It is carried out. A specific example of this flowchart is shown below.

First, the setting change content determination program 1180 requests the configuration change program 1190 to execute a setting change operation, and acquires an execution result (step 301). Next, the setting change content determination program 1180 checks whether or not an entry for the new time has been added to the prediction formula source information table 1140 (step 302).

When the entry of the new time is added, the setting change content determination program 1180 acquires the number of acquired data in the target domain in the prediction formula source information table 1140 (step 303), and determines whether the data has been acquired sufficiently. Check (step 304).

Here, in order to determine whether or not the learning data has been sufficiently acquired, a threshold value for the number of data is set in advance, or the number of explanatory information shown in the prediction formula table is set as a threshold value. You can keep it. If the learning data is sufficiently acquired, the setting change content determination program 1180 proceeds to the next step 305. If the learning data has not been sufficiently acquired, the setting change content determination program 1180 executes the process again from step 302.

In step 305, the setting change content determination program 1180 requests the configuration change program 1190 to execute the setting change operation to be returned before execution of step 301, and acquires the execution result. If the setting change operation requested in step 301 or step 305 is not successful, this processing is interrupted.

After obtaining sufficient learning data by executing FIG. 10, by executing the prediction formula generation process 100 shown in FIG. 8, the prediction formula source information table 1140 in a state where there is no shortage of learning data in the new configuration. The prediction formula table 1150 indicating the prediction formula with high accuracy can be generated.

In the present embodiment, in step 201 to step 204 of the setting change content determination processing 200 shown in FIG. 9, all the definition areas having insufficient data are extracted from the range that can be taken from the configuration, and then shown in FIG. Data of a domain having insufficient data is acquired by the learning data acquisition setting changing process, and then a prediction formula is generated by the prediction formula generation process 100 shown in FIG. However, it is not limited to this. As another example, every time a domain with insufficient data is extracted, data for that domain is acquired, and once the data is acquired, a prediction formula is generated at that stage. You may repeat as many times as there are domain names.

A specific example is shown. Here, it is assumed that the cache size can range from 1 GB to 72 GB. For example, it is possible to extract a definition area in which data is insufficient in a range that can be taken by the cache size, for example, in units of 1 GB, and generate the prediction formula after acquiring all the extracted domain data. Alternatively, as another example, for each definition area where data extracted in 1 GB units is insufficient, the process of acquiring data, generating a prediction formula, and proceeding to the next definition area may be repeated.

Further, in this embodiment, in the setting change content determination process 200 shown in FIG. 9, data is insufficient for all items included in the configuration information table 1130 among items included in the prediction formula table 1150. After the defined domain is extracted, the learning data acquisition setting change process 206 is executed. Therefore, a prediction formula is generated after data is collected for all items included in the prediction formula table 1150 and the configuration information table 1130. However, it is not limited to this. As another example, the process of executing the learning data acquisition setting change process and the prediction expression generation process for one item included in the prediction formula table 1150 and the configuration information table 1130 and proceeding to the next item is repeated. May be.

As described above, according to the present embodiment, a learning formula that is insufficient in a range that can be taken by the computer system is actively collected in advance, so that a prediction formula with high accuracy can be obtained early when a configuration change is made. The learning time can be shortened and efficient management based on machine learning technology can be implemented immediately after the configuration change.

For example, when a function is used for predictive monitoring, there is a problem with the IT system if measured values obtained from the IT system are far from the relationship indicated by the function, even immediately after the configuration change or newly constructed configuration. It is possible to determine that it has occurred.

Also, when a function is used for fault cause isolation, even if it is a configuration immediately after a configuration change or a newly constructed configuration, if the measured values obtained from the IT system are far from the relationship indicated by the function, It is possible to determine the root cause because there is a high possibility that a problem has occurred in the explanation information having the largest fluctuation range. This also makes it possible to automatically identify the root cause immediately when a failure occurs.

In addition, when a function is used for What-if analysis, even if it is immediately after a configuration change or a newly constructed configuration, by substituting the value that you want to try into the function, it will appear in the function in the situation of the substituted value. It is possible to simulate the value of the metric.

As described above, according to the present invention, it is possible to prevent the occurrence of a failure or failure to satisfy management requirements even immediately after a configuration change or a newly constructed configuration, and to obtain an effect such as quick failure recovery when a failure occurs. Is possible. The present invention can also be applied to the above-described various cloud forms, and can also be applied to forms in which management software SaaS and operation management work are contracted as services.

The computer system according to the second embodiment basically has the same configuration as that of the first embodiment and performs the same operation. However, the second embodiment uses not only the related information related to the purpose information but also the information acquired in the computer system of the business having characteristics similar to the business targeted by the purpose information for generating the prediction formula. Different from the first embodiment.

FIG. 11 is a diagram illustrating an example of the business property management table 1800 according to the second embodiment. The business property management table 1800 manages business property information for each business unit.

The business property management table 1800 stores business unit 18011 and business property 18012 data. In this embodiment, an example is shown in which business units are associated with volumes, and information on I / O such as the number of I / Os and the ratio of each I / O pattern is managed for each business as business characteristics. Referring to FIG. 11, fields of business units 18011 and business characteristics 18012 are associated with the business characteristic management table 1800. The business unit 18011 includes a volume ID 1801. The business characteristics 18012 include fields for the number of I / O 1802, an I / O increase / decrease rate 1803, a high-frequency access 1804, and an I / O pattern 1805.

The volume ID 1801 stores an identifier for uniquely identifying the volume 2210. In the I / O number 1802, the I / O number is recorded. For example, the average value or intermediate value of the IOPS of the previous month is recorded. In the I / O increase / decrease rate 1803, the ratio of how much the IOPS has changed in the past fixed period is recorded. For example, in a half year or one year, an average of one month of IOPS is calculated, and an increase / decrease rate of the average value of each month with respect to the average value of the previous month is calculated. In the I / O pattern 1805, the generation ratio of each I / O pattern of Random Read, Random Write, Sequential Read, and Sequential Write is recorded. The I / O pattern having the highest rate is recorded in the high-frequency access 1804. Although an example in which a business unit corresponds to a volume is given here, the present invention is not limited to this. As another example, the business unit may be a VM, a business program on the server 3000, or information on a processing unit executed by the business program.

FIG. 12 is a flowchart of processing for executing learning data sharing according to the second embodiment. The learning required data sharing process 400 is a process in the second embodiment corresponding to step 105 of the prediction formula generation process 100 in FIG. 8 in the first embodiment. This process is performed by the processor 1300 of the management server 1000 executing the prediction formula generation program 1170 expanded on the memory 1100. A specific example of this flowchart is shown below.

The prediction formula generation program 1170 first acquires information of the business property management table 1800 (step 401). Next, the prediction formula generation program 1170 checks whether or not there is a constituent element of a prediction formula generation target that is used in a similar business similar to the business for which the prediction formula is generated (step 402). Here, the information of the business property management table 1800 acquired in step 401 is used to determine whether the business is used in a similar business. Similar tasks may be grouped in advance and the presence or absence of components used in the tasks belonging to the same group may be checked.

As an example of grouping, a business with the same high-frequency access information may be a business-similar group. Alternatively, with respect to the I / O increase / decrease rate, a decrease rate of 5% or more, an increase / decrease rate within ± 5%, an increase rate of 5% or more, and the like may be set as the business similarity group. Alternatively, the tasks may be classified into any number of groups using the k-average method. Alternatively, the number of groups may be input in advance and the groups may be appropriately grouped according to the number of sono groups. Alternatively, the tasks may be grouped by combining the above grouping methods. Thus, grouping may be performed by any method and is not particularly limited.

In the case of the business characteristic management table 1800 illustrated in FIG. 11, Volume 1 and Volume 3 have the same high frequency access 1804 “RW”, the same I / O increase / decrease rate 1803 “5% or more”, and the I / O The O number 1802 is “10000 or more” and is the same. Tasks may be grouped so that Volume 1 and Volume 3 are determined to be the same task similar group.

In step 402, when there is a component used in the similar job, the prediction formula generation program 1170 uses the information in the prediction formula source information table 1140 of each component used in the similar job to predict the formula. Is stored in the prediction formula table 1150 (step 403). In step 402, when there is no component used in the similar job, the prediction formula generation program 1170 generates a prediction formula from information in the prediction formula original information table for each component and stores it in the prediction formula table. (Step 404).

As described above, by sharing learning data between groups with similar work, it is possible to reduce learning time and quickly implement highly efficient and efficient management based on machine learning technology. For example, when creating a new environment, it is usually necessary to acquire various history information such as performance information and capacity information over a long period of time, for example, several months. However, if there is a business in an environment similar to the environment of the newly created business, the data of the similar business group is used, and the judgment of the similar business group is performed in a short period of time, for example, 3 days. Based on this, efficient management can be implemented. Also, in step 203 of the setting change content determination process 200 shown in FIG. 9, since the missing definition area can be searched based on the configuration that has been taken in each similar business group, the missing data It is possible to reduce the time required for collection, and it is possible to perform efficient management based on machine learning technology.

The computer system according to each embodiment described above can be arranged in the following manner.

(Aspect 1)
An information collection unit that acquires learning data from the constituent elements of the target system while the target system is in operation, and a relation between the constituent elements of the target system based on the learning data is expressed by a relation between purpose information and explanatory information A prediction formula generation unit that generates the prediction formula, a setting change content determination unit that determines setting content for changing the configuration of the target system, and a configuration change unit that changes the configuration of the target system, In a range of the state of the component that can be taken when the setting change content determination unit changes the configuration of the target system, a state where the learning data is not sufficiently acquired is extracted as a data shortage state, and the component Determines the setting contents for changing the configuration of the target system so that the data shortage state occurs, and the configuration changing unit follows the determined setting contents. Changing the configuration of the target system, the information collecting unit acquires learning data when the data starved from the components of the target system, the learning data processing apparatus.

(Aspect 2)
The setting change content determination unit extracts a configuration in which learning data is not acquired more than a predetermined amount of data from a range of states of the components that can be taken by the target system, and determines a setting change corresponding to the configuration The learning data processing apparatus according to aspect 1.

It is possible to temporarily set a configuration in which learning data is not sufficient among the configurations that the target system can take, collect learning data, and cover the learning data of the configuration that the target system can take.

(Aspect 3)
The setting change content determination unit temporarily changes the configuration of the target system so that the component is in the data shortage state, and the configuration change unit configures the configuration of the target system according to the determined setting content. When the information collection unit acquires and accumulates learning data when the data is in a deficient state from the components of the target system, and the configuration of the target system is changed, the prediction formula generation unit The learning data processing apparatus according to aspect 1, wherein the prediction formula is generated using the learning data accumulated by the information collecting unit in a configuration in which the component is in the data shortage state.

When the configuration of the target system is changed, learning data in the configuration after the change is acquired in advance and a prediction formula is generated with the learning data in the configuration after the change, so in a short period of time when the configuration is changed A highly accurate prediction formula can be obtained.

(Claim 4)
The prediction formula uses the performance of a target component that is a target component of the prediction formula as objective information, and uses the performance of one or more related components logically associated with the target component as explanatory information. , The objective information is expressed as a function of the explanation information, the information collection unit accumulates performance exhibited by the constituent elements of the target system as performance history information, and the prediction formula generation unit includes the performance history The learning data processing device according to aspect 1, wherein the function is calculated using information as the learning data.

Since the prediction formula is generated based on the performance history information measured and accumulated, it is possible to generate a good prediction formula by accumulating sufficient performance history information.

(Aspect 5)
The learning data processing apparatus according to aspect 4, wherein the purpose information is a response time of a volume in the storage, and the description information includes a usage rate of a processor used for accessing the volume and a size of a cache.

(Aspect 6)
The prediction formula indicates the objective information as a sum of a product of the explanation information and a coefficient, and the prediction formula generation unit uses the performance history information as the learning data and calculates a coefficient for each related component. The learning data processing device according to aspect 4, which calculates.

Since the prediction formula is expressed by the product sum of the explanation information and the coefficient and the coefficient is calculated, a function indicating the relationship between the objective information and the explanation information can be easily calculated.

(Aspect 7)
The learning data processing device according to aspect 1, wherein the prediction formula generation unit generates the prediction formula using learning data acquired in a similar system that satisfies a predetermined similarity condition with respect to the target system.

When there is a similar system in the target system, the prediction formula is generated using the learning data of the similar system, so that the machine learning with high accuracy can be performed from an early stage after the configuration change by increasing the usable learning data.

(Aspect 8)
The aspect is described in aspect 7, wherein the target information is a performance of the storage volume, and the similarity condition is determined by a similarity degree of an I / O pattern including random read, random write, sequential read, and sequential write with respect to the volume. Learning data processing device.

Since the similarity determination is performed based on the similarity of the I / O pattern, it is possible to use the learning data collected in a task with a similar I / O pattern in the configuration change or new construction of another task.

(Aspect 9)
The learning data processing device according to aspect 5, wherein the setting change content determination unit determines to change the cache size to a size where sufficient learning data is not obtained.

When there is a cache size in the explanation information, the learning data within the range that the cache size can take can be covered in advance, so even if the cache size of the target system is changed, the learning formula is insufficient and the accuracy of the prediction formula is kept high. can do.

1000 ... Management server, 1100 ... Memory, 1110 ... Related information table, 1120 ... Performance history information table, 1121 ... Time, 1124 ... Metric, 1125 ... Performance value, 1130 ... Configuration information table, 1133 ... Metric, 1134 ... Value, 1140 ... prediction formula source information table, 1160 ... information collection program, 1170 ... prediction formula generation program, 1180 ... setting change content determination program, 1190 ... configuration change program, 1200 ... communication device, 1300 ... processor, 1400 ... output device, 1500 ... Input device, 1600 ... storage device, 1700 ... internal bus, 1800 ... business characteristic management table, 2000 ... storage device, 2100 ... memory, 2110 ... disk cache, 2120 ... configuration performance information collection program, 2 30 ... configuration change program, 2200 ... logical volume providing unit, 2210 ... volume, 2220 ... disk pool, 2230 ... physical area,
2300 ... Disk I / F controller, 2500 ... Processor, 2600 ... Data I / F, 2700 ... Communication path, 3000 ... Server, 3100 ... Memory, 3110 ... Configuration information collection program, 3120 ... Business program, 3130 ... Configuration change program, 3300: Processor, 3500 ... Communication path, 4000 ... SAN, 5000 ... Management network

Claims

An information collection unit for acquiring learning data from the constituent elements of the target system while the target system is operated;
A prediction expression generation unit that generates a prediction expression expressing the relationship between the components of the target system based on the learning data by the relationship between the purpose information and the explanation information;
A setting change content determination unit for determining a setting content for changing the configuration of the target system;
A configuration changing unit that changes the configuration of the target system,
The setting change content determination unit extracts a state where the learning data is not sufficiently acquired as a data shortage state in a range of the state of the component that can be taken when the configuration of the target system is changed, and the configuration Determine the settings to change the configuration of the target system so that the element is in the data shortage state,
The configuration change unit changes the configuration of the target system according to the determined setting content,
The information collection unit acquires learning data when the data shortage state from the component of the target system,
Learning data processing device.
The setting change content determination unit extracts a configuration in which learning data is not acquired more than a predetermined amount of data from a range of states of the components that can be taken by the target system, and determines a setting change corresponding to the configuration To
The learning data processing apparatus according to claim 1.
The setting change content determination unit temporarily changes the configuration of the target system so that the component is in the data shortage state,
The configuration change unit changes the configuration of the target system according to the determined setting content,
The information collection unit acquires and accumulates learning data when the data is insufficient from the components of the target system,
When the configuration of the target system is changed,
The prediction formula generation unit generates the prediction formula using learning data accumulated by the information collection unit in a configuration in which the component is in the data deficient state.
The learning data processing apparatus according to claim 1.
The prediction formula uses the performance of a target component that is a target component of the prediction formula as objective information, and uses the performance of one or more related components logically associated with the target component as explanatory information. , Indicating the purpose information as a function of the description information,
The information collection unit accumulates performance exhibited by the components of the target system as performance history information,
The prediction formula generation unit calculates the function using the performance history information as the learning data.
The learning data processing apparatus according to claim 1.
The learning data processing apparatus according to claim 4, wherein the purpose information is a response time of a volume in the storage, and the explanation information includes a usage rate of a processor used for accessing the volume and a size of a cache.
The prediction formula indicates the objective information as a sum of products of the explanation information and a coefficient,
The prediction formula generation unit calculates a coefficient for each related component using the performance history information as the learning data,
The learning data processing apparatus according to claim 4.
The learning data processing apparatus according to claim 1, wherein the prediction formula generation unit generates the prediction formula using learning data acquired by a similar system that satisfies a predetermined similarity condition with respect to the target system.
The purpose information is the performance of the storage volume;
The similarity condition is determined by a similarity degree of an I / O pattern including random read, random write, sequential read, and sequential write for the volume.
The learning data processing apparatus according to claim 7.
The learning data processing device according to claim 5, wherein the setting change content determination unit determines to change the cache size to a size where sufficient learning data is not obtained.
A learning formula is acquired from the constituent elements of the target system while the target system is in operation, and a prediction formula expressing the relationship between the constituent elements of the target system based on the learning data by the relation between the objective information and the explanation information. A learning data processing method for generating,
The setting change content determination means
In the range of the states of the components that can be taken when changing the configuration of the target system, the state where the learning data is not sufficiently acquired is extracted as a data shortage state,
Determine the setting content to change the configuration of the target system so that the component is in the data shortage state,
The configuration changing means changes the configuration of the target system according to the determined setting content,
The information collecting means obtains learning data when the data is insufficient from the components of the target system;
Learning data processing method.