WO2016174739A1

WO2016174739A1 - Multicomputer system, management computer, and data linkage management method

Info

Publication number: WO2016174739A1
Application number: PCT/JP2015/062822
Authority: WO
Inventors: 充実寺山; 恵介畑崎; 睦細谷
Original assignee: 株式会社日立製作所
Priority date: 2015-04-28
Filing date: 2015-04-28
Publication date: 2016-11-03

Abstract

　This multicomputer system is provided with: a first computer system having one or more distributed processing computers that each have at least one distributed storage forming a resource pool and execute parallel distributed processing of a job; and a second computer system in which one or more computers share a shared storage and which connects to the first computer system to transmit and receive data. The first computer system divides the aforementioned job into one or more tasks, determines the method for loading, in the distributed storage, the data to be accessed in the aforementioned tasks, and requests the data from the second computer system. The second computer system transmits the data to the first computer system, and the first computer system loads the received data in the distributed storage using the aforementioned loading method.

Description

Compound computer system, management computer, and data linkage management method

The present invention relates to a technology for linking data between computer systems having different storage configurations.

Virtualization technologies such as server virtualization and storage virtualization are generally used for the purpose of reducing capital investment related to information systems and increasing the agility of system operation. According to these virtualization technologies, the utilization efficiency of physical resources can be increased by sharing physical resources among a plurality of systems. In addition, it is possible to generate a resource in software beyond the limitation due to the configuration of the physical resource. Therefore, a system can be constructed flexibly and quickly, and the system once constructed can be transferred to another physical resource. However, a system constructed by virtualization technology cannot have a processing performance that exceeds physical limits in terms of performance. Therefore, the characteristics and performance of the system are determined to some extent depending on what kind of physical resource constitutes the system.

In contrast, studies have been made to reduce physical resource constraints on systems built with virtualization technology, and in recent years systems called scale-out (aggregation) platforms have emerged. In this system, a parallel distributed system is configured by a plurality of physical servers and a plurality of storage devices built in the physical servers. The scale-out type infrastructure makes it possible to configure a resource pool that spans multiple nodes (physical servers) in the same cluster, and a virtual server or process can be executed on any physical node.

This makes it possible to add a virtual server without worrying about the free space remaining on the physical server, for example. Further, when a failure occurs in a physical server, the system can be transferred to another node and processing can be continued without worrying about differences in physical specifications. In addition, because the scale-out platform can easily change the capacity and configuration of resources depending on the cluster configuration, by using relatively inexpensive hardware that is inferior in reliability alone, it has sufficient redundancy, It is also possible to obtain sufficient reliability for the entire board.

On the other hand, a system in which a plurality of physical servers and shared storage devices used by them are interconnected via a network has been widely used. Here, this kind of system is called a scale-up type base for a scale-out type.

Since there is a fixed shared storage device in the scale-up type infrastructure, storage resources are added by adding devices mounted on each device in principle. Although the scale-up platform has one aspect that the hardware is likely to be complicated, the reliability can be improved by making each module such as a controller redundant. Therefore, the scale-up platform is suitable for applications such as databases that require high reliability and performance. And to date, many business systems have been operating on this scale-up platform. Therefore, it can be considered that a lot of data cultivated by companies, such as customer information and transaction information, is stored and updated in the existing scale-up base.

In the future, the scale-up base and the scale-out base will be used properly in the right place due to the fact that the scale-up base and the scale-out base are different. Is important. In other words, while maintaining the existing scale-up platform, a new scale-out platform is introduced, and a utilization method is required in which the data are linked by connecting them via a network. It is expected that various business applications will be provided on an optimal basis according to the data format and required performance. In that case, it is efficient to load the data accumulated in the existing scale-up base into the scale-out base, and conversely store the processing results in the scale-out base into the scale-up base. Data linkage technology is essential.

Patent Document 1 discloses a virtualization technology that uses a storage device in a data center as if it is part of a remote site. In this technology, a storage device provided virtually includes an interface for grasping IO (Input / Output) characteristics for each storage block, and a necessary storage block can be transferred when necessary based on the IO characteristics. Further, Patent Document 1 discloses a technique for transferring data between different sites with high efficiency by transferring storage blocks that are likely to be required next in advance.

Further, in Patent Document 2, storage in a distributed processing node is performed so as to increase the efficiency of IO for a storage device based on distributed processing execution information such as how to divide processing and in what order. A technique for controlling area allocation is disclosed.

US Patent Application Publication No. 2011/0238775 JP 2008-15888 A

When the technology of Patent Document 1 is applied to a scale-out type base, it becomes possible to access a specific area of the scale-up type base without changing the configuration of the scale-out type base. However, according to Patent Document 1, although the scale-up type base is logically recognized as a part of the scale-out type base, it depends only on the IO characteristics for each storage block. The efficiency of data transfer cannot be increased sufficiently in a suitable form. The scale-out type platform has a unique behavior such as copying the same storage block multiple times in order to make the data redundant and distributed, and there is a high possibility of performing redundant data transfer to the scale-up type platform.

If the technology of Patent Document 2 is used for data linkage between a scale-up type base and a scale-out type base, it is linked to the execution state of a business application that operates on the scale-out type base, to a storage device possessed by the scale-up type base May be more efficient However, Patent Document 2 does not disclose a method for transferring data from a scale-up base so that it can cope with dynamically distributed parallel distributed processing. On the scale-out type platform, processing processes are asynchronously and parallelly arranged, and the response performance between the processing processes and the distributed data dynamically changes. Therefore, in order to improve access efficiency, it is necessary to consider not only the execution order of processing processes but also data arrangement. However, when transferring data from a scale-up type base to a scale-out type base, the delay is likely to become a bottleneck. When the size of the delay becomes a bottleneck, not only the processing on the scale-out type platform is in a standby state, but also the period of time that must be set offline so that the data is not updated on the scale-up type platform becomes long. Further, the data arrangement is determined by different management methods for the scale-up type base and the scale-out type base, but a technique for solving the difference is not disclosed in Patent Document 2.

As described above, there is no established method for data linkage in a method suitable for each of the scale-up type and scale-out type architectures.

An object of the present invention is to provide a technology that realizes suitable data linkage between a scale-up type base and a scale-out type base.

A compound computer system according to an aspect of the present invention includes a first computer system having one or more distributed processing computers each having one or more distributed storages constituting a resource pool and executing parallel distributed processing of jobs, One or more computers share a shared storage, and have a second computer system that is connected to the first computer system and transmits / receives data, and the first computer system has one job. Dividing into the above tasks, determining a method of expanding the data accessed by the task to the distributed storage, requesting the data to the second computer system, and the second computer system The first computer system transmits the received data to the distributed storage by the expansion method. To.

According to the present invention, the data of the second computer system, which is a scale-up type computer system, is expanded to the distributed storage in the first computer system by a method suitable for the first computer system, which is a scale-out type computer system. Therefore, it is possible to realize suitable data linkage between the scale-up computer system and the scale-out computer system.

1 is a schematic block diagram of a computer system according to a first embodiment. It is a block diagram which shows the physical structure of a 1st computer system. It is a block diagram which shows the logic structure of a 1st computer system. It is a block diagram which shows the physical structure of the 2nd computer system 40b. It is a figure which shows the specific example of the storage structure management table 153c. It is a block diagram which shows the logic structure of the 2nd computer system 40b. 2 is a diagram showing a configuration of a management program that runs on a management computer 200. FIG. 6 is a diagram illustrating a specific example of a task management table 203. FIG. 5 is a diagram illustrating a specific example of a data arrangement management table 206. FIG. 3 is a diagram illustrating an example of a resource configuration managed by a resource configuration management unit 207. FIG. FIG. 6 is a sequence diagram showing a flow from when a user submits a job in the first computer system 40a to when necessary data is specified in the second computer system 40b. It is a sequence diagram which shows the process which expand | deploys the data of a 2nd computer system to a 1st computer system. FIG. 10 is a diagram illustrating a data transfer operation in the second embodiment.

Hereinafter, a computer system according to an embodiment of the present invention will be described with reference to the drawings.

According to the present embodiment, a composite computer system that efficiently loads data from a scale-up base to a scale-out base is provided.

FIG. 1 is a schematic block diagram of a computer system according to the first embodiment.

The computer system of this embodiment is mainly composed of a first computer system 40a and a second computer system 40b. Due to architectural differences, the first computer system 40a is also referred to as a scale-out type base 40a, and the second computer system 40b is also referred to as a scale-up type base 40b.

The first computer system 40 a is mainly composed of a cluster 11 of one or more physical servers 10. The physical server 10 is also referred to as a node, referring to the state of participating in the cluster 11. Each physical server 10 is connected to the network 50a and set so as to be able to communicate with each other. The configuration of the network 50a, each physical server 10, the cluster 11, and the application 30a operating on each node constituting the computer system 40a is controlled by the management computer 200a. The computer system 40a provides a service realized by the application 30a to one or more client computers 250 via the network 50a.

In the first computer system 40a, each physical server 10 participating in the cluster 11 constitutes a distributed system and a distributed storage, and the first computer system 40a is suitable for parallel processing.

Each node 10 can access a storage area physically provided in another node via the network 50a, and can distribute computation processing as necessary.

In addition, when the processing capacity that can be supplied in the cluster 11 is insufficient, the computing capacity and the storage area can be expanded by adding a new node 10. Due to this characteristic, the first computer system 40a is called a scale-out type base 40a.

The second computer system 40b takes a form in which one or more physical servers 20 share the storage apparatus 100 connected via the network 50b. The configuration of the network 50b, each physical server 20, the storage apparatus 100, and the application 30b running on each physical server 20 configuring the computer system 40b is controlled by the management computer 200b. The computer system 40b provides a service realized by the application 30b to one or more client computers 250 via the network 50b.

In the second computer system 40b, the main operation is to reserve a sufficient resource capacity after estimating the required resource amount required by the application in advance. The above-described virtualization technology makes it possible to define a larger resource reservation amount with respect to the total amount of resources that exist physically, thereby improving the resource utilization efficiency of the entire system. At this time, the real resource allocated to a certain application can be expanded in the single capacity up to the reserved amount. The first computer system 40b is called a scale-up type base 40b in the sense that it is expanded by a single resource.

Traditionally, mission critical missions such as customer management systems or production management systems have been built on scale-up platforms 40b, where the upper and lower limits of demand are easy to estimate and reliability and unit performance are required. .

On the other hand, the load on applications such as short-term campaign sites, applications where it is difficult to predict demand, such as changes in end-user activity trends, and applications such as messaging and analysis of employee virtual desktop infrastructure and sensor data occur simultaneously. In recent years, the number of uses that are difficult to accommodate or cost too much with existing scale-up bases, such as concentrated applications, has been increasing. The scale-out type base 40a is attracting attention as a base that can flexibly respond to such a large-scale demand that is drastically changing.

FIG. 2 is a block diagram showing the physical configuration of the first computer system. The main physical components in the first computer system 40 a are one or more physical servers 10. The physical server 10 is used for the purpose of operating the application 30a, the purpose of installing a control unit for realizing the cluster 11 function, or a management program for realizing the management computer 200a. An OS (Operating System) that controls the physical server 10 and a program that realizes the main functions of the application 12 are developed on the memory (main storage device) 22, and the CPU 21 performs calculation processing.

These main processing units of the physical server 10 are connected to processing units such as a network adapter such as the NIC 25 and an input / output device via the data bus 23. Non-volatile input / output devices (storage media) are responsible for device control, such as the disk controller 27, depending on the type of SSD (Solid State Drive) 108, HDD (Hard Disk Drive) 105, flash memory 29, etc. May be connected via a controller. For example, when using PCIe (Peripheral Component Interconnect Express) for the purpose of connecting a plurality of devices to the same data bus 23, the physical server 10 may have a mechanism for switching connection like the PCIe switch 24. Good. Devices that can be used as storage media are not limited to the SSD 108, HDD 105, and PCI connection flash memory 29 shown in the figure, and the configuration such as the type and quantity of media including other media may be changed according to performance and price.

In general, storage media that can be read and written at high speed, such as PCI-connected flash memory and SSD, are used in combination with storage media that have a high price per capacity, while HDDs that are low speed but have a lower price per capacity. There are many cases. Since such cost performance is directly linked to the type of hardware used, it is handled by a concept called a storage tier in which devices are grouped by type.

The physical server 10 is connected to the network 50a by the NIC 25 included in each physical server. The network 50a only needs to connect the physical server 10 and the management computer 200a to each other, and may be configured by one or a plurality of network switches. The network 50a may use any protocol as long as the network 50a does not depend on the physical configuration shown in FIG. For example, another protocol such as Infini-Band may be used. Generally, these devices are physically connected to a plurality of lines as shown in the figure in order to cope with an apparatus failure in the network 50a.

FIG. 3 is a block diagram showing a logical configuration of the first computer system.

As described above, the physical server 10 is mainly used as a distributed processing node including the hypervisor 31, but in addition, it can be used as a management computer 200a or a storage appliance 36c by installing necessary software. Here, a plurality of physical servers 10 in the cluster 11 have a function of forming a single file system in cooperation with each other, and it can be said that the first computer system 40a is a distributed storage type system.

A physical server 10 that is used as a distributed processing node is a so-called virtual machine host having a hypervisor 31. The hypervisor 31 is a general virtualization mechanism, also called a virtual machine monitor, and operates a plurality of guest OSs 32 on one physical server by simulating virtual hardware by software. . An area partitioned by the hypervisor 31 for each guest OS 32 used by the user is called a virtual machine, in particular, an instance 35 when used for a service for the user, and the application 30a uses a virtual resource controlled by the guest OS 32. And operate.

More specifically, the hypervisor 31 uses a higher-order interrupt instruction (hypervisor call) than a general OS (guest OS 32) controls the hardware, for example, the CPU 21 by time division scheduling or address conversion. Multiple devices such as the memory 22 are multiplexed. There are a variety of mounting forms, such as a hypervisor 31 that operates between physical hardware and the OS, and a built-in firmware that controls a physical device. Due to the function of the hypervisor 31, virtual resources are allocated to each instance 35, and it is apparently not limited by the quantity or capacity of physical devices included in the physical server 10.

The distributed storage controller 36a controls the internal storage 37 (for example, the SSD 108, the HDD 105, the flash memory 29, etc.) and cooperates with the distributed storage controller 36a operating on another distributed processing node via the network 50a to realize distributed storage. It is a control unit. All instances 35 on which the application 30 a operates access the storage area via the distributed storage controller 36 a arranged on the same physical server 10.

The distributed file system is configured by the cooperation of the distributed storage controllers 36a in the plurality of distributed processing nodes 10. With the distributed file system, the guest OS 32 in one instance 35 can access the internal storage 37 stored in another physical server 10 in the same cluster 11.

Generally, a file system has metadata that expresses a storage destination address and a file structure separately from a stored data body. In the distributed file system according to this embodiment, all distributed storage controllers 36a in the same cluster 11 synchronize metadata, and all nodes share storage resources. With this function, the instance 35 can logically use the total capacity of the built-in storage 37 in the cluster 11 regardless of the physical server 10 arranged.

Also, the data body is not one, but the replica is placed on another node. Such a redundancy mechanism improves fault tolerance and at the same time improves reading performance. For example, by having certain data in the internal storages 37 of the three distributed processing nodes 10, instances executed in parallel may refer to the internal storage 37 in the vicinity, and one of the internal storages 37 reads and writes due to a physical failure. Even if it becomes impossible, processing can be continued with reference to one of the other two remaining internal storages.

In order to make the data body redundant, the distributed storage controller 36a continuously updates the data body in the other node when data is written in a certain data area. At this time, the data area as the replication source is called a master. Such a replication process in the file system layer is a data management process unique to the scale-out type board 40a, and is a difference from the scale-up type board 40b that performs redundancy in the hardware device layer.

The replication process is performed via the network 50a, and is not synchronized with the master until the distributed storage controller 36a on the other node completes the replication process (writing as a file system has not been completed) ) State. Therefore, the writing process in this type of distributed file system requires a longer time than the reading process. When new data is transferred from the outside of the scale-out type base 40a, the storage area of the data is once secured on the file system, and then the replication process is performed inside the scale-out type base 40a. It is necessary to follow the procedure.

Basically, the data communication between the internal storage 37 and the instance 35 in the same physical server 10 is faster than the case where the physical server 10 is separated. For this reason, when the performance is insufficient, the data on the internal storage 37 of the storage destination is transferred to the internal storage 37 of another node, or is transferred to the node where the data using the instance 35 is stored. Can improve performance. One advantage of the parallel distributed system is that performance and storage capacity are balanced in the entire system by devising the arrangement of these data and processing mechanisms (instances in this embodiment).

As described above, the distributed storage controllers 36a communicate with each other via the network 50a. As a connection protocol at that time, for example, general iSCSI (Internet Small Computer System Interface) connection, NFS (Network File System), or the like is used. Therefore, if the storage apparatus 100 corresponding to the same protocol is prepared in the second computer system 40b, the distributed storage controller 36a is connected to the network 50b of the second computer system 40b via the network 50c, and Can read and write storage resources. Here, the distributed storage controller 36a at the boundary between the first computer system 40a and the second computer system 40b may be referred to as a storage gateway.

Thus, the storage area in the second computer system 40b recognized by the storage gateway is in a readable / writable state from the first computer system 40a. However, when the first computer system 40a performs parallel distributed processing that exhibits its original performance, as described above, a procedure for relocating necessary data to the distributed file system on the first computer system 40a is performed. . In order to distinguish the procedure of placing the data of the second computer system 40b in the distributed file system of the first computer system 40a from the data transfer between the first computer system 40a and the second computer system 40b, This is called “development” of data.

The distributed storage controller 36 a is a virtual machine that operates on virtual resources partitioned by the hypervisor 31. However, unlike the other instances 35, the distributed storage controller 36a does not provide the application 30a to the user. The storage controller 36a is not necessarily a virtual machine, and may be a storage appliance 36c that occupies a single physical server 10 and does not require the hypervisor 31, for example.

In the first computer system 40a, the management computer 200a is composed of a management program group installed on the physical server 10 or the virtual machine. The management computer 200a is mainly connected to each physical server 10 (distributed processing node) and storage appliance 36c via the network 50a, and controls their configuration. Details of the management program constituting the management computer 200a will be described later.

FIG. 4 is a block diagram showing a physical configuration of the second computer system 40b. The main components of the second computer system 40b are one or more physical servers 20, one or more storage apparatuses 100 for operating an application 30b and a management program used by the user, and interconnecting them.

Networks

55 and 65.

The physical server 20 has the same physical configuration as the physical server 10 in the first computer system 40a shown in FIG. However, it may be connected to a SAN (Storage Area Network) 55 for the purpose of increasing the speed and reliability of data communication with the storage apparatus 100. As a more specific implementation example of SAN, there is FC SAN composed of one or a plurality of fiber channel switches and HBA (host bus adapter) 51 for connecting each data input / output device. The HBA is mounted on the physical server 20 or the storage apparatus 100 and connected to another HBA or a fiber channel switch by a port installed on the HBA 51. However, the implementation of SAN 55 is not limited to fiber channel, but may be anything that can achieve the same purpose of large-capacity data communication. For example, it uses another type of device and protocol such as iSCSI, FCoE, and Infini-band. May be.

The storage apparatus 100 has a function of providing storage resources to each physical server 20 and the physical server 10 in the first computer system 40a. The storage apparatus 100 is physically composed of one or more shared storage controllers 150 and one or more storage media 105. With the function of the shared storage controller 150, a logical configuration of storage resources is constructed. The shared storage controller 150 has a general computer architecture, and provides a function specialized for controlling storage resources by a control program group 153b read by the processor (CPU) 151 onto the memory 152.

In this embodiment, the control program group 153 includes at least a response program 153a for returning a response to a data read or write request from the server, and a storage control for controlling the logical storage configuration described above. A program 153b and a storage configuration management table 153c for holding a storage configuration controlled by the storage control program 153b are included. Data communication from the server to the storage resource is performed via the SAN 55 connected by the HBA 51 or the LAN 65 connected by the NIC 155. The storage response program 153a interprets a plurality of protocols according to the communication method to be used. More specifically, for example, a read / write function for storage resources is provided by a fiber channel connection for the HBA 51 on the storage controller 150 and an iSCSI connection or NFS for the NIC 155.

Communication from the management computer 200b to the shared storage controller 150 is mainly transmitted and received by the NIC 155.

In addition, the processor 151 uses the SAS IF 154 for transmitting / receiving data to / from the storage medium 105 or the cache 153, which is a temporary data area installed for the purpose of performing a response to the server at a high speed, to function the shared storage controller 150. Is realized. In particular, the cache 153 is smaller in capacity than the non-volatile storage medium 105, but utilizes the characteristic that data can be read and written at a very high speed. The data requested from the server is stored in the cache 153. The higher the cache hit rate is, the higher the IO performance can be expected.

Reading from and writing to the storage medium 105 from the cache 153 also involves processing costs for the CPU 151 such as calculation of parity values and Read Modify Write, for example. These processing costs vary depending on the IO characteristics such as the IO length or sequential access ratio and the setting of the shared storage controller 150 such as the threshold of the allowable dirty rate that affects the timing of accessing the medium. The IO length here is the data size transferred for each IO command.

The storage control program 153b does not provide the storage medium 105 to the server as it is, but defines a logical data structure on the storage medium 105 to realize efficient operation management of storage resources. The storage control program 153b provides storage resources in the unit of volume 101 to the physical server 20 in the second computer system 40b and the physical server 10 in the first computer system 40a.

In the storage apparatus 100, any volume 101 is defined as a data structure in a RAID (Redundant Arrays of Independent Disks) group 104 composed of a plurality of storage media (HDDs in this case) 105. However, the storage resource providing method is roughly divided into a method (physical volume) in which the volume 101 defined directly from the RAID group 104 is provided to the server, and a storage unit 106 that is once incorporated into another structural unit. There are two types of methods of providing a volume (virtual volume 107) allocated from the pool 106 to the server.

The latter is, for example, for realizing a storage capacity virtualization function in which only the capacity in which data is actually written is allocated from the storage pool 106 while recognizing it as a virtually defined volume 107 for the physical server 20. Necessary data structure. When a plurality of shared storage controllers 150 are mounted on one storage device 100, the plurality of shared storage controllers 150 cooperate with each other through a data bus 156 that connects the shared storage controllers 150 to each other. Control is performed so that no inconsistency occurs in the data area. The storage control program 153 b further has a function of defining a path that is a logical connection from the volume 101 to the provision destination physical server 20 or creating a copy of a specific volume in the storage apparatus 100.

The above-described data structure controlled by the storage control program 153b is held by the storage configuration management table 153c. A specific example of the storage configuration management table 153c is shown in FIG.

In the present embodiment, the table 153c indicates an identifier (volume ID) 501 unique to a single storage apparatus 100 assigned to the volume 101, a volume type 502, a capacity 503, and a logical connection. A path 504, a connection destination ID 505 indicating a storage resource providing destination, and a medium 506 constituting the volume are held.

As described above, in the case of a physical volume directly defined from the RAID group 104, “physical” is described in the type field 502, and in the case of the virtual volume 107 assigned from the storage pool 106, “virtual” is described. .

The media field 506 is information indicating the allocation source. The identifier of the storage pool 106 is described in the case of a virtual volume, and the storage tier and identifier of the RAID group 104 are described in the case of a physical volume. Storage tiers are assigned numbers such as “T1”, “T2”, and “T3” because of their high speed.

When the physical volume is a component of the storage pool 106, the connection destination ID field 505 describes the identifier of the storage pool 106. When the volume is provided to the physical server 20 in the second computer system 40b, the connection destination ID field 505 describes the identifier of the physical server 20. When the volume is provided to the physical server 10 in the first computer system 40a, the connection destination ID field 505 describes “External” in the sense that it is not included in the second computer system 40b. The When creating a copy of the specific volume (backup volume 108) by the storage control program 153b, the connection destination ID field 505 holds the volume ID of the volume that is the copy source.

FIG. 6 is a block diagram showing a logical configuration of the second computer system 40b. As shown in the figure, in the second computer system 40 b, a plurality of physical servers 20 that operate the application 30 b operate by sharing a smaller number of storage devices 100 than the physical servers 20. A network 50b that interconnects these devices is physically implemented as a network 65 or a network 55.

In the system of this mode, since a large amount of data copy or data capacity aggregation can be realized by a function mounted on the dedicated storage apparatus 100 side, the application 30b can efficiently use the capability provided by the physical server 20. . At this time, it can be said that the second computer system 40b uses a shared storage type storage subsystem.

In this computer system 40b, the physical server 20 is operated in two ways: a bare metal host and a virtual machine host. When used as a bare metal host, an OS 32 that directly controls physical processing resources is running on the physical server 20, and an application administrator or user directly uses this as the guest OS 32 to directly The logical configuration of the host 20 can be controlled.

Similarly to the first computer system 40a, the user may refer to an area where the application 30b is operated and is called an instance 35. At this time, the guest OS 32 directly reads from and writes to the volume 101 in the storage apparatus 100.

On the other hand, in the case of a virtual machine host, the hypervisor 31 is operated on the physical server 20, and the physical resources of the physical server 20 are partitioned into virtual resources. At this time, as with the first computer system 40a, the guest OS 32 is logically operating in another partition, and the user configures an instance 35 that operates the application 30b.

According to a general implementation example of the hypervisor 31 on the virtual machine host, a file system is further defined for the volume 101 in the storage apparatus 100, and the guest OS 32 recognizes a file called the virtual disk 102 as a storage area. Therefore, the guest OS 32 recognizes the virtual disk 102 as the virtual volume 103 and performs reading and writing. Thereby, for example, even when the hypervisor 31 migrates the virtual disk 102 to another volume 101, it is possible to read and write continuously without changing the setting of the guest OS 32.

In the second computer system 40b, the management computer 200b is constructed by a management program group operating on the physical server 20 or a virtual machine. The management computer 200b is connected to each physical server 20 and the storage apparatus 100 or a network switch constituting the network 55 through the network 50b, and manages the respective configurations. Details of the management program constituting the management computer 200b will be described later.

FIG. 7 is a diagram showing a configuration of a management program that runs on the management computer 200. The management program on the management computer 200 is composed of a plurality of modules. The role of each module of the management program will be described below. Management programs having similar functions in the first computer system 40a and the second computer system 40b give the same name, but as shown in FIG. It may be distinguished whether the program is on the first computer system 40a or the second computer system 40b.

The management computer 200a that manages the first computer system 40a includes an application configuration management unit A 201a, a task execution management unit 202, a data arrangement management unit 204, and a data linkage control unit A 205a that are configured by modules of a management program. . Each of these modules may be operated simultaneously as long as consistency is not impaired, or may be distributed on a plurality of physical servers.

Application configuration management unit A 201 a manages the configuration of the application 30 a running on the cluster 11. The application 30a operates using a function provided by the OS and a computing resource provided by the first computer system 40a. The application configuration management unit 201a holds these settings, and changes the settings and reflects them in the system according to the user's specification. The settings managed by the application configuration management unit A 201a include, for example, the type of software that configures the application, the identifier that uniquely identifies the installation destination resource, the access control information of the user or processing process that uses the application, and the resource allocation amount This refers to the parameters to be executed and the operating status of the application.

The task execution management unit 202 has a function of dividing a series of processes (jobs) into processes called tasks according to the designation of the application configuration management unit A 201a and arranging them in each physical server 10. For example, when the job requested by the user is MapReduce processing that handles a large volume of data set, it is more detailed such as allocation for each distributed processing node in consideration of the dependency between data and processing, or aggregation of calculation results for each node. It is divided into processing steps of fine granularity. Here, an execution unit like the processing step is called a task.

¡To place tasks on each node, at least a schedule for determining the task execution order and an algorithm for determining the placement destination node for each task are implemented. In the scale-out type base 40a, it is necessary to consider the tradeoff between the processing performance realized by parallel execution and the amount of consumed resources, and the redundancy to cope with hardware failure and logical failure. Each process is asynchronously and dynamically arranged by 202. For this reason, the task arrangement method is not constant, and is not deterministic depending on the operating state of the scale-out type base board 40a. Information on tasks managed by the task execution management unit 202 is held in a task management table 203 included in the task execution management unit 202.

FIG. 8 is a diagram showing a specific example of the task management table 203. The job accepted by the task execution management unit 202 is divided into tasks. Each task is managed by a set of job ID 203a and task ID 203b in the table. Each task is determined according to the execution order and the efficiency of parallelization, and is held as a record in the task management table 203 according to the order in which the tasks should be arranged. Hereinafter, a record held in the task management table 203 may be referred to as task execution information.

Priority is given to each task, for example, a small number is assigned from the highest priority. The priority is held in the priority field 203c.

Also, the number of distributed processing nodes or the number of processes arranged simultaneously is managed by the multiplicity 203d.

When the placement destination node 10 is determined by the task execution management unit 202, an identifier for identifying the node is recorded in the placement node ID field 203e.

Also, an area necessary for the task to read and write data is called a directory here, and its identifier (path) is held in the directory field 203f. When a data usage method is defined for the same directory, it is stored in the task management table 203 as an attribute 203g.

More specifically, the data utilization method indicates that the read-only data “R” and the write-only data “W” are both indefinite data “R / W”. Further, if the job designates an external data source not included in the first computer system 40a, specifically, data included in the second computer system 40b, the job is external data. “E” is added to indicate “R (E)”, “W (E)”, and the like.

The data arrangement management unit 204 manages the identifier of the storage device and the address on the device that are actually assigned to the directory required for task execution. Even if the data allocation management unit 204 manages data allocation (volume identifier such as device identifier or LUN (Logical Unit Number) or block address on the device), the data used by which application and in which task Whether it is an area cannot be grasped alone.

The data arrangement management unit 204 manages the data arrangement by the data arrangement management table 206, and associates the directory included in the task execution information with the data arranged in each distributed processing node 10.

A specific example of the data arrangement management table 206 is shown in FIG. In this embodiment, the data arrangement management table 206 includes a directory 206a indicating a logical data storage destination for each task, an assignment flag 206b indicating whether or not task assignment is received by the task execution management unit 202, and the data A master flag 206c indicating whether or not the area is a master, a state 206d indicating whether or not the data area can be used from a task, and a placement destination device 206e indicating the internal storage of the placement destination are held.

When the usage method from the task can be determined (for example, when the attribute 203g is held in the task management table 203), the data arrangement management table 206 may be further provided with an attribute 206f. The value of the attribute 206f conforms to the attribute 203g held in the task management table 203.

As a representation format of the placement destination device 206e, a combination of a node name and a drive number is used so that it is at least unique within the cluster 11 to which the node belongs. However, for external devices, a name other than the node name, for example, the name “G01” of the storage gateway is used in the sense that it is not secured in the internal storage of any node. Further, the media tier type may be added to the placement destination device 206e in the form of “(T1)” for the storage controller 36a to use for control.

Actually, since the storage controller 36a is responsible for processing such as securing, deleting, duplicating, and transferring the storage area, the master 206c and the state 206d follow the metadata held by the storage controller 36a on each node.

The data linkage control unit A 205a particularly controls the exchange of data held by the first computer system 40a and the second computer system 40b. The data linkage control unit A 205a communicates with a management program group operating on the other management computer 200a, and can interpret the configuration managed by each management program. Further, management information can be transmitted / received to / from the data linkage control unit B 205b operating in the second computer system 40b, and a linkage function characteristic of the present invention is realized.

Application configuration management unit B 201b manages the configuration of the application 30b running on the second computer system 40b.

The resource configuration management unit 207 manages the configuration of resources necessary for the operation of the application 30b. In the second computer system 40b, a management method in which resource design is performed in advance and necessary resources are reserved is generally used. Therefore, the second computer system 40b is static in the sense that there is less change compared to the first computer system 40a. is there.

FIG. 10 is a diagram illustrating an example of a resource configuration managed by the resource configuration management unit 207. A specific example of the resource configuration managed by the resource configuration management unit 207 will be described using the resource configuration management table 208 of FIG.

In the resource configuration management table 208, for each instance identifier 208a, a type 208b indicating whether it is a virtual machine or a bare metal host, a physical server identifier 208c indicating a working physical server, and a storage resource (volume 101) to be used are uniquely specified. A set of the target ID 208e and the LUN 208f specified in FIG.

Furthermore, a partition 208g indicating a mount point managed by the guest OS 32 is held. In general, since storage requirements differ for each type of file used by the application 30b, the resource configuration management unit 207 adjusts the storage destination device by the partition 208g.

More specifically, when a database is assumed as an example of the application 30b, a large-capacity partition “/ dev / db” for storing the data body and an update log before the transaction processing is completed are stored. A partition with high write performance “/ dev / log” is prepared separately, and the volume to be used is separated. When data needs to be quiesced, such as during backup acquisition, writing to the partition is stopped, and the state 208d at this time is expressed as “locking”, for example. The resource configuration management unit 207 records these configuration information in the resource configuration management table 208 when creating each instance, and updates the table when the configuration is changed.

The data cooperation control unit B 205b operates in cooperation with the data cooperation control unit A 205a operating in the first computer system 40a, and transfers data held by the second computer system 40b to the data cooperation control unit A 205a.

As with the first computer system 40a, the second computer system 40b is in operation at any time as a whole. Therefore, when the first computer system 40a requests the data in the second computer system 40b, if the data is staticized or transferred in error, a service response is sent to the end user of the second computer system 40b. There may be disadvantages such as delay or loss.

In this embodiment, data transfer management is performed so that both the data and the processing performance of both computer systems can be utilized while minimizing such a decrease in service level.

Here, the transfer of necessary data from the second computer system 40b to the first computer system 40a according to the job and task executed on the first computer system 40a is taken as an example, and data transfer management is performed. A method will be described.

The processing related to data transfer is roughly divided into two processes. The first process is a process for grasping the correspondence between the data arrangement recognized by the first computer system 40a and the data arrangement recognized by the second computer system 40b. This process will be described later with reference to FIG. The second process is a process of determining demand in the first computer system 40a, transferring data from the second computer system 40b to the first computer system 40a, and expanding the data in the first computer system 40a. is there. This process will be described later with reference to FIG.

First, the first process will be described.

FIG. 11 is a sequence diagram showing a flow from when a user submits a job in the first computer system 40a to when necessary data is specified in the second computer system 40b.

In the first computer system 40a and the second computer system 40b, different logical addresses are assigned even if the actual location of the data is the same. In order to realize data cooperation as a target in the present embodiment, a means for grasping the correspondence relationship of data arrangement between computer systems having different architectures is required.

In step 701, the user (job requester 700) submits a desired job from the client computer 250 to the application configuration management unit A 201a. The application configuration management unit A 201a responds to the client computer 250 that the job requested by the job requester 700 has been received if there is no problem in setting. At this time, when the job requester 700 uses an external data source that is not in the first computer system 40a, the identifier is designated in the job program or directory. As a format of the identifier, a format in which the identifier is represented by a URI (Uniform Resource Identifier), a database connection string (connection string), an NFS mount point, or a path pointing to a partial area in the storage gateway is used.

In step 702, the application configuration management unit A 201a newly registers the job submitted by the job requester 700 in the task execution management unit 202. The task execution management unit 202 divides the job received from the application configuration management unit A 201a into a plurality of tasks, and holds it in the task management table 203 (step 704). As a result, when there is an inquiry from another management program, the task execution information is managed so that the execution status and schedule of the job and related tasks can be transmitted (step 703).

On the other hand, the data linkage control unit A 205a monitors whether a new job has been input by a method such as periodically checking the application configuration management unit A 201a. If it is confirmed in step 705 that a new job has been submitted, the data linkage control unit A 205a searches the task management table 203 of the task execution management unit 202 and acquires task execution information of the task corresponding to the job. . This task execution information includes information such as a priority and a directory for a task to be executed. The data linkage control unit A 205a performs a task for accessing data stored in the second computer system 40b (with the attribute 203g indicating that the directory to be used does not exist in the first computer system 40a). Determine.

Further, the data linkage control unit A 205a inquires of the data arrangement management unit 204 about the directory arrangement location that is determined not to exist in the first computer system 40a, and from which device the data arrangement management unit 204 is reading the data source. Is determined (step 706). More specifically, with reference to the value of the field of the placement destination device 206e in the data placement management table 206, for example, a directory described as a storage gateway (G01, etc.) is scanned, and the field of the placement destination device 206e of the corresponding directory is scanned. The device identifier (D101 (T1), etc.) included in is acquired. The data cooperation control unit A 205a combines the obtained device identifier and task execution information and notifies the data cooperation control unit B 205b on the second computer system 40b.

The task execution information transmitted to the data linkage control unit B 205b includes an address (device information) given by the first computer system 40a to the resource in the second computer system 40b. As described above, for example, a Web application running on the first computer system 40a connects to a database in the second computer system 40b, a connection character string, an NFS mount point, or an iSCSI connection IQN ( This includes the iSCSI Qualified Name) and the SCSI ID (a set of target ID and LUN, etc.).

Since such access information (address) is given to the external resource by the first computer system 40a, it is common that another address is given in the second computer system 40b. is there. Therefore, in step 707, the data linkage control unit B 205b needs to read this access information into an address in the second computer system 40b.

For this reason, the data linkage control unit B 205b acquires the corresponding storage resource address (target ID and LUN pair) and usage status from the resource configuration management unit 207 (step 708), and further makes an inquiry to the shared storage controller 150. (Step 709), the storage configuration of the storage area storing the necessary data is uniquely identified.

More specifically, as a series of processing of these management program groups, first, from the connection destination ID 505 of the storage configuration management table 153c, it is provided to the outside, that is, the first computer system 40a in this embodiment. Search for. Further, a set of a connection destination ID 505 and a path 504 that uses the same volume 501 as the path 504 provided to the outside is specified. Further, the resource configuration management table 208 is searched by the set of the connection destination ID 505 and the path 504, and the instance ID 208a of the physical server to which the corresponding device belongs and the use status 208d of the device are specified.

For example, in FIG. 5, the volume having the connection destination ID 505 described as “External”, that is, the volume ID “VOL201” to which the path connected to the outside belongs has “Host 01” as the other connection destination ID, and the target They are connected by ID “TRG0” and LUN “LUN3”. At this time, in FIG. 10, when searching for a place connected with the target ID “TRG0” and LUN “LUN3”, the physical server ID 208c is “Host 01” and the instance ID 208a is “INS-01-032”. It can be seen that the device with the target ID 208e “TRG0” and LUN 208f “3” corresponds to it.

FIG. 12 is a sequence diagram showing a process for expanding the data of the second computer system into the first computer system. In the figure, when the task execution time is approaching, necessary data is transferred from the second computer system 40b to the first computer system 40a, and the data is expanded in the distributed storage system of the first computer system 40a. A series of processes to be performed is shown. It is assumed that the correspondence between data in the first computer system 40a and the second computer system 40b can be grasped by the processing in the previous stage shown in FIG.

In step 710, the data linkage control unit A 205 a on the first computer system 40 a requests and acquires the latest task execution information from the task execution management unit 202, and refers to the task execution information to place the task. The node 10 to be checked is checked (step 711). Furthermore, the data linkage control unit A 205a requests the data linkage control unit B 205b on the second computer system 40b to start data transfer. Further, the data linkage control unit A 205a inquires of the data arrangement management unit 204 about the data arrangement based on the directory path of the task, the arrangement destination device 206e serving as the load destination, the arrangement destination of the master, and the status 206d of the expansion process Such information is acquired (step 712).

In step 713, the data linkage control unit B 205b on the second computer system 40b receives a data transfer request and starts securing resources for performing the data transfer. At this time, the data linkage control unit B 205b first requests the resource configuration management unit 207 to acquire the lock of the corresponding volume 101 so that it is not accessed by the application of the other second computer system 40b (step 714). If the lock acquisition is successful, the data linkage control unit B 205b instructs the shared storage controller 150 in the second computer system 40b that manages the data of the volume to transfer the data.

The shared storage controller 150 actually waits for a request from the first computer system 40a in accordance with the data transfer instruction (step 715).

However, when the backup volume 108 as an alternative exists in the accessed volume 101, a path for connecting the backup volume 108 to the outside is set and the lock of the backup source volume 101 is exceptionally released, etc. Preparation is performed (step 716).

On the other hand, in step 717, the data linkage control unit A 205a uses a specific algorithm as to how to arrange (expand) the data scheduled to be transferred from the second computer system 40b on the first computer system 40a. Therefore, calculate. More specifically, as an algorithm for calculating this expansion method, a placement destination device 206e having the highest response speed is selected for the distributed processing node 10 where the task is placed, or a plurality of distributed processing nodes 10 are selected. A node having a low processing load at that time is selected from among the distributed storage controllers 36, or a storage tier (SSD 108, HDD 105, or flash 29) on the node 10 is selected according to the task priority. Can be considered.

The processing load relationship between the storage controller 36 and the distributed processing node can be controlled by aggregating the placement destination device 206e described in the data placement management table 206 and the placement node 203e described in the task management table 203. . For example, if the placement destination device 206e is specified so that the master or copy data accessed by the task is expanded in the internal storage of the distributed processing node where the task is placed, the IO response can be speeded up.

The data linkage control unit A 205a determines the transfer order and expansion method for the transfer target data by the above algorithm and instructs the data arrangement management unit 204 to expand the data. Upon receiving the instruction, the data arrangement management unit 204 instructs the corresponding storage gateway to acquire data (step 718). Data acquisition here is data copy or reading.

The distributed storage controller 36 in the first computer system 40a requests the shared storage controller 150 in the second computer system 40b, and the shared storage controller 150 receives the request and transfers data to the distributed storage controller 36. Thereby, the distributed storage controller 36 in the first computer system 40a acquires the data from the shared storage controller 150 in the second computer system 40b via the storage gateway. The acquired data is expanded to the internal storage 37 of each node 10 in accordance with designation from the data arrangement management unit 204.

After the data transfer is completed, the data linkage control unit B 205b of the second computer system 40b instructs the resource configuration management unit 207 to release the lock (step 720), and the data linkage control unit A 205a of the first computer system 40a. Return control to.

According to the present embodiment, based on job and task execution information defined in the scale-out type base (first computer system 40a), the data is efficiently scaled up (second computer system 40b). And a computer system are provided. As a result, the amount of data communication and the storage capacity between the scale-out type base and the scale-up type base can be reduced, and the time required for executing the processing across the two can be reduced. As a result, the user of the computer system can enjoy the benefits of reducing the cost per processing and using the collected and stored data more effectively.

Further, according to the present embodiment, the compound computer system has one or more (preferably a plurality (preferably the same below)) distributed storages constituting the resource pool, respectively, and executes parallel distributed processing of jobs. The first computer system 40a having a distributed processing computer (distributed processing node) and the second computer system in which one or more computers share a shared storage and are connected to the first computer system 40a to transmit and receive data. 40b. The first computer system 40a divides the job into one or more tasks, determines a method for expanding the data accessed by the task to the distributed storage, and requests the data from the second computer system 40b. When the second computer system 40b transmits the data to the first computer system 40a, the first computer system 40a expands the received data in the distributed storage by the expansion method. According to this, the distributed storage in the first computer system 40a is expanded in a manner suitable for the first computer system 40a, which is a scale-out type computer system, from the data of the second computer system 40b, which is a scale-up type computer system. Therefore, suitable data linkage between the scale-up computer system and the scale-out computer system can be realized.

In the present embodiment, the second computer system 40b determines the logical address (first logical address) in the first computer system 40a and the logical address (second logical address) in the second computer system 40b. The management information (storage configuration management table 153c, resource configuration management table 208) to be associated is held in advance. The first computer system 40a transmits the first logical address of the data accessed by the task to the second computer system 40b. The second computer system 40b identifies the second logical address of the data based on the received first logical address and management information, acquires the data using the second logical address, and obtains the first computer Send to system 40a. According to this, since the second computer system 40b that is a scale-up type computer system can be associated with the logical address of the first computer system 40a that is a scale-out type computer system, the scale-up type computer system Data linkage is possible even if different logical addresses are used in the scale-out computer system.

In the present embodiment, when a task execution request occurs, the first computer system 40a requests preparation for transmitting data to the second computer system 40b before starting the calculation of the data expansion method. To do. When the second computer system 40b receives a request for preparation, the second computer system 40b prepares for data transmission and waits for a data transmission request from the first computer system 40a. According to this, since the second computer system 40b can prepare to transmit data while the first computer system 40a is calculating the expansion method, the first computer system 40a and the second computer system 40a The data linkage of the computer system 40b can be efficiently performed.

Further, in the present embodiment, the first computer system 40a determines the expansion method so that the data accessed by the task is arranged in the distributed storage in the vicinity of the distributed processing computer that executes the task. According to this, in the scale-out type computer system, the distributed processing computer and the distributed storage accessed by the distributed processing computer can be arranged close to each other, so that the storage can be accessed efficiently.

Further, in this embodiment, the first computer system 40a may determine the expansion method so that the data accessed by the task is arranged in the distributed storage of the distributed processing computer that executes the task.

According to this, in the scale-out type computer system, the storage accessed by the distributed processing computer can be arranged inside the distributed processing computer, so that the storage can be efficiently accessed.

Further, in the present embodiment, the first computer system 40a determines a deployment method so that the load is distributed to one or more distributed storages. According to this, in the scale-out type computer system, data is expanded so that the load of the distributed storage is not biased, so that access can be prevented from being concentrated on a specific storage device, and access can be performed efficiently. .

In this embodiment, when a job is submitted, the first computer system 40a transmits the first logical address of the data accessed by the task to the second computer system 40b, and the second computer system 40b identifies the second logical address of the data based on the received first logical address. When a task execution request is generated, the first computer system 40a determines a method for expanding the data accessed by the task to the distributed storage, requests the data from the second computer system 40b, and the second computer system 40b. Acquires the data using the second logical address and transmits the data to the first computer system 40a. The first computer system 40a distributes the data transmitted from the second computer system 40b according to the expansion method. Deploy to storage and perform tasks. According to this, the logical addresses of the scale-out computer system and the scale-up computer system are associated with each other when a job is submitted, and data is transmitted from the scale-up computer system to the scale-out computer system when a task execution request occurs. Since the task is executed by the scale-out type computer system, the data necessary for the execution of the task can be efficiently and quickly developed in the scale-out type computer system.

In this embodiment, the distributed processing computer (distributed processing node) is a computer having a CPU 21, a memory 22, and a flash memory 29. A distributed storage is configured on the flash memory 29, and the CPU 21 executes a software program. The distributed storage controller 36a realized by this expands the data to the distributed storage by the expansion method instructed by the data arrangement management module (data arrangement management unit 204) of the management computer 200a.

In this embodiment, the type of application executed in each computer system or the specification is not particularly limited, but more specifically, the following utilization examples are given.

In the application example of the compound computer system of this embodiment to the data analysis application, for example, a product transaction transaction stored in the scale-up type base may be used for analyzing the customer consumption tendency executed on the scale-out type base. It is done. At this time, the data linkage control unit A 205a detects a task that is going to refer to past product transaction information, and transfers the transaction data from the scale-up base prior to the analysis processing.

Since necessary data is acquired from the scale-up type base prior to the analysis processing, the scale-out type base can continue the processing without particularly worrying about the location of the data when performing the analysis processing. For example, there is no need to perform processing for waiting for completion of data transfer from the scale-up type base to the scale-out type base while performing analysis processing, or processing for checking whether there is an error in the transferred data. Further, since the data accumulated in the scale-up type base can be analyzed in the scale-out type base, the range of information to be incorporated into the analysis process can be widened, and the analysis accuracy can be improved.

Also, because the scale-up platform data is transferred to the scale-out platform, deployed, and used for analysis processing on the scale-out platform, the time during which data is locked in the scale-up platform can be reduced. it can. For example, when a scale-up base is used as a database for merchandise transactions as a main purpose, it takes less time to stop updating the database.

Further, in the application example of the compound computer system according to the present embodiment to the Web application use, the scale-out type platform is responsible for the processing of distributing the moving image to many clients in parallel, such as image data or a program executed on the client side. An example is that static content is held in a scale-up type base and is used by expanding it to a scale-out type base if necessary.

Specifically, the page to be browsed next is determined when the end user logs in to the Web site. Further, it is determined whether or not the static content of the page in the scale-up type base has been updated, and if it has been updated, the latest static content is transferred from the scale-up type base to the scale-out type base. Processing related to the development and update of static content is performed on a scale-up type platform, and on the scale-out type platform, production of moving image content to be distributed or program debugging is concentrated.

The scale-out-type infrastructure is suitable for processing to send reference-only data to a large number of clients in parallel. In such applications, the scale of deployment can be adjusted according to the amount requested by the clients.

On the other hand, by consolidating the development of static content on a scale-up platform, it is possible to ensure the consistency of the entire website and at the same time suppress unnecessary transfer of content. In addition, for pages that are expected to be accessed by end users in a short period of time, such as campaign sites, it is possible to increase the number of data deployments and flexibly configure the usage to actively increase investment efficiency according to the purpose. Can be easily handled with a scale-up base.

Also, the present invention can be applied to the VDI (Virtual Desktop Infrastructure, virtual desktop platform) of the compound computer system according to the present embodiment. In this application example, for example, a master image for starting a virtual desktop that a user company uses for an employee is created in a scale-up type base board according to the management policy of the head office. In the scale-out type platform, when the employee uses VDI at a remote site constructed in a branch office or the like, an example is considered in which the startup master image is transferred from the scale-up type platform to the scale-out type platform.

At this time, it is also possible to change the master image displayed on the employee's terminal by transferring from the scale-up type base to the scale-out type base according to the order or time when the employee requested the login job on the remote site side. The type of startup master image to be transferred is determined based on the order or timing when the login job is requested, and the data is expanded in the vicinity of the node that operates the virtual desktop. As a result, the startup time or login time on the remote site side can be shortened and the required storage capacity can be reduced. On the other hand, by centrally managing the startup master image on a scale-up platform, security patches can be applied consistently, and predetermined settings can be applied collectively to employee software. Can be done easily. While ensuring the high level of governance, it is possible to suppress the scale-up type base that performs centralized management from becoming a bottleneck.

Example 1 exemplifies a compound computer system that loads data from a scale-up type base to a scale-out type base, but is not limited thereto. As another example, the second embodiment exemplifies a computer system that efficiently loads data from a scale-out type base to a scale-up type base.

As described above, the scale-out type base is good at large-scale parallel distributed processing, and the scale-up type base is good at processing requiring reliability or processing speed by itself. Therefore, it is conceivable that the access log and the unstructured data (natural language text, image data, etc.) are processed on the scale-out base and the result is read by the application on the scale-up base.

The system configuration of the computer system shown in the first embodiment and the management program stored in the computer system are not only transferred from the scale-up base to the scale-out base but also from the scale-out base to the scale-up base. Data transfer is also realized.

More specifically, for example, the data arrangement management table 206 includes the storage gateway 36b in the data development destination node for the data on the first computer system 40a that would be necessary in the second computer system 40b. And the volume 101 secured in the storage device 100 in the second computer system 40b is designated as the data write destination.

At this time, the data cooperation control unit A 205a and the data cooperation control unit B 205b can control the order of executing a plurality of tasks obtained by dividing the job and the timing of transferring data from the storage gateway 36b. Further, data can be efficiently transferred by the control.

The physical configuration and logical configuration of the

computer systems

40a and 40b in the present embodiment are the same as those in the first embodiment. Hereinafter, unless otherwise required, each term in the present embodiment refers to the same term as in the first embodiment.

FIG. 13 is a diagram illustrating a data transfer operation according to the second embodiment.

In this embodiment, an area for transferring data from the first computer system 40a is secured as the virtual volume 107 in the second computer system 40b.

In the second computer system 40b, the data linkage control unit B 205b instructs the shared storage controller 150 to create the virtual volume 107, and further sets a port 732 for release to the first computer 40a.

The data cooperation control unit B 205b of the second computer system 40b cooperates with the data cooperation control unit A 205a of the first computer system 40a, sets a part of the distributed storage controller 36a as the storage gateway 36b, The volume 107 is mapped. As a result, the virtual volume 107 in the first computer system 40 a to the second computer system 40 b can be used as a part of the distributed file system 730.

Consider the case where the above-described virtual volume 107 is provided in addition to the volume 101 used for main processing when the application 30b is constructed in the second computer system 40b. At this time, the data linkage control unit B205b of the second computer system 40b creates a job for the first computer system 40a via the data linkage control unit A205a and inputs it to the application configuration management unit A201a.

Thereafter, similarly to the sequence shown in FIG. 11, the task execution management unit 202 generates a task (step 704), and the task and the placement destination device are set by the data placement management unit 204. Here, the data linkage control unit A 205a searches the task management table 203 for a task in charge of aggregation processing (for example, processing for obtaining the final job result such as logical sum and logical product of the processing results of each task), and stores the data. The directory 203f is specified. Furthermore, the arrangement destination device 206e of the directory 206a held in the data arrangement management table 206 is changed to the device ID of the area for the second computer system 40b managed by the storage gateway 36b via the data arrangement management unit 204. . At this time, even if the device is not a master (an area in which data is directly written from the arithmetic process), the device is replicated to the device by the autonomous operation of the distributed storage controller 36a (aggregation operation 731 in FIG. 13). . In this way, data transfer 733 from the first computer system 40a to the second computer system 40b is performed only when the tabulation process is actually performed.

In the second computer system 40b, by accessing the virtual volume 107, it is possible to refer to the processing results of the first computer system 40a as if the aggregation results were generated in the same system. Further, the second computer system 40b does not always have to update the data being processed for the first computer system 40a. When the series of tasks in the first computer system 40a is completed, the necessary processing result is obtained. Only need to be transferred.

As described above, as described in the present embodiment, the compound computer system not only transfers data from the second computer system 40b to the first computer system 40a but also from the first computer system 40a to the second computer system. Data transfer to 40b is also efficiently realized.

As a more specific application example of this embodiment, for example, an improvement of a server program using a web application process log can be mentioned. A web application server that automatically scales in response to a user request is deployed on the scale-out platform (first computer system 40a), and the development environment for the application is installed on the scale-up platform (second computer system 40b). To construct. The data linkage control unit B 205b inputs a job for extracting the processing steps that are the bottleneck of the whole by collecting the logs of the processing steps of the application and the time required for the processing steps into the application configuration management unit A 201a. At this time, by setting the storage gateway 36b to be included in the placement destination device at the time of data expansion, the aggregation result is loaded into the scale-up type base (second computer system 40b). This presents the processing steps that are expected to provide the highest speed of processing and the improvement of the processing method in the web application to the developer who develops the application server program on the scale-up type platform. be able to.

According to this application example, it is possible to make a highly accurate improvement in that the performance test of the server program is not specially created and executed, and the actual user behavior and the operation of the Web application are reflected. In addition, the data transfer required for this can be minimized.

Hereinafter, the configuration of the second embodiment will be described in an organized manner.

As in the first embodiment, the compound computer system according to the second embodiment includes one or more distributed storages constituting a resource pool and one or more distributed storage controllers 36a that control the distributed storage 37, respectively. The first computer system 40a having one or more distributed processing computers 10 that execute processing and the one or more computers 20 share the shared storage, and are connected to the first computer system 40a to transmit / receive data. A second computer system 40b. The second computer system 40b creates a virtual volume and publishes it to the first computer system 40a. The first computer system 40a makes the virtual volume available by setting at least some of the distributed storage controllers 36a as gateways. Further, the second computer system 40b submits a job to the first computer system 40a. The first computer system 40a processes the job and returns the processing result to the second computer system 40b via the virtual volume.

According to this, the second computer system 40b, which is a scale-up type base, causes the first computer system 40a to execute processing via the virtual volume, so that processing is performed as if using the virtual volume in the same system. In this way, jobs can be executed on a scale-up platform.

In this embodiment, when the second computer system 40b submits a job to the first computer system 40a, the first computer system 40a divides the job into a plurality of tasks and performs parallel distributed processing. Of the plurality of tasks, the location of the directory for storing the processing result of the task that collects the processing results is set in the area of the second computer system 40b managed by the gateway. Furthermore, the second computer system 40b refers to the aggregation processing result of the first computer system 40a by accessing the virtual volume.

According to this, the first computer system 40a divides the job into a plurality of tasks, performs parallel distributed processing, aggregates the processing results of the parallel distributed processing, and collects the aggregated final processing results as the second computer system. Since the reference can be made from 40b, the second computer system 40b can use the resources of the first computer system 40a without being aware of how to process the job in the first computer system 40a.

In the present embodiment, the second computer system 40b inputs a predetermined job included in the application to the first computer system 40a when the application is constructed.

According to this, when an application is constructed in the second computer system 40b, a job that realizes processing suitable for the scale-out type base included in the application is assigned to the first computer system 40a. Each job is executed by a computer system suitable for the job, and the entire application can be executed efficiently. For example, the main processing provides a suitable configuration when large-scale processing that analyzes access logs and unstructured data is included in a part of an application that requires reliability or processing speed. can do.

Further, in the present embodiment, the second computer system 40b has a web application development environment constructed, and the first computer system 40a has an application server for the web application. The second computer system 40b submits a job to the first computer system 40a to analyze a log of executing a Web application and extract a processing step that is a bottleneck. The first computer system 40a extracts the bottleneck processing step and sends it to the second computer system 40a. The second computer system 40b presents the bottleneck processing step to the developer.

According to this, in the first computer system 40a that is a computer system in which a log is generated by the operation of a Web application and is suitable for large-scale processing, the log of the Web application is recorded. Analyzing the bottle net by analysis, it is possible to suppress the transfer of log data between systems and to perform bottleneck extraction by log analysis on a suitable computer system for large-scale processing in web application development it can.

In this embodiment, the job that the second computer system 40b submits to the first computer system 40a counts the time required for each processing step based on the log of the Web application, and determines the bottleneck based on the required time. It is a job to extract the processing step. According to this, processing steps with a long required time can be extracted and presented to the developer, so that it is possible to select a processing step with a long required time from the processing of the Web application and improve them.

The application server deployed in the first computer system 40a is an application server having a function of performing automatic resource scaling, and information indicating processing steps that are highly requested by users and are bottlenecks. Presented to the developer from the second computer system 40b.

According to this, a processing step that is highly requested by the user and that takes a long time is presented to the developer, so the developer selects a processing step that is expected to improve relatively, and efficiently improves it. be able to.

The above-described embodiments are examples for explaining the present invention, and are not intended to limit the scope of the present invention only to those embodiments. Those skilled in the art can implement the present invention in various other modes without departing from the gist of the present invention.

DESCRIPTION OF SYMBOLS 10 ... Node, physical server, distributed processing node, 100 ... Storage device, 101 ... Volume, 102 ... Virtual disk, 103 ... Volume, 104 ... RAID group, 105 ... HDD, Storage media, 106 ... Storage pool, 107 ... Virtual volume 108, SSD, backup volume, 11 ... cluster, 12 ... application, 150 ... shared storage controller, 151 ... CPU, processor, 152 ... memory, 153a ... response program, 153b ... storage control program, 153c ... storage configuration management table, 154 ... SAS IF, 155 ... NIC, 156 ... data bus, 20 ... host, 20 ... physical server, 200 ... management computer, 200a ... management computer, 200b ... management computer 201 ... VOL, 201a ... Application configuration management unit A, 201b ... Application configuration management unit B, 202 ... Task execution management unit, 203 ... Task management table, 203a ... Job ID, 203b ... Task ID, 203c ... Priority field , 203d ... multiplicity, 203e ... placement node ID field, 203f ... directory field, 203g ... attribute, 204 ... data placement management unit, 205a ... data linkage control unit A, 205b ... data linkage control unit B, 206 ... data placement management Table 206a ... Directory concerned 206b ... Flag 206c ... Master flag 206d ... Status 206e ... Placement device 206f ... Attribute ... 207 ... Resource configuration management unit 208 ... Resource configuration management table 208a ... Instance ID Instance identifier, 208b ... Type, 208c ... Physical server ID, physical server identifier, 208d ... Use status, 208e ... Target ID, 208f ... LUN, 208g ... Partition, 21 ... CPU, 22 ... Memory, 23 ... Data bus, 24 ... PCIe switch, 25 ... NIC, 250 ... client computer, 27 ... disk controller, 29 ... flash memory, 3 ... LUN, 30a ... application, 30b ... application, 31 ... hypervisor, 32 ... OS, 35 ... instance, 36 ... distributed storage Controller, storage controller, 36a ... storage controller, distributed storage controller, 36b ... storage gateway, 36c ... storage appliance, 37 ... built-in storage, 40 ... scale-out type base, computer system, 40b ... scale-up type base, computer system, 501 ... volume, 502 ... type, 503 ... capacity, 504 ... path, 505 ... connection destination ID, 506 ... media, 50a ... network, 50b ... Network, 50c ... Network, 51 ... HBA, 55 ... SAN, Network, 65 ... LAN, Network, 700 ... Job requester, 730 ... Distributed file system, 731 ... Aggregation operation, 732 ... Port, 733 ... Data transfer

Claims

A first computer system having one or more distributed processing computers each having one or more distributed storages constituting a resource pool and executing parallel distributed processing of jobs;
One or more computers share a shared storage and have a second computer system connected to the first computer system to transmit and receive data;
The first computer system divides the job into one or more tasks, determines a method for expanding the data accessed by the task to the distributed storage, and requests the data from the second computer system. ,
The second computer system transmits the data to the first computer system;
The first computer system expands the received data to the distributed storage by the expansion method;
Compound computer system.
The second computer system holds in advance management information that associates the first logical address in the first computer system with the second logical address in the second computer system,
The first computer system sends a first logical address of data accessed by the task to the second computer system;
The second computer system specifies a second logical address of the data based on the received first logical address and the management information, and acquires the data using the second logical address. To the first computer system
The compound computer system according to claim 1.
The first computer system requests preparation for transmitting data to the second computer system before starting the calculation of the data expansion method when the task execution request occurs,
When the second computer system receives the preparation request, the second computer system prepares the data transmission and waits for the data transmission request from the first computer system.
The compound computer system according to claim 1.
The said 1st computer system determines the said expansion | deployment method so that the said data accessed by the said task may be arrange | positioned to the distributed storage near the distributed processing computer which performs the said task. Complex computer system.
The composite computer according to claim 4, wherein the first computer system determines the expansion method so that the data accessed by the task is arranged in a distributed storage included in a distributed processing computer that executes the task. system.
The composite computer system according to claim 1, wherein the first computer system determines the deployment method so that a load is distributed to one or more of the distributed storages.
When the job is submitted, the first computer system transmits a first logical address of data accessed by the task to the second computer system, and the second computer system receives the first logical address. Identifying a second logical address of the data based on the first logical address;
When the task execution request occurs, the first computer system determines the expansion method of the data accessed by the task to the distributed storage, requests the data from the second computer system, and The second computer system acquires the data using the second logical address and transmits the data to the first computer system, and the first computer system is transmitted from the second computer system. Deploying the data to the distributed storage by the deployment method and executing the task;
The compound computer system according to claim 1.
The first computer system and the second computer system are connected, and the second computer system shares one or more computers with a shared storage, and the first computer system cooperates with the second computer system. In the first computer system of the compound computer system that executes jobs, one or more distributed processing computers each having one or more distributed storages that constitute a resource pool and executing parallel distributed processing of the jobs are provided. A management computer to manage,
A memory for storing software programs;
A CPU for executing a software program of the memory,
The software program includes
A task execution management module that divides the job into one or more tasks;
A data linkage control module for determining a method of expanding data accessed in the task to the distributed storage;
A data placement management module that instructs the distributed processing computer to expand the data to the distributed storage by the expansion method;
Management computer that contains.
When the execution request of the task occurs, the data linkage control module requests preparation for transmitting data to the second computer system before starting calculation of the expansion method of the data.
The management computer according to claim 8.
The said data cooperation control module determines the said expansion | deployment method so that the said data accessed by the said task may be arrange | positioned to the distributed storage near the distributed processing computer which performs the said task. Management computer.
The management computer according to claim 10, wherein the data linkage control module determines the expansion method so that the data accessed by the task is arranged in a distributed storage included in a distributed processing computer that executes the task. .
The management computer according to claim 8, wherein the data linkage control module determines the deployment method so that a load is distributed to one or more of the distributed storages.
The data linkage control module is
When the job is submitted, the first logical address of the data accessed by the task is transmitted to the second computer system, and the second logical address of the data is set based on the first logical address. To identify
When an execution request for the task occurs, the method for expanding the data accessed by the task to the distributed storage is determined,
The data arrangement management module is
Requesting the data from the second computer system, causing the data to be acquired using the second logical address, and transmitting the data to the distributed processing computer;
The management computer according to claim 8.
A plurality of distributed processing nodes that each have a plurality of distributed storages constituting a resource pool and a distributed storage controller that controls the distributed storages, and executes parallel distributed processing of jobs, a task execution management module, and a first data linkage control module And a first management computer that executes a software program including a data arrangement management module, and
A second program that executes a software program that includes a shared storage having a shared storage controller, a plurality of physical servers that share and use the shared storage and execute processing, a second data linkage control module, and a resource configuration management module A second computer system having a management computer, and
(1) When specifying data to be accessed,
In the first computer system,
When the job is submitted, the application configuration management module sends the job execution request to the task execution management module,
The task execution management module that has received the job execution request divides the job into a plurality of tasks,
The data linkage control module periodically checks whether a new job has been submitted to the application configuration management module, and if a new job has been submitted, includes a directory for accessing a task corresponding to the job The task execution information is acquired from the task execution management module, the information of the device in which the directory exists is acquired from the data arrangement management module, and if the directory is external, it is given by the first computer system. The task execution information including access information to the device where the directory is arranged is transmitted to the second computer system,
In the second computer system,
When the second data linkage control module receives the task execution information, the access information in the second computer system corresponding to the access information in the first computer system is acquired from the resource configuration management module. And inquires of the shared storage controller to identify the area of the shared storage that stores the data,
(2) When expanding the data to be accessed,
In the first computer system,
The first data linkage control module acquires task execution information from the data arrangement management module, and requests the second data linkage control module to transfer data accessed by the task based on the task execution information. And obtaining the data load destination information from the data arrangement management module,
In the second computer system,
The second data linkage control module requested to transfer the data causes the resource configuration management module to lock the volume corresponding to the data, and instructs the shared storage controller to transfer the data,
The shared storage controller instructed to transfer the data prepares to transfer the data,
In the first computer system,
The first data linkage control module calculates a data expansion method based on the data load destination information, and obtains the data and expands to the load destination by the expansion method to the data arrangement management module. Direct,
The data placement management module instructs the distributed storage controller that controls the load-destination distributed storage to acquire and deploy the data;
The distributed storage controller acquires and expands the data from the shared storage controller;
In the second computer system,
The shared storage controller that has given the data to the distributed storage controller notifies the second data linkage control module that transfer has been completed,
The second data linkage control module releases the lock and notifies the first data linkage control module of the transfer completion;
Compound computer system.