WO2013073005A1

WO2013073005A1 - Computer system and duplication control method

Info

Publication number: WO2013073005A1
Application number: PCT/JP2011/076283
Authority: WO
Inventors: 浩也松葉; 鵜飼　敏之
Original assignee: 株式会社日立製作所
Priority date: 2011-11-15
Filing date: 2011-11-15
Publication date: 2013-05-23
Also published as: JPWO2013073005A1; JP5731665B2

Abstract

In order to provide a function with which it is possible to reduce the amount of storage capacity consumed when a duplicate is saved in a substitute computer in preparation for malfunctions in a computer that executes jobs, and with which the job can be restored in a re-executable state when a malfunction occurs, a computer system, wherein multiple computers that execute jobs formed by dividing one process into multiple parts are connected via a network, is characterized in that the multiple computers include a first computer that executes jobs, and a second computer that saves duplicate data of the data stored in the first computer, and in that the computer system is equipped with a duplication control unit that creates duplicate data for the data that is written to a storage medium of the first computer by means of a job, and that writes the duplicate data to a storage medium of the second computer under prescribed conditions.

Description

Computer system and replication control method

The present invention relates to a distributed shared file system constructed using a plurality of computers. In particular, the present invention relates to a computer for managing duplicate data and a method for managing duplicate data.

When performing a large amount of data processing using a computer, the entire target processing may be divided into multiple jobs, and each job may be executed in parallel by multiple computers to speed up the processing. . In such a case, assignment of computer resources to each job is determined by the batch job system.

Since the batch job system dynamically assigns computers while monitoring the progress of processing, the computer to which each job is assigned is not fixed until immediately before the execution of each job. For this reason, files that store job input / output data may be placed on a distributed shared file system so that the same file can be used by all computers.

In a distributed supply file system composed of disk drives built in a plurality of computers, a failure of a computer executing a job may occur among the computers configuring the file system.

In this case, it is necessary for the alternative computer to access the information stored in the computer in which the failure has occurred and execute the job that was being executed on the computer again. In other words, the alternative computer must re-execute the job after discarding all changes to the file until the failure of the computer occurs. This is to prevent problems such as the result of the job executed by the computer being overwritten before the failure occurs.

A general method for transferring data to an alternative computer is to store the duplicated data in the alternative computer. Japanese Patent Application Laid-Open No. 2004-151867 discloses a job management that can prevent a file being changed from being referred to without performing a work for returning the file to a re-executable state even when the file is changed when a job abnormality occurs. A method for providing a system is described.

Specifically, in Patent Document 1, the job management system generates, in a storage unit, a storage unit that stores at least a real directory, and a virtual directory that duplicates the real directory at the start of job processing execution. On the other hand, a method including control means for controlling to perform file access by job processing is disclosed.

JP 2009-251764 A

In order for the alternative computer to take over the functions of the distributed shared file system when a failure occurs in the computer executing the job, the alternative computer only needs to hold the replicated data, but there is a problem that the consumption of the disk is doubled.

Furthermore, when the above-described known technology is used, it is necessary to restore the file state before execution in order to cause the alternative computer to re-execute the job. In order to realize this, two different versions before and after the job execution are executed. There is a problem that the amount of disk consumption becomes twice the actual file size.

Therefore, when a combination of the known technology and the technology described in Patent Document 1 is used, a storage area that is four times the actual file size is consumed, and the disk utilization efficiency is poor.

The present invention has been made in view of the above-mentioned problems, and can improve the fault tolerance of the distributed shared file system while reducing the disk consumption, and can restore the file state when the job is re-executed. An object is to provide a computer system and method.

A typical example of the invention disclosed in the present application is as follows. That is, a computer system in which a plurality of computers that execute a job in which one process is divided into a plurality is connected via a network, each of the plurality of computers including a processor and a memory connected to the processor A storage medium connected to the processor, and a network interface for connecting to another device, wherein the plurality of computers includes a first computer that executes the job, and the first computer. A second computer that holds duplicate data of data stored in the storage medium, and the computer system generates duplicate data of data written to the storage medium of the first computer by the job; A replication control unit that writes the replicated data to the storage medium of the second computer under a specific condition is provided.

According to the present invention, it is possible to secure the fault tolerance of the computer system while suppressing the consumption of the storage capacity of the storage medium, and automatically restore the job to a re-executable state when a failure occurs in the computer executing the job. can do.

It is a block diagram which shows the structure of the computer system of 1st embodiment of this invention. It is a block diagram which shows the hardware constitutions of the scheduler computer in 1st embodiment of this invention. It is a block diagram which shows the hardware constitutions of the server computer in 1st embodiment of this invention. It is a block diagram which shows the hardware constitutions of the server computer in 1st embodiment of this invention. It is explanatory drawing which shows an example of the replication file list | wrist in 1st embodiment of this invention. It is explanatory drawing which shows an example of the delayed writing information in 1st embodiment of this invention. It is a flowchart explaining the process which the completion | finish notification part in 1st embodiment of this invention performs. It is a flowchart explaining the process which the file server part in 1st embodiment of this invention performs. It is a flowchart explaining the process which the delay replication control part in 1st embodiment of this invention performs. It is a flowchart explaining the process which the delay replication control part in 1st embodiment of this invention performs. It is a flowchart explaining the process which the replication reception setting part in 1st embodiment of this invention performs. It is a flowchart explaining the process which the replication transmission setting part in 1st embodiment of this invention performs. It is a flowchart explaining the process which the recovery process part in 1st embodiment of this invention performs. It is a flowchart explaining the process which the failover process part in 1st embodiment of this invention performs. It is a flowchart explaining the process which the failover process part in 1st embodiment of this invention performs. It is a block diagram which shows the structure of the computer system of 2nd embodiment of this invention. It is a flowchart explaining the process which the replication control part in 2nd embodiment of this invention performs. It is a block diagram which shows the structure of the computer system of 3rd embodiment of this invention. It is a flowchart explaining the detail of the process which the replication division part in 3rd embodiment of this invention performs.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

[First embodiment]

FIG. 1 is a block diagram showing a configuration of a computer system according to the first embodiment of this invention.

The computer system includes a scheduler computer 101, a server computer 102A, and a server computer 102B. The scheduler computer 101, the server computer 102A, and the server computer 102B are connected to each other via the network 104. The network 104 is, for example, a LAN (Local Area Network). However, the present invention is not limited to the connection type of the network 104. Hereinafter, when the server computer 102A and the server computer 102B are not distinguished, they are described as the server computer 102.

In this embodiment, it is assumed that the server computer 102A operates as a computer that executes an assigned job, and the server computer 102B operates as a substitute computer when a failure occurs in the server computer 102A. In FIG. 1, there are two server computers 102, but there may be three or more. Hereinafter, a computer that can execute a job is referred to as an execution computer, and a computer that can operate as a substitute for another server computer 102 when a failure occurs is referred to as an alternative computer.

In this embodiment, one distributed shared file system is constructed on a storage area in which the storage areas provided by the server computer 102 are integrated. Data is stored for each predetermined range in the storage area provided by each server computer 102, and each server computer 102 manages data in the range. Each server computer 102 executes processing (writing, reading, updating, etc.) as a master of data included in the range managed by itself. Each server computer 102 holds duplicate data of a predetermined range of data managed as a master by other server computers 102.

In the following description, it is assumed that a job is executed on each server computer 102 under the configuration described above.

The scheduler computer 101 is a computer that assigns jobs to the server computer 102 and manages the execution status of each job. The scheduler computer 101 includes a job scheduler 141. The job scheduler 141 is a program for realizing a job management function, and includes a plurality of modules and information.

Specifically, the job scheduler 141 includes an end notification unit 151, a scheduling unit 152, a start processing unit 153, an end processing unit 154, and job information 171.

The end notification unit 151 detects the end of the job and notifies the server computer 102 of the end of the job. Details of the process executed by the end notification unit 151 will be described later with reference to FIG. The scheduling unit 152 determines a job to be started with reference to the job information 171 and the server computer 102 to which the job is assigned.

The start processing unit 153 instructs the server computer 102 determined by the scheduling unit 152 to start job execution. The end processing unit 154 monitors the end of the job and, when detecting the end of the job, notifies the scheduling unit 152 that the server computer 102 in which the job has ended has a free space.

The job information 171 stores information related to job assignment. The job information 171 includes at least the identifier of the job and the identifier of the server computer 102 to which the job is assigned.

Note that the scheduling unit 152, the start processing unit 153, the end processing unit 154, and the job information 171 are publicly known, and thus details thereof will not be described in this specification.

The server computer 102A is an execution computer that receives an access request to a file and executes various processes. The server computer 102A includes a file system program 142 and a user application 143.

The user application 143 is a program started by the job scheduler 141, and the processing content is arbitrary. When executing the process, the user application 143 outputs an access request for the file to its own file system program 142 or transmits an access request to the file system program 142 of another server computer 102. The access request may be received via the network 104 from another computer (not shown).

The file system program 142 is a program that executes various processes using a file based on an access request input from the user application 143, and includes a plurality of modules and information.

Specifically, the file system program 142 includes a file server unit 161, a delayed replication control unit 162, a disk driver 163A, a network disk driver 164A, a replication transmission setting unit 165, a recovery processing unit 166, a replication file list 180, a delayed writing. Information 181 and duplicate transmission setting information 182 are included.

The file server unit 161 executes processing such as writing to a file and reading a file based on the access request. Details of processing executed by the file server unit 161 will be described later with reference to FIG.

The delayed replication control unit 162 temporarily suspends writing of the data written by the file server unit 161 to the alternative computer, and writes the replicated data to the server computer 102B, which is a standby computer, after the job ends. Details of the processing executed by the delayed replication control unit 162 will be described later with reference to FIGS.

As described above, the delayed replication control unit 162 suspends writing of replication data to the server computer 102B until the job is completed. As a result, even if the job ends abnormally due to a failure in the server computer 102A, the server computer 102B resumes the job from the state before the job start by using the file (duplicated data) held by itself. can do.

The disk driver 163A is a program that realizes access to the disk device 126A (see FIG. 2) that stores file data.

The network disk driver 164A is a program that realizes access to the disk device 126B (see FIG. 2) for storing file data via the network 104. The network disk driver 164A may be iSCSI or the like.

For example, when the network disk driver 164B is set to permit access from the network disk driver 164A, the server computer 102A uses the disk driver 163A to access the disk device 126A (see FIG. 2). By executing the same process as the process for the network disk driver 164A, the disk device 126B (see FIG. 2) can be accessed.

The replication transmission setting unit 165 sets the server computer 102 that is the transmission destination of the replication data. Details of processing executed by the duplicate transmission setting unit 165 will be described later with reference to FIG.

The recovery processing unit 166 executes recovery processing after a failure has occurred in the server computer 102A. Details of the processing executed by the recovery processing unit 166 will be described later with reference to FIG.

The duplicate file list 180 stores information indicating the relationship between files and jobs. The duplicate file list 180 is generated by the file server unit 161. The file server unit 161 refers to the duplicate file list 180 and executes exclusive control on the file. Details of the duplicate file list 180 will be described later with reference to FIG.

The delayed write information 181 stores information related to replicated data. The delayed writing information 181 is generated by the delayed replication control unit 162. Details of the delayed writing information 181 will be described later with reference to FIG.

The duplicate transmission setting information 182 stores information related to the server computer 102 that is the destination of the duplicate data. Specifically, the duplicate transmission setting information 182 includes a network address, a Mac address, and the like of the server computer 102. In this embodiment, it is assumed that the network address of the server computer 102B is stored.

The server computer 102B is a computer that operates as an alternative computer when a failure occurs in the server computer 102A. The server computer 102B includes a copy receiving unit 144.

The copy receiving unit 144 is a program for executing writing of copy data transmitted from the server computer 102A and recovery processing when a failure occurs, and includes a plurality of modules. Specifically, the copy receiving unit 144 includes a disk driver 163B, a network disk driver 164B, a copy reception setting unit 169, and a failover processing unit 170.

The disk driver 163B and the network disk driver 164B are the same as the disk driver 163A and the network disk driver 164A.

The replication reception setting unit 169 sets whether or not to permit an access request from the server computer 102 that is an execution computer. Details of the process executed by the duplicate reception setting unit 169 will be described later with reference to FIG.

The failover processing unit 170 performs failover. Details of the processing executed by the failover processing unit 170 will be described later with reference to FIGS. 12A and 12B.

In the example shown in FIG. 1, the server computer 102A operates as an execution computer and the server computer 102B operates as an alternative computer to the server computer 102A. However, the present invention is not limited to the configuration described above. For example, the server computer 102A and the server computer 102B both include the file system program 142 and the copy receiving unit 144, operate as execution computers that perform jobs independently, and each of the

server computers

102A and 102B is the other server computer 102. It may be configured to operate as an alternative computer.

In the present embodiment, the scheduler computer 101 and the server computer 102 are separate computers, but the present invention is not limited. A configuration in which at least one of the plurality of server computers 102 includes the job scheduler 141 may be employed.

Next, the hardware configuration of the scheduler computer 101, the server computer 102A, and the server computer 102B will be described.

FIG. 2A is a block diagram showing a hardware configuration of the scheduler computer 101 in the first embodiment of the present invention. FIG. 2B is a block diagram showing a hardware configuration of the server computer 102A in the first embodiment of the present invention. FIG. 2C is a block diagram showing a hardware configuration of the server computer 102B in the first embodiment of the present invention.

As shown in FIG. 2A, the scheduler computer 101 includes a processor 111, a memory 112, a storage device 113, and a network interface 114.

The processor 111 executes a program stored in the memory 112. The function of the scheduler computer 101 can be realized by the processor 111 executing the program. Note that when processing is described with the program as the subject, it indicates that the program is being executed by the processor 111.

The memory 112 stores a program executed by the processor 111 and data necessary to execute the program. In the present embodiment, the memory 112 stores a job scheduler 141. The memory 112 may be a semiconductor memory such as a DRAM, and can be accessed at a higher speed than the storage device 113.

The storage device 113 stores programs and data. As the storage device 113, for example, an HDD and an SSD can be considered. The network interface 114 is an interface for connecting to other devices via the network 104.

Note that the job scheduler 141 may be stored in the storage device 113 or an external device (not shown). In this case, the job scheduler 141 is read from the storage device 113, or the job scheduler 141 is read from an external device via the network 104 and stored in the memory 112.

As shown in FIG. 2B and FIG. 2C, the server computer 102A and the server computer 102B have the same hardware configuration. The server computer 102A includes a processor 121A, a memory 122A, a storage device 123A, a network interface 124A, a storage interface 125A, and a disk device 126A. The server computer 102B includes a processor 121B, a memory 122B, a storage device 123B, a network interface 124B, a storage interface 125B, and a disk device 126B.

The

processors

121A and 121B execute programs stored in the

memories

122A and 122B. The function of the server computer 102A can be realized by the processor 121A executing the program, and the function of the server computer 102B can be realized by the processor 121B executing the program. Note that when processing is described with the program as the subject, it indicates that the program is being executed by the

processors

121A and 121B.

The

memories

122A and 122B store programs executed by the

processors

121A and 121B and data necessary for executing the programs. In the present embodiment, the user application 143 and the file system program 142 are stored in the memory 122A, and the copy receiving unit 144 is stored in the memory 122B. As the

memories

122A and 122B, for example, a semiconductor memory such as a DRAM can be considered, and can be accessed at a higher speed than the storage devices 123A and 123B.

Storage devices 123A and 123B store programs and data. As the storage devices 123A and 123B, for example, HDDs and SSDs can be considered. The network interfaces 124 </ b> A and 124 </ b> B are interfaces for connecting to other devices via the network 104.

The storage interfaces 125A and 125B are interfaces for connecting to storage devices (

disk devices

126A and 126B) that can store a large amount of data. In this embodiment, it is connected to the disk device 126A via the storage interface 125A, and is connected to the disk device 126B via the storage interface 125B.

The

disk devices

126A and 126B store files necessary for processing executed by the user application 143. There may be a plurality of

disk devices

126A and 126B. Also, a RAID may be configured using a plurality of

disk devices

126A and 126B. Further, the disk device 126A may be in a format externally attached to the server computer 102A, and the disk device 126B may be in a format externally attached to the server computer 102B.

The user application 143 and the file system program 142 may be stored in the storage device 123A or an external device (not shown). In this case, each program is read from the storage device 123A, or each program is read from an external device via the network 104 and stored in the memory 122A. Further, the copy receiving unit 144 may be stored in the storage device 123B or an external device (not shown). In this case, each program is read from the storage device 123B, or each program is read from an external device via the network 104 and stored in the memory 122B.

Next, information stored in the server computer 102A will be described.

FIG. 3 is an explanatory diagram showing an example of the duplicate file list 180 according to the first embodiment of this invention.

The duplicate file list 180 stores information in a list format indicating the correspondence between files and jobs.

In the first information 610, the first entry 611 is stored. The first entry 611 stores the address of the memory 122 indicating the storage destination of the first entry included in the list.

The entry 620 indicated by the first entry 611 stores the correspondence between the file and the job. Specifically, the entry 620 includes a file ID 621, a job ID 622, and then an entry 623.

The file ID 621 is an identifier that uniquely identifies the file. The job ID 622 is an identifier of an application (job) that executes processing using a file corresponding to the file ID 621.

Also, the next entry 623 is an address of the memory 122 indicating the storage destination of the next entry 620. When “0x0” indicating that there is no next entry 620 is stored in the next entry 623, it indicates that the entry 620 is the end of the list.

Note that the duplicate file list 180 shown in FIG. 3 is an example, and the duplicate file list 180 may be configured by other methods as long as it includes a job identifier and a file identifier.

FIG. 4 is an explanatory diagram showing an example of the delayed writing information 181 in the first embodiment of the present invention.

The delayed writing information 181 stores information in a list format indicating the correspondence between a job and the position of data written by the job. Specifically, the delayed writing information 181 includes a job list and a block list.

In the first information 700, the first entry 701 is stored. The first entry 701 stores the address of the memory 122 indicating the storage location of the first entry in the job list.

The entry 710 indicated by the first entry 701 stores the correspondence between the job and the data position. The entry 710 includes a job ID 711, a block list 712, and the next entry 713.

The job ID 711 is an identifier that uniquely identifies the job. The block list 712 is an address of the memory 122 indicating the storage location of the first entry in the block list indicating the position of data written by the job.

The next entry 713 is an address of the memory 122 indicating the storage destination of the next entry 710 in the job list. When “0x0” indicating that there is no next entry 710 is stored in the next entry 713, this indicates that the entry 710 is the end of the list.

In the first entry 720 in the block list, information indicating the position of data written by the job corresponding to the job ID 711 is stored. The entry 720 includes a block number 721 and the next entry 722.

Block number 721 is a block number indicating the position of a block that stores data written by a job. Next, the entry 722 is an address of the memory 122 indicating the storage destination of the next entry 720 in the block list.

Note that the delayed writing information 181 shown in FIG. 4 is an example, and the delayed writing information 181 may be configured by other methods as long as it includes a job identifier and a block number.

Next, processing executed by each program will be described. First, the processing of the job scheduler 141 will be described.

FIG. 5 is a flowchart for explaining processing executed by the end notification unit 151 according to the first embodiment of the present invention.

The end notification unit 151 starts processing upon receiving a job end notification from the end processing unit 154 (step S201). The end notification includes at least a job identifier.

The end processing unit 154 can detect that the job has ended by receiving a notification that the job has ended from the server computer 102 that executes the job. Here, it is assumed that a notification indicating that the job has ended is received from the server computer 102A.

The completion notification unit 151 refers to the job information 171 based on the job identifier, and acquires the identifier of the server computer 102A to which the completed job is assigned (step S202). Here, the identifier of the server computer 102A in the network 104 is acquired.

The end notification unit 151 transmits end information notifying that the job has ended to the delayed replication control unit 162 of the server computer 102 corresponding to the acquired identifier (step S203). After the notification, the end notification unit 151 ends the process (step S204). Note that the end information includes at least a job identifier.

Next, processing of the file system program 142 will be described.

FIG. 6 is a flowchart for explaining processing executed by the file server unit 161 according to the first embodiment of the present invention.

When the file server unit 161 receives an access request from the user application 143 or the like, the file server unit 161 starts processing (step S301). The access request includes an identifier (file ID) of the target file. Hereinafter, the file that is the target of the access request is also referred to as the target file.

The file server unit 161 determines whether or not the target file information is recorded in the duplicate file list 180 (step S302). Specifically, the file server unit 161 determines whether an entry 620 that matches the file ID included in the access request is recorded in the duplicate file list 180.

If it is determined that the target file information is not recorded in the duplicate file list 180, the file server unit 161 proceeds to step S305.

When it is determined that the target file information is recorded in the duplicate file list 180, the file server unit 161 refers to the entry 620 recorded in the duplicate file list 180, and identifies the identifier of the job that is accessing the file. Determines whether or not the ID matches the identifier of the job that issued the access request (step S303).

If it is determined that the identifier of the job being accessed does not match the identifier of the job that issued the access request, the file server unit 161 ends in error (step S304). This is to indicate that the file is being accessed by another user application 143. This eliminates contention access in which different jobs access the same file at the same time.

If it is determined that the identifier of the job being accessed matches the identifier of the job that issued the access request, the file server unit 161 proceeds to step S305.

When the determination result of step S302 is NO or the determination result of step S303 is YES, the file server unit 161 determines whether the access request is a write request (step S305).

If it is determined that the access request is not a write request, that is, the access request is a read request, the file server unit 161 calculates a block number on the disk device 126A in which the data of the target file to be read is stored ( Step S306).

Next, the file server unit 161 outputs a read request including the calculated block number to the disk driver 163A (step S307). Upon receiving the read request, the disk driver 163A reads the data of the target file from the predetermined disk device 126A. The read data is output from the disk driver 163A to the file server unit 161.

Next, the file server unit 161 outputs the data read from the disk device 126A to the user application 143, and ends the process (steps S308 and S312).

When it is determined in step S305 that the access request is a write request, the file server unit 161 records the identifier of the file and the identifier of the job to be executed in the duplicate file list 180 (step S309). ). By this processing, an entry 620 as shown in FIG. 3 is generated. When the new entry 620 is generated, the address to the new entry 620 is added to the entry 623 next to the entry 620 generated before that.

Note that when the same information is already recorded in the duplicate file list 180, control is performed so as not to duplicately record.

Next, the file server unit 161 calculates a block number indicating the storage destination of the data to be written on the disk device 126A (step S310).

Next, the file server unit 161 outputs a write request to the delayed replication control unit 162 and ends the process (steps S311 and S312). The write request includes a job identifier, a block number, and data.

As described above, the file server unit 161 refers to the duplicate file list 180 and controls each file to accept access from only one user application 143 at most.

This control has the effect of suppressing the influence on other user applications 143 even when the writing process by the user application 143 is canceled when the user application 143 terminates abnormally.

When a plurality of user applications 143 simultaneously execute read processing, there is no influence on the file due to abnormal termination of the user applications 143, and thus it is not necessary to perform the control as described above. Therefore, no information is registered in the duplicate file list 180.

7 and 8 are flowcharts for explaining processing executed by the delayed replication control unit 162 according to the first embodiment of the present invention. The delayed replication control unit 162 is activated when end information is received from the end notification unit 151 of the job scheduler 141 and when a write request is received from the file server unit 161.

FIG. 7 shows processing executed when a write request is received from the file server unit 161.

The delayed replication control unit 162 starts processing upon receiving a write request from the file server unit 161 (step S401).

The delayed write information 181 extracts the job identifier and block number included in the write request, and records the job identifier and block number in the delayed write information 181 in association with each other (step S402).

The delayed write information 181 outputs a write request to the disk device 126A including the block number and data to the disk driver 163A, and ends the process (steps S403 and S404). The disk driver 163A that has received the write request writes data to a predetermined block of the disk device 126A.

FIG. 8 shows processing executed when the end information is received from the end notification unit 151.

The delayed replication control unit 162 starts the process when receiving the end information from the end notification unit 151 (step S501). Note that the end information includes a job identifier.

The delayed replication control unit 162 refers to the delayed writing information 181 based on the job identifier, and searches for a block list corresponding to the completed job (step S502). Specifically, an entry 720 in which the job ID 711 matches the job identifier is searched.

After that, the following processing is repeatedly executed for all entries in the block list.

First, the delayed replication control unit 162 determines whether or not the block list is an empty set (step S503). That is, it is determined whether or not processing has been completed for all entries 720 in the block list.

When it is determined that the block list is not an empty set, the delayed replication control unit 162 reads the entry 720 included in the list, acquires the block number from the entry 720, and deletes the read entry 720 from the block list ( Step S506).

Next, the delayed replication control unit 162 reads the data stored in the acquired block number (step S507). Specifically, the delayed replication control unit 162 outputs a read request including the acquired block number to the disk driver 163A. The disk driver 163A that has received the read request reads data from the block on the disk device 126A corresponding to the block number, and outputs the data to the delayed replication control unit 162. Here, the read data is duplicated data.

The delayed replication control unit 162 outputs a write request including the block number and the read data (replicated data) to the network disk driver 164A, and returns to step S503 (step S508). The network disk driver 164A that has received the write request transfers the write request to the server computer 102B, which is an alternative computer. The server computer 102B stores the duplicate data in the disk device 126B based on the transferred write request.

As will be described later with reference to FIG. 10, a write request including duplicate data is set to be transferred to the server computer 102B. The transferred data is stored in the disk device 126B via the network disk driver 164B and the disk driver 163B of the server computer 102B.

If it is determined in step S503 that the block list is an empty set, the delayed replication control unit 162 deletes the entry 710 corresponding to the job from the delayed writing information 181 and ends the process (step S504, Step S505).

The delayed replication control unit 162 receives the data and simultaneously writes the data to its own disk device 126A. On the other hand, the delayed replication control unit 162 suspends writing to the server computer 102B until it receives end information from the end notification unit 151. That is, the copy data creation process is suspended until the job scheduler 141 confirms the end of the job.

Thus, even when the user application 143 that executes the job is abnormally terminated, the data written by the user application 143 that has abnormally terminated is not reflected on the disk device 126B of the server computer 102B. Therefore, by referring to the disk device 126B of the server computer 102B, the user application 143 that has ended abnormally can be returned to the state before starting the job.

Next, an example of a setting method for transmission / reception of replicated data in the server computer 102 will be described.

In the example described below, it is assumed that the server computer 102B not only operates as a substitute computer for the server computer 102A but also operates as an execution computer. That is, the server computer 102B includes the file system program 142 equivalent to the server computer 102A and executes the program. Accordingly, the

server computers

102A and 102B are set to be the other alternative computers. Specifically, the duplicate reception setting unit 183 performs settings for receiving data as an alternative computer, and the duplicate transmission setting unit 165 performs settings for transmitting data as an execution computer.

FIG. 9 is a flowchart for explaining processing executed by the copy reception setting unit 169 according to the first embodiment of the present invention.

The copy reception setting unit 169 starts processing upon receiving an activation command from the administrator (step S1001).

First, the duplicate reception setting unit 169 acquires the network address set in its own network interface 124 (step S1002). The acquired network address is, for example, an IP address.

The replication reception setting unit 169 determines whether or not the acquired network address is an even number (step S1003). If the IP address is an IPv4 IP address, it is determined whether or not the numerical value of the host address portion is an even number. For example, when the acquired IP address is “192.168.1.160”, it is determined that the IP address is an even number.

When it is determined that the network address is an even number, the copy reception setting unit 169 uses the server computer 102 to which the network address obtained by subtracting “1” as its own network address is assigned as the execution computer, from the server computer 102 to the network disk. The setting is made such that access via the driver 164 is permitted, and the process is terminated (steps S1004 and S1006). The access permission setting information is stored in the memory 122.

If it is determined that the network address is not an even number, that is, the network address is an odd number, the copy reception setting unit 169 executes the server computer 102 to which the network address obtained by adding “1” to its own network address is assigned to the execution computer As described above, the server computer 102 is set to allow access via the network disk driver 164, and the process is terminated (steps S1005 and S1006).

FIG. 10 is a flowchart for describing processing executed by the duplicate transmission setting unit 165 according to the first embodiment of this invention.

The copy transmission setting unit 165 starts processing upon receiving an activation command from the administrator (step S1101).

First, the duplicate transmission setting unit 165 acquires the network address set in its own network interface 124 (step S1102). The acquired network address is, for example, an IP address.

The duplicate transmission setting unit 165 determines whether or not the acquired network address is an even number (step S1103). The process of step S1103 is the same process as step S1003.

When it is determined that the network address is an even number, the duplicate transmission setting unit 165 uses the server computer 102 assigned with the network address obtained by adding “1” to its network address as an alternative computer, and uses the server computer 102 as a network disk. The access destination via the driver 164 is set, and the process proceeds to step S1106 (step S1104).

When it is determined that the network address is not an even number, that is, the network address is an odd number, the duplicate transmission setting unit 165 replaces the server computer 102 to which the network address obtained by subtracting “1” from its own network address is an alternative computer. Then, the server computer 102 is set as an access destination via the network disk driver 164, and the process proceeds to step S1106 (step S1105).

The replication transmission setting unit 165 records the net address of the server computer 102 set as an alternative computer in the replication transmission setting information 182 and ends the process (steps S1006 and S1107).

9 and 10 can automatically generate a pair of computers 102 each holding duplicate data from among a large number of server computers 102 having the same configuration.

In this embodiment, it is assumed that each server computer 102 is assigned a continuous address that increases by one, but the present invention is not limited to this. That is, the present invention can be implemented as long as the data transmission destination and the data reception source can be set in the network disk driver 164.

Next, processing when a failure occurs in the server computer 102A that is an execution computer will be described. In the following, it is assumed that the server computer 102A is restarted after the cause of the failure is removed by the system administrator. The recovery processing unit 166 is activated upon restart.

FIG. 11 is a flowchart illustrating processing executed by the recovery processing unit 166 according to the first embodiment of this invention.

The recovery processing unit 166 starts processing upon restart, and then activates the duplicate transmission setting unit 165 to set the network disk driver 164 (steps S1201 and S1202).

The recovery processing unit 166 reads the duplicate transmission setting information 182 (step S1203). In the present embodiment, the copy transmission setting information 182 records the network address of the server computer 102B.

The recovery processing unit 166 sets the network disk driver 164A so as to permit access to the disk device 126A from the server computer 102B, which is an alternative computer (step S1204).

Thereafter, the recovery processing unit 166 waits for processing until a completion notification is received from the failover processing unit 170 of the server computer 102B (step S1205).

The recovery processing unit 166 cancels the access permission setting set in the network disk driver 164A in step S1204, and ends the process (step S1206). Thereafter, the

server computers

102A and 102B return to the normal state.

FIG. 12A and FIG. 12B are flowcharts for explaining processing executed by the failover processing unit 170 in the first embodiment of the present invention.

When the server computer 102B detects the stop of the server computer 102A, the server computer 102B activates the failover processing unit 170 (step S1301). Thereafter, the failover processing unit 170 repeatedly executes the following processing until the server computer 102A recovers. The server computer 102B can detect that the server computer 102A is stopped using a heartbeat or the like.

First, the failover processing unit 170 receives an access request instead of the file server unit 161 of the server computer 102A where the failure has occurred (step S1302). It is assumed that the server computer 102B also has a file system program 142 and is operating as an execution computer. Further, since the method for switching the access request reception destination is a known technique, a description thereof will be omitted.

Next, the failover processing unit 170 determines whether or not the received access request is a write request (step S1303).

When it is determined that the received access request is not a write request, that is, a read request, the failover processing unit 170 calculates a block number in which target data is recorded (step S1304).

Further, the failover processing unit 170 outputs a read request including the calculated block number to the disk driver 163, returns to step S1302, and then waits for an access request (step S1305). The disk driver 163B that has received the read request reads data from the disk device 126B. The read data is transmitted to the access request source.

If it is determined in step S1303 that the received access request is a write request, the failover processing unit 170 performs the following processing in order to reflect the target data in both the server computer 102B and the server computer 102A. Execute.

First, the failover processing unit 170 determines whether or not the server computer 102A that has been stopped due to a failure has been restarted (step S1307).

When it is determined that the server computer 102A has not been restarted, the failover processing unit 170 checks the state of the server computer 102A (step S1309).

Next, the failover processing unit 170 determines whether or not the restart of the server computer 102A has been detected based on the result of the confirmation described above (step S1310). For example, when a notification indicating that the server computer 102A has been restarted is received, and there is a response due to a heartbeat, it is determined that the restart of the server computer 102A has been detected.

If it is determined that the restart of the server computer 102A has not been detected, the failover processing unit 170 sends a write request including the calculated block number and data to the disk device 126B to reflect the data. The data is output to the driver 163B, and the process returns to step S1302 (step S1311).

When it is determined that the restart of the server computer 102A has been detected, the failover processing unit 170 starts copy processing for reflecting all the contents of the disk device 126B on the disk device 126A (step S1312). However, in the copy process, the process is skipped for a storage area in which data has already been reflected.

If it is determined in step S1307 that the server computer 102A has been restarted, or after step S1312, the failover processing unit 170 sends the calculated block number and data to the disk driver 163B and the network disk driver 164B. A write request including is output (step S1313). The network disk driver 164B that has received the write request transfers the write request including the data to the server computer 102A.

By this processing, the same data is written to both the disk device 126A and the disk device 126B.

Next, the failover processing unit 170 confirms the progress status of the copy process, and determines whether or not the copy process is completed (step S1314).

If it is determined that the copy process has not ended, the failover processing unit 170 returns to step S1302.

If it is determined that the copy process has been completed, the failover processing unit 170 transmits a completion notification to the recovery processing unit 166 and ends the process (steps S1315 and S1316).

The recovery processing unit 166 and the failover processing unit 170 permit access from the server computer B, which is an alternative computer, to the disk device 126A via the network disk driver 164B. By this operation, the server computer 102B can continue processing as an execution computer, and at the same time, data written while the server computer 102A is stopped can be reflected in the disk device 126A of the server computer 102A.

[Second embodiment]

The second embodiment differs from the first embodiment in that the server computer 102B, which is an alternative computer, delays the writing of replicated data to the disk device 126B. Hereinafter, the second embodiment will be described focusing on the differences from the first embodiment.

FIG. 13 is a block diagram showing a configuration of a computer system according to the second embodiment of this invention.

In the second embodiment, the configurations of the server computer 102A and the server computer 102B are different.

Specifically, the file system program 142 includes a replication control unit 862 instead of the delayed replication control unit 162. In addition, the copy receiving unit 144 newly includes a receiving side delayed copy control unit 871, delayed writing information 884, and a temporary writing area 890.

In the second embodiment, the process executed by the file server unit 861 is different from that of the file server unit 161. Specifically, in step S311, the file server unit 861 outputs a write request to the replication control unit 892. Other processes are the same as those in the first embodiment.

In this embodiment, the server computer 102A always transmits data to the server computer 102B when writing data. Therefore, the server computer 102A does not include the delayed writing information 181. Further, the server computer 102B suspends data writing to the disk device 126B until receiving an end notification from the job scheduler 141. In order to implement the above-described processing, the server computer 102B includes a receiving-side delayed replication control unit 871, delayed writing information 884, and a temporary writing area 890.

Hereinafter, processing executed by a component different from the first embodiment will be described.

FIG. 14 is a flowchart illustrating processing executed by the replication control unit 892 according to the second embodiment of the present invention.

When the replication control unit 892 receives a write request from the file server unit 861, the replication control unit 892 starts processing (step S1401).

The replication control unit 892 outputs the write request to the disk driver 163A and the network disk driver 164A without changing the information included in the received write request, and ends the processing (steps S1402 and S1403).

When the receiving side delayed replication control unit 871 receives a write request, it executes the same processing as in FIG.

Specifically, in step S401, the receiving-side delayed replication control unit 871 starts processing upon receiving a write request from the replication control unit 862. In step S <b> 402, the reception-side delayed replication control unit 871 records the job identifier and block number in the delayed writing information 884. In step S <b> 403, the receiving-side delayed replication control unit 871 stores data in the temporary write area 890.

Here, when writing data to the temporary writing area 890, the receiving side delayed replication control unit 871 may directly write the data to the temporary writing area 890, or may write the data to the temporary writing area 890 in the disk driver 163B. A write request may be output.

The temporary writing area 890 is realized by, for example, a magnetic disk, a semiconductor disk, a virtual disk realized by software, or a combination of them, and the mounting format is arbitrary.

Further, when receiving the end information from the end notification unit 151 of the job scheduler 141, the receiving side delayed replication control unit 871 executes the same processing as in FIG. The difference from the process shown in FIG. 5 is that the process is executed based on the delayed write information 884, the data read source in step S507 is the temporary write area 890, and the output destination of the write request in step S508 is the disk driver. It is a point which becomes 163B.

Other processing is the same as that of the first embodiment, and thus description thereof is omitted.

[Third embodiment]

In the third embodiment, in addition to the first embodiment, the replicated data is divided and written in the server computer 102 which is a plurality of alternative computers. Hereinafter, the difference from the first embodiment will be mainly described.

FIG. 15 is a block diagram showing a configuration of a computer system according to the third embodiment of this invention.

The third embodiment is different in that the computer system includes a server computer 102C that is an alternative computer. The server computer 102C has the same configuration as the server computer 102B and includes a disk device 126. The server computer 102A further includes a replication division unit 1401.

In this embodiment, the replication division unit 1401 divides the replication data generated by the delayed replication control unit 162 and stores it in each

server computer

102B, 102C.

Hereinafter, processing in the third embodiment will be described.

In the third embodiment, the process executed when the delayed replication control unit 162 receives the end information from the end notification unit 151 is different. Specifically, in step S508, the delayed replication control unit 162 outputs a replication data write request to the replication division unit 1401. Other processes are the same as those in the first embodiment.

FIG. 16 is a flowchart for explaining details of processing executed by the replication division unit 1401 according to the third embodiment of the present invention.

The replication division unit 1401 starts processing upon receiving a write request from the delayed replication control unit 162 (step S1501).

The replication division unit 1401 refers to the block number included in the write request and determines whether the block number is an even number (step S1502). Here, the first block number of data is targeted.

If it is determined that the block number is an even number, the replication dividing unit 1401 transmits a request for writing the replicated data to the server computer 102C via the network disk driver 164A, and ends the process (step S1503).

When it is determined that the block number is not an even number, that is, the block number is an odd number, the replication division unit 1401 transmits a replication data write request to the server computer 102B via the network disk driver 164A, and ends the processing. (Step S1504).

In the third embodiment, two server computers 102 which are alternative computers are used, but there may be three or more. In this case, a method of determining the transfer destination of the duplicate data based on the remainder obtained by dividing the write address by the number of server computers 102 can be considered.

配置 By placing multiple server computers that are alternative computers, the time required to write replicated data can be shortened. Therefore, a disk drive for storing duplicate data can be configured using a low-speed and inexpensive disk device.

According to the present invention, it is possible to cause a delay when writing replicated data to the server computer 102 as an alternative computer, and to reflect the replicated data to the alternative computer when the job ends. As a result, even if a failure occurs in the server computer 102 that is an execution computer, the server computer 102 that is an alternative computer can resume the job from the job start state.

In addition, this invention is not limited to embodiment mentioned above, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of an embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of an embodiment. Moreover, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

In addition, each of the above-described configurations, functions, processing units, processing means, and the like may be realized using hardware by designing a part or all of them with, for example, an integrated circuit. In addition, each of the above-described configurations, functions, and the like may be realized using software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be stored in a recording device such as a memory, a hard disk, and an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, and a DVD. Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

Claims

A computer system in which a plurality of computers that execute a job in which one process is divided into a plurality are connected via a network,
Each of the plurality of computers has a processor, a memory connected to the processor, a storage medium connected to the processor, and a network interface for connecting to another device,
The plurality of computers includes a first computer that executes the job, and a second computer that holds duplicate data of data stored in the storage medium of the first computer,
The computer system is
A replication control unit that generates replication data of data written to the storage medium of the first computer by the job and writes the replication data to the storage medium of the second computer under a specific condition; A featured computer system.
The computer system according to claim 1,
The computer system, wherein the duplication control unit writes the duplication data in a storage medium of the second computer after receiving a notification that the job is finished.
The computer system according to claim 2,
The computer system further includes an end notification unit that detects the end of the job and transmits a notification that the job has ended.
The computer system according to claim 3,
The computer system includes a file system configured on a storage in which storage areas provided by the storage medium included in the plurality of computers are integrated,
The file system manages data of a plurality of files used when the job is executed,
In each of the plurality of computers that provide the storage area constituting the storage, the data of the file is distributed and arranged for each predetermined range,
The plurality of computers further includes a scheduler computer that manages an execution schedule of the job,
The scheduler computer has the end notification unit,
The first computer has a first replication control unit,
The end notification unit
If the end of the job is detected, the identifier of the first computer that was executing the ended job is acquired;
Transmitting the notification including the acquired identifier of the first computer to the first replication control unit;
The first replication control unit includes:
Generating write information in which information indicating a write destination of data written to the storage medium of the first computer is associated with an identifier of the job;
When the notification is received, the data is read from the storage medium of the first computer with reference to the write information,
The computer system, wherein the read data is transmitted to the second computer as the duplicate data.
A computer system according to claim 4, wherein
The first calculator is:
A file server unit that processes an access request to the file managed by the file system when the job is executed;
Storing file information in which the identifier of the job being executed is associated with the identifier of the file accessed by the job;
The file server unit
When an access request output by the job is received, it is determined whether or not information matching the identifier of the file that is the target of the access request is registered with reference to the file information,
When information that matches the identifier of the file that is the target of the access request is registered, the identifier of the job that is associated with the file that is the target of the access request with reference to the registered information is Determine whether it matches the identifier of the job that output the access request,
When the identifier of the job associated with the file that is the target of the access request does not match the identifier of the job that has output the access request, the access to the file that is the target of the access request is denied,
When the identifier of the job associated with the file that is the target of the access request matches the identifier of the job that has output the access request, access to the file that is the target of the access request is permitted. A computer system.
The computer system according to claim 5,
The file server unit
When access to the file is permitted and the access request is a write request, the file identifier and the job identifier included in the write request are extracted and registered in the file information. ,
Calculating information indicating the destination of the data;
Sending a write request including the job identifier, information indicating the calculated data write destination and the data to the first replication control unit;
The first replication control unit includes:
When the write request is received from the file server unit, the write information is generated based on the write request,
A computer system, wherein data included in the write request is written to a storage medium of the first computer.
The computer system according to claim 3,
The computer system includes a file system configured on a storage in which storage areas provided by the storage medium included in the plurality of computers are integrated,
The file system manages data of a plurality of files used when the job is executed,
In each of the plurality of computers that provide the storage area constituting the storage, the data of the file is distributed and arranged for each predetermined range,
The plurality of computers further includes a scheduler computer that manages an execution schedule of the job,
The scheduler computer has the end notification unit,
The first computer has a first replication control unit,
The second computer has a second replication control unit and a temporary writing area,
The end notification unit
If the end of the job is detected, the identifier of the first computer that was executing the ended job is acquired;
Transmitting the notification including the acquired identifier of the first computer to the second replication control unit;
The first replication control unit includes:
Sending the data written in the storage medium of the first computer to the second computer as the duplicate data;
The second replication control unit includes:
Generating write information in which information indicating a write destination of data written to the storage medium of the first computer is associated with an identifier of the job;
Write the received duplicate data to the temporary write area,
When the notification is received, the computer system is characterized in that the duplicate data stored in the temporary writing area is written to a storage medium of the second computer with reference to the writing information.
The computer system according to claim 7,
The first calculator is:
A file server unit that processes an access request to the file managed by the file system when the job is executed;
Storing file information in which the identifier of the job being executed is associated with the identifier of the file accessed by the job;
The file server unit
When an access request output by the job is received, it is determined whether or not information matching the identifier of the file that is the target of the access request is registered with reference to the file information,
When information that matches the identifier of the file that is the target of the access request is registered, the identifier of the job that is associated with the file that is the target of the access request with reference to the registered information is Determine whether it matches the identifier of the job that output the access request,
When the identifier of the job associated with the file that is the target of the access request does not match the identifier of the job that has output the access request, the access to the file that is the target of the access request is denied,
When the identifier of the job associated with the file that is the target of the access request matches the identifier of the job that has output the access request, access to the file that is the target of the access request is permitted. A computer system.
A computer system according to claim 8, wherein
The file server unit
When access to the file is permitted and the access request is a write request, the file identifier and the job identifier included in the write request are extracted and registered in the file information. ,
Calculating information indicating the destination of the data;
A write request including the job identifier, information indicating the calculated data write destination, and the data is transmitted to the first replication control unit;
The first replication control unit includes:
Writing data included in the write request to a storage medium of the first computer;
Transferring the write request to the second replication control unit;
The second replication control unit generates the write information when the write request is received,
A computer system, wherein data included in the received write request is written into the temporary write area as the duplicate data.
A replication control method in a computer system in which a plurality of computers that execute a job in which one process is divided into a plurality are connected via a network,
Each of the plurality of computers has a processor, a memory connected to the processor, a storage medium connected to the processor, and a network interface for connecting to another device,
The plurality of computers includes a first computer that executes the job, and a second computer that holds duplicate data of data stored in the storage medium of the first computer,
The method
A first step in which at least one of the plurality of computers generates duplicate data of data written to the storage medium of the first computer by the job;
A second step in which at least one of the plurality of computers writes the duplicated data to the storage medium of the second computer under a specific condition;
A replication control method comprising:
The replication control method according to claim 10, comprising:
In the second step, the copy control method is characterized in that the copy data is written in a storage medium of the second computer after receiving a notification that the job has ended.
The replication control method according to claim 11, comprising:
The method further includes a third step of detecting the end of the job and transmitting a notification that the job has ended.
The replication control method according to claim 12, comprising:
The computer system includes a file system configured on a storage in which storage areas provided by the storage medium included in the plurality of computers are integrated,
The file system manages data of a plurality of files used when the job is executed,
In each of the plurality of computers that provide the storage area constituting the storage, the data of the file is distributed and arranged for each predetermined range,
The plurality of computers further includes a scheduler computer that manages an execution schedule of the job,
The third step includes
Obtaining an identifier of the first computer that was executing the completed job when the scheduler computer detects the end of the job;
The scheduler computer transmitting the notification including the acquired identifier of the first computer to the first computer;
Including
In the first step, the first computer generates write information in which information indicating a write destination of data written to a storage medium of the first computer is associated with an identifier of the job. Including steps,
The second step includes
When the first computer receives the notification, reading the data from the storage medium of the first computer with reference to the write information;
The first computer sending the read data as the duplicate data to the second computer;
A replication control method comprising:
The replication control method according to claim 13, comprising:
The first calculator is:
A file server unit that processes an access request to the file managed by the file system when the job is executed;
Storing file information in which the identifier of the job being executed is associated with the identifier of the file accessed by the job;
The method further comprises:
When the first computer receives an access request output by the job, whether or not information matching the identifier of the file that is the target of the access request is registered with reference to the file information. A determining step;
When the first computer registers information that matches the identifier of the file that is the target of the access request, the first computer refers to the registered information and associates it with the file that is the target of the access request. Determining whether the identifier of the job to be matched with the identifier of the job that has output the access request;
When the identifier of the job associated with the file that is the target of the access request does not match the identifier of the job that has output the access request, the first computer Deny access,
When the identifier of the job that is associated with the file that is the target of the access request matches the identifier of the job that has output the access request, the first computer Granting access, and
A replication control method comprising:
The replication control method according to claim 14, wherein
The method further comprises:
The first computer extracts an identifier of the file and an identifier of the job included in the write request when access to the file is permitted and the access request is a write request. Registering in the file information;
The first computer calculating information indicating a write destination of the data;
The first computer outputting a write request including the identifier of the job, information indicating a write destination of the calculated data, and the data;
Including
The first step includes
When the first computer receives the output write request, generating the write information based on the write request;
The first computer writing the data included in the output write request to a storage medium of the first computer;
A replication control method comprising:
The replication control method according to claim 12, comprising:
The computer system includes a file system configured on a storage in which storage areas provided by the storage medium included in the plurality of computers are integrated,
The file system manages data of a plurality of files used when the job is executed,
In each of the plurality of computers that provide the storage area constituting the storage, the data of the file is distributed and arranged for each predetermined range,
The plurality of computers further includes a scheduler computer that manages an execution schedule of the job,
The second computer has a temporary writing area;
The third step includes
Obtaining an identifier of the first computer that was executing the completed job when the scheduler computer detects the end of the job;
The scheduler computer sends the notification to the second computer including the acquired identifier of the first computer;
Including
The first step includes
The first computer sending the data written in the storage medium of the first computer to the second computer as the duplicate data;
The second computer generates write information in which information indicating a write destination of data written to the storage medium of the first computer is associated with an identifier of the job;
The second computer writing the received replicated data to the temporary write area;
Including
In the second step, when the second computer receives the notification, the copy data stored in the temporary write area with reference to the write information is stored in the storage medium of the second computer. A copy control method comprising the step of writing to
The replication control method according to claim 16, wherein
The first calculator is:
A file server unit that processes an access request to the file managed by the file system when the job is executed;
Storing file information in which the identifier of the job being executed is associated with the identifier of the file accessed by the job;
The method further comprises:
When the first computer receives an access request output by the job, whether or not information that matches the identifier of the file that is the target of the access request is registered with reference to the file information. A determining step;
When the first computer registers information that matches the identifier of the file that is the target of the access request, the first computer refers to the registered information and associates it with the file that is the target of the access request. Determining whether the identifier of the job to be matched with the identifier of the job that has output the access request;
When the identifier of the job associated with the file that is the target of the access request does not match the identifier of the job that has output the access request, the first computer Deny access,
When the identifier of the job that is associated with the file that is the target of the access request matches the identifier of the job that has output the access request, the first computer Granting access, and
A replication control method comprising:
The replication control method according to claim 17, wherein
The method further comprises:
The first computer extracts an identifier of the file and an identifier of the job included in the write request when access to the file is permitted and the access request is a write request. Registering in the file information;
The first computer calculating information indicating a write destination of the data;
The first computer outputting a write request including the identifier of the job, information indicating a write destination of the calculated data, and the data;
The first computer writing the data included in the output write request to a storage medium of the first computer;
The first computer transferring the output write request to the second replication control unit;
Including
The first step includes
Generating the write information when the second computer receives the write request;
The second computer writing the data included in the received write request as the duplicate data in the temporary write area;
A replication control method comprising: