WO2017005116A1 - 中间文件处理方法、客户端、服务器和系统 - Google Patents

中间文件处理方法、客户端、服务器和系统 Download PDF

Info

Publication number
WO2017005116A1
WO2017005116A1 PCT/CN2016/087462 CN2016087462W WO2017005116A1 WO 2017005116 A1 WO2017005116 A1 WO 2017005116A1 CN 2016087462 W CN2016087462 W CN 2016087462W WO 2017005116 A1 WO2017005116 A1 WO 2017005116A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
client
intermediate file
cluster
cluster information
Prior art date
Application number
PCT/CN2016/087462
Other languages
English (en)
French (fr)
Inventor
朱家稷
姚文辉
谢巍
Original Assignee
阿里巴巴集团控股有限公司
朱家稷
姚文辉
谢巍
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 朱家稷, 姚文辉, 谢巍 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017005116A1 publication Critical patent/WO2017005116A1/zh
Priority to US15/862,570 priority Critical patent/US11500812B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/144Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Definitions

  • the present invention belongs to the field of Internet technologies, and in particular, to an intermediate file processing method, a client, a server, and a system.
  • each job (Job) for data processing contains multiple tasks, each of which performs corresponding conversion operations on the data, usually performed by multiple tasks ( Task worker) threads process different data fragments in parallel to complete.
  • Task worker threads process different data fragments in parallel to complete.
  • a large amount of intermediate result data is usually exchanged between task execution threads, and these intermediate result data provides access in the form of an intermediate file. How to efficiently process intermediate files is a key technology for distributed data processing.
  • each task processing thread directly accesses the server local file system to store the output intermediate file, and reports the file location information to the job management node (the node responsible for managing the task execution in the job), the operation
  • the management node informs the next intermediate file storage address of the next task to read the intermediate file, and the next task execution thread remotely reads the required intermediate file data through HTTP access.
  • the job management node notifies the agent component on each machine to delete the intermediate file of the job.
  • some intermediate files may be large or some of the intermediate files hosted by some servers / disks, usually resulting in uneven load. It is not possible to dynamically adjust the read and write scheduling of intermediate files based on the load of each server and disk. Although it is possible to reduce the impact of long tails of certain task execution threads through a Backup Instance, finer-grained optimization is not possible.
  • the present application provides an intermediate file processing method, a client, a server, and a system.
  • the system solves the problem that the read and write operations for intermediate files cannot be dynamically adjusted in the prior art.
  • an intermediate file processing method applicable to a cluster management client, comprising: receiving a message from an first client to write an intermediate file to a first server; requesting creation from a second server The cluster information of the intermediate file; after the cluster information is successfully created, receiving the cluster information returned by the second server; wherein the cluster information includes a cluster name and a priority; sending the cluster information to Transmitting, by the first client, the intermediate file to the first server by the first client, so that the first server writes the identifier according to a local disk load and the priority And an intermediate file, wherein the intermediate file is read by the second client from the first server according to the cluster information.
  • the present application further discloses an intermediate file processing method, including: receiving a request for writing an intermediate file from a first client, the request for writing an intermediate file including the middle Cluster information of the file, wherein the cluster information is created by the second server and sent to the cluster management client, and then sent by the cluster management client to the first client and the second client; the cluster information Include a cluster name and a priority; verify the received cluster information; when the received cluster information is successfully verified, send a message that the verification succeeds to the first client; and receive the uploaded by the first client
  • the intermediate file is written to the intermediate file according to a local disk load and the priority.
  • an intermediate file processing method which is applicable to a second server, and includes: creating cluster information according to a request from a cluster management client, where the cluster information is from a first client to a first client Cluster information of an intermediate file written by a server, the cluster information including a cluster name and a priority; sending the cluster information to the cluster management client, and the cluster management client sends the cluster information to the a first client and a second client, wherein the first client uploads the intermediate file to the first server, so that the first server writes the intermediate file according to a local disk load and the priority And reading, by the second client, the intermediate file from the first server according to the cluster information.
  • an intermediate file processing method which is applicable to the first client, and includes: sending a message for writing an intermediate file to the first server to the cluster management client, and managing the client by the cluster Sending, to the second server, cluster information of the intermediate file; receiving the cluster information returned by the cluster management client, where the cluster information includes a cluster name and a priority; sending a request to write to the intermediate file to the first server, the request to write an intermediate file includes the cluster information, and the cluster information is verified by the first server; After successful verification by the first server, the intermediate file is written to the first server, and the intermediate file is written to the disk by the first server according to the local disk load and the priority.
  • the present application further discloses an intermediate file processing method, applicable to a second client, comprising: receiving cluster information from a cluster management client; wherein the cluster information is written to a first server
  • the cluster information of the intermediate file is requested to be created by the cluster management client to the second server; the cluster information includes a cluster name and a priority; and the cluster information is used to query the first server for the corresponding cluster information.
  • Write location information of the intermediate file on the disk the intermediate file is uploaded by the first client to the first server, and the first server writes the intermediate file according to the local disk load and the priority; Reading the intermediate file from the first server according to the write location information.
  • an intermediate file processing method client comprising: a first receiving module, configured to receive a message from an first client to write an intermediate file to a first server; a request module, And the second receiving module, after the cluster information is successfully created, receiving the cluster information returned by the second server; wherein the cluster information is Include a cluster name and a priority; the first sending module sends the cluster information to the first client and the second client, and the first client uploads the intermediate file to the first server, so that The first server writes the intermediate file according to the local disk load and the priority, and the second client reads the intermediate file from the first server according to the cluster information.
  • an intermediate file processing server including: a fourth receiving module, configured to receive a request for writing an intermediate file from a first client, where the request for writing an intermediate file includes The cluster information of the intermediate file, wherein the cluster information is created by the second server and sent to the cluster management client, and then sent by the cluster management client to the first client and the second client;
  • the cluster information includes a cluster name and a priority;
  • a verification module is configured to verify the received cluster information;
  • a third sending module is configured to send a message that the verification succeeds when the received cluster information is successfully verified.
  • a first client a first write module for receiving The intermediate file uploaded by the first client writes the intermediate file according to a local disk load and the priority.
  • an intermediate file processing server including: a creating module, configured to create cluster information according to a request from a cluster management client, where the cluster information is first by the first client The cluster information of the intermediate file written by the server, the cluster information includes a cluster name and a priority; the seventh sending module is configured to send the cluster information to the cluster management client, and the cluster management client sends the Transmitting the cluster information to the first client and the second client, and uploading, by the first client, the intermediate file to the first server, so that the first server according to the local disk load and the priority Writing to the intermediate file, the intermediate file is read by the second client from the first server according to the cluster information.
  • a creating module configured to create cluster information according to a request from a cluster management client, where the cluster information is first by the first client The cluster information of the intermediate file written by the server, the cluster information includes a cluster name and a priority
  • the seventh sending module is configured to send the cluster information to the cluster management client, and the cluster management client sends the Transmitting the cluster information to the first client
  • an intermediate file processing client comprising: an eighth sending module, configured to send a message for writing an intermediate file to the first server to the cluster management client, managed by the cluster
  • the client requests the second server to create cluster information of the intermediate file
  • the seventh receiving module is configured to receive the cluster information returned by the cluster management client, where the cluster information includes a cluster name and a priority.
  • a ninth sending module configured to send a request for writing the intermediate file to the first server, where the request for writing an intermediate file includes the cluster information, and the cluster information is verified by the first server;
  • a second writing module configured to write the intermediate file to the first server after the cluster information is successfully verified by the first server, by the first server according to a local disk load and the priority The level is written to the intermediate file.
  • an intermediate file processing client including: an eighth receiving module, configured to receive cluster information from a cluster management client; wherein the cluster information is written to the first server
  • the cluster information of the intermediate file is requested by the cluster management client to be created by the second server; the cluster information includes a cluster name and a priority; and the second query module is configured to send the first information according to the cluster information.
  • the server queries the write location information of the intermediate file corresponding to the cluster information on the disk, and the intermediate file is uploaded by the first client to the first server, and the first server according to the local disk load and the priority
  • the level is written to the intermediate file;
  • the reading module is configured to read the intermediate file from the first server according to the writing location information.
  • an intermediate file processing system including: a first client, a second client, a first server, a second server, and a cluster management client;
  • the client sends a message for writing the intermediate file to the first server to the cluster management client before writing the intermediate file to the first server;
  • the cluster management client requests the second server to create the cluster information of the intermediate file, and receives the cluster information.
  • the cluster information is sent to the first client and the second client, where the cluster information includes a cluster name and a priority; the first client is configured according to the cluster information.
  • a server requests to write to the intermediate file, and after receiving the message returned by the first server that has successfully verified the cluster information, uploading the intermediate file to the first server; the first server according to the local disk load and the priority The level is written to the intermediate file; the second client reads the intermediate file from the first server according to the cluster information.
  • the present application can obtain the following technical effects: ensuring that tasks with higher priority jobs are processed in time, preventing higher priority jobs from being slowed down, and maintaining load balancing of disks.
  • FIG. 1 is a schematic structural diagram of an intermediate file processing system according to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of an intermediate file processing method according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an intermediate file processing method according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an intermediate file processing method according to an embodiment of the present application.
  • FIG. 5 is a schematic flowchart diagram of an intermediate file processing method according to an embodiment of the present application.
  • FIG. 6 is a schematic flowchart diagram of an intermediate file processing method according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an intermediate file processing client according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an intermediate file processing server according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an intermediate file processing server according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an intermediate file processing client according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an intermediate file processing client according to an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a topological structure of an intermediate file processing system according to an embodiment of the present application.
  • the system includes a first server 10, a second server 11, a cluster management client 12, a first client 13 and a second client 14. .
  • the first client 13 is a client for writing an intermediate file to the first server 10
  • the second client 14 is a client for reading the intermediate file from the first server 10.
  • a plurality of tasks are included in the overall job execution plan, the first client 13 is the executor of the upper task, and the second client 14 is the executor of the next level task, via the first client.
  • the intermediate file generated after the data processing of the terminal 13 needs to be read by the second client 14 to continue the subsequent processing.
  • the first server 10 is a server for storing intermediate files, and completes the read and write operations of the first client 13 and the second client 14 for the intermediate files.
  • the second server 11 is configured to create and store cluster information of the intermediate file, and manage the intermediate file stored by the first server 10 by the cluster information. Each cluster information has an expiration time, and after the cluster information expires, the second server 11 deletes the expired cluster information.
  • the second server 11 synchronizes the cluster information to the first server 10 through the background heartbeat, so that the first server 10 deletes the intermediate file corresponding to the deleted cluster information, thereby completing the cleaning of the intermediate file.
  • the cluster management client 12 is configured to request the second server 11 to create cluster information, and periodically sends a request to the second server 11 to extend the expiration time of the cluster information to extend the life cycle of the cluster information. After the second client 14 has read the intermediate file from the first server 10, the cluster management client 12 updates the expiration time of the cluster information of the intermediate file to the second server 11 to the current time, so that the second server 11 deletes the Cluster information for intermediate files.
  • the writing and reading process for the intermediate file is as follows.
  • the result data needs to be uploaded to the first server 10 in the form of an intermediate file, and the first client 13 sends a message for writing the intermediate file to the first server 10 to the cluster management client 12.
  • the message includes the identifier of the first client 13, the amount of data of the intermediate file, and the user identifier (User id) of the job plan executed by the first client 13.
  • the data volume of the intermediate data file is the storage space expected to be occupied. For example, 1042Kb; the user identifier can be a string of numbers or letters.
  • the cluster management client 12 After receiving the above message from the first client 13, the cluster management client 12 requests the second server 11 to create cluster information of the intermediate file, the request includes the identifier of the first client 13, the amount of data of the intermediate file, and the The user ID of the job plan executed by the first client 13.
  • creating the corresponding cluster information includes: performing verification according to the user identifier therein, determining the priority of the user represented by the user identifier to perform the task; according to the identifier of the first client 13 (Task) Id) generate a cluster name, add a random string in front of and/or after the identifier of the first client 13, and generate a globally unique cluster name, for example, the identifier of the first client is 1673, and generate a globally unique cluster name bcd_1673;
  • the storage space quota is determined according to the data volume of the intermediate file; a default expiration time is generated, for example, 30 minutes from the current time.
  • the cluster information created includes: cluster name, expiration time, priority, and storage space quota. For example, cluster name: bcd_1673; expiration time: 6:35 (current time 6:05); priority: 1; storage space quota: 1042Kb.
  • the cluster management client 12 After receiving the cluster information returned by the second server 11, the cluster management client 12 sends the cluster information to the first client 13, and sends the cluster information to the subsequent processing according to the execution plan of the job. Second client 14.
  • the first client 13 sends a request to write an intermediate file to the first server 10, the request to write the intermediate file includes cluster information, and the cluster information is verified by the first server 10.
  • the first server 10 verifies to the second server 11 whether the cluster information exists. If the cluster information exists in the second server 11, the first server 10 sends a message that the verification is successful to the first client 13; if the verification fails, the first one is blocked.
  • the first server 10 After receiving the intermediate file uploaded by the first client 13, the first server 10 writes the intermediate file to the lightly loaded disk according to the local disk load, so that the load of each local disk is balanced.
  • the first server 11 adjusts the writing order according to the priority in the cluster information sent by each first client 13, and the intermediate file with higher priority takes precedence. Write to ensure that tasks with higher priority jobs are processed in a timely manner to prevent jobs with higher priority from being slowed down.
  • the second client 14 After receiving the cluster information, the second client 14 periodically queries the first server 10 for the location of the intermediate file corresponding to the cluster information on the disk. For example, the first server 10 is queried every 30 seconds, and the transmitted query request includes the cluster information. After the first server 10 queries the write location information of the intermediate file corresponding to the cluster information, the first server 10 sends the write location information to the second client 14. The second client 14 reads the intermediate file that has been written by the first client 13 from the first server 10 based on the queried write location information.
  • the intermediate file with higher priority in the cluster information is preferentially read by the corresponding second client 14, so that the reading order of the intermediate files is also It will be adjusted with the adjustment of the write order to ensure that the tasks of higher priority jobs are processed in time to prevent the higher priority jobs from being slowed down.
  • the cluster management client 12 updates the expiration time of the cluster information to the second server 11 every first preset time period to extend the cluster information. Life cycle. For example, the first preset duration is 5 minutes, and the expiration time of the cluster information is extended by 5 minutes each time the expiration time is updated.
  • the length of time for each expiration time is preset in the submitted job execution plan, and is usually set to the maximum time for deleting the generated intermediate file after the cluster management client 12 submitting the job execution plan is abnormal, that is, if During this time, the cluster management client 12 can be restored to normal, the intermediate file that has been generated will not be deleted, the job will continue to be executed, and the cluster information of the intermediate file will not be deleted as soon as it has expired. If the cluster management is within this time If the client 12 does not return to normal, the expiration time of the cluster information will not be updated to the second server 11, which may cause the second server 11 to delete the expired cluster information, thereby causing the first server 10 to delete the corresponding intermediate file. Therefore, the cluster management client 12 is every first preset time Updating the expiration time of the cluster information to the second server 11 can ensure that the intermediate file is not deleted because the corresponding cluster information has expired, and thus can be successfully read by the second client 14.
  • the second client 14 After the second client 14 successfully reads all the data of the intermediate file, sends a message that the intermediate file has been successfully read to the cluster management client 12, and the message that successfully reads the intermediate file includes the cluster information in the intermediate file.
  • the cluster name such as the cluster name in the above example: bcd_1673.
  • the cluster management client 12 updates the cluster information expiration time corresponding to the cluster name to the current time to the second server 11, so that the cluster information is immediately in an expired state in the second server 11, and needs to be cleaned up.
  • the second server 11 deletes the cluster information, and synchronizes the locally saved cluster information to the first server 10 through the background heartbeat.
  • the first server 10 determines that the cluster information deletes the intermediate file corresponding to the cluster information after the second server 11 has been deleted.
  • the second server 11 deletes the cluster information "cluster name: bcd_1673; expiration time: 6:35 (current time 6:05); priority: 1; storage space quota: 1042Kb", and the first server 10 also The intermediate file corresponding to the cluster information is deleted, thereby freeing the disk storage space and reducing the disk load.
  • the second server 11 is queried whether the cluster information still exists to determine whether to continue the current service.
  • the cluster management client 12 sends the identifier of the first client 13 to the second server 11, and the second server 11 queries, in the stored cluster information, whether the cluster name includes the first client 13 according to the identifier of the first client 13.
  • the identifier is determined to determine whether the cluster information corresponding to the intermediate file uploaded by the first client 13 still exists.
  • the cluster management client 12 uploads the identifier "1673" of the first client 13, and the second server 11 queries whether the cluster name containing "1673" exists in the saved cluster information; and finally queries the cluster name: bcd_1673, which represents After the cluster management client 12 is restarted, the cluster information has not expired; if the cluster name containing "1673" is not finally found, after the cluster management client 12 is restarted, the cluster information has expired and is deleted by the second server 11. .
  • the cluster management client 12 updates the expiration time of the cluster information to the second server 11 so that the cluster information does not expire, so that the intermediate file corresponding to the cluster information does not Deleted by the first server 10, the intermediate file processing system can continue to execute the current job.
  • the corresponding cluster information is not queried, it indicates that the intermediate file corresponding to the cluster information has also been deleted by the first server 10, and the cluster management client 12 resubmits the current operation plan, thereby re-re- Execute the current job.
  • the intermediate file processing system is abnormal in the current job, the corresponding intermediate file can be recovered in time to release the disk space, and if the abnormality is restored in a short time, the current job can be continued.
  • the first server 10 transmits the usage information of the intermediate file corresponding to the cluster information to the second server 11 every second preset time period. For example, the usage information of the intermediate file corresponding to the cluster information is transmitted every one minute.
  • the usage information includes the number of files of the intermediate file, the occupied storage space, the usage rate of the storage space quota, and the like, so that the second server 11 can know the global usage of the intermediate file corresponding to each cluster information, and can provide an interface for The cluster management client 12 queries.
  • the embodiment of the present application provides an intermediate file processing method. As shown in FIG. 2, the method includes the following steps.
  • step S201 a message from the first client to write the intermediate file to the first server is received.
  • step S202 the second server is requested to create cluster information of the intermediate file.
  • step S203 after the cluster information is successfully created, the cluster information returned by the second server is received, wherein the cluster information includes a cluster name and a priority.
  • step S204 the cluster information is sent to the first client and the second client, and the first client uploads the intermediate file to the first server, so that the first server writes the intermediate file according to the local disk load and priority, by the first The second client reads the intermediate file from the first server according to the cluster information.
  • the cluster information further includes an expiration time
  • the foregoing intermediate file processing method applicable to the cluster management client may further include the following steps.
  • the expiration time of the cluster information is updated to the second server every first preset time period to extend the life cycle of the cluster information.
  • the cluster management client sends the identifier of the first client to the second server to query whether the cluster information of the intermediate file exists; when the cluster information of the intermediate file exists, the current job is continued; when the cluster of the intermediate file exists When the information does not exist, re-execute the current job.
  • the embodiment of the present application provides an intermediate file processing method. As shown in FIG. 3, the method includes the following steps.
  • step S301 receiving a request for writing an intermediate file from the first client, the request for writing the intermediate file includes cluster information of the intermediate file, wherein the cluster information is created by the second server and sent to the cluster management client, and then The cluster management client sends the first client and the second client; the cluster information includes the cluster name and priority.
  • step S302 the received cluster information is verified.
  • step S303 when the received cluster information verification is successful, a message that the verification is successful is sent to the first client.
  • step S304 the intermediate file uploaded by the first client is received, and the intermediate file is written according to the local disk load and priority.
  • the foregoing intermediate file processing method applicable to the first server may further include the following steps.
  • the usage information of the intermediate file corresponding to the cluster information is sent to the second server every second preset time period.
  • the cluster information further includes an expiration time.
  • the intermediate file processing method further includes: synchronizing the cluster information saved by the second server; and deleting the intermediate file corresponding to the cluster information when the cluster information of the intermediate file has been deleted by the second server.
  • the embodiment of the present application provides an intermediate file processing method. As shown in FIG. 4, the method includes the following steps.
  • step S401 cluster information is created according to a request from the cluster management client, wherein the cluster information is cluster information of an intermediate file written by the first client to the first server, and the cluster information includes a cluster name and a priority.
  • step S402 the cluster information is sent to the cluster management client, and the cluster management client sends the cluster.
  • the information is sent to the first client and the second client, and then the first client uploads the intermediate file to the first server, so that the first server writes the intermediate file according to the local disk load and priority, and the second client according to the cluster information Read intermediate files from the first server.
  • the cluster information further includes an expiration time
  • the foregoing intermediate file processing method applicable to the second server may further include the following steps.
  • the expiration time is updated from the cluster management client every first preset time to extend the life cycle of the cluster information.
  • the update time from the cluster management client is the current time; the cluster information of the intermediate file is deleted.
  • the intermediate file corresponding to the cluster information is deleted by the first server.
  • the usage information of the intermediate file corresponding to the cluster information from the first server is received every second preset time period.
  • the cluster management client After the cluster management client restarts, receiving the identifier of the first client from the cluster management client; querying whether the cluster information exists according to the identifier of the first client; returning the query result to the cluster management client, so that the cluster management client determines Whether to continue the current job.
  • the embodiment of the present application provides an intermediate file processing method. As shown in FIG. 5, the method includes the following steps.
  • step S501 a message for writing an intermediate file to the first server is sent to the cluster management client, and the cluster management client requests the second server to create cluster information of the intermediate file.
  • step S502 the cluster information returned by the cluster management client is received, wherein the cluster information includes a cluster name and a priority.
  • step S503 a request to write an intermediate file is sent to the first server, and the request to write the intermediate file includes cluster information, and the cluster information is verified by the first server.
  • step S504 after the cluster information is successfully verified by the first server, an intermediate file is written to the first server, and the intermediate file is written by the first server according to the local disk load and priority.
  • the embodiment of the present application provides an intermediate file processing method. As shown in FIG. 6, the method includes the following steps.
  • step S601 cluster information from the cluster management client is received; wherein the cluster information is cluster information of the intermediate file written to the first server, and the cluster management client requests creation from the second server; the cluster information includes the cluster name And priority.
  • step S602 the first server is queried according to the cluster information to the first server to write the location information of the intermediate file corresponding to the cluster information, and the intermediate file is uploaded by the first client to the first server, and the first server is based on the local disk load and Priority is written to the intermediate file.
  • step S603 the intermediate file is read from the first server based on the write location information.
  • the cluster information further includes an expiration time
  • the foregoing intermediate file processing method applicable to the second client may further include the following steps.
  • FIG. 7 is an intermediate file processing client adopted by the embodiment of the present application, including:
  • the first receiving module 70 is configured to receive a message from the first client to write an intermediate file to the first server;
  • the requesting module 71 is configured to request, by the second server, cluster information for creating an intermediate file
  • the second receiving module 72 receives the cluster information returned by the second server after the cluster information is successfully created; wherein the cluster information includes a cluster name and a priority.
  • the first sending module 73 sends the cluster information to the first client and the second client, and the first client uploads the intermediate file to the first server, so that the first server writes the intermediate file according to the local disk load and priority,
  • the second client reads the intermediate file from the first server according to the cluster information.
  • the cluster information further includes an expiration time
  • the client further includes:
  • the first update module is configured to update the expiration time of the cluster information to the second server every first preset duration to extend the life cycle of the cluster information.
  • a third receiving module configured to receive a message from the second client that has successfully read the intermediate file from the first server
  • a second update module configured to update the expiration time to the second server as the current time
  • a second sending module configured to send the identifier of the first client to the second server after the restart, to query whether the cluster information of the intermediate file exists;
  • the first executing module is configured to: When the cluster information of the intermediate file exists, the current job is continued; and the second execution module is configured to re-execute the current job when the cluster information of the intermediate file does not exist.
  • FIG. 8 is an intermediate file processing server adopted by the embodiment of the present application, including:
  • the fourth receiving module 80 is configured to receive a request for writing an intermediate file from the first client, where the request for writing the intermediate file includes cluster information of the intermediate file, where the cluster information is created by the second server and sent to the cluster management client. And then sent by the cluster management client to the first client and the second client; the cluster information includes a cluster name and a priority;
  • a verification module 81 configured to verify received cluster information
  • the third sending module 82 is configured to: when the received cluster information is successfully verified, send a message that the verification succeeds to the first client;
  • the first writing module 83 is configured to receive an intermediate file uploaded by the first client, and write the intermediate file according to the local disk load and priority.
  • the cluster information further includes an expiration time
  • the server further includes:
  • the fourth sending module sends a write location of the intermediate file corresponding to the cluster information to the second client according to the query request from the second client, where the query request includes cluster information
  • the fifth sending module is configured to be based on the second
  • the read request of the client sends the intermediate file corresponding to the cluster information to the second client, where the read request includes the write location.
  • the sixth sending module is configured to send the usage information of the intermediate file corresponding to the cluster information to the second server every second preset time period.
  • the first synchronization module is configured to synchronize cluster information saved by the second server
  • the first deletion module is configured to delete the intermediate file corresponding to the cluster information when the cluster information of the intermediate file has been deleted by the second server.
  • FIG. 9 is an intermediate file processing server adopted by the embodiment of the present application, including:
  • a creating module 90 configured to create cluster information according to a request from the cluster management client, where the cluster information is cluster information of an intermediate file written by the first client to the first server, where the cluster information includes a cluster name and a priority;
  • the seventh sending module 91 is configured to send the cluster information to the cluster management client, and the cluster management client sends the cluster information to the first client and the second client, and the first client uploads the intermediate file to the first server, so that the first sending module sends the cluster information to the first server.
  • the first server writes the intermediate file according to the local disk load and priority, and the second client reads the intermediate file from the first server according to the cluster information.
  • the cluster information further includes an expiration time
  • the server further includes:
  • the third update module is configured to update the expiration time from the cluster management client every first preset time, and extend the life cycle of the cluster information.
  • the fourth update module is configured to: after the second client successfully reads the intermediate file from the first server, update the expiration time from the cluster management client to the current time; and the second deleting module is configured to delete the cluster information of the intermediate file.
  • the second synchronization module is configured to synchronize the locally saved cluster information to the first server, so that when the cluster information of the intermediate file has been deleted, the intermediate file corresponding to the cluster information is deleted by the first server.
  • the fifth receiving module is configured to receive the usage information of the intermediate file corresponding to the cluster information from the first server every second preset duration.
  • a sixth receiving module configured to: after the cluster management client restarts, receive the identifier of the first client from the cluster management client; the first query module is configured to query whether the cluster information exists according to the identifier of the first client; The module is used to return the query result to the cluster management client, so that the cluster management client determines whether to continue the current job.
  • FIG. 10 is an intermediate file processing client adopted by the embodiment of the present application, including:
  • the eighth sending module 100 is configured to send a message for writing an intermediate file to the first server to the cluster management client, where the cluster management client requests the second server to create cluster information of the intermediate file;
  • the seventh receiving module 101 is configured to receive cluster information returned by the cluster management client, where the cluster information includes a cluster name and a priority;
  • the ninth sending module 102 is configured to send a request for writing an intermediate file to the first server, and write The request of the intermediate file includes cluster information, and the cluster information is verified by the first server;
  • the second writing module 103 is configured to: after the cluster information is successfully verified by the first server, write an intermediate file to the first server, and the first server writes the intermediate file to the disk according to the local disk load and priority.
  • FIG. 11 is an intermediate file processing client adopted by the embodiment of the present application, including:
  • the eighth receiving module 110 is configured to receive cluster information from the cluster management client, where the cluster information is cluster information of the intermediate file written to the first server, and the cluster management client requests the second server to create the cluster information. Includes cluster name and priority;
  • the second query module 111 is configured to query, according to the cluster information, the first server to write the location information of the intermediate file corresponding to the cluster information on the disk, where the intermediate file is uploaded by the first client to the first server, and the first server is locally Disk load and priority are written to intermediate files;
  • the reading module 112 is configured to read the intermediate file from the first server according to the writing location information.
  • the cluster information further includes an expiration time
  • the client further includes:
  • the tenth sending module sends a message that the intermediate file has been successfully read to the cluster management client, and the cluster management client updates the expiration time to the second server as the current time, so that the second server deletes the cluster information, and then the first server deletes The intermediate file corresponding to the cluster information.
  • Each job plan includes multiple tasks.
  • the intermediate result data is written to the first server as an intermediate file, and the next task is The first server reads the intermediate result data.
  • the cluster management client currently submits three job scenarios, Job1, Job2, and Job3.
  • the user identifier corresponding to Job1 is SamZhang
  • the user identifier corresponding to Job2 is SissiLi
  • the user identifier corresponding to Job3 is LeoZhao.
  • For each job plan there is a first client that has performed the current task and a second client that needs to continue to perform the next task.
  • the first client needs to write an intermediate file to the first server, and the second client needs Read intermediate files from the first server.
  • Job1 including the first client and the successor that has performed task 1147 (Task id)
  • Job2 including the first client that has performed task 1214 and the second client that continues to perform task 1215
  • Job3 includes the first client that has performed task 1359 and The second client of task 1360 continues to execute.
  • the first client that has performed task 1147 needs to write intermediate file A to the first server.
  • Sending a message to the first server to write the intermediate file to the cluster management client the message includes the identifier 1147 of the first client, the data volume of the intermediate file A 2264Kb, and the user identifier SamZhang.
  • the first client that has performed task 1214 needs to write intermediate file B to the first server.
  • the first client that has performed task 1359 needs to write intermediate file C to the first server.
  • the message includes the identifier 1359 of the first client, the data amount 4043Kb of the intermediate file C, and the user identifier LeoZhao.
  • the cluster management client requests the second server to create cluster information of the intermediate file A, the intermediate file B, and the intermediate file C, and the request also includes the above information.
  • the second server After receiving the request from the cluster management client, the second server starts to create corresponding cluster information.
  • the second server generates the cluster name according to the identifier of the first client, and generates the cluster names abcd_1147, efgh_1214, and ijkl_1359 respectively; the priority is determined according to the user identifier, the priority corresponding to the user identifier SamZhang is 2, and the priority corresponding to the user identifier SissiLi is 3.
  • the three cluster information has an expiration time of 5:36:50.
  • the cluster information created by the second server for the intermediate file A, the intermediate file B, and the intermediate file C are respectively:
  • the second server sends the cluster information created above to the cluster management client.
  • the cluster management client sends the cluster information of the intermediate file A to the first client that has executed the task 1147 and the second client that continues to execute the task 1148, and sends the cluster information of the intermediate file B to the executed task 1214.
  • the first client and the second client that continues to perform task 1215 will cluster the intermediate file C
  • the information is sent to the first client that has performed task 1359 and the second client that has performed task 1360, respectively.
  • the above-mentioned requests for writing intermediate files include cluster information of the intermediate file A, cluster information of the intermediate file B, and cluster information of the intermediate file C, respectively.
  • the first server After receiving the cluster information, the first server performs verification on the second server according to the cluster name, and returns a successful verification message to the above three first clients after the verification succeeds. After receiving the verification success message, the above three first clients upload the intermediate file A, the intermediate file B, and the intermediate file C to the first server respectively.
  • the intermediate file is written to the lightly loaded disk according to the current disk load condition, and the priority order of the writing is determined according to the priority in the cluster information of each intermediate file.
  • the priority of the cluster information of the intermediate file C is 1, the priority of the cluster information of the intermediate file A is 2, and the priority of the cluster information of the intermediate file B is 3. Therefore, the first server preferentially writes the intermediate file. C, then write to intermediate file A, and finally to intermediate file B. This ensures that tasks with higher priority jobs are processed in a timely manner, preventing jobs with higher priority from being slowed down.
  • the second client continuing to execute the task 1148 continuing to execute the second client of task 1215 and continuing to execute the second client of task 1360, after receiving the cluster information from the cluster management client, periodically to the first server according to the cluster information Query the write location of the corresponding intermediate file. Since the intermediate file C with priority 1 in the cluster information is first written, the second client continuing to execute the task 1360 first queries the write position of the intermediate file C, and continues to execute the second client of the task 1360. The intermediate file C is read from the first server according to the written location of the query. Then, the second client continuing to execute task 1148 will query the write location of intermediate file A and read intermediate file A from the first server.
  • the second client continuing to execute task 1215 will query the write location of intermediate file B and read intermediate file B from the first server.
  • the order in which the intermediate files are read can also be adjusted accordingly, so that higher priority tasks can be prioritized.
  • the above three second clients separately send the message that the intermediate file has been successfully read to the cluster management client, and the sent messages respectively include the clusters in the cluster information of the intermediate files respectively read. Name abcd_1147, Efgh_1214 and ijkl_1359.
  • the cluster management client updates the expiration time of the corresponding three cluster information to the current time according to the received three cluster names, so that the cluster information of the intermediate file A, the intermediate file B, and the intermediate file C are immediately expired. So that the second server deletes the corresponding three cluster information. After the second server deletes the corresponding cluster information, the locally saved cluster information is synchronized to the first server. After the first server finds that the cluster information corresponding to the intermediate file A, the intermediate file B, and the intermediate file C has been deleted by the second server, the intermediate file A, the intermediate file B, and the intermediate file C saved by the local disk are deleted. Thereby, the storage space of the first server and the second server is released, so that the intermediate files generated by the job execution process can be cleaned up in time.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
  • first device if a first device is coupled to a second device, the first device can be directly electrically coupled to the second device, or electrically coupled indirectly through other devices or coupling means. Connected to the second device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种中间文件处理方法、客户端、服务器和系统,包括:接收到来自第一客户端的向第一服务器写入中间文件的消息(S201);向第二服务器请求创建中间文件的簇信息(S202);当簇信息创建成功之后,接收到第二服务器返回的簇信息;其中,簇信息包括簇名称和优先级(S203);发送簇信息至第一客户端和第二客户端,由第一客户端向第一服务器上传中间文件,使第一服务器根据本地磁盘负载和优先级写入中间文件,由第二客户端根据簇信息从第一服务器读取中间文件(S204)。保证优先级较高的作业的任务得到及时处理,防止优先级较高的作业被拖慢,维持磁盘的负载均衡。

Description

中间文件处理方法、客户端、服务器和系统 技术领域
本发明属于互联网技术领域,具体地说,涉及一种中间文件处理方法、客户端、服务器和系统。
背景技术
在分布式数据计算处理中(如MapReduce、Spark),对数据处理的每个作业(Job)包含多个任务(Task),每个任务对数据进行相应的转换操作,通常由多个任务执行(Task worker)线程并行处理不同的数据分片来完成。任务执行线程之间通常需要交换大量的中间结果数据,这些中间结果数据以中间文件的形式提供访问。如何高效处理中间文件是分布式数据处理的一个关键技术。
现有的技术方案主要为:每个任务处理线程直接访问服务器本地文件系统将输出的中间文件进行存储,并将文件位置信息汇报给作业管理节点(负责管理作业内的任务执行的节点),作业管理节点告知下一个要读取中间文件的任务相应的中间文件存储地址,下一个任务执行线程通过HTTP访问来远程读取所需要的中间文件数据。当该作业执行完毕后,作业管理节点通知每台机器上的代理(agent)部件删除该作业的中间文件。
直接访问每台服务器的本地文件系统存取中间文件,有些中间文件可能很大或者某些服务器/磁盘承载的中间文件很多,通常会造成负载不均。无法根据每台服务器和磁盘的负载动态调整中间文件的读写调度。虽然可以通过备份实例(Backup Instance)来减弱某些任务执行线程运行时间长尾带来的影响,但无法做好更细粒度的优化。
发明内容
有鉴于此,本申请提供了一种中间文件处理方法、客户端、服务器和系 统,用以解决现有技术中无法动态调整针对中间文件的读写操作的问题。
为了解决上述技术问题,本申请公开了一种中间文件处理方法,适用于簇管理客户端,包括:接收到来自第一客户端的向第一服务器写入中间文件的消息;向第二服务器请求创建所述中间文件的簇信息;当所述簇信息创建成功之后,接收到所述第二服务器返回的所述簇信息;其中,所述簇信息包括簇名称和优先级;发送所述簇信息至所述第一客户端和第二客户端,由所述第一客户端向所述第一服务器上传所述中间文件,使所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件,由所述第二客户端根据所述簇信息从所述第一服务器读取所述中间文件。
为了解决上述技术问题,本申请还公开了一种中间文件处理方法,其特征在于,包括:接收到来自第一客户端的写入中间文件的请求,所述写入中间文件的请求包括所述中间文件的簇信息,其中,所述簇信息由第二服务器创建并发送至簇管理客户端,再由所述簇管理客户端发送至所述第一客户端和第二客户端;所述簇信息包括簇名称和优先级;验证所述接收到的簇信息;当所述接收到的簇信息验证成功时,发送验证成功的消息至所述第一客户端;接收所述第一客户端上传的所述中间文件,根据本地磁盘负载和所述优先级写入所述中间文件。
为了解决上述技术问题,本申请还公开了一种中间文件处理方法,适用于第二服务器,包括:根据来自簇管理客户端的请求,创建簇信息,所述簇信息是由第一客户端向第一服务器写入的中间文件的簇信息,所述簇信息包括簇名称和优先级;发送所述簇信息至所述簇管理客户端,由所述簇管理客户端发送所述簇信息至所述第一客户端和第二客户端,由所述第一客户端向所述第一服务器上传所述中间文件,使所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件,由所述第二客户端根据所述簇信息从所述第一服务器读取所述中间文件。
为了解决上述技术问题,本申请还公开了一种中间文件处理方法,适用于第一客户端,包括:发送向第一服务器写入中间文件的消息至簇管理客户端,由所述簇管理客户端向第二服务器请求创建所述中间文件的簇信息;接收到所述簇管理客户端返回的所述簇信息,其中,所述簇信息包括簇名称和 优先级;发送写入所述中间文件的请求至所述第一服务器,所述写入中间文件的请求包括所述簇信息,由所述第一服务器验证所述簇信息;在所述簇信息由所述第一服务器验证成功之后,向所述第一服务器写入所述中间文件,由所述第一服务器根据本地磁盘负载和所述优先级将所述中间文件写入磁盘。
为了解决上述技术问题,本申请还公开了一种中间文件处理方法,适用于第二客户端,包括:接收到来自簇管理客户端的簇信息;其中,所述簇信息是向第一服务器写入的中间文件的簇信息,由所述簇管理客户端向第二服务器请求创建;所述簇信息包括簇名称和优先级;根据所述簇信息向所述第一服务器查询所述簇信息对应的中间文件在磁盘的写入位置信息,所述中间文件是由第一客户端上传至所述第一服务器,由所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件;根据所述写入位置信息从所述第一服务器读取所述中间文件。
为了解决上述技术问题,本申请还公开了一种中间文件处理方法客户端,包括:第一接收模块,用于接收到来自第一客户端的向第一服务器写入中间文件的消息;请求模块,用于向第二服务器请求创建所述中间文件的簇信息;第二接收模块,当所述簇信息创建成功之后,接收到所述第二服务器返回的所述簇信息;其中,所述簇信息包括簇名称和优先级;第一发送模块,发送所述簇信息至所述第一客户端和第二客户端,由所述第一客户端向所述第一服务器上传所述中间文件,使所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件,由所述第二客户端根据所述簇信息从所述第一服务器读取所述中间文件。
为了解决上述技术问题,本申请还公开了一种中间文件处理服务器,包括:第四接收模块,用于接收到来自第一客户端的写入中间文件的请求,所述写入中间文件的请求包括所述中间文件的簇信息,其中,所述簇信息由第二服务器创建并发送至簇管理客户端,再由所述簇管理客户端发送至所述第一客户端和第二客户端;所述簇信息包括簇名称和优先级;验证模块,用于验证所述接收到的簇信息;第三发送模块,用于当所述接收到的簇信息验证成功时,发送验证成功的消息至所述第一客户端;第一写入模块,用于接收 所述第一客户端上传的所述中间文件,根据本地磁盘负载和所述优先级写入所述中间文件。
为了解决上述技术问题,本申请还公开了一种中间文件处理服务器,包括:创建模块,用于根据来自簇管理客户端的请求,创建簇信息,所述簇信息是由第一客户端向第一服务器写入的中间文件的簇信息,所述簇信息包括簇名称和优先级;第七发送模块,用于发送所述簇信息至所述簇管理客户端,由所述簇管理客户端发送所述簇信息至所述第一客户端和第二客户端,由所述第一客户端向所述第一服务器上传所述中间文件,使所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件,由所述第二客户端根据所述簇信息从所述第一服务器读取所述中间文件。
为了解决上述技术问题,本申请还公开了一种中间文件处理客户端,包括:第八发送模块,用于发送向第一服务器写入中间文件的消息至簇管理客户端,由所述簇管理客户端向第二服务器请求创建所述中间文件的簇信息;第七接收模块,用于接收到所述簇管理客户端返回的所述簇信息,其中,所述簇信息包括簇名称和优先级;第九发送模块,用于发送写入所述中间文件的请求至所述第一服务器,所述写入中间文件的请求包括所述簇信息,由所述第一服务器验证所述簇信息;第二写入模块,用于在所述簇信息由所述第一服务器验证成功之后,向所述第一服务器写入所述中间文件,由所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件。
为了解决上述技术问题,本申请还公开了一种中间文件处理客户端,包括:第八接收模块,用于接收到来自簇管理客户端的簇信息;其中,所述簇信息是向第一服务器写入的中间文件的簇信息,由所述簇管理客户端向第二服务器请求创建;所述簇信息包括簇名称和优先级;第二查询模块,用于根据所述簇信息向所述第一服务器查询所述簇信息对应的中间文件在磁盘的写入位置信息,所述中间文件是由第一客户端上传至所述第一服务器,由所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件;读取模块,用于根据所述写入位置信息从所述第一服务器读取所述中间文件。
为了解决上述技术问题,本申请还公开了一种中间文件处理系统,包括:第一客户端,第二客户端,第一服务器,第二服务器和簇管理客户端;第一 客户端在向第一服务器写入中间文件之前,发送向第一服务器写入中间文件的消息至簇管理客户端;簇管理客户端向第二服务器请求创建所述中间文件的簇信息,在接收到第二服务器返回的簇信息之后,发送所述簇信息至第一客户端和第二客户端,其中,所述簇信息包括簇名称和优先级;第一客户端根据所述簇信息向第一服务器请求写入所述中间文件,在接收到第一服务器返回的已成功验证所述簇信息的消息之后,向第一服务器上传所述中间文件;第一服务器根据本地磁盘负载和所述优先级写入所述中间文件;第二客户端根据所述簇信息从第一服务器读取所述中间文件。
与现有技术相比,本申请可以获得包括以下技术效果:保证优先级较高的作业的任务得到及时处理,防止优先级较高的作业被拖慢,维持磁盘的负载均衡。
当然,实施本申请的任一产品必不一定需要同时达到以上所述的所有技术效果。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是本申请实施例的一种中间文件处理系统的结构示意图;
图2是本申请实施例的一种中间文件处理方法的流程示意图;
图3是本申请实施例的一种中间文件处理方法的流程示意图;
图4是本申请实施例的一种中间文件处理方法的流程示意图;
图5是本申请实施例的一种中间文件处理方法的流程示意图;
图6是本申请实施例的一种中间文件处理方法的流程示意图;
图7是本申请实施例的一种中间文件处理客户端的结构示意图;
图8是本申请实施例的一种中间文件处理服务器的结构示意图;
图9是本申请实施例的一种中间文件处理服务器的结构示意图;
图10是本申请实施例的一种中间文件处理客户端的结构示意图;
图11是本申请实施例的一种中间文件处理客户端的结构示意图。
具体实施方式
以下将配合附图及实施例来详细说明本发明的实施方式,藉此对本发明如何应用技术手段来解决技术问题并达成技术功效的实现过程能充分理解并据以实施。
图1是本申请实施例提供的一种中间文件处理系统的拓扑结构示意图,该系统包括第一服务器10,第二服务器11,簇管理客户端12,第一客户端13和第二客户端14。
第一客户端13是用以向第一服务器10写入中间文件的客户端,第二客户端14是用以从第一服务器10读取该中间文件的客户端。在整体的作业(Job)执行方案中包括多个任务(Task),第一客户端13是上一级任务的执行者,第二客户端14是下一级任务的执行者,经第一客户端13数据处理后产生的中间文件,需要由第二客户端14读取来继续进行后续处理。
第一服务器10是用以存储中间文件的服务器,完成第一客户端13和第二客户端14针对中间文件的读写操作。第二服务器11用以创建并存储中间文件的簇信息,通过簇信息来管理第一服务器10存储的中间文件。每个簇信息都具有过期时间,在簇信息过期之后,第二服务器11会将过期的簇信息删除。第二服务器11通过后台心跳向第一服务器10同步簇信息,使第一服务器10将已删除的簇信息所对应的中间文件删除,从而完成中间文件的清理。
簇管理客户端12用以向第二服务器11请求创建簇信息,并定期向第二服务器11发送延长该簇信息的过期时间的请求,以延长簇信息的生命周期。第二客户端14已从第一服务器10读取中间文件之后,簇管理客户端12用以向第二服务器11更新该中间文件的簇信息的过期时间为当前时间,使第二服务器11删除该中间文件的簇信息。
在该中间文件处理系统中,针对中间文件的写入和读取过程如下。
第一客户端13的任务处理完成后,需要将结果数据以中间文件的形式上传到第一服务器10,第一客户端13发送向第一服务器10写入中间文件的消息至簇管理客户端12,该消息中包括第一客户端13的标识、中间文件的数据量以及该第一客户端13所执行的作业方案的用户标识(User id)。其中第一客户端13的标识是第一客户端13所执行任务的任务标识(Task id),通常是唯一的数字,例如Task id=1673;该中间数据文件的数据量为预计占用的存储空间,例如1042Kb;该用户标识可以是数字或字母组成的字符串。
簇管理客户端12接收到来自第一客户端13的上述消息后,向第二服务器11请求创建该中间文件的簇信息,该请求包括第一客户端13的标识,中间文件的数据量以及该第一客户端13所执行的作业方案的用户标识。
第二服务器11接收到该请求后,创建相应的簇信息包括:根据其中的用户标识进行验证,确定该用户标识所代表的用户执行该任务的优先级;根据第一客户端13的标识(Task id)生成簇名称,在第一客户端13的标识的前面和/或后面添加随机字符串,生成全局唯一的簇名称,例如,第一客户端的标识为1673,生成全局唯一的簇名称bcd_1673;根据中间文件的数据量确定存储空间额度;生成默认的过期时间,例如:从当前时间起算30分钟。创建的簇信息包括:簇名称,过期时间,优先级和存储空间额度。例如,簇名称:bcd_1673;过期时间:6:35(当前时间6:05);优先级:1;存储空间额度:1042Kb。第二服务器11创建簇信息成功后,将该簇信息在本地磁盘备份存储,并发送该簇信息至簇管理客户端12。
簇管理客户端12接收到第二服务器11返回的簇信息之后,将该簇信息发送给第一客户端13,同时根据作业的执行方案将该簇信息发送给需要利用该中间文件进行后续处理的第二客户端14。
第一客户端13发送写入中间文件的请求至第一服务器10,该写入中间文件的请求包括簇信息,由第一服务器10验证该簇信息。第一服务器10向第二服务器11验证该簇信息是否存在,如果第二服务器11存在该簇信息,第一服务器10发送验证成功的消息至第一客户端13;如果验证失败,则阻止第一客户端13的本次中间文件的写入。例如,第一服务器10向第二服务器11验证上述簇信息“簇名称:bcd_1673;过期时间:6:35(当前时间6:05); 优先级:1;存储空间额度:1042Kb”是否存在,验证成功后发送消息至第一客户端13,由第一客户端13则上传该簇信息对应的中间文件至第一服务器10。
第一服务器10接收到第一客户端13上传的中间文件后,根据本地磁盘负载将中间文件写入到负载较轻的磁盘,使本地各个磁盘的负载得到均衡。当需要写入的中间文件的第一客户端13有多个时,第一服务器11根据各第一客户端13发送的簇信息中的优先级调整写入顺序,优先级较高的中间文件优先写入,以保证优先级较高的作业的任务得到及时处理,防止优先级较高的作业被拖慢。
第二客户端14接收到簇信息之后,定期向第一服务器10查询该簇信息对应的中间文件在磁盘的写入位置。例如,每隔30秒向第一服务器10查询一次,发送的查询请求包括该簇信息。第一服务器10查询到该簇信息对应的中间文件的写入位置信息之后,发送该写入位置信息至第二客户端14。第二客户端14根据查询到的写入位置信息从第一服务器10读取由第一客户端13已写入的中间文件。当同时有多个第二客户端14需要读取中间文件时,簇信息中的优先级较高的中间文件会被对应的第二客户端14优先读取,从而使中间文件的读取顺序也会随着写入顺序的调整而调整,以保证优先级较高的作业的任务得到及时处理,防止优先级较高的作业被拖慢。
在第一客户端13上传的中间文件被第二客户端14全部读取完成之前,簇管理客户端12每隔第一预设时长向第二服务器11更新簇信息的过期时间,以延长簇信息的生命周期。例如,该第一预设时长为5分钟,每次更新过期时间时,将该簇信息的过期时间延长5分钟。该过期时间每次延长的时间长度在提交的作业执行方案中预先设置,通常设置为提交该作业执行方案的簇管理客户端12发生异常后,删除所产生的中间文件的最长时间,即如果在该时间内簇管理客户端12能够恢复正常,则不会删除已产生的中间文件,作业继续执行,中间文件的簇信息也恰好不会因已过期而被删除,如果在该时间内簇管理客户端12没有恢复正常,则不会向第二服务器11更新簇信息的过期时间,会导致第二服务器11删除已过期的簇信息,进而使第一服务器10删除对应的中间文件。因此,簇管理客户端12每隔第一预设时长 向第二服务器11更新簇信息的过期时间,能够保证该中间文件不会因对应的簇信息已过期而被删除,从而能够被第二客户端14成功读取。
第二客户端14成功读取该中间文件的全部数据之后,发送已成功读取中间文件的消息至簇管理客户端12,该已成功读取中间文件的消息包括该中间文件的簇信息中的簇名称,例如上例中的簇名称:bcd_1673。簇管理客户端12向第二服务器11更新该簇名称所对应的簇信息过期时间为当前时间,使该簇信息在第二服务器11立刻处于过期状态,需要被清理。第二服务器11删除该簇信息,通过后台心跳将本地保存的簇信息同步至第一服务器10。第一服务器10确定该簇信息在第二服务器11已被删除后,删除该簇信息对应的中间文件。例如,第二服务器11删除上述簇信息“簇名称:bcd_1673;过期时间:6:35(当前时间6:05);优先级:1;存储空间额度:1042Kb”之后,第一服务器10也将该簇信息对应的中间文件删除,从而释放磁盘存储空间,降低磁盘负载。
本申请实施例的中间文件处理系统,在上述处理过程中如果簇管理客户端12因出现故障而重启,重启后向第二服务器11查询该簇信息是否还存在,以确定是否继续执行当前业务。簇管理客户端12发送第一客户端13的标识至第二服务器11,第二服务器11根据该第一客户端13的标识在存储的簇信息中查询是否有簇名称包含该第一客户端13的标识,从而确定与该第一客户端13上传的中间文件对应的簇信息是否还存在。例如,簇管理客户端12上传第一客户端13的标识“1673”,第二服务器11在保存的簇信息中查询是否存在包含“1673”的簇名称;最终查询到簇名称:bcd_1673,代表在簇管理客户端12重启之后,该簇信息还没有过期;如果最终未查询到包含“1673”的簇名称,则在簇管理客户端12重启之后,该簇信息已经过期并被第二服务器11删除。
当查询到相应的簇信息还存在时,簇管理客户端12向第二服务器11更新该簇信息的过期时间,以使该簇信息不会过期,从而使该簇信息对应的中间文件也不会被第一服务器10删除,中间文件处理系统能够继续执行当前作业。当查询不到相应的簇信息时,说明该簇信息对应的中间文件也已经被第一服务器10删除,簇管理客户端12则重新提交当前作业方案,从而重新 执行当前作业。使该中间文件处理系统在当前作业出现异常时,能够及时回收相应的中间文件,释放磁盘空间,如果异常在短时间内恢复,也能够继续执行当前作业。
第一服务器10每隔第二预设时长向第二服务器11发送簇信息对应的中间文件的使用信息。例如,每隔一分钟发送簇信息对应的中间文件的使用信息。该使用信息包括中间文件的文件数量、以占用的存储空间、存储空间额度的使用率等信息,使第二服务器11能够得知各个簇信息对应的中间文件的全局使用情况,并可以提供接口供簇管理客户端12查询。
在上述中间文件处理系统中,对于其中的簇管理客户端,本申请实施例提供了一种中间文件处理方法,如图2所示,该方法包括以下步骤。
在步骤S201中,接收到来自第一客户端的向第一服务器写入中间文件的消息。
在步骤S202中,向第二服务器请求创建中间文件的簇信息。
在步骤S203中,当簇信息创建成功之后,接收到第二服务器返回的簇信息,其中,簇信息包括簇名称和优先级。
在步骤S204中,发送簇信息至第一客户端和第二客户端,由第一客户端向第一服务器上传中间文件,使第一服务器根据本地磁盘负载和优先级写入中间文件,由第二客户端根据簇信息从第一服务器读取中间文件。
在一个实施例中,簇信息还包括过期时间,上述适用于簇管理客户端的中间文件处理方法还可以进一步包括以下步骤。
每隔第一预设时长向第二服务器更新簇信息的过期时间,以延长簇信息的生命周期。
接收到来自第二客户端的已从第一服务器成功读取中间文件的消息;向第二服务器更新过期时间为当前时间,由第二服务器删除簇信息,进而使第一服务器删除簇信息对应的中间文件。
在宕机重启之后,簇管理客户端发送第一客户端的标识至第二服务器,以查询中间文件的簇信息是否存在;当中间文件的簇信息存在时,继续执行当前作业;当中间文件的簇信息不存在时,重新执行当前作业。
在上述中间文件处理系统中,对于其中的第一服务器,本申请实施例提供了一种中间文件处理方法,如图3所示,该方法包括以下步骤。
在步骤S301中,接收到来自第一客户端的写入中间文件的请求,写入中间文件的请求包括中间文件的簇信息,其中,簇信息由第二服务器创建并发送至簇管理客户端,再由簇管理客户端发送至第一客户端和第二客户端;簇信息包括簇名称和优先级。
在步骤S302中,验证接收到的簇信息。
在步骤S303中,当接收到的簇信息验证成功时,发送验证成功的消息至第一客户端。
在步骤S304中,接收第一客户端上传的中间文件,根据本地磁盘负载和优先级写入中间文件。
在一个实施例中,上述适用于第一服务器的中间文件处理方法还可以进一步包括以下步骤。
根据来自第二客户端的查询请求,发送簇信息对应的中间文件的写入位置至第二客户端,其中,查询请求包括簇信息;
根据来自第二客户端的读取请求,发送簇信息对应的中间文件至第二客户端,其中,读取请求包括写入位置。
每隔第二预设时长向第二服务器发送簇信息对应的中间文件的使用信息。
簇信息还包括过期时间,上述中间文件处理方法还包括:同步第二服务器保存的簇信息;当中间文件的簇信息已被第二服务器删除时,删除簇信息对应的中间文件。
在上述中间文件处理系统中,对于其中的第二服务器,本申请实施例提供了一种中间文件处理方法,如图4所示,该方法包括以下步骤。
在步骤S401中,根据来自簇管理客户端的请求,创建簇信息,其中,该簇信息是由第一客户端向第一服务器写入的中间文件的簇信息,簇信息包括簇名称和优先级。
在步骤S402中,发送簇信息至簇管理客户端,由簇管理客户端发送簇 信息至第一客户端和第二客户端,再由第一客户端向第一服务器上传中间文件,使第一服务器根据本地磁盘负载和优先级写入中间文件,由第二客户端根据簇信息从第一服务器读取中间文件。
在一个实施例中,簇信息还包括过期时间,上述适用于第二服务器的中间文件处理方法还可以进一步包括以下步骤。
每隔第一预设时间从簇管理客户端更新过期时间,延长簇信息的生命周期。
在第二客户端已从第一服务器成功读取中间文件之后,从簇管理客户端更新过期时间为当前时间;删除中间文件的簇信息。
同步本地保存的簇信息至第一服务器,使中间文件的簇信息已被删除时,由第一服务器删除簇信息对应的中间文件。
每隔第二预设时长接收到来自第一服务器的簇信息对应的中间文件的使用信息。
在簇管理客户端宕机重启之后,从簇管理客户端接收到第一客户端的标识;根据第一客户端的标识查询簇信息是否存在;向簇管理客户端返回查询结果,使簇管理客户端确定是否继续执行当前作业。
在上述中间文件处理系统中,对于其中的第一客户端,本申请实施例提供了一种中间文件处理方法,如图5所示,该方法包括以下步骤。
在步骤S501中,发送向第一服务器写入中间文件的消息至簇管理客户端,由簇管理客户端向第二服务器请求创建中间文件的簇信息。
在步骤S502中,接收述簇管理客户端返回的簇信息,其中,簇信息包括簇名称和优先级。
在步骤S503中,发送写入中间文件的请求至第一服务器,写入中间文件的请求包括簇信息,由第一服务器验证簇信息。
在步骤S504中,在簇信息由第一服务器验证成功之后,向第一服务器写入中间文件,由第一服务器根据本地磁盘负载和优先级写入中间文件。
在上述中间文件处理系统中,对于其中的第二客户端,本申请实施例提供了一种中间文件处理方法,如图6所示,该方法包括以下步骤。
在步骤S601中,接收到来自簇管理客户端的簇信息;其中,簇信息是向第一服务器写入的中间文件的簇信息,由簇管理客户端向第二服务器请求创建;簇信息包括簇名称和优先级。
在步骤S602中,根据簇信息向第一服务器查询簇信息对应的中间文件在磁盘的写入位置信息,中间文件是由第一客户端上传至第一服务器,由第一服务器根据本地磁盘负载和优先级写入中间文件。
在步骤S603中,根据写入位置信息从第一服务器读取中间文件。
在一个实施例中,簇信息还包括过期时间,上述适用于第二客户端的中间文件处理方法还可以进一步包括以下步骤。
发送已成功读取中间文件的消息至簇管理客户端,由簇管理客户端向第二服务器更新过期时间为当前时间,使第二服务器删除簇信息,进而使第一服务器删除簇信息对应的中间文件。
上述分别适用于簇管理客户端、第一服务器、第二服务器、第一客户端和第二客户端的中间文件处理方法,其中各个步骤的实现方式已在由它们组成的中间文件处理系统的实施例中进行了说明,在此不再进行重复。
图7是本申请实施例通过的一种中间文件处理客户端,包括:
第一接收模块70,用于接收到来自第一客户端的向第一服务器写入中间文件的消息;
请求模块71,用于向第二服务器请求创建中间文件的簇信息;
第二接收模块72,当簇信息创建成功之后,接收到第二服务器返回的簇信息;其中,簇信息包括簇名称和优先级。
第一发送模块73,发送簇信息至第一客户端和第二客户端,由第一客户端向第一服务器上传中间文件,使第一服务器根据本地磁盘负载和优先级写入中间文件,由第二客户端根据簇信息从第一服务器读取中间文件。
在一个实施例中,簇信息还包括过期时间,该客户端还包括:
第一更新模块,用于每隔第一预设时长向第二服务器更新簇信息的过期时间,以延长簇信息的生命周期。
第三接收模块,用于接收到来自第二客户端的已从第一服务器成功读取中间文件的消息;第二更新模块,用于向所述第二服务器更新所述过期时间为当前时间,由所述第二服务器删除所述簇信息,进而使所述第一服务器删除所述簇信息对应的中间文件。
第二发送模块,用于在宕机重启之后,发送所述第一客户端的标识至所述第二服务器,以查询所述中间文件的簇信息是否存在;第一执行模块,用于当所述中间文件的簇信息存在时,继续执行当前作业;第二执行模块,用于当所述中间文件的簇信息不存在时,重新执行所述当前作业。
图8是本申请实施例通过的一种中间文件处理服务器,包括:
第四接收模块80,用于接收到来自第一客户端的写入中间文件的请求,写入中间文件的请求包括中间文件的簇信息,其中,簇信息由第二服务器创建并发送至簇管理客户端,再由簇管理客户端发送至第一客户端和第二客户端;簇信息包括簇名称和优先级;
验证模块81,用于验证接收到的簇信息;
第三发送模块82,用于当接收到的簇信息验证成功时,发送验证成功的消息至第一客户端;
第一写入模块83,用于接收第一客户端上传的中间文件,根据本地磁盘负载和优先级写入中间文件。
在一个实施例中,簇信息还包括过期时间,该服务器还包括:
第四发送模块,根据来自第二客户端的查询请求,发送簇信息对应的中间文件的写入位置至第二客户端,其中,查询请求包括簇信息;第五发送模块,用于根据来自第二客户端的读取请求,发送簇信息对应的中间文件至第二客户端,其中,读取请求包括写入位置。
第六发送模块,用于每隔第二预设时长向第二服务器发送簇信息对应的中间文件的使用信息。
第一同步模块,用于同步第二服务器保存的簇信息;第一删除模块,用于当中间文件的簇信息已被第二服务器删除时,删除簇信息对应的中间文件。
图9是本申请实施例通过的一种中间文件处理服务器,包括:
创建模块90,用于根据来自簇管理客户端的请求,创建簇信息,簇信息是由第一客户端向第一服务器写入的中间文件的簇信息,簇信息包括簇名称和优先级;
第七发送模块91,用于发送簇信息至簇管理客户端,由簇管理客户端发送簇信息至第一客户端和第二客户端,由第一客户端向第一服务器上传中间文件,使第一服务器根据本地磁盘负载和优先级写入中间文件,由第二客户端根据簇信息从第一服务器读取中间文件。
在一个实施例中,簇信息还包括过期时间,该服务器还包括:
第三更新模块,用于每隔第一预设时间从簇管理客户端更新过期时间,延长簇信息的生命周期。
第四更新模块,用于在第二客户端已从第一服务器成功读取中间文件之后,从簇管理客户端更新过期时间为当前时间;第二删除模块,用于删除中间文件的簇信息。
第二同步模块,用于同步本地保存的簇信息至第一服务器,使中间文件的簇信息已被删除时,由第一服务器删除簇信息对应的中间文件。
第五接收模块,用于每隔第二预设时长接收到来自第一服务器的簇信息对应的中间文件的使用信息。
第六接收模块,用于在簇管理客户端宕机重启之后,从簇管理客户端接收到第一客户端的标识;第一查询模块,用于根据第一客户端的标识查询簇信息是否存在;反馈模块,用于向簇管理客户端返回查询结果,使簇管理客户端确定是否继续执行当前作业。
图10是本申请实施例通过的一种中间文件处理客户端,包括:
第八发送模块100,用于发送向第一服务器写入中间文件的消息至簇管理客户端,由簇管理客户端向第二服务器请求创建中间文件的簇信息;
第七接收模块101,用于接收到簇管理客户端返回的簇信息,其中,簇信息包括簇名称和优先级;
第九发送模块102,用于发送写入中间文件的请求至第一服务器,写入 中间文件的请求包括簇信息,由第一服务器验证簇信息;
第二写入模块103,用于在簇信息由第一服务器验证成功之后,向第一服务器写入中间文件,由第一服务器根据本地磁盘负载和优先级将中间文件写入磁盘。
图11是本申请实施例通过的一种中间文件处理客户端,包括:
第八接收模块110,用于接收到来自簇管理客户端的簇信息;其中,簇信息是向第一服务器写入的中间文件的簇信息,由簇管理客户端向第二服务器请求创建;簇信息包括簇名称和优先级;
第二查询模块111,用于根据簇信息向第一服务器查询簇信息对应的中间文件在磁盘的写入位置信息,中间文件是由第一客户端上传至第一服务器,由第一服务器根据本地磁盘负载和优先级写入中间文件;
读取模块112,用于根据写入位置信息从第一服务器读取中间文件。
在一个实施例中,簇信息还包括过期时间,该客户端还包括:
第十发送模块,发送已成功读取中间文件的消息至簇管理客户端,由簇管理客户端向第二服务器更新过期时间为当前时间,使第二服务器删除簇信息,进而使第一服务器删除簇信息对应的中间文件。
下面通过应用实例对本申请实施例提供的中间文件处理系统做进一步说明。
在云计算平台中,会同时执行多个作业(Job)方案。每个作业方案包括多个任务(Task),对于各个任务之间的中间结果数据来说,上一个任务执行后,将中间结果数据以中间文件的形式写入第一服务器,由下一个任务从第一服务器读取中间结果数据。
簇管理客户端当前提交了三个作业方案,Job1、Job2和Job3。其中,Job1对应的用户标识为SamZhang,Job2对应的用户标识为SissiLi,Job3对应的用户标识为LeoZhao。对于每个作业方案,都具有已执行完当前任务的第一客户端和需要继续执行下一个任务的第二客户端,第一客户端需要向第一服务器写入中间文件,第二客户端需要从第一服务器读取中间文件。
当前,对于Job1,包括已执行完任务1147(Task id)的第一客户端和继 续执行任务1148的第二客户端;对于Job2,包括已执行完任务1214的第一客户端和继续执行任务1215的第二客户端;对于Job3,包括已执行完任务1359的第一客户端和继续执行任务1360的第二客户端。
对于Job1,已执行完任务1147的第一客户端需要向第一服务器写入中间文件A。发送向第一服务器写入中间文件的消息至簇管理客户端,该消息包括该第一客户端的标识1147,中间文件A的数据量2264Kb,和用户标识SamZhang。对于Job2,已执行完任务1214的第一客户端需要向第一服务器写入中间文件B。发送向第一服务器写入中间文件的消息至簇管理客户端,该消息包括该第一客户端的标识1214,中间文件B的数据量3376Kb,和用户标识SissiLi。对于Job3,已执行完任务1359的第一客户端需要向第一服务器写入中间文件C。发送向第一服务器写入中间文件的消息至簇管理客户端,该消息包括该第一客户端的标识1359,中间文件C的数据量4043Kb,和用户标识LeoZhao。簇管理客户端向第二服务器请求创建中间文件A、中间文件B和中间文件C的簇信息,该请求中也同样包括上述信息。
第二服务器接收到簇管理客户端的请求后,开始创建对应的簇信息。第二服务器根据第一客户端的标识生成簇名称,分别生成簇名称abcd_1147,efgh_1214和ijkl_1359;根据用户标识分别确定优先级,用户标识SamZhang对应的优先级为2,用户标识SissiLi对应的优先级为3,用户标识LeoZhao对应的优先级为1;根据中间文件的数据量分别存储空间额度;生成默认的过期时间,例如过期时间=当前时间+30分钟,如果当前时间为5:06:50,则创建的三个簇信息的过期时间为5:36:50。第二服务器为中间文件A、中间文件B和中间文件C创建的簇信息分别为:
“abcd_1147,5:36:50,2,2264Kb”;
“efgh_1214,5:36:50,3,3376Kb”;
“ijkl_1359,5:36:50,1,4043Kb”。
第二服务器发送上述创建的簇信息至簇管理客户端。簇管理客户端将中间文件A的簇信息分别发送到已执行完任务1147的第一客户端和继续执行任务1148的第二客户端,将中间文件B的簇信息分别发送到已执行完任务1214的第一客户端和继续执行任务1215的第二客户端,将中间文件C的簇 信息分别发送到已执行完任务1359的第一客户端和已执行完任务1360的第二客户端。
已执行完任务1147的第一客户端,已执行完任务1214的第一客户端和已执行完任务1359的第一客户端分别向第一服务器发送写入中间文件A、中间文件B和中间文件C的请求。上述写入中间文件的请求分别包括中间文件A的簇信息、中间文件B的簇信息和中间文件C的簇信息。第一服务器接收到上述簇信息后,会根据簇名称向第二服务器进行验证,在验证成功后返回验证成功的消息至以上三个第一客户端。以上三个第一客户端接收到验证成功的消息后,分别将中间文件A、中间文件B和中间文件C上传至第一服务器。
第一服务器在写入接收到中间文件时,根据当前磁盘负载情况将中间文件写入到负载较轻的磁盘,根据各个中间文件的簇信息中的优先级确定写入的优先顺序。中间文件C的簇信息中的优先级为1,中间文件A的簇信息中的优先级为2,中间文件B的簇信息中的优先级为3,因此,第一服务器会优先写入中间文件C,再写入中间文件A,最后写入中间文件B。从而保证优先级较高的作业的任务得到及时处理,防止优先级较高的作业被拖慢。
继续执行任务1148的第二客户端,继续执行任务1215的第二客户端和继续执行任务1360的第二客户端,接收到来自簇管理客户端的簇信息之后,根据该簇信息定期向第一服务器查询对应的中间文件的写入位置。由于簇信息中的优先级为1的中间文件C最先被写入,因此继续执行任务1360的第二客户端会先查询到中间文件C的写入位置,继续执行任务1360的第二客户端根据查询到的写入位置从第一服务器读取中间文件C。然后,继续执行任务1148的第二客户端会查询到中间文件A的写入位置,并从第一服务器读取中间文件A。然后,继续执行任务1215的第二客户端会查询到中间文件B的写入位置,并从第一服务器读取中间文件B。通过优先级调整各个中间文件被写入的顺序,使各个中间文件被读取的顺序也能够随之调整,使较高优先级的任务能够被优先处理。以上三个第二客户端分别成功读取对应的中间文件之后,分别发送已成功读取中间文件的消息至簇管理客户端,发送的消息分别包括各自读取的中间文件的簇信息中的簇名称abcd_1147, efgh_1214和ijkl_1359。
簇管理客户端根据接收到的以上三个簇名称,向第二服务器将对应的三个簇信息的过期时间更新为当前时间,使中间文件A、中间文件B和中间文件C的簇信息立刻过期,使第二服务器将对应的三条簇信息删除。第二服务器将对应的簇信息删除后,将本地保存的簇信息同步到第一服务器。第一服务器发现中间文件A、中间文件B和中间文件C对应的簇信息已被第二服务器删除后,删除本地磁盘保存的中间文件A、中间文件B和中间文件C。从而释放第一服务器和第二服务器的存储空间,使作业执行过程产生的中间文件能够及时被清理。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
如在说明书及权利要求当中使用了某些词汇来指称特定组件。本领域技术人员应可理解,硬件制造商可能会用不同名词来称呼同一个组件。本说明书及权利要求并不以名称的差异来作为区分组件的方式,而是以组件在功能上的差异来作为区分的准则。如在通篇说明书及权利要求当中所提及的“包含”为一开放式用语,故应解释成“包含但不限定于”。“大致”是指在可接收的 误差范围内,本领域技术人员能够在一定误差范围内解决所述技术问题,基本达到所述技术效果。此外,“耦接”一词在此包含任何直接及间接的电性耦接手段。因此,若文中描述一第一装置耦接于一第二装置,则代表所述第一装置可直接电性耦接于所述第二装置,或通过其他装置或耦接手段间接地电性耦接至所述第二装置。说明书后续描述为实施本发明的较佳实施方式,然所述描述乃以说明本发明的一般原则为目的,并非用以限定本发明的范围。本发明的保护范围当视所附权利要求所界定者为准。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的商品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种商品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的商品或者系统中还存在另外的相同要素。
上述说明示出并描述了本发明的若干优选实施例,但如前所述,应当理解本发明并非局限于本文所披露的形式,不应看作是对其他实施例的排除,而可用于各种其他组合、修改和环境,并能够在本文所述发明构想范围内,通过上述教导或相关领域的技术或知识进行改动。而本领域人员所进行的改动和变化不脱离本发明的精神和范围,则都应在本发明所附权利要求的保护范围内。

Claims (35)

  1. 一种中间文件处理方法,适用于簇管理客户端,其特征在于,包括:
    接收到来自第一客户端的向第一服务器写入中间文件的消息;
    向第二服务器请求创建所述中间文件的簇信息;
    当所述簇信息创建成功之后,接收到所述第二服务器返回的所述簇信息;其中,所述簇信息包括簇名称和优先级;
    发送所述簇信息至所述第一客户端和第二客户端,由所述第一客户端向所述第一服务器上传所述中间文件,使所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件,由所述第二客户端根据所述簇信息从所述第一服务器读取所述中间文件。
  2. 如权利要求1所述的方法,其特征在于,所述簇信息还包括过期时间,所述方法还包括:
    每隔第一预设时长向所述第二服务器更新所述簇信息的过期时间,以延长所述簇信息的生命周期。
  3. 如权利要求2所述的方法,其特征在于,所述方法还包括:
    接收到来自所述第二客户端的已从所述第一服务器成功读取所述中间文件的消息;
    向所述第二服务器更新所述过期时间为当前时间,由所述第二服务器删除所述簇信息,进而使所述第一服务器删除所述簇信息对应的中间文件。
  4. 如权利要求2所述的方法,其特征在于,所述方法还包括:
    在宕机重启之后,发送所述第一客户端的标识至所述第二服务器,以查询所述中间文件的簇信息是否存在;
    当所述中间文件的簇信息存在时,继续执行当前作业;
    当所述中间文件的簇信息不存在时,重新执行所述当前作业。
  5. 一种中间文件处理方法,适用于第一服务器,其特征在于,包括:
    接收到来自第一客户端的写入中间文件的请求,所述写入中间文件的请求包括所述中间文件的簇信息,其中,所述簇信息由第二服务器创建并发送 至簇管理客户端,再由所述簇管理客户端发送至所述第一客户端和第二客户端;所述簇信息包括簇名称和优先级;
    验证所述接收到的簇信息;
    当所述接收到的簇信息验证成功时,发送验证成功的消息至所述第一客户端;
    接收所述第一客户端上传的所述中间文件,根据本地磁盘负载和所述优先级写入所述中间文件。
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:
    根据来自所述第二客户端的查询请求,发送所述簇信息对应的中间文件的写入位置至所述第二客户端,其中,所述查询请求包括所述簇信息;
    根据来自所述第二客户端的读取请求,发送所述簇信息对应的中间文件至所述第二客户端,其中,所述读取请求包括所述写入位置。
  7. 如权利要求5所述的方法,其特征在于,所述方法还包括:
    每隔第二预设时长向所述第二服务器发送所述簇信息对应的中间文件的使用信息。
  8. 如权利要求5所述的方法,其特征在于,所述簇信息还包括过期时间,所述方法还包括:
    同步所述第二服务器保存的簇信息;
    当所述中间文件的簇信息已被所述第二服务器删除时,删除所述簇信息对应的中间文件。
  9. 一种中间文件处理方法,适用于第二服务器,其特征在于,包括:
    根据来自簇管理客户端的请求,创建簇信息,所述簇信息是由第一客户端向第一服务器写入的中间文件的簇信息,所述簇信息包括簇名称和优先级;
    发送所述簇信息至所述簇管理客户端,由所述簇管理客户端发送所述簇信息至所述第一客户端和第二客户端,由所述第一客户端向所述第一服务器上传所述中间文件,使所述第一服务器根据本地磁盘负载和所述优先级写入 所述中间文件,由所述第二客户端根据所述簇信息从所述第一服务器读取所述中间文件。
  10. 如权利要求9所述的方法,其特征在于,所述簇信息还包括过期时间,所述方法还包括:
    每隔第一预设时间从所述簇管理客户端更新所述过期时间,延长所述簇信息的生命周期。
  11. 如权利要求10所述的方法,其特征在于,所述方法还包括:
    在所述第二客户端已从所述第一服务器成功读取所述中间文件之后,从所述簇管理客户端更新所述过期时间为当前时间;
    删除所述中间文件的簇信息。
  12. 如权利要求10所述的方法,其特征在于,所述方法还包括:
    同步本地保存的簇信息至所述第一服务器,使所述中间文件的簇信息已被删除时,由所述第一服务器删除所述簇信息对应的中间文件。
  13. 如权利要求9所述的方法,其特征在于,所述方法还包括:
    每隔第二预设时长接收到来自所述第一服务器的所述簇信息对应的中间文件的使用信息。
  14. 如权利要求10所述的方法,其特征在于,所述方法还包括:
    在所述簇管理客户端宕机重启之后,从所述簇管理客户端接收到第一客户端的标识;
    根据所述第一客户端的标识查询所述簇信息是否存在;
    向所述簇管理客户端返回查询结果,使所述簇管理客户端确定是否继续执行当前作业。
  15. 一种中间文件处理方法,适用于第一客户端,其特征在于,包括:
    发送向第一服务器写入中间文件的消息至簇管理客户端,由所述簇管理客户端向第二服务器请求创建所述中间文件的簇信息;
    接收到所述簇管理客户端返回的所述簇信息,其中,所述簇信息包括簇名称和优先级;
    发送写入所述中间文件的请求至所述第一服务器,所述写入中间文件的请求包括所述簇信息,由所述第一服务器验证所述簇信息;
    在所述簇信息由所述第一服务器验证成功之后,向所述第一服务器写入所述中间文件,由所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件。
  16. 一种中间文件处理方法,适用于第二客户端,其特征在于,包括:
    接收到来自簇管理客户端的簇信息;其中,所述簇信息是向第一服务器写入的中间文件的簇信息,由所述簇管理客户端向第二服务器请求创建;所述簇信息包括簇名称和优先级;
    根据所述簇信息向所述第一服务器查询所述簇信息对应的中间文件在磁盘的写入位置信息,所述中间文件是由第一客户端上传至所述第一服务器,由所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件;
    根据所述写入位置信息从所述第一服务器读取所述中间文件。
  17. 如权利要求16所述的方法,其特征在于,所述簇信息还包括过期时间,所述方法还包括:
    发送已成功读取所述中间文件的消息至所述簇管理客户端,由所述簇管理客户端向所述第二服务器更新所述过期时间为当前时间,使所述第二服务器删除所述簇信息,进而使所述第一服务器删除所述簇信息对应的中间文件。
  18. 一种中间文件处理客户端,其特征在于,包括:
    第一接收模块,用于接收到来自第一客户端的向第一服务器写入中间文件的消息;
    请求模块,用于向第二服务器请求创建所述中间文件的簇信息;
    第二接收模块,当所述簇信息创建成功之后,接收到所述第二服务器返回的所述簇信息;其中,所述簇信息包括簇名称和优先级;
    第一发送模块,发送所述簇信息至所述第一客户端和第二客户端,由所述第一客户端向所述第一服务器上传所述中间文件,使所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件,由所述第二客户端根据所述 簇信息从所述第一服务器读取所述中间文件。
  19. 如权利要求18所述的客户端,其特征在于,所述簇信息还包括过期时间,所述客户端还包括:
    第一更新模块,用于每隔第一预设时长向所述第二服务器更新所述簇信息的过期时间,以延长所述簇信息的生命周期。
  20. 如权利要求19所述的客户端,其特征在于,所述客户端还包括:
    第三接收模块,用于接收到来自所述第二客户端的已从所述第一服务器成功读取所述中间文件的消息;
    第二更新模块,用于向所述第二服务器更新所述过期时间为当前时间,由所述第二服务器删除所述簇信息,进而使所述第一服务器删除所述簇信息对应的中间文件。
  21. 如权利要求19所述的客户端,其特征在于,所述客户端还包括:
    第二发送模块,用于在宕机重启之后,发送所述第一客户端的标识至所述第二服务器,以查询所述中间文件的簇信息是否存在;
    第一执行模块,用于当所述中间文件的簇信息存在时,继续执行当前作业;
    第二执行模块,用于当所述中间文件的簇信息不存在时,重新执行所述当前作业。
  22. 一种中间文件处理服务器,其特征在于,包括:
    第四接收模块,用于接收到来自第一客户端的写入中间文件的请求,所述写入中间文件的请求包括所述中间文件的簇信息,其中,所述簇信息由第二服务器创建并发送至簇管理客户端,再由所述簇管理客户端发送至所述第一客户端和第二客户端;所述簇信息包括簇名称和优先级;
    验证模块,用于验证所述接收到的簇信息;
    第三发送模块,用于当所述接收到的簇信息验证成功时,发送验证成功的消息至所述第一客户端;
    第一写入模块,用于接收所述第一客户端上传的所述中间文件,根据本 地磁盘负载和所述优先级写入所述中间文件。
  23. 如权利要求22所述的服务器,其特征在于,所述服务器还包括:
    第四发送模块,根据来自所述第二客户端的查询请求,发送所述簇信息对应的中间文件的写入位置至所述第二客户端,其中,所述查询请求包括所述簇信息;
    第五发送模块,用于根据来自所述第二客户端的读取请求,发送所述簇信息对应的中间文件至所述第二客户端,其中,所述读取请求包括所述写入位置。
  24. 如权利要求22所述的服务器,其特征在于,所述服务器还包括:
    第六发送模块,用于每隔第二预设时长向所述第二服务器发送所述簇信息对应的中间文件的使用信息。
  25. 如权利要求22所述的服务器,其特征在于,所述簇信息还包括过期时间,所述服务器还包括:
    第一同步模块,用于同步所述第二服务器保存的簇信息;
    第一删除模块,用于当所述中间文件的簇信息已被所述第二服务器删除时,删除所述簇信息对应的中间文件。
  26. 一种中间文件处理服务器,其特征在于,包括:
    创建模块,用于根据来自簇管理客户端的请求,创建簇信息,所述簇信息是由第一客户端向第一服务器写入的中间文件的簇信息,所述簇信息包括簇名称和优先级;
    第七发送模块,用于发送所述簇信息至所述簇管理客户端,由所述簇管理客户端发送所述簇信息至所述第一客户端和第二客户端,由所述第一客户端向所述第一服务器上传所述中间文件,使所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件,由所述第二客户端根据所述簇信息从所述第一服务器读取所述中间文件。
  27. 如权利要求26所述的服务器,其特征在于,所述簇信息还包括过期时间,所述服务器还包括:
    第三更新模块,用于每隔第一预设时间从所述簇管理客户端更新所述过期时间,延长所述簇信息的生命周期。
  28. 如权利要求27所述的方法,其特征在于,所述服务器还包括:
    第四更新模块,用于在所述第二客户端已从所述第一服务器成功读取所述中间文件之后,从所述簇管理客户端更新所述过期时间为当前时间;
    第二删除模块,用于删除所述中间文件的簇信息。
  29. 如权利要求27所述的服务器,其特征在于,所述服务器还包括:
    第二同步模块,用于同步本地保存的簇信息至所述第一服务器,使所述中间文件的簇信息已被删除时,由所述第一服务器删除所述簇信息对应的中间文件。
  30. 如权利要求26所述的服务器,其特征在于,所述服务器还包括:
    第五接收模块,用于每隔第二预设时长接收到来自所述第一服务器的所述簇信息对应的中间文件的使用信息。
  31. 如权利要求27所述的服务器,其特征在于,所述服务器还包括:
    第六接收模块,用于在所述簇管理客户端宕机重启之后,从所述簇管理客户端接收到第一客户端的标识;
    第一查询模块,用于根据所述第一客户端的标识查询所述簇信息是否存在;
    反馈模块,用于向所述簇管理客户端返回查询结果,使所述簇管理客户端确定是否继续执行当前作业。
  32. 一种中间文件处理客户端,其特征在于,包括:
    第八发送模块,用于发送向第一服务器写入中间文件的消息至簇管理客户端,由所述簇管理客户端向第二服务器请求创建所述中间文件的簇信息;
    第七接收模块,用于接收到所述簇管理客户端返回的所述簇信息,其中,所述簇信息包括簇名称和优先级;
    第九发送模块,用于发送写入所述中间文件的请求至所述第一服务器,所述写入中间文件的请求包括所述簇信息,由所述第一服务器验证所述簇信 息;
    第二写入模块,用于在所述簇信息由所述第一服务器验证成功之后,向所述第一服务器写入所述中间文件,由所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件。
  33. 一种中间文件处理客户端,其特征在于,包括:
    第八接收模块,用于接收到来自簇管理客户端的簇信息;其中,所述簇信息是向第一服务器写入的中间文件的簇信息,由所述簇管理客户端向第二服务器请求创建;所述簇信息包括簇名称和优先级;
    第二查询模块,用于根据所述簇信息向所述第一服务器查询所述簇信息对应的中间文件在磁盘的写入位置信息,所述中间文件是由第一客户端上传至所述第一服务器,由所述第一服务器根据本地磁盘负载和所述优先级写入所述中间文件;
    读取模块,用于根据所述写入位置信息从所述第一服务器读取所述中间文件。
  34. 如权利要求33所述的客户端,其特征在于,所述簇信息还包括过期时间,所述客户端还包括:
    第十发送模块,用于发送已成功读取所述中间文件的消息至所述簇管理客户端,由所述簇管理客户端向所述第二服务器更新所述过期时间为当前时间,使所述第二服务器删除所述簇信息,进而使所述第一服务器删除所述簇信息对应的中间文件。
  35. 一种中间文件处理系统,其特征在于,包括:第一客户端,第二客户端,第一服务器,第二服务器和簇管理客户端;
    第一客户端在向第一服务器写入中间文件之前,发送向第一服务器写入中间文件的消息至簇管理客户端;簇管理客户端向第二服务器请求创建所述中间文件的簇信息,在接收到第二服务器返回的簇信息之后,发送所述簇信息至第一客户端和第二客户端,其中,所述簇信息包括簇名称和优先级;
    第一客户端根据所述簇信息向第一服务器请求写入所述中间文件,在接收到第一服务器返回的已成功验证所述簇信息的消息之后,向第一服务器上 传所述中间文件;第一服务器根据本地磁盘负载和所述优先级写入所述中间文件;第二客户端根据所述簇信息从第一服务器读取所述中间文件。
PCT/CN2016/087462 2015-07-08 2016-06-28 中间文件处理方法、客户端、服务器和系统 WO2017005116A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/862,570 US11500812B2 (en) 2015-07-08 2018-01-04 Intermediate file processing method, client, server, and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510398346.6 2015-07-08
CN201510398346.6A CN106339176B (zh) 2015-07-08 2015-07-08 中间文件处理方法、客户端、服务器和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/862,570 Continuation US11500812B2 (en) 2015-07-08 2018-01-04 Intermediate file processing method, client, server, and system

Publications (1)

Publication Number Publication Date
WO2017005116A1 true WO2017005116A1 (zh) 2017-01-12

Family

ID=57684862

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/087462 WO2017005116A1 (zh) 2015-07-08 2016-06-28 中间文件处理方法、客户端、服务器和系统

Country Status (3)

Country Link
US (1) US11500812B2 (zh)
CN (1) CN106339176B (zh)
WO (1) WO2017005116A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11500812B2 (en) 2015-07-08 2022-11-15 Alibaba Group Holding Limited Intermediate file processing method, client, server, and system

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11381446B2 (en) * 2020-11-23 2022-07-05 Zscaler, Inc. Automatic segment naming in microsegmentation
CN108932236B (zh) * 2017-05-22 2021-05-07 北京金山云网络技术有限公司 一种文件管理方法及装置
CN115277707A (zh) * 2022-07-15 2022-11-01 京东科技信息技术有限公司 业务处理方法、装置、电子设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5682507A (en) * 1995-06-07 1997-10-28 Tandem Computers, Incorporated Plurality of servers having identical customer information control procedure functions using temporary storage file of a predetermined server for centrally storing temporary data records
CN102214184A (zh) * 2010-04-07 2011-10-12 腾讯科技(深圳)有限公司 分布式计算系统的中间文件处理装置及方法
CN102541460A (zh) * 2010-12-20 2012-07-04 中国移动通信集团公司 一种多磁盘场景下的磁盘管理方法和设备

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6973657B1 (en) * 2001-01-30 2005-12-06 Sprint Communications Company L.P. Method for middle-tier optimization in CORBA OTS
US6895472B2 (en) * 2002-06-21 2005-05-17 Jp Morgan & Chase System and method for caching results
US8255373B2 (en) * 2008-10-24 2012-08-28 Microsoft Corporation Atomic multiple modification of data in a distributed storage system
CN101493844B (zh) * 2009-03-06 2012-06-06 无锡紫芯集成电路系统有限公司 实现嵌入式存储器多主接口的方法及装置
CN102694860A (zh) * 2012-05-25 2012-09-26 北京邦诺存储科技有限公司 一种云存储的数据处理方法、设备及系统
CN102937964B (zh) * 2012-09-28 2015-02-11 无锡江南计算技术研究所 基于分布式系统的智能数据服务方法
US9565252B2 (en) * 2013-07-31 2017-02-07 International Business Machines Corporation Distributed storage network with replication control and methods for use therewith
CN103401931B (zh) * 2013-08-05 2017-07-25 天闻数媒科技(湖南)有限公司 一种下载文件的方法和系统
US9389994B2 (en) * 2013-11-26 2016-07-12 International Business Machines Corporation Optimization of map-reduce shuffle performance through shuffler I/O pipeline actions and planning
CN106339176B (zh) 2015-07-08 2020-04-10 阿里巴巴集团控股有限公司 中间文件处理方法、客户端、服务器和系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5682507A (en) * 1995-06-07 1997-10-28 Tandem Computers, Incorporated Plurality of servers having identical customer information control procedure functions using temporary storage file of a predetermined server for centrally storing temporary data records
CN102214184A (zh) * 2010-04-07 2011-10-12 腾讯科技(深圳)有限公司 分布式计算系统的中间文件处理装置及方法
CN102541460A (zh) * 2010-12-20 2012-07-04 中国移动通信集团公司 一种多磁盘场景下的磁盘管理方法和设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11500812B2 (en) 2015-07-08 2022-11-15 Alibaba Group Holding Limited Intermediate file processing method, client, server, and system

Also Published As

Publication number Publication date
US11500812B2 (en) 2022-11-15
US20180129668A1 (en) 2018-05-10
CN106339176A (zh) 2017-01-18
CN106339176B (zh) 2020-04-10

Similar Documents

Publication Publication Date Title
US10255343B2 (en) Initialization protocol for a peer-to-peer replication environment
US10831741B2 (en) Log-shipping data replication with early log record fetching
TWI689181B (zh) 資料處理方法和系統
US9659078B2 (en) System and method for supporting failover during synchronization between clusters in a distributed data grid
WO2017005116A1 (zh) 中间文件处理方法、客户端、服务器和系统
US8533525B2 (en) Data management apparatus, monitoring apparatus, replica apparatus, cluster system, control method and computer-readable medium
US11228486B2 (en) Methods for managing storage virtual machine configuration changes in a distributed storage system and devices thereof
WO2020134199A1 (zh) 实现数据一致性的方法和装置、服务器和终端
US11080146B2 (en) System and method for storage unavailability tolerant backup
JP2013171301A (ja) ジョブ継続管理装置、ジョブ継続管理方法、及び、ジョブ継続管理プログラム
US11275601B2 (en) System and method for auto recovery of deleted virtual machines identified through comparison of virtual machine management application snapshots and having corresponding backups at a storage device
US20180075122A1 (en) Method to Federate Data Replication over a Communications Network
JPWO2019087786A1 (ja) 情報分散記憶システム、方法およびプログラム
AU2019244116B2 (en) Techniques for scheduled anti-entropy repair design
US9195500B1 (en) Methods for seamless storage importing and devices thereof
US20130246614A1 (en) Method of updating agent service

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16820767

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16820767

Country of ref document: EP

Kind code of ref document: A1