CN111629028A - Data transmission scheduling system for distributed multi-cloud storage - Google Patents

Data transmission scheduling system for distributed multi-cloud storage Download PDF

Info

Publication number
CN111629028A
CN111629028A CN202010281456.5A CN202010281456A CN111629028A CN 111629028 A CN111629028 A CN 111629028A CN 202010281456 A CN202010281456 A CN 202010281456A CN 111629028 A CN111629028 A CN 111629028A
Authority
CN
China
Prior art keywords
cloud storage
data
transmission
data transmission
bandwidth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010281456.5A
Other languages
Chinese (zh)
Other versions
CN111629028B (en
Inventor
鄂金龙
李振华
刘云浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010281456.5A priority Critical patent/CN111629028B/en
Publication of CN111629028A publication Critical patent/CN111629028A/en
Application granted granted Critical
Publication of CN111629028B publication Critical patent/CN111629028B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/61Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/63Routing a service request depending on the request content or context

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data transmission scheduling system for distributed multi-cloud storage. The system comprises: the proxy server is used as a user uploading and downloading relay to optimize data transmission, feeds back the link bandwidth of the cloud storage node to the scheduling controller, and migrates the data to other regional cloud storage nodes; the scheduling controller processes user operation requests on the basis of online queuing of preset priorities, selects an optimal access performance node set according to the link bandwidth of the cloud storage nodes, and adaptively adjusts the number of node proxy servers; and the local manager is used for sending the user request operation to the scheduling controller and finishing data transmission by interacting with the returned performance optimal node set according to a preset data storage mode. The method can provide general and complete data transmission support for most cloud storage service upper-layer application programs, simultaneously considers optimizing data transmission performance and efficiently utilizing resources, and can effectively balance user experience and resource overhead of application service providers.

Description

Data transmission scheduling system for distributed multi-cloud storage
Technical Field
The invention belongs to the technical field of cloud storage, and particularly relates to a data transmission scheduling system for distributed multi-cloud storage.
Background
Compared with the traditional network storage, the cloud storage service has the advantages of easiness in access, low maintenance cost, high expandability and the like, and is used by more and more enterprise applications and personal users in recent years. At present, popular cloud service businesses such as amazon AWS, microsoft Azure, google cloud platform, aristoloc and the like all deploy global distributed data centers to provide online data storage and content distribution services with quality guarantee for users in different regions. On the basis, the cloud storage service application utilizes distributed nodes of a public cloud platform to construct a Content Delivery Network (CDN) and provides low-delay data access for users. Personal clouds such as google Drive and microsoft OneDrive utilize respective deployed global backbone networks to perform high-bandwidth data transmission among distributed nodes, so that the overall efficiency of file synchronization is improved.
However, in a distributed cloud storage node, generally, there is a case where access delay is highly unstable, and the optimal access performance of the cloud storage service may be obtained from different nodes over time. For a user in a specific region, the cloud storage node with the best access performance may have very poor performance or even be unusable in a certain period of time, and the access node corresponding to the IP address obtained by analyzing the cloud storage service domain name through the DNS during file uploading/downloading is often not the current node with the best performance. For the above problems, measurement experiments show that data access delay can be reduced by integrating distributed nodes of multiple clouds as a storage node deployment system, but how to reasonably determine the data transmission quantity of each node to optimize the overall performance still needs further research. On the other hand, the cloud storage service needs to perform file uploading/downloading requested by a user and various directory operations within a certain time limit to guarantee user experience. Some statistics show that the user requests of the cloud storage service are in periodic bursts, and huge expenses are brought to the cloud server by intensively processing a large number of requests arriving in bursts and excessively reserving resources. Therefore, how to effectively adjust the cloud server and the network bandwidth resource allocation to balance the user experience and the resource overhead needs to be solved urgently.
Disclosure of Invention
The invention aims to provide a data transmission scheduling system for distributed multi-cloud storage, which optimizes data transmission performance to meet the real-time requirement of a user, and simultaneously reduces the cost of computing, storage and network bandwidth resources of an application service provider as much as possible so as to solve the practical problems in the prior art.
In order to achieve the purpose, the invention adopts the technical scheme that:
distributed multi-cloud storage-oriented data transmission scheduling system comprises: the system comprises a proxy server, a scheduling controller and a local manager; wherein:
the proxy server is deployed adjacent to the cloud storage node in the local area, is used as a user to upload and download relay optimization data for transmission, estimates the link bandwidth of the cloud storage node on line, feeds the link bandwidth back to the scheduling controller, and migrates the data to the cloud storage nodes in other areas when the bandwidth is idle;
the scheduling controller processes user operation requests on line based on preset priority, periodically collects link bandwidths of cloud storage nodes, selects an optimal access performance node set to provide to a current client requesting operation, and adaptively adjusts the number of proxy servers of a plurality of nodes;
and the local manager monitors user request operation at a client, sends the user request operation to the scheduling controller, and completes data transmission by interacting with the returned performance-optimized node set according to a preset data storage mode.
The proxy server comprises a bandwidth measuring module, a transmission optimizing module and a data migration module; wherein:
the bandwidth measurement module estimates the link bandwidth between the user and the cloud storage node according to the periodical transmission measurement preset file and the time delay record of data uploading and downloading in the preset time period. Specifically, a test file with a preset size is transmitted between idle time and a plurality of clients to obtain link measurement transmission time, and an initial measurement bandwidth is calculated according to the link measurement transmission time; time segmentation is carried out according to a state feedback cycle, and the time for all client historical uploading and downloading data is recorded in each time period; when each transmission request arrives, calculating a harmonic average value of link bandwidths of the latest preset number of files in the current time period as a current available bandwidth; if the actual number of the transmitted files is smaller than the preset number, the average bandwidth is obtained according to the actual number of the files and is used as the current available bandwidth; and if no file is transmitted, acquiring the initial measurement bandwidth as the current available bandwidth.
The transmission optimization module accesses data of the adjacent storage nodes with low time delay and performs optimization processing in the data transmission process with the client. Specifically, when a client sends an upload or download request, a checksum list of a request file generated by the client is received; searching similar files stored in the nodes in the maintained metadata list; if the similar files exist, for file uploading, returning a check sum list of the files to the client to enable the client to generate an incremental file for uploading, receiving the incremental file to generate a new edition file, and calling a cloud API (application programming interface) to store the new edition file in a node; for file downloading, calling a cloud API to acquire a file version in a node to generate an incremental file, and returning the incremental file to a client to generate a new version file; if the similar files do not exist, the files exceeding the preset capacity are respectively compressed by adopting conventional uploading or downloading processing.
The data migration module builds metadata consistency among hash tree comparison cloud storage nodes for users across geographic regions when no data is transmitted, and migrates files to be migrated to other region cloud storage nodes through optimized transmission. Specifically, calculating hash values of all files of a user, and clustering according to local storage paths of the files to generate a hash tree; starting data migration when no data is transmitted between the two cloud storage nodes, exchanging hash values of all layers of a hash tree by the two proxy servers from top to bottom, and determining a file to be migrated; the source end agent calls a cloud API to acquire a file to be migrated from a source node, and packaging or compressing the file according to the size of the file; establishing network long connection transmission files between agents; the destination agent decompresses the file and calls a cloud API to store the file to a destination node; and deleting the files and the corresponding metadata in the source node after all the files are migrated.
The scheduling controller comprises a request queuing module and an agent adapting module; wherein:
the request queuing module defines a preset priority by the sum of the request arrival time and the expected completion time, and queues and processes the user file operation request according to the preset priority. Specifically, when a user request arrives, the arrival time is recorded and the request is added into a processing queue; calculating the predicted completion time of the file to be transmitted; setting the completion time corresponding to the file operation which does not relate to transmission as a fixed value; sorting the processing queues according to a preset priority defined by the sum of the request arrival time and the predicted completion time; and adding the sorted operation classes to a plurality of message queues, informing the client to execute the operation by a processing thread of each message queue, and providing a current performance optimal node set.
And the agent adaptation module respectively compares the actual bandwidth and the maximum bandwidth fed back by the plurality of agent servers of the cloud storage node with a preset congestion threshold and a preset idle threshold, dynamically adjusts the number of the agent servers, and determines the relay agent of single data transmission. Specifically, when the ratio of the actual bandwidth to the maximum bandwidth is smaller than a preset congestion threshold, a proxy server is additionally deployed at a node; when the ratio of the actual bandwidth to the maximum bandwidth is larger than a preset idle threshold value, the proxy server with the lowest bandwidth is recycled at the node; when the client requests data transmission, the proxy server with the highest bandwidth is selected as a relay agent of the data transmission according to the current actual bandwidth fed back by all the proxy servers.
The local manager includes a transmission management module. And the transmission management module respectively selects a node set to upload and download files according to the performance optimal node set returned by the scheduling controller and a multi-copy storage mode and a redundancy code storage mode, and processes file deletion operation by inquiring metadata. Specifically, for a multi-copy storage mode, data are uploaded to a current performance optimal node, and a preset number of nodes are randomly selected to upload copies when idle; during downloading, downloading amount distribution is carried out according to the link bandwidth of the data node; for a redundant coding storage mode, data are partitioned and erasure code redundant coding is used, and a preset number of nodes are randomly selected for uploading; during downloading, selecting a downloading node of the redundant block to be an integer programming problem of 0-1, and solving an approximate optimal solution by adopting a branch limit method; for file delete operations that do not involve data transfer, all node stored data and blocks are deleted.
The data transmission scheduling system for distributed multi-cloud storage has the beneficial effect that universal and complete data transmission support can be provided for most of cloud storage service upper-layer application programs. Different from the existing cloud storage data transmission optimization and task scheduling technology which only considers the optimization problem under a specific scene, the optimization data transmission performance and the efficient resource utilization are simultaneously considered, and the user experience and the resource overhead of an application service provider can be effectively balanced by adopting methods including transmission optimization aiming at two data storage modes, request queuing based on priority, agent quantity self-adaption and the like.
Drawings
FIG. 1 is an overall block diagram of the system of the present invention;
FIG. 2 is a flow chart of the bandwidth measurement module operation of the present invention;
FIG. 3 is a flow chart of the operation of the transmission optimization module of the present invention;
FIG. 4 is a data migration module workflow diagram of the present invention;
FIG. 5 is a flow diagram of the operation of the request queuing module of the present invention;
FIG. 6 is a flowchart of the agent adaptation module operation of the present invention;
fig. 7 is a flow chart of the operation of the transmission management module of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below with reference to the accompanying drawings:
the invention provides a data transmission scheduling system for distributed multi-cloud storage, the overall structure diagram of which is shown in figure 1, and the data transmission scheduling system comprises: the system comprises a proxy server, a scheduling controller and a local manager; wherein:
the proxy server is deployed adjacent to the cloud storage node in the local area, is used as a user to upload and download relay optimization data for transmission, estimates the link bandwidth of the cloud storage node on line, feeds the link bandwidth back to the scheduling controller, and migrates the data to the cloud storage nodes in other areas when the bandwidth is idle;
the scheduling controller processes user operation requests on line based on preset priority, periodically collects link bandwidths of cloud storage nodes, selects an optimal access performance node set to provide to a current client requesting operation, and adaptively adjusts the number of proxy servers of a plurality of nodes;
and the local manager monitors user request operation at a client, sends the user request operation to the scheduling controller, and completes data transmission by interacting with the returned performance-optimized node set according to a preset data storage mode.
The proxy server comprises a bandwidth measuring module, a transmission optimizing module and a data migration module; wherein:
the bandwidth measurement module estimates the available bandwidth of the current cloud storage node according to the real-time measured user upload/download data delay, as shown in fig. 2, the work flow is as follows:
1) periodically transmitting a specific size (set to S) between idle and clientsF) Using BW when test files measure link transmissionl=SF/TlEstimating the uplink and downlink bandwidth of each link l and recording the uplink and downlink bandwidth as initial measurement bandwidth;
2) segmenting the time according to a state cycle fed back to a scheduling controller, and recording the historical data uploading/downloading time of each client in each time period;
3) at each arrival of a transmission request, the link bandwidth of the latest k files (k is usually 5) in the current time period is calculated
Figure BDA0002446734250000051
Calculating the harmonic mean value as the current available bandwidth
Figure BDA0002446734250000052
4) If the number of the files is less than k in the current time period, averaging the files according to the actual number to serve as the current available bandwidth;
5) and if no file is transmitted in the current time period, taking the initial measurement bandwidth as the current available bandwidth value.
It should be noted here that since the transmission link bandwidth may change frequently with time, only using the bandwidth estimation with the periodic coarse granularity may have a great influence on the optimal allocation of the transmission workload; the network conditions are observed to be stable on a short time scale and can be predicted relatively accurately, so that the workflow can estimate the link bandwidth change situation in a fine-grained manner.
The transmission optimization module performs optimization processing such as incremental coding and compression on data transmission in the relay transmission process, as shown in fig. 3, and the work flow is as follows:
1) when a client sends an uploading/downloading request, receiving a checksum list of a request file f generated by the client, wherein the checksum list is used for all uploading and downloading conditions with local versions;
2) searching similar files f' stored in the nodes in the maintained metadata according to the checksum list, if the similar files exist, respectively executing the step 3) and the step 4) for uploading and downloading, and if the files are not matched, jumping to the step 5);
3) for file uploading, returning a checksum list of f' to the client for generating an incremental file, then receiving the incremental file uploading and finally generating a new version file f, and calling a cloud API (application programming interface) to store the file in a node;
4) for file downloading with f, calling the cloud API to acquire the file version f 'in the node, generating an incremental file, and returning the incremental file to the client for finally generating a new version file f';
5) for other cases, a general upload/download is used, where a large file (e.g., greater than 10MB in size) is compressed prior to upload and download, e.g., using bzip2, and then decompressed by another party.
It should be noted here that, since the proxy server can access data from the cloud storage node with low latency (for example, data transmission rate between EC2 and S3 of amazon same region is hundreds of Mbps); meanwhile, the cloud storage data set indicates that most files are modified by users, and the versions stored in the cloud and the client have obvious similarity, so that the whole transmission delay and the network flow can be effectively reduced by the workflow.
The data migration module migrates data of users across the geographic area to other area cloud storage nodes when there is no idle data transmission, as shown in fig. 4, the work flow is as follows:
1) calculating hash values of all files of a user, clustering the files according to a storage path of a client, and connecting the files through character strings "The calculation contains n files fiHash value Hash (d0) of directory d of (i ═ 1, 2, …, n) ═ Hash (d0) + ∑ Hash (f)i) Wherein Hash (d)0) Iteratively calculating the hash value calculated for the storage path layer by layer according to the storage path, and finally generating a hash tree with the height of h;
2) when no data is transmitted between the two cloud storage nodes, data migration is started, directory hash values are exchanged and compared layer by layer between the two proxy servers from top to bottom according to a hash tree, the hash values are matched to represent that all files under the directories are consistent, the lower-layer directories or files are compared to the lowest-layer files for the unmatched directories, all files needing to be migrated can be determined within the maximum h interactions, and the following steps 3) to 5) are executed for each file;
3) a source end proxy server calls a cloud API to acquire files from a source cloud storage node, judges whether the size of the files exceeds a threshold value (10 MB is also taken here), compresses large files by bzip2, and waits for small files to be packaged into transmission blocks with the size of the threshold value;
4) establishing a network long connection between the proxy servers, transmitting blocks or large files packed by small files, and asynchronously confirming the files or packed blocks successfully transmitted by the destination proxy server;
5) the destination proxy server decompresses the file or unpacks the packed transmission block, and calls a cloud API to store the file to a destination cloud storage node;
6) and when all the file migration is completed, deleting the files and the metadata thereof at the source cloud storage node.
It should be noted here that, in order not to affect the user data transmission performance, data migration between cloud storage nodes is performed only when there is no idle time for user data transmission, and the migration may be suspended. Therefore, when migration is started each time, the consistency of user data between the source end node and the destination end node needs to be compared, and files needing to be migrated are determined actually. By adopting transmission optimization technologies such as compression, packaging and the like among the proxy servers, the transmission efficiency of data migration can be improved, and the occupied time of the data migration is reduced.
The scheduling controller comprises a request queuing module and an agent adapting module; wherein:
the request queuing module queues operation requests of file uploading/downloading and the like of a user by adopting a processing queue based on priority, as shown in fig. 5, the work flow is as follows:
1) when a user request arrives at the scheduling controller, the arrival time t of the user request is recordedAAdding the request to a processing queue;
2) for size SfCalculation of predicted completion time t for file transfer operation (upload/download)OT=Sf*(1-β)/BWaWherein BWaRepresenting the maximum estimated link available bandwidth between the current proxy servers and the client, the redundancy rate β represents the reduced transmission volume of the file subjected to the transmission optimization process, and is predicted according to the file history (the file β not subjected to transmission optimization is 0);
3) for file deletion and other operations not involving data transmission, t isOTSetting a small fixed value (for example, the processing capacity according to the current cloud storage service may be set to 0.5 s);
4) defining each request priority RP as a request arrival time tAAnd predicted completion time tOTThe adjustment queue sorts the requests according to the RP value, and the requests with small RP values have priority;
5) and the sorted operation classes are added to a plurality of message queues, and the respective processing threads inform the client to execute corresponding operations and provide the cloud storage node with the optimal current performance.
It should be noted here that, in a cloud storage scenario, the requirement of a user on timeliness of a small task (small file operation) is often more sensitive than that of a large task (large file operation), whereas a traditional first-come first-serve method causes a small task arriving later than a large task to wait for a long time, and a shortest task priority method causes the large task to starve under the condition that the small task continuously arrives, so that the workflow considers both the request arrival time and the task size.
The agent adaptation module dynamically adjusts the number of the agent servers of each cloud storage node according to the available bandwidth of all links between the current link and the client, which is fed back by each agent server, as shown in fig. 6, and the work flow is as follows:
1) when the ratio of the actual bandwidth (the maximum value of the available bandwidth of all current links) of all the proxy servers in use of a certain cloud storage node to the rated maximum bandwidth is less than a congestion threshold thetacWhen the node is used, a proxy server is additionally deployed at the node to balance the load of the existing proxy server;
2) when the ratio of the actual bandwidth to the maximum bandwidth of all the proxy servers in use of a certain cloud storage node is larger than an idle threshold thetalWhen the node is used, the proxy server with the lowest bandwidth is recycled after the current data transmission is finished at the node so as to save resources;
3) when a client requests data transmission, according to the current actual bandwidth fed back by all proxy servers of the relevant cloud storage nodes, the proxy server with the highest bandwidth is selected as a relay agent for transmission.
It should be noted here that in order to reduce the frequent fluctuation of the number of proxy servers, the above workflow employs a lazy mechanism, which increases or decreases the number of proxy servers when all the proxies meet below or exceed a threshold. According to the measurement of the change situation of the transmission bandwidth of the mainstream cloud storage service and the analysis of a typical file transmission data set, the congestion threshold value and the idle threshold value can be set to be 20% and 60% respectively.
The local manager comprises a transmission management module besides the function of the interactive optimization transmission with the proxy server transmission optimization module.
The transmission management module sends a user operation request to the scheduling controller, and uploads/downloads data to/from a node in the cloud storage node set with the optimal performance according to feedback, as shown in fig. 7, the work flow is as follows:
1) for multi-copy storage mode uploading operation, uploading data to a cloud storage node with the largest current link bandwidth (namely, the best performance), randomly selecting m nodes from a node set with the best performance by using a consistent hashing method, and uploading data copies for backup in idle time (the total copy number n is m + 1);
2) for the uploading operation of a redundancy coding storage mode, the data are generally partitioned according to a fixed size and subjected to redundancy coding by using an R-S erasure code before being uploaded, and for each redundancy block, n nodes are randomly selected from a node set with optimal performance by adopting a consistent hashing method and are asynchronously uploaded once;
3) for the multi-copy storage mode downloading operation, the total downloading workload w is ∑ w for all the n nodes storing data according to wrR is 1, 2, …, n is distributed, and is used for downloading
Figure BDA0002446734250000091
Equality to overall minimum: (
Figure BDA0002446734250000092
Bandwidth available for each download link);
4) for the downloading operation of the redundancy coding mode, each block needs to download p redundancy blocks to recover data, and an indicating variable s is stored between the node r and the block iriAnd a download indicator variable driSatisfy the relationship
Figure BDA0002446734250000095
The whole file is downloaded from each service node to the maximum
Figure BDA0002446734250000093
(biIn order to be a redundant block size,
Figure BDA0002446734250000094
bandwidth available for each download link), converting the 0-1 integer programming problem into an integer linear programming problem, and solving { d ] by using a branch and bound methodri-approximating an optimal solution;
5) for file deletion operation which does not involve data transmission, a proxy server which is responsible for maintaining metadata of all nodes is inquired whether relevant data or blocks exist, and if yes, an API is called to delete the data.
It should be noted here that, in order to ensure data recoverability and reduce transmission overhead, at least one copy of each file or redundant block may be stored in each of 3 nodes (i.e., n is 3 above), and accordingly, p is 2 above for redundancy coding using R-S erasure codes. The conclusion of 'when the downloading time from each node is equal and the whole time consumption is minimum' in the step 3) can be simply proved by a back-off method.
From the above description, those skilled in the art can clearly understand that the embodiments can be implemented by means of software plus a necessary hardware platform. Based on such understanding, the above technical solutions essentially or contributing to the prior art may be embodied in the form of software products, and are deployed in devices such as cloud servers and personal computers to execute the methods described in the whole system or some parts thereof.
The above embodiments are only for illustrating the technical solutions of the present invention, and not for limiting the same; those of ordinary skill in the art will understand that: the technical solutions described in the embodiments may be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (4)

1. Distributed multi-cloud storage-oriented data transmission scheduling system is characterized by comprising: the system comprises a proxy server, a scheduling controller and a local manager; wherein:
the proxy server is deployed adjacent to the cloud storage node in the local area, is used as a user to upload and download relay optimization data for transmission, estimates the link bandwidth of the cloud storage node on line, feeds the link bandwidth back to the scheduling controller, and migrates the data to the cloud storage nodes in other areas when the bandwidth is idle;
the scheduling controller processes user operation requests on line based on preset priority, periodically collects link bandwidths of cloud storage nodes, selects an optimal access performance node set to provide to a current client requesting operation, and adaptively adjusts the number of proxy servers of a plurality of nodes;
and the local manager monitors user request operation at a client, sends the user request operation to the scheduling controller, and completes data transmission by interacting with the returned performance-optimized node set according to a preset data storage mode.
2. The distributed multi-cloud storage oriented data transmission scheduling system of claim 1, wherein the proxy server comprises a bandwidth measurement module, a transmission optimization module and a data migration module; wherein:
the bandwidth measurement module estimates the link bandwidth between a user and the cloud storage node according to a preset file and a time delay record of data uploading and downloading in a preset time period;
the transmission optimization module accesses data of adjacent storage nodes with low time delay and performs optimization processing in the transmission process of the data and client data;
the data migration module builds metadata consistency among hash tree comparison cloud storage nodes for users across geographic regions when no data is transmitted, and migrates files to be migrated to other region cloud storage nodes through optimized transmission.
3. The distributed multi-cloud storage oriented data transmission scheduling system of claim 1, wherein the scheduling controller comprises a request queuing module and an agent adaptation module; wherein:
the request queuing module defines a preset priority by the sum of the request arrival time and the expected completion time, and queues and processes the user file operation request according to the preset priority;
and the agent adaptation module respectively compares the actual bandwidth and the maximum bandwidth fed back by the plurality of agent servers of the cloud storage node with a preset congestion threshold and a preset idle threshold, dynamically adjusts the number of the agent servers, and determines the relay agent of single data transmission.
4. The distributed multi-cloud storage oriented data transmission scheduling system of claim 1, wherein the local manager comprises a transmission management module, and the transmission management module selects a node set to upload and download files according to a performance-optimized node set returned by the scheduling controller and according to a multi-copy storage mode and a redundant coding storage mode, and processes file deletion operation by querying metadata.
CN202010281456.5A 2020-04-10 2020-04-10 Data transmission scheduling system for distributed multi-cloud storage Active CN111629028B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010281456.5A CN111629028B (en) 2020-04-10 2020-04-10 Data transmission scheduling system for distributed multi-cloud storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010281456.5A CN111629028B (en) 2020-04-10 2020-04-10 Data transmission scheduling system for distributed multi-cloud storage

Publications (2)

Publication Number Publication Date
CN111629028A true CN111629028A (en) 2020-09-04
CN111629028B CN111629028B (en) 2022-02-25

Family

ID=72259631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010281456.5A Active CN111629028B (en) 2020-04-10 2020-04-10 Data transmission scheduling system for distributed multi-cloud storage

Country Status (1)

Country Link
CN (1) CN111629028B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112290676A (en) * 2020-10-20 2021-01-29 中腾微网(北京)科技有限公司 Photovoltaic power station control system combining field data and cloud storage system
CN112637354A (en) * 2020-12-28 2021-04-09 同方威视科技江苏有限公司 Data transmission management method, system and equipment based on cloud storage
CN113194330A (en) * 2021-03-25 2021-07-30 电子科技大学 Fragmented multi-cloud video resource management method and system
CN113242278A (en) * 2021-04-19 2021-08-10 中国电影科学技术研究所 Communication method and device for manufacturing heterogeneous network based on movie cloud and electronic equipment
CN113472849A (en) * 2021-05-31 2021-10-01 济南浪潮数据技术有限公司 Node management method, system, device and medium
CN113572813A (en) * 2021-06-22 2021-10-29 复旦大学 Data backup method based on network coding
CN114500514A (en) * 2022-02-14 2022-05-13 京东科技信息技术有限公司 File transmission method and device, electronic equipment and computer readable storage medium
CN114546980A (en) * 2022-04-25 2022-05-27 成都云祺科技有限公司 Backup method, system and storage medium of NAS file system
CN114584552A (en) * 2022-02-28 2022-06-03 西安交通大学 Scheduling method, system, equipment and medium for distributed CT file transmission
CN114666284A (en) * 2022-05-23 2022-06-24 阿里巴巴(中国)有限公司 Flow control method and device, electronic equipment and readable storage medium
CN116614379A (en) * 2023-07-18 2023-08-18 中移(苏州)软件技术有限公司 Bandwidth adjustment method and device for migration service and related equipment
CN116980641A (en) * 2023-09-22 2023-10-31 江西云眼视界科技股份有限公司 Asynchronous processing method, system, computer and storage medium for video migration
CN117459901A (en) * 2023-12-26 2024-01-26 深圳市彩生活网络服务有限公司 Cloud platform data intelligent management system and method based on positioning technology

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130091188A1 (en) * 2011-09-26 2013-04-11 Gladinet, Inc. System and method for providing access to a file in real time during a cloud storage upload process
US20130110778A1 (en) * 2010-05-03 2013-05-02 Panzura, Inc. Distributing data for a distributed filesystem across multiple cloud storage systems
CN105491145A (en) * 2015-12-21 2016-04-13 清华大学 Agglomeration system of multi-manufacturer cloud storage service, and method
US20170075907A1 (en) * 2015-09-14 2017-03-16 Komprise, Inc. Electronic file migration system and various methods of transparent data migration management
CN107018185A (en) * 2017-03-28 2017-08-04 清华大学 The synchronous method and device of cloud storage system
CN108989384A (en) * 2018-05-31 2018-12-11 华为技术有限公司 A kind of method of data processing, cloudy management system and relevant device
US10547679B1 (en) * 2018-01-02 2020-01-28 Architecture Technology Corporation Cloud data synchronization based upon network sensing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130110778A1 (en) * 2010-05-03 2013-05-02 Panzura, Inc. Distributing data for a distributed filesystem across multiple cloud storage systems
US20130091188A1 (en) * 2011-09-26 2013-04-11 Gladinet, Inc. System and method for providing access to a file in real time during a cloud storage upload process
US20170075907A1 (en) * 2015-09-14 2017-03-16 Komprise, Inc. Electronic file migration system and various methods of transparent data migration management
CN105491145A (en) * 2015-12-21 2016-04-13 清华大学 Agglomeration system of multi-manufacturer cloud storage service, and method
CN107018185A (en) * 2017-03-28 2017-08-04 清华大学 The synchronous method and device of cloud storage system
US10547679B1 (en) * 2018-01-02 2020-01-28 Architecture Technology Corporation Cloud data synchronization based upon network sensing
CN108989384A (en) * 2018-05-31 2018-12-11 华为技术有限公司 A kind of method of data processing, cloudy management system and relevant device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JINLONG E.等: "CoCloud: Enabling efficient cross-cloud file collaboration based on inefficient web APIs", 《 IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS》 *
JINLONG E.等: "CoCloud: Enabling Efficient Cross-Cloud File Collaboration Based on Inefficient Web APIs", 《IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS》 *
JINLONG E等: "HyCloud: Tweaking Hybrid Cloud Storage Services for Cost-Efficient Filesystem Hosting", 《 IEEE INFOCOM 2019 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112290676B (en) * 2020-10-20 2022-06-24 中腾微网(北京)科技有限公司 Photovoltaic power station control system combining field data and cloud storage system
CN112290676A (en) * 2020-10-20 2021-01-29 中腾微网(北京)科技有限公司 Photovoltaic power station control system combining field data and cloud storage system
CN112637354A (en) * 2020-12-28 2021-04-09 同方威视科技江苏有限公司 Data transmission management method, system and equipment based on cloud storage
CN113194330A (en) * 2021-03-25 2021-07-30 电子科技大学 Fragmented multi-cloud video resource management method and system
CN113242278A (en) * 2021-04-19 2021-08-10 中国电影科学技术研究所 Communication method and device for manufacturing heterogeneous network based on movie cloud and electronic equipment
CN113472849A (en) * 2021-05-31 2021-10-01 济南浪潮数据技术有限公司 Node management method, system, device and medium
CN113472849B (en) * 2021-05-31 2022-10-28 济南浪潮数据技术有限公司 Node management method, system, device and medium
CN113572813A (en) * 2021-06-22 2021-10-29 复旦大学 Data backup method based on network coding
CN114500514A (en) * 2022-02-14 2022-05-13 京东科技信息技术有限公司 File transmission method and device, electronic equipment and computer readable storage medium
CN114500514B (en) * 2022-02-14 2023-12-12 京东科技信息技术有限公司 File transmission method and device for cloud storage, electronic equipment and storage medium
CN114584552A (en) * 2022-02-28 2022-06-03 西安交通大学 Scheduling method, system, equipment and medium for distributed CT file transmission
CN114546980A (en) * 2022-04-25 2022-05-27 成都云祺科技有限公司 Backup method, system and storage medium of NAS file system
CN114666284A (en) * 2022-05-23 2022-06-24 阿里巴巴(中国)有限公司 Flow control method and device, electronic equipment and readable storage medium
CN116614379A (en) * 2023-07-18 2023-08-18 中移(苏州)软件技术有限公司 Bandwidth adjustment method and device for migration service and related equipment
CN116614379B (en) * 2023-07-18 2023-10-10 中移(苏州)软件技术有限公司 Bandwidth adjustment method and device for migration service and related equipment
CN116980641A (en) * 2023-09-22 2023-10-31 江西云眼视界科技股份有限公司 Asynchronous processing method, system, computer and storage medium for video migration
CN116980641B (en) * 2023-09-22 2023-12-15 江西云眼视界科技股份有限公司 Asynchronous processing method, system, computer and storage medium for video migration
CN117459901A (en) * 2023-12-26 2024-01-26 深圳市彩生活网络服务有限公司 Cloud platform data intelligent management system and method based on positioning technology
CN117459901B (en) * 2023-12-26 2024-03-26 深圳市彩生活网络服务有限公司 Cloud platform data intelligent management system and method based on positioning technology

Also Published As

Publication number Publication date
CN111629028B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN111629028B (en) Data transmission scheduling system for distributed multi-cloud storage
TWI528191B (en) File Handling Method Based on Cloud Storage, System and Server Cluster System
US20210144423A1 (en) Dynamic binding for use in content distribution
US11010188B1 (en) Simulated data object storage using on-demand computation of data objects
US7773522B2 (en) Methods, apparatus and computer programs for managing performance and resource utilization within cluster-based systems
US8275787B2 (en) System for managing data collection processes
US20110213879A1 (en) Multi-level Decision Support in a Content Delivery Network
EP3873066A1 (en) Method for managing resource state information, and resource downloading system
WO2022100318A1 (en) Fog node scheduling method and apparatus, and computer device and storage medium
CN112162865A (en) Server scheduling method and device and server
US20150309874A1 (en) A method and apparatus for code length adaptation for access to key-value based cloud storage systems
US11102289B2 (en) Method for managing resource state information and system for downloading resource
US20100161828A1 (en) Methods and systems for transferring data over electronic networks
US11842215B2 (en) Autoscaling and throttling in an elastic cloud service
CN101741884A (en) Distributed storage method and device
US20050058138A1 (en) Communications management system
CN101938524A (en) Method and system for handling P2P (Peer-to-Peer) services
US8583819B2 (en) System and method for controlling server usage in peer-to-peer (P2P) based streaming service
CN108667920B (en) Service flow acceleration system and method for fog computing environment
JP3672483B2 (en) Content distribution apparatus, content distribution method, and recording medium recording content distribution program
US11861176B2 (en) Processing of input/ouput operations by a distributed storage system based on latencies assigned thereto at the time of receipt
Tlili et al. Daresch: deadline-aware request scheduling for cloud storage services
Almhanna et al. Dynamic Weight Assignment with Least Connection Approach for Enhanced Load Balancing in Distributed Systems
JP4483633B2 (en) Information processing apparatus, method, and program for managing the status of hardware resources
CN118233468A (en) Resource downloading management method and system based on cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant