WO2016146023A1 - 分布式计算系统和方法 - Google Patents

分布式计算系统和方法 Download PDF

Info

Publication number
WO2016146023A1
WO2016146023A1 PCT/CN2016/076123 CN2016076123W WO2016146023A1 WO 2016146023 A1 WO2016146023 A1 WO 2016146023A1 CN 2016076123 W CN2016076123 W CN 2016076123W WO 2016146023 A1 WO2016146023 A1 WO 2016146023A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
data
target data
copy
server
Prior art date
Application number
PCT/CN2016/076123
Other languages
English (en)
French (fr)
Inventor
徐凯
尹小明
何乐
罗李
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016146023A1 publication Critical patent/WO2016146023A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to a distributed computing system and method.
  • the multi-cluster data access method when a computing task needs to access data across a cluster, the multi-cluster data access method generally adopts the following method: directly reading data across the cluster through a network directly connected between the clusters.
  • the existing method of directly accessing data location by metadata information has problems in the following three aspects: (1) when a hotspot data is accessed by a large number of computing tasks at the same time across clusters and regions, the network bandwidth will be Becoming a bottleneck, causing problems such as access delays and communication quality degradation, extreme conditions can lead to avalanche of network conditions; (2) restrictions on network topology of multiple clusters, such as requiring any cluster to be connected, and cross-regional Difficult to do; (3) high requirements for network stability, such as the network must be available at any time, and long-distance links are difficult to do.
  • the purpose of the present application is to solve at least one of the above technical problems to some extent.
  • the first object of the present application is to propose a distributed computing system.
  • the system can ensure that the computing task obtains service data within a limited waiting time range when the service data copy is dynamically distributed in a multi-cluster environment with limited network conditions, thereby improving the efficiency of accessing data under the limited bandwidth between the clusters. .
  • a second object of the present application is to propose a distributed computing method.
  • the distributed computing system of the first aspect of the present application includes: multiple clusters, Used for distributed computing; a global metadata server for saving and managing a multi-cluster data copy distribution view and a multi-cluster topology view; a cross-cluster data replication server for copying specified data to the copy according to a copy instruction a cluster specified by the instruction; and a data access distribution server, configured to allocate location information of the target data according to the data usage request of the cluster, the multi-cluster data replica distribution view, and the multi-cluster topology view, and according to the target data The location information generates the copy instruction to cause the cross-cluster data replication server to copy a target data copy corresponding to the target data to a cluster specified by the data use request.
  • the distributed computing system in the embodiment of the present application can ensure that the business task obtains the service data within a limited waiting time range and introduces the globally when the service data copy is dynamically distributed in a multi-cluster environment with limited network conditions.
  • a data access distribution server that changes cross-cluster data access requests from unordered to controlled, with cross-cluster data replication servers, from globally balanced resource usage across clusters, and by introducing progress information across clustered data replication servers.
  • it provides waiting options on the computing task, avoids the short-term data access flood peak, and distributes the single-point data access request to the global multi-point through limited replication waiting time, which improves the access data under the limited bandwidth between the clusters. effectiveness.
  • the distributed computing method of the second aspect of the present application includes: multiple clusters for distributed computing; the global metadata server saves and manages the multi-cluster data replica distribution view and the multi-cluster topology view;
  • the data replication server copies the specified data to the cluster specified by the copy instruction according to the copy instruction;
  • the data access distribution server allocates the target data according to the data usage request of the cluster, the multi-cluster data copy distribution view, and the multi-cluster topology view Position information; and the data access distribution server generates the copy instruction according to the location information of the target data, so that the cross-cluster data replication server copies the target data copy corresponding to the target data to the data usage Request the specified cluster.
  • the distributed computing method in the embodiment of the present application can ensure that the business task obtains the service data within a limited waiting time range and introduces the global data in a multi-cluster environment with limited network conditions.
  • a data access distribution server that changes cross-cluster data access requests from unordered to controlled, with cross-cluster data replication servers, from globally balanced resource usage across clusters, and by introducing progress information across clustered data replication servers.
  • it provides waiting options on the computing task, avoids the short-term data access flood peak, and distributes the single-point data access request to the global multi-point through limited replication waiting time, which improves the access data under the limited bandwidth between the clusters. effectiveness.
  • FIG. 1 is a schematic structural diagram of a distributed computing system according to an embodiment of the present application.
  • FIG. 2 is a flow chart of a distributed computing method in accordance with one embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a distributed computing system according to an embodiment of the present application.
  • the distributed computing system can include a plurality of clusters 10, a global metadata server 20, a cross-cluster data replication server 30, and a data access distribution server 40.
  • multiple clusters 10 can be used to perform distributed computing.
  • the cluster 10 can be understood as a parallel or distributed system composed of a plurality of computers connected to each other. From the outside, the cluster 10 is only a system and provides a unified service to the outside.
  • the global metadata server 20 can be used to save and manage multi-cluster data copy distribution views and multi-cluster topology views. More specifically, the global metadata server 20 can save the globally visible multi-cluster data copy distribution view, and provide interfaces for adding, deleting, modifying, and querying for external module calls, and can save the globally visible multi-cluster topology view and provide a query. , administrator modification and other interfaces.
  • the global metadata server 20 operates in a request/response mode.
  • the cross-cluster data replication server 30 can be used to copy the specified data to the cluster 10 specified by the copy instruction in accordance with the copy instruction. More specifically, the cross-cluster data replication server 30 can copy the specified data to the specified cluster 10 according to the copy instruction, and the copy process is transparent to the outside, and can provide an interface for adding, deleting, modifying, copying, and querying the interface for external use. Management call. That is, the cross-cluster data replication server 30 has two workflows: a background data replication process and a request data replication process.
  • the data access distribution server 40 is configured to allocate location information of the target data according to the data usage request of the cluster 10, the multi-cluster data replica distribution view, and the multi-cluster topology view, and generate a copy instruction according to the location information of the target data, so that the cross-cluster data is generated.
  • the copy server 30 copies the target data copy corresponding to the target data to the data so that The cluster 10 specified by the request.
  • the data access distribution server 40 can receive the data usage request of the cluster 10 and access the global metadata server 20 according to the data usage request to obtain the global distribution information of the request data list, the multi-cluster topology view, and the corresponding bandwidth description information.
  • the cross-cluster data replication server 30 is accessed to obtain the status and progress of the data copy generation, and the location information of the target data is allocated for each request by comprehensively considering the above information.
  • the data access distribution server 40 may determine whether the target data exists according to the data usage request, if the target If the data does not exist, the target data is not fed back to the cluster 10. Specifically, after receiving the data usage request, the data access distribution server 40 may access the global metadata server 20 according to the data usage request to acquire data distribution, information, and determine whether the target data exists according to the data usage request, and if not, terminate Running and feeding back to the cluster 10 the target data does not exist.
  • the data access distribution server 40 may obtain a copy of the available target data corresponding to the target data, and according to the cluster 10 and the cluster 10 where the available target data copy is located. The distance and bandwidth determine the final target data copy and send to the cluster 10; if the target data exists and the cluster 10 can wait, the data access distribution server 40 can further determine the cluster 10 that the cluster 10 corresponds to the computing task.
  • the final target data copy is determined according to the distance and bandwidth between the cluster 10 and the cluster 10 where the available target data copy is located, and sent to the cluster 10; if so, the slave task A copy of the target data is obtained from the cluster 10 that is directly reachable and fed back to the cluster 10.
  • the computing task can be understood as a computer program for processing data, and the data is processed according to different computing models (such as Map/Reduce), and the result is written into the storage medium.
  • a distance and bandwidth can be selected from the currently available data copies.
  • the copy is returned to the cluster 10, and the data copy distribution metadata of the data use request and the network structure between the clusters 10 can be analyzed, and a data copy with a large network bandwidth margin and a physical distance is selected as the final target data copy. Feedback to cluster 10.
  • the cluster 10 can wait (that is, the computing task can wait)
  • it can further determine whether there is a corresponding available in the cluster 10 that the computing task corresponding to the cluster 10 can reach.
  • a copy of the target data if not, analyze the data replica distribution metadata of the data usage request and the network structure between the clusters 10, and select a copy of the data with a large network bandwidth margin and a physical distance as the final target data copy and feedback Go to cluster 10; if there is, obtain a copy of the target data from cluster 10 that can be reached directly by the computing task, and feed back to cluster 10.
  • the evaluation function of the data distance and bandwidth may be defined according to the business scenario.
  • the specific implementation process of the data access distribution server 40 obtaining the target data copy from the cluster that can be directly obtained by the computing task may be: acquiring the cluster 10 access computing task according to the multi-cluster topology view The cluster 10 of the cluster 10 that is the least costly and has a copy of the target data, and obtains a copy of the target data.
  • the data access distribution server 40 queries the cross-cluster data replication server 30 to obtain the time estimate of the completion of the copy of the target data copy, and feeds back to the cluster 10 while acquiring the target data copy. Specifically, the data access distribution server 40 queries the cross-cluster data replication server 30 while the target data copy is in the replication while picking the target data copy in the cluster 10 with the least access cost according to the multi-cluster topology view. A time estimate of the completion of the copy of the target data copy is obtained, and the time estimate is fed back to the cluster 10.
  • the distributed computing system in the embodiment of the present application can ensure that the business task obtains the service data within a limited waiting time range and introduces the globally when the service data copy is dynamically distributed in a multi-cluster environment with limited network conditions.
  • a data access distribution server that changes cross-cluster data access requests from unordered to controlled, with cross-cluster data replication servers, from globally balanced resource usage across clusters, and by introducing progress information across clustered data replication servers.
  • it provides waiting options on the computing task, avoids the short-term data access flood peak, and distributes the single-point data access request to the global multi-point through limited replication waiting time, which improves the access data under the limited bandwidth between the clusters. effectiveness.
  • the present application also proposes a distributed computing method.
  • the distributed computing method may include:
  • a cluster can be understood as a parallel or distributed system composed of a plurality of computers connected to each other. From the outside, the cluster is only a system and provides a unified service to the outside.
  • the global metadata server saves and manages the multi-cluster data copy distribution view and the multi-cluster topology view.
  • the global metadata server can save the globally visible multi-cluster data copy distribution view, and provide interfaces for adding, deleting, modifying, and querying for external module calls, and can save the globally visible multi-cluster topology view, and provide query and management. Modify the interface.
  • the global metadata server works in the request/response mode.
  • the cross-cluster data replication server copies the specified data to the cluster specified by the copy instruction according to the copy instruction.
  • the cross-cluster data replication server may copy the specified data to the specified cluster according to the copy instruction, and the replication process is transparent to the outside, and may provide an interface for adding, deleting, modifying, copying, and querying the interface for external management. That is, a cross-cluster data replication server has two workflows: a background data replication process and a request data replication process.
  • the data access distribution server allocates location information of the target data according to the data usage request of the cluster, the multi-cluster data replica distribution view, and the multi-cluster topology view.
  • the data access distribution server may receive the data usage request of the cluster, and access the global metadata server according to the data usage request to obtain the global distribution information of the request data list, the multi-cluster topology view, and the corresponding bandwidth description information, and access the cross-cluster.
  • the data replication server obtains the status and progress of the data copy generation, and assigns location information of the target data to each request by comprehensively considering the above information.
  • the data access distribution server generates a copy instruction according to the location information of the target data, so that the cross-cluster data replication server copies the target data copy corresponding to the target data to the cluster specified by the data use request.
  • the distributed computing method may further include: the data access distribution server uses the data according to the data. The request determines whether the target data exists; if the target data does not exist, the data access distribution server feeds back to the cluster that the target data does not exist. Specifically, after receiving the data usage request, the data access distribution server may access the global metadata server according to the data usage request to obtain data distribution, information, and determine whether the target data exists according to the data usage request, and if not, terminate the operation. And feedback to the cluster target data does not exist.
  • the distributed computing method may further include: if the target data exists, and the cluster cannot wait, the data access distribution server acquires a copy of the available target data corresponding to the target data, and according to the cluster where the available target data copy is located The distance and bandwidth between the cluster and the cluster determine the final target data copy and send to the cluster; if the target data exists and the cluster can wait, the data access distribution server further determines whether there is any cluster in the cluster that corresponds to the computing task Corresponding copy of the available target data; if not, the data access distribution server determines the final target data copy based on the distance and bandwidth between the cluster and the cluster where the available target data copy is located, and sends it to the cluster; if there is, the data access distribution server A copy of the target data is obtained from the cluster that the computing task can reach and fed back to the cluster.
  • the computing task can be understood as a computer program for processing data, and the data is processed according to different computing models (such as Map/Reduce), and the result
  • a copy with a suitable distance and bandwidth may be selected from the currently available data copies.
  • the data access distribution server judges that the target data exists according to the data usage request, and the cluster can wait (that is, the calculation task can wait), it can further determine whether there is a corresponding copy of the available target data in the cluster that can be directly accessed by the computing task corresponding to the cluster, If not, analyze the data distribution distribution metadata of the data usage request and the network structure between the clusters, and select a copy of the data with a large network bandwidth margin and physical distance as the final target data copy and feed back to the cluster; , obtain a copy of the target data from the cluster that the computing task can reach, and feed back to the cluster.
  • the evaluation function of the data distance and the bandwidth may be defined according to the service scenario.
  • the specific implementation process of obtaining the target data copy from the cluster that can be directly obtained by the computing task may be: obtaining the cost in the cluster that can be directly accessed by the cluster access computing task according to the multi-cluster topology view.
  • the data access distribution server when acquiring the target data copy, further includes: querying the cross-cluster data replication server to obtain a time estimate of the completion of the copy of the target data copy, and feeding back to the cluster. Specifically, the data access distribution server selects a target data copy in the cluster with the least access cost according to the multi-cluster topology view, and if the target data copy is still in the replication, queries the cross-cluster data replication server to obtain the target. A time estimate of the completion of the copy of the data copy and feed back the time estimate to the cluster.
  • the short-term data access flood is avoided, and the single-point data access request is dispersed to the global through a limited replication waiting time. More.
  • the distributed computing method in the embodiment of the present application can ensure that the business task obtains the service data within a limited waiting time range and introduces the global data in a multi-cluster environment with limited network conditions.
  • a data access distribution server that changes cross-cluster data access requests from unordered to controlled, with cross-cluster data replication servers, from globally balanced resource usage across clusters, and by introducing progress information across clustered data replication servers.
  • it provides waiting options on the computing task, avoids the short-term data access flood peak, and distributes the single-point data access request to the global multi-point through limited replication waiting time, which improves the access data under the limited bandwidth between the clusters. effectiveness.
  • the distributed computing method in the embodiment of the present application is applicable to a scenario in which a computing task accesses data across a cluster in a multi-cluster environment, mainly by introducing the progress information of the cross-cluster data replication server, and simultaneously calculating the task.
  • the option of waiting is provided to avoid the short-term data access flood peak, thus avoiding that a certain hotspot data is accessed by a large number of computing tasks across the cluster and across regions at the same time, the network bandwidth becomes a bottleneck, resulting in access delay and communication quality degradation.
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” or “second” may include at least one of the features, either explicitly or implicitly.
  • the meaning of "a plurality” is at least two, such as two, three, etc., unless specifically defined otherwise.
  • a "computer-readable medium” can be any system, apparatus, or apparatus that can contain, store, communicate, propagate, or transport a Or device used by the device.
  • computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
  • the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
  • portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like. While the embodiments of the present application have been shown and described above, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the present application. The embodiments are subject to variations, modifications, substitutions and variations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种分布式计算系统和方法,其中该系统包括:多个集群,用于进行分布式计算;全局元数据服务器,用于保存并管理多集群数据副本分布视图和多集群拓扑结构视图;跨集群数据复制服务器,用于根据复制指令将指定的数据复制到复制指令指定的集群;以及数据访问分配服务器,用于根据集群的数据使用请求、多集群数据副本分布视图和多集群拓扑结构视图分配目标数据的位置信息,并根据目标数据的位置信息生成复制指令,以使跨集群数据复制服务器将目标数据对应的目标数据副本复制到数据使用请求指定的集群。该系统可以使得在网络条件受限的多集群环境中,业务数据副本多地动态分布时,确保计算任务在有限等待时间范围内获取到业务数据。

Description

分布式计算系统和方法
本申请要求2015年03月19日递交的申请号为201510122729.0、发明名称为“分布式计算系统和方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,尤其涉及一种分布式计算系统和方法。
背景技术
目前,当前主流的数据业务公司的离线数据处理方法大多为运行在集群上的分布式数据计算任务。随着数据规模的不断扩大,多集群协同工作成为处理大规模离线数据的主流方式。在网络带宽有限的跨地域多集群协作的场景下,如何高效地在集群间访问数据,成为一个重要的问题。
相关技术中,多集群数据访问方法在计算任务需要跨集群访问数据时,通常采用如下方式:通过集群间直连的网络跨集群直读数据。
但是,现有的通过元数据信息定位数据位置直接进行访问的方法在以下三个方面存在问题:(1)某份热点数据在同一时间被大量计算任务跨集群、跨地域访问时,网络带宽会成为瓶颈,导致访问延时、通信质量下降等问题,极端情况会导致网络状况雪崩;(2)对多集群的网络拓扑结构做出了限制,比如要求任意集群间必须连通,而跨地域时较难做到;(3)对网络稳定性要求较高,比如任何时候网络必须可用,而长途链路很难做到。
因此,现有的通过元数据信息定位数据位置直接进行访问的方法无法适用于跨地域多集群的复杂网络环境场景。
发明内容
本申请的目的旨在至少在一定程度上解决上述的技术问题之一。
为此,本申请的第一个目的在于提出一种分布式计算系统。该系统可以使得在网络条件受限的多集群环境中,业务数据副本多地动态分布时,确保计算任务在有限等待时间范围内获取到业务数据,提高了集群之间有限带宽下访问数据的效率。
本申请的第二个目的在于提出一种分布式计算方法。
为了实现上述目的,本申请第一方面实施例的分布式计算系统,包括:多个集群, 用于进行分布式计算;全局元数据服务器,用于保存并管理多集群数据副本分布视图和多集群拓扑结构视图;跨集群数据复制服务器,用于根据复制指令将指定的数据复制到所述复制指令指定的集群;以及数据访问分配服务器,用于根据所述集群的数据使用请求、所述多集群数据副本分布视图和多集群拓扑结构视图分配目标数据的位置信息,并根据所述目标数据的位置信息生成所述复制指令,以使所述跨集群数据复制服务器将所述目标数据对应的目标数据副本复制到所述数据使用请求指定的集群。
本申请实施例的分布式计算系统,可以使得在网络条件受限的多集群环境中,业务数据副本多地动态分布时,确保计算任务在有限等待时间范围内获取到业务数据,并通过引入全局数据访问分配服务器,将跨集群数据访问请求从无序变为受控,配合跨集群数据复制服务器,能够从全局平衡跨集群数据访问的资源使用,以及通过引入跨集群数据复制服务器的进度信息,同时在计算任务上提供等待的选项,避免了短时数据访问洪峰,并通过有限的复制等待时间,将单点的数据访问请求分散到了全局多点,提高了集群之间有限带宽下访问数据的效率。
为了实现上述目的,本申请第二方面实施例的分布式计算方法,包括:多个集群进行分布式计算;全局元数据服务器保存并管理多集群数据副本分布视图和多集群拓扑结构视图;跨集群数据复制服务器根据复制指令将指定的数据复制到所述复制指令指定的集群;数据访问分配服务器根据所述集群的数据使用请求、所述多集群数据副本分布视图和多集群拓扑结构视图分配目标数据的位置信息;以及所述数据访问分配服务器根据所述目标数据的位置信息生成所述复制指令,以使所述跨集群数据复制服务器将所述目标数据对应的目标数据副本复制到所述数据使用请求指定的集群。
本申请实施例的分布式计算方法,可以使得在网络条件受限的多集群环境中,业务数据副本多地动态分布时,确保计算任务在有限等待时间范围内获取到业务数据,并通过引入全局数据访问分配服务器,将跨集群数据访问请求从无序变为受控,配合跨集群数据复制服务器,能够从全局平衡跨集群数据访问的资源使用,以及通过引入跨集群数据复制服务器的进度信息,同时在计算任务上提供等待的选项,避免了短时数据访问洪峰,并通过有限的复制等待时间,将单点的数据访问请求分散到了全局多点,提高了集群之间有限带宽下访问数据的效率。
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中,
图1是根据本申请一个实施例的分布式计算系统的结构示意图;以及
图2是根据本申请一个实施例的分布式计算方法的流程图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。
下面参考附图描述本申请实施例的分布式计算系统和方法。
图1是根据本申请一个实施例的分布式计算系统的结构示意图。如图1所示,该分布式计算系统可以包括:多个集群10、全局元数据服务器20、跨集群数据复制服务器30和数据访问分配服务器40。
具体地,多个集群10可用于进行分布式计算。其中,在本申请的实施例中,集群10可理解为由一些互相连接在一起的计算机构成的一个并行或分布式系统,从外部来看,集群10仅仅是一个系统,对外提供统一的服务。
全局元数据服务器20可用于保存并管理多集群数据副本分布视图和多集群拓扑结构视图。更具体地,全局元数据服务器20可保存全局可见的多集群数据副本分布视图,提供增加、删除、修改、查询等接口供外部模块调用,并可保存全局可见的多集群拓扑结构视图,提供查询、管理员修改等接口。其中,全局元数据服务器20的工作方式为请求/响应模式。
跨集群数据复制服务器30可用于根据复制指令将指定的数据复制到复制指令指定的集群10。更具体地,跨集群数据复制服务器30可根据复制指令将指定的数据复制到指定集群10中,复制过程对外透明,并可提供增加接口、删除接口、修改接口、复制请求接口和查询接口供外部管理调用。也就是说,跨集群数据复制服务器30具有两个工作流程:后台数据复制流程和请求数据复制流程。
数据访问分配服务器40可用于根据集群10的数据使用请求、多集群数据副本分布视图和多集群拓扑结构视图分配目标数据的位置信息,并根据目标数据的位置信息生成复制指令,以使跨集群数据复制服务器30将目标数据对应的目标数据副本复制到数据使 用请求指定的集群10。
更具体地,数据访问分配服务器40可接收集群10的数据使用请求,并根据数据使用请求访问全局元数据服务器20以获取请求数据列表的全局分布信息、多集群拓扑结构视图及相应的带宽描述信息,访问跨集群数据复制服务器30以获取数据副本产生的状态和进度,并通过综合考虑上述这些信息为每个请求分配目标数据的位置信息。
其中,在本申请的一个实施例中,在访问跨集群数据复制服务器30以获取数据副本产生的状态和进度的过程中,数据访问分配服务器40可根据数据使用请求判断目标数据是否存在,如果目标数据不存在则向集群10反馈目标数据不存在信息。具体地,数据访问分配服务器40在接收到数据使用请求之后,可根据数据使用请求访问全局元数据服务器20以获取数据分布,信息,并根据数据使用请求判断目标数据是否存在,如果否,则终止运行,并向集群10反馈目标数据不存在。
在本申请的实施例中,如果目标数据存在,且集群10不能等待,则数据访问分配服务器40可获取目标数据对应的可用目标数据副本,并根据可用目标数据副本所在集群10与集群10之间的距离和带宽确定最终的目标数据副本,并发送至集群10;如果目标数据存在,且集群10能等待,则数据访问分配服务器40可进一步判断集群10所对应计算任务所能直达的集群10中是否有对应的可用目标数据副本,如果没有,则根据可用目标数据副本所在集群10与集群10之间的距离和带宽确定最终的目标数据副本,并发送至集群10;如果有,则从计算任务所能直达的集群10中获取目标数据副本,并反馈至集群10。其中,在本申请的实施例中,计算任务可理解为处理数据的计算机程序,按照不同的计算模型(如Map/Reduce)读取数据进行处理,将结果写入到存储介质中。
具体地,数据访问分配服务器40在根据数据使用请求判断目标数据存在,且集群10不能等待(即计算任务需要马上进行数据访问)时,可从当前可用的数据副本中选择一个距离和带宽都合适的副本返回到集群10中,即可分析数据使用请求的数据副本分布元数据和集群10之间的网络结构,挑选网络带宽裕量较大和物理距离较近的数据副本作为最终的目标数据副本并反馈到集群10。数据访问分配服务器40在根据数据使用请求判断目标数据存在,且集群10能等待(即计算任务可以等待)时,可进一步判断集群10所对应计算任务所能直达的集群10中是否有对应的可用目标数据副本,如果没有,则可分析数据使用请求的数据副本分布元数据和集群10之间的网络结构,挑选网络带宽裕量较大和物理距离较近的数据副本作为最终的目标数据副本并反馈到集群10;如果有,则从计算任务所能直达的集群10中获取目标数据副本,并反馈至集群10。其中,在本 申请的实施例中,数据距离和带宽的评估函数可根据业务场景定义。
具体而言,在本申请的实施例中,数据访问分配服务器40从计算任务所能直达的集群中获取目标数据副本的具体实现过程可为:根据多集群拓扑结构视图获取集群10访问计算任务所能直达的集群10中代价最小且具有目标数据副本的集群10,并获取目标数据副本。
进一步的,在本申请的实施例中,数据访问分配服务器40在获取目标数据副本的同时,查询跨集群数据复制服务器30以获取目标数据副本复制完成的时间估计,并反馈至集群10。具体地,数据访问分配服务器40在根据多集群拓扑结构视图挑选一个访问代价最小的集群10中的目标数据副本的同时,如果该目标数据副本还在复制中,则查询跨集群数据复制服务器30以获取该目标数据副本复制完成的时间估计,并将该时间估计反馈到集群10。
本申请实施例的分布式计算系统,可以使得在网络条件受限的多集群环境中,业务数据副本多地动态分布时,确保计算任务在有限等待时间范围内获取到业务数据,并通过引入全局数据访问分配服务器,将跨集群数据访问请求从无序变为受控,配合跨集群数据复制服务器,能够从全局平衡跨集群数据访问的资源使用,以及通过引入跨集群数据复制服务器的进度信息,同时在计算任务上提供等待的选项,避免了短时数据访问洪峰,并通过有限的复制等待时间,将单点的数据访问请求分散到了全局多点,提高了集群之间有限带宽下访问数据的效率。
为了实现上述实施例,本申请还提出了一种分布式计算方法。
图2是根据本申请一个实施例的分布式计算方法的流程图。如图2所示,该分布式计算方法可以包括:
S201,多个集群进行分布式计算。
其中,在本申请的实施例中,集群可理解为由一些互相连接在一起的计算机构成的一个并行或分布式系统,从外部来看,集群仅仅是一个系统,对外提供统一的服务。
S202,全局元数据服务器保存并管理多集群数据副本分布视图和多集群拓扑结构视图。
具体地,全局元数据服务器可保存全局可见的多集群数据副本分布视图,提供增加、删除、修改、查询等接口供外部模块调用,并可保存全局可见的多集群拓扑结构视图,提供查询、管理员修改等接口。其中,全局元数据服务器的工作方式为请求/响应模式。
S203,跨集群数据复制服务器根据复制指令将指定的数据复制到复制指令指定的集群。
具体地,跨集群数据复制服务器可根据复制指令将指定的数据复制到指定集群中,复制过程对外透明,并可提供增加接口、删除接口、修改接口、复制请求接口和查询接口供外部管理调用。也就是说,跨集群数据复制服务器具有两个工作流程:后台数据复制流程和请求数据复制流程。
S204,数据访问分配服务器根据集群的数据使用请求、多集群数据副本分布视图和多集群拓扑结构视图分配目标数据的位置信息。
具体地,数据访问分配服务器可接收集群的数据使用请求,并根据数据使用请求访问全局元数据服务器以获取请求数据列表的全局分布信息、多集群拓扑结构视图及相应的带宽描述信息,访问跨集群数据复制服务器以获取数据副本产生的状态和进度,并通过综合考虑上述这些信息为每个请求分配目标数据的位置信息。
S205,数据访问分配服务器根据目标数据的位置信息生成复制指令,以使跨集群数据复制服务器将目标数据对应的目标数据副本复制到数据使用请求指定的集群。
其中,在本申请的一个实施例中,在数据访问分配服务器访问跨集群数据复制服务器以获取数据副本产生的状态和进度的过程中,分布式计算方法还可包括:数据访问分配服务器根据数据使用请求判断目标数据是否存在;如果目标数据不存在,则数据访问分配服务器向集群反馈目标数据不存在信息。具体地,数据访问分配服务器在接收到数据使用请求之后,可根据数据使用请求访问全局元数据服务器以获取数据分布,信息,并根据数据使用请求判断目标数据是否存在,如果否,则终止运行,并向集群反馈目标数据不存在。
在本申请的一个实施例中,分布式计算方法还可包括:如果目标数据存在,且集群不能等待,则数据访问分配服务器获取目标数据对应的可用目标数据副本,并根据可用目标数据副本所在集群与集群之间的距离和带宽确定最终的目标数据副本,并发送至集群;如果目标数据存在,且集群能等待,则数据访问分配服务器进一步判断集群所对应计算任务所能直达的集群中是否有对应的可用目标数据副本;如果没有,则数据访问分配服务器根据可用目标数据副本所在集群与集群之间的距离和带宽确定最终的目标数据副本,并发送至集群;如果有,则数据访问分配服务器从计算任务所能直达的集群中获取目标数据副本,并反馈至集群。其中,在本申请的实施例中,计算任务可理解为处理数据的计算机程序,按照不同的计算模型(如Map/Reduce)读取数据进行处理,将结果 写入到存储介质中。
具体地,数据访问分配服务器在根据数据使用请求判断目标数据存在,且集群不能等待(即计算任务需要马上进行数据访问)时,可从当前可用的数据副本中选择一个距离和带宽都合适的副本返回到集群中,即可分析数据使用请求的数据副本分布元数据和集群之间的网络结构,挑选网络带宽裕量较大和物理距离较近的数据副本作为最终的目标数据副本并反馈到集群。数据访问分配服务器在根据数据使用请求判断目标数据存在,且集群能等待(即计算任务可以等待)时,可进一步判断集群所对应计算任务所能直达的集群中是否有对应的可用目标数据副本,如果没有,则可分析数据使用请求的数据副本分布元数据和集群之间的网络结构,挑选网络带宽裕量较大和物理距离较近的数据副本作为最终的目标数据副本并反馈到集群;如果有,则从计算任务所能直达的集群中获取目标数据副本,并反馈至集群。其中,在本申请的实施例中,数据距离和带宽的评估函数可根据业务场景定义。
具体而言,在本申请的实施例中,从计算任务所能直达的集群中获取目标数据副本的具体实现过程可为:根据多集群拓扑结构视图获取集群访问计算任务所能直达的集群中代价最小且具有目标数据副本的集群,并获取目标数据副本。
进一步的,在本申请的实施例中,数据访问分配服务器在获取目标数据副本的同时,还包括:查询跨集群数据复制服务器以获取目标数据副本复制完成的时间估计,并反馈至集群。具体地,数据访问分配服务器在根据多集群拓扑结构视图挑选一个访问代价最小的集群中的目标数据副本的同时,如果该目标数据副本还在复制中,则查询跨集群数据复制服务器以获取该目标数据副本复制完成的时间估计,并将该时间估计反馈到集群。由此,通过引入跨集群数据复制服务器的进度信息,同时在计算任务上提供等待的选项,避免了短时数据访问洪峰,并通过有限的复制等待时间,将单点的数据访问请求分散到了全局多点。
本申请实施例的分布式计算方法,可以使得在网络条件受限的多集群环境中,业务数据副本多地动态分布时,确保计算任务在有限等待时间范围内获取到业务数据,并通过引入全局数据访问分配服务器,将跨集群数据访问请求从无序变为受控,配合跨集群数据复制服务器,能够从全局平衡跨集群数据访问的资源使用,以及通过引入跨集群数据复制服务器的进度信息,同时在计算任务上提供等待的选项,避免了短时数据访问洪峰,并通过有限的复制等待时间,将单点的数据访问请求分散到了全局多点,提高了集群之间有限带宽下访问数据的效率。
与相关技术相比,本申请实施例的分布式计算方法适用于针对多集群环境下,计算任务大规模跨集群访问数据的场景,主要通过引入跨集群数据复制服务器的进度信息,同时在计算任务上提供等待的选项,避免了短时数据访问洪峰,从而避免了某份热点数据在同一时间被大量计算任务跨集群、跨地域访问时,网络带宽会成为瓶颈,导致访问延时、通信质量下降等问题的发生,并在获取业务数据副本时无需对多集群的网络拓扑结构做限制,以及根据多集群网络拓扑结构挑选一个访问代价最小的集群的数据副本作为最终的目标数据副本,可以看出,在整个过程中对网络稳定性要求不高。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置 或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (10)

  1. 一种分布式计算系统,其特征在于,包括:
    多个集群,用于进行分布式计算;
    全局元数据服务器,用于保存并管理多集群数据副本分布视图和多集群拓扑结构视图;
    跨集群数据复制服务器,用于根据复制指令将指定的数据复制到所述复制指令指定的集群;以及
    数据访问分配服务器,用于根据所述集群的数据使用请求、所述多集群数据副本分布视图和多集群拓扑结构视图分配目标数据的位置信息,并根据所述目标数据的位置信息生成所述复制指令,以使所述跨集群数据复制服务器将所述目标数据对应的目标数据副本复制到所述数据使用请求指定的集群。
  2. 如权利要求1所述的分布式计算系统,其特征在于,所述数据访问分配服务器根据所述数据使用请求判断目标数据是否存在,如果所述目标数据不存在则向所述集群反馈目标数据不存在信息。
  3. 如权利要求2所述的分布式计算系统,其特征在于,
    如果所述目标数据存在,且所述集群不能等待,则所述数据访问分配服务器获取所述目标数据对应的可用目标数据副本,并根据所述可用目标数据副本所在集群与所述集群之间的距离和带宽确定最终的目标数据副本,并发送至所述集群;
    如果所述目标数据存在,且所述集群能等待,则进一步判断所述集群所对应计算任务所能直达的集群中是否有对应的可用目标数据副本;
    如果没有,则根据所述可用目标数据副本所在集群与所述集群之间的距离和带宽确定最终的目标数据副本,并发送至所述集群;
    如果有,则从所述计算任务所能直达的集群中获取所述目标数据副本,并反馈至所述集群。
  4. 如权利要求3所述的分布式计算系统,其特征在于,所述从所述计算任务所能直达的集群中获取所述目标数据副本具体包括:
    根据所述多集群拓扑结构视图获取所述集群访问所述计算任务所能直达的集群中代价最小且具有所述目标数据副本的集群,并获取所述目标数据副本。
  5. 如权利要求4所述的分布式计算系统,其特征在于,所述数据访问分配服务器 在获取所述目标数据副本的同时,查询所述跨集群数据复制服务器以获取所述目标数据副本复制完成的时间估计,并反馈至所述集群。
  6. 一种分布式计算方法,其特征在于,包括以下步骤:
    多个集群进行分布式计算;
    全局元数据服务器保存并管理多集群数据副本分布视图和多集群拓扑结构视图;
    跨集群数据复制服务器根据复制指令将指定的数据复制到所述复制指令指定的集群;
    数据访问分配服务器根据所述集群的数据使用请求、所述多集群数据副本分布视图和多集群拓扑结构视图分配目标数据的位置信息;以及
    所述数据访问分配服务器根据所述目标数据的位置信息生成所述复制指令,以使所述跨集群数据复制服务器将所述目标数据对应的目标数据副本复制到所述数据使用请求指定的集群。
  7. 如权利要求6所述的分布式计算方法,其特征在于,还包括:
    所述数据访问分配服务器根据所述数据使用请求判断目标数据是否存在;
    如果所述目标数据不存在,则所述数据访问分配服务器向所述集群反馈目标数据不存在信息。
  8. 如权利要求7所述的分布式计算方法,其特征在于,还包括:
    如果所述目标数据存在,且所述集群不能等待,则所述数据访问分配服务器获取所述目标数据对应的可用目标数据副本,并根据所述可用目标数据副本所在集群与所述集群之间的距离和带宽确定最终的目标数据副本,并发送至所述集群;
    如果所述目标数据存在,且所述集群能等待,则所述数据访问分配服务器进一步判断所述集群所对应计算任务所能直达的集群中是否有对应的可用目标数据副本;
    如果没有,则所述数据访问分配服务器根据所述可用目标数据副本所在集群与所述集群之间的距离和带宽确定最终的目标数据副本,并发送至所述集群;
    如果有,则所述数据访问分配服务器从所述计算任务所能直达的集群中获取所述目标数据副本,并反馈至所述集群。
  9. 如权利要求8所述的分布式计算方法,其特征在于,从所述计算任务所能直达的集群中获取所述目标数据副本,具体包括:
    根据所述多集群拓扑结构视图获取所述集群访问所述计算任务所能直达的集群中 代价最小且具有所述目标数据副本的集群,并获取所述目标数据副本。
  10. 如权利要求9所述的分布式计算方法,其特征在于,所述数据访问分配服务器在获取所述目标数据副本的同时,还包括:
    查询所述跨集群数据复制服务器以获取所述目标数据副本复制完成的时间估计,并反馈至所述集群。
PCT/CN2016/076123 2015-03-19 2016-03-11 分布式计算系统和方法 WO2016146023A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510122729.0A CN106034160B (zh) 2015-03-19 2015-03-19 分布式计算系统和方法
CN201510122729.0 2015-03-19

Publications (1)

Publication Number Publication Date
WO2016146023A1 true WO2016146023A1 (zh) 2016-09-22

Family

ID=56918389

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/076123 WO2016146023A1 (zh) 2015-03-19 2016-03-11 分布式计算系统和方法

Country Status (2)

Country Link
CN (1) CN106034160B (zh)
WO (1) WO2016146023A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319618A (zh) * 2017-01-17 2018-07-24 阿里巴巴集团控股有限公司 一种分布式存储系统的数据分布控制方法、系统及装置
CN108390771A (zh) * 2018-01-25 2018-08-10 中国银联股份有限公司 一种网络拓扑重建方法和装置
CN111885123A (zh) * 2020-07-06 2020-11-03 苏州浪潮智能科技有限公司 一种跨K8s目标服务访问通道的构建方法及装置
CN114936090A (zh) * 2022-04-28 2022-08-23 北京辰行科技有限公司 一种基于云平台的金融大数据信息存储方法及设备

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108234566B (zh) * 2016-12-21 2021-04-23 阿里巴巴集团控股有限公司 一种集群的数据处理方法及装置
CN108282378B (zh) * 2017-01-05 2021-11-09 阿里巴巴集团控股有限公司 一种监控网络流量的方法和装置
CN109582686B (zh) * 2018-12-13 2021-01-15 中山大学 分布式元数据管理一致性保证方法、装置、系统及应用
CN110532802B (zh) * 2019-09-02 2021-06-22 中国农业银行股份有限公司 一种数据处理方法及系统
CN110795257B (zh) * 2019-09-19 2023-06-16 平安科技(深圳)有限公司 处理多集群作业记录的方法、装置、设备及存储介质
CN111049898A (zh) * 2019-12-10 2020-04-21 杭州东方通信软件技术有限公司 一种实现计算集群资源跨域架构的方法及系统
CN111290712B (zh) * 2020-01-22 2021-06-18 腾讯科技(深圳)有限公司 块设备创建方法、装置、云计算管理系统及存储介质
CN113138722B (zh) * 2021-04-30 2024-01-12 北京百度网讯科技有限公司 用于分布式块存储系统的复制快照方法、系统和介质
CN114827145B (zh) * 2022-04-24 2024-01-05 阿里巴巴(中国)有限公司 服务器集群系统、元数据的访问方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697526A (zh) * 2009-10-10 2010-04-21 中国科学技术大学 分布式文件系统中元数据管理的负载均衡方法及其系统
US20130218934A1 (en) * 2012-02-17 2013-08-22 Hitachi, Ltd. Method for directory entries split and merge in distributed file system
CN103647797A (zh) * 2013-11-15 2014-03-19 北京邮电大学 一种分布式文件系统及其数据访问方法
CN103916467A (zh) * 2014-03-25 2014-07-09 中国科学院计算技术研究所 一种元数据集群中负载转移方法及系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8051113B1 (en) * 2009-09-17 2011-11-01 Netapp, Inc. Method and system for managing clustered and non-clustered storage systems
GB2484086A (en) * 2010-09-28 2012-04-04 Metaswitch Networks Ltd Reliability and performance modes in a distributed storage system
GB2495079A (en) * 2011-09-23 2013-04-03 Hybrid Logic Ltd Live migration of applications and file systems in a distributed system
GB2496111A (en) * 2011-10-28 2013-05-08 Intergence Systems Ltd Tracing the real-world storage location of critical data items to form part of physical network map

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697526A (zh) * 2009-10-10 2010-04-21 中国科学技术大学 分布式文件系统中元数据管理的负载均衡方法及其系统
US20130218934A1 (en) * 2012-02-17 2013-08-22 Hitachi, Ltd. Method for directory entries split and merge in distributed file system
CN103647797A (zh) * 2013-11-15 2014-03-19 北京邮电大学 一种分布式文件系统及其数据访问方法
CN103916467A (zh) * 2014-03-25 2014-07-09 中国科学院计算技术研究所 一种元数据集群中负载转移方法及系统

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319618A (zh) * 2017-01-17 2018-07-24 阿里巴巴集团控股有限公司 一种分布式存储系统的数据分布控制方法、系统及装置
CN108319618B (zh) * 2017-01-17 2022-05-06 阿里巴巴集团控股有限公司 一种分布式存储系统的数据分布控制方法、系统及装置
CN108390771A (zh) * 2018-01-25 2018-08-10 中国银联股份有限公司 一种网络拓扑重建方法和装置
CN108390771B (zh) * 2018-01-25 2021-04-16 中国银联股份有限公司 一种网络拓扑重建方法和装置
CN111885123A (zh) * 2020-07-06 2020-11-03 苏州浪潮智能科技有限公司 一种跨K8s目标服务访问通道的构建方法及装置
CN111885123B (zh) * 2020-07-06 2022-06-03 苏州浪潮智能科技有限公司 一种跨K8s目标服务访问通道的构建方法及装置
CN114936090A (zh) * 2022-04-28 2022-08-23 北京辰行科技有限公司 一种基于云平台的金融大数据信息存储方法及设备
CN114936090B (zh) * 2022-04-28 2024-01-23 Tcl金融科技(深圳)有限公司 一种基于云平台的金融大数据信息存储方法及设备

Also Published As

Publication number Publication date
CN106034160A (zh) 2016-10-19
CN106034160B (zh) 2019-06-11

Similar Documents

Publication Publication Date Title
WO2016146023A1 (zh) 分布式计算系统和方法
US11379428B2 (en) Synchronization of client machines with a content management system repository
US10841234B2 (en) Constructing virtual motherboards and virtual storage devices
US11275622B2 (en) Utilizing accelerators to accelerate data analytic workloads in disaggregated systems
US10642491B2 (en) Dynamic selection of storage tiers
US9697247B2 (en) Tiered data storage architecture
US10528527B2 (en) File management in thin provisioning storage environments
US9400792B1 (en) File system inline fine grained tiering
US9705986B2 (en) Elastic scalability of a content transformation cluster
CN109196461A (zh) 用于在分布式存储系统中提供动态管理的服务质量的技术
KR20130055515A (ko) 데이터 프로세싱 시스템의 메모리 사용을 추적하는 방법
US20200081874A1 (en) Data placement and sharding
US11082517B2 (en) Content transformations using a transformation node cluster
US20160070475A1 (en) Memory Management Method, Apparatus, and System
US10972555B2 (en) Function based dynamic traffic management for network services
US20220075757A1 (en) Data read method, data write method, and server
US9448791B1 (en) Synchronizing source code objects and software development workflow objects
US10592469B1 (en) Converting files between thinly and thickly provisioned states
US10474653B2 (en) Flexible in-memory column store placement
CN111767169A (zh) 数据处理方法、装置、电子设备及存储介质
US10019969B2 (en) Presenting digital images with render-tiles
CN110168513B (zh) 在不同存储系统中对大文件的部分存储
US9473799B1 (en) Resource data query processing
US10713103B2 (en) Lightweight application programming interface (API) creation and management
US20140143457A1 (en) Determining a mapping mode for a dma data transfer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16764203

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16764203

Country of ref document: EP

Kind code of ref document: A1