WO2017114178A1 - 一种调整数据分片分布的方法及数据服务器 - Google Patents

一种调整数据分片分布的方法及数据服务器 Download PDF

Info

Publication number
WO2017114178A1
WO2017114178A1 PCT/CN2016/110238 CN2016110238W WO2017114178A1 WO 2017114178 A1 WO2017114178 A1 WO 2017114178A1 CN 2016110238 W CN2016110238 W CN 2016110238W WO 2017114178 A1 WO2017114178 A1 WO 2017114178A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
client
distribution information
information
access frequency
Prior art date
Application number
PCT/CN2016/110238
Other languages
English (en)
French (fr)
Inventor
张海勇
陆靖
姚文辉
董乘宇
朱家稷
Original Assignee
阿里巴巴集团控股有限公司
张海勇
陆靖
姚文辉
董乘宇
朱家稷
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 张海勇, 陆靖, 姚文辉, 董乘宇, 朱家稷 filed Critical 阿里巴巴集团控股有限公司
Priority to US15/780,380 priority Critical patent/US10956990B2/en
Publication of WO2017114178A1 publication Critical patent/WO2017114178A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method for adjusting data slice distribution.
  • the invention also relates to a data server.
  • the distributed file system serves as the underlying storage layer, and provides services that are close to infinitely extended storage services to the upper layer services.
  • data centers have become more and more offline due to various physical problems (such as smashing fiber, engine room fires, etc.), once the data center is under Lines will cause serious service availability issues.
  • DC DataCenter
  • data When data is distributed in multiple data centers (hereinafter referred to as DataCenter or DC), there will be data read and write processes between DCs, such as user operations reading and writing data across the computer room or due to hardware and software failures, the file system needs to be from the inside out. Perform data replication. This puts high demands on network connections and bandwidth across DCs.
  • the existing technical solutions are often data carriers self-built data centers and networks, so as to ensure sufficient network bandwidth between data centers. However, the lines between data centers need to be rented, which is costly and does not guarantee sufficient bandwidth.
  • the present invention provides a method for adjusting data slice distribution to maximize data access performance while reducing bandwidth requirements.
  • the method is applied to a distributed file storage system including multiple data centers, and the data to be processed is divided. A plurality of copies of the slice are stored in one or more data centers in the distributed file storage system, the method comprising:
  • the access frequency information of the to-be-processed data fragment is obtained, specifically:
  • the access frequency information is composed of sub-access frequency information of the to-be-processed data fragment in each of the data centers, where the sub-access frequency information includes at least a data fragment size, a slave and the sub- The number of accesses of the data center accessing the data segment corresponding to the frequency information, the data traffic generated by the data segment from the data center, and the average cross-frame bandwidth.
  • the revenue data and the number of accesses, the data traffic, and the average It is proportional to the bandwidth across the room and inversely proportional to the size of the data slice.
  • the optimal distribution information is generated according to the revenue data of each of the data centers and the number of the copies, specifically:
  • the location of each of the data centers in each of the data centers is adjusted according to the optimal distribution information, specifically:
  • the method further comprises:
  • the number of accesses is incremented by the data center before returning a copy corresponding to the data fragment requested by the client to the client;
  • the data traffic increases the amount of data of the copy before the data center returns a copy corresponding to the data slice requested by the client to the client.
  • the method further comprises:
  • the data fragment to be written carried in the data write request is obtained, and it is determined whether the data write request further carries the write option information. ;
  • the data write request carries write option information distributed across the data center, determining a data center for allocating the data slice to be written according to the distribution information specified by the user, and returning the determination result to the a client, so that the client writes the data segment to be written according to the determination result;
  • the data write request carries the default write option information or does not carry any write option information, determine, according to the identifier of the data center where the client is located in the data write request, to allocate the to-be-written Entering the data center of the data slice, and returning the determination result to the client, so that the client writes the data slice to be written according to the determination result.
  • the method further comprises:
  • the distribution information of the data fragment corresponding to the data read request when the data read request sent by the user is received by the client, so that the client is configured according to the The distribution information is selected and read according to the data fragment corresponding to the data center in which it is located.
  • the present application further provides an apparatus for adjusting a data slice distribution, wherein the device is applied to a distributed file storage system including multiple data centers, and multiple copies of the data fragments to be processed are stored in One or more data centers in the distributed file storage system, the device comprising:
  • a determining module determining, according to the access frequency information and a preset benefit function, the data segment corresponding to the revenue data of each of the data centers;
  • the adjusting module adjusts the position of each of the copies in each of the data centers according to the optimal distribution information.
  • the obtaining module is specifically configured to:
  • the access frequency information is composed of sub-access frequency information of the data segment to be processed in each data center, and the sub-access frequency information includes at least a data fragment size and a slave The number of accesses of the data center accessing the data segment corresponding to the sub-access frequency information, the data traffic generated by the data segment from the data center, and the average cross-frame bandwidth.
  • the revenue data is proportional to the number of accesses, the data traffic, and the average bandwidth across the computer room, and inversely proportional to the data slice size.
  • the generating module further includes:
  • Processing the sub-module obtaining an identifier of the data center within the same rank as the number, and using the acquired identifier as the optimal distribution information.
  • the generating module further includes:
  • Obtaining a sub-module acquiring real-time distribution information of the data segment, where the original distribution information is composed of identifiers of data centers in which each copy is currently located;
  • Determining a sub-module determining whether the real-time distribution information is consistent with the optimal distribution information, and when the real-time distribution information is inconsistent with the optimal distribution information, according to the real-time distribution information and the optimal distribution information
  • the same identification generates a data replication task to store each of the copies to a data center corresponding to the identification in the optimal distribution information.
  • the method further comprises:
  • a counting module wherein the number of accesses is increased by one before the data center returns a copy corresponding to the data fragment requested by the client to the client;
  • the metering module increases the amount of data of the copy before the data center returns a copy corresponding to the data slice requested by the client to the client.
  • the method further comprises:
  • a write module when receiving a data write request sent by the user by the client, acquiring a data fragment to be written carried in the data write request, and determining whether the data write request is further carried Write option information;
  • the write module determining, according to the distribution information specified by the user, a data center for allocating the data slice to be written, and determining Returning the result to the client, so that the client follows the The result is written into the data slice to be written;
  • the data write request carries the default write option information or does not carry any write option information, and the write module determines to be used according to the identifier of the data center where the client is located in the data write request. Allocating the data center of the data fragment to be written, and returning the determination result to the client, so that the client writes the data fragment to be written according to the determination result.
  • the method further comprises:
  • a reading module when receiving a data read request sent by the user through the client, returning distribution information of the data fragment corresponding to the data read request to the client, so that the client
  • the terminal selects a data slice corresponding to the data center in which it is located to perform reading according to the distribution information.
  • the present application further provides a distributed file storage system, comprising: at least one client, the file storage system further comprising:
  • One or more data centers for storing multiple copies of data segments to be processed
  • An apparatus for adjusting a data slice distribution where the device is configured to acquire access frequency information of the to-be-processed data slice when an adjustment time corresponding to the to-be-processed data slice is reached; according to the access frequency information And determining, by the preset revenue function, the data segment corresponding to the revenue data of each of the data centers; generating optimal distribution information according to the revenue data of each of the data centers and the number of the copies; according to the optimal distribution information The position of each of the copies in each of the data centers is adjusted.
  • the access frequency information of the data segment to be processed is acquired, and then according to the access frequency information and the preset income function.
  • the data segment is determined to correspond to the revenue data of each data center, and finally the optimal distribution information is generated according to the income data of each data center and the number of copies, and the positions of each copy in each data center are adjusted according to the optimal distribution information. So no need to set up
  • the distribution of data fragments is dynamically optimized according to the access frequency and characteristics of the data fragmentation, thereby reducing the transmission bandwidth requirement between data centers.
  • FIG. 1 is a schematic flow chart of a method for adjusting data slice distribution according to the present application
  • FIG. 2 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an apparatus for adjusting data slice distribution according to the present application.
  • the present application divides the data in the equipment room into different data fragments, which may also be referred to as data blocks.
  • data fragments As a physical record of data, it is a set of logically consecutive records. Each record consists of multiple copies, and a copy of the data slice is a unit of data that is transferred between the data center and the input, output device, or other data center. Optimize network traffic across data centers by dynamically adjusting data distribution based on data fragmentation in different machine rooms to maximize data access performance while reducing bandwidth requirements.
  • FIG. 1 a schematic flowchart of a method for adjusting data slice distribution proposed in the present application, because the present application aims to optimize the distribution of existing data fragments in multiple data centers according to user access conditions,
  • the method is applied to a distributed file system comprising a plurality of data centers, wherein a plurality of copies of the data fragments to be processed are stored in one or more data centers in the distributed file system (ie, multiple copies may be stored in all One of the multiple data centers, or is distributed in the current data center, specifically, including the following steps:
  • the access frequency information is composed of sub-access frequency information of the to-be-processed data fragment in each of the data centers, where the sub-access frequency information includes at least a data fragment size and a slave data size.
  • the adjustment of the data fragmentation may be performed by the system spontaneously or manually, and for the automatic triggering of the system, when the current time is determined as the adjustment time according to the preset time period, Acquiring the sub-access frequency information reported by each of the data centers in the time period; and when manually triggered, acquiring the data centers in a preset time period when receiving the adjustment trigger message The reported sub-access frequency information.
  • the access frequency information is composed of sub-access frequency information of the data segment to be processed in each data center. For each data center, it not only records the data fragment size and the average cross-frame bandwidth, but also the number of accesses to access the data fragments in each cycle and the data traffic generated by the data fragments from the data center. To provide a basis for subsequent determination of revenue.
  • the number of accesses needs to be added before the data center returns a copy corresponding to the data fragment requested by the client to the client; and the data traffic needs to be in the data center.
  • the amount of data that is added to the client before the copy of the data fragment requested by the client is returned to the client.
  • the default processing method is to place all data fragments into the data center where the client is located, which can avoid the traffic across the data center generated when the data fragments are written.
  • the user-specified write option is to write across data centers, the data fragments are written according to the user-specified write options, which may result in traffic across the data center. Therefore, in the implementation manner of the present application, a corresponding method step of the data fragment writing process is proposed, which is specifically as follows:
  • the identifier of the data center where the client is located is determined to be used for allocating the A data center to be written into the data slice, and returning the determination result to the client, so that the client writes the data slice to be written according to the determination result.
  • the read and write process of the data fragment is optimized by the foregoing manner, so that the traffic across the data center generated when the data slice is read and written is greatly reduced, thereby reducing the bandwidth requirement.
  • a specific embodiment of the present application provides a distributed storage system including a primary server and a client, where the client is composed of one or more client components.
  • the main server includes multiple modules such as a data server, a metadata server, and a data distribution management group. Specifically, each module is described as follows:
  • the client component obtains the location information of the data fragment from the primary server when opening the file, and selects the nearest DC (same DC as the external DC) to access; when writing data, the default is always all Data fragments are placed in the data center where the writer is located to avoid cross-data center traffic generated by writing data.
  • Data Server Manages the data and access frequency information of a copy of a file, providing read and write operations to the managed copy.
  • the frequency information is reported to the metadata server by the periodic reporting mechanism data server. Because the data is multiple copies, there will be multiple data servers reporting access data for the same data slice.
  • Metadata server records the data fragmentation location information of the file, and summarizes the frequency information of a certain data fragment from any DC access (accumulated within a certain period of time, such as 1 day).
  • Data Distribution Management component A component on the metadata server. The component calculates the location information and access frequency information of all data fragments periodically (such as one day) and calculates the revenue of data redistribution. If the revenue exceeds a certain weight, the data fragmentation position adjustment request is asynchronously triggered to complete the data distribution adjustment.
  • the data writing process and the data reading process before the data slice adjustment are as follows:
  • Step a) The client program C receives a request to write data in dc1.
  • C requests the location of the data fragment from the primary server. In the request, C will bring the data center name dc1 where it is located. By default, the primary server will allocate all data fragments to dc1. This subsequent write does not generate traffic across the DC. If the user specifies that the write option is distributed across DCs, then maste will try to allocate data fragments according to the user-specified distribution. At this time, subsequent writes may generate traffic across DCs.
  • Step b) The client program completes data writing according to the data fragment allocated by the main server.
  • Step a) The client program C requests the main server to open the file f for data reading.
  • Step b) The primary server returns the location of the data fragment to C
  • Step c) C preferentially selects the data fragment of the same data center, directly connects to the corresponding data server for data reading operation, in the request C will carry its own DC name dc1 to the data server, and specify the access and the Read data fragment d.
  • Step d) The data server records the access frequency +1 of the corresponding data fragment d before returning the data to C, and the corresponding access data amount plus the data of the request
  • Step e) C gets the data that the data server sends back and returns it to the user.
  • the revenue data is proportional to the number of accesses, the data traffic, and the average bandwidth across the computer room, and inversely proportional to the data slice size.
  • a reference formula for the income function is as follows:
  • d corresponds to a particular data fragment and S(d) is the size (MB) of the fragment.
  • A(d, dc) is the number of times data fragment d is accessed from the equipment room dc
  • B(d, dc) is the data traffic generated by data fragmentation d from the equipment room dc.
  • C is the bandwidth that can be obtained by each CS in addition to the bandwidth of the entire cluster in addition to the bandwidth of the entire cluster (MB).
  • the calculation result f(d, dc) is the benefit that can be brought when the data segment d is divided into dc.
  • all data fragments d can be calculated using the income function, and all dcs (including the dc where the data is located) are calculated. Sort the calculation results to ensure that there are data fragments in the highest-yielding equipment room, and the number of fragments is proportional to the frequency of data access.
  • the above formula is only a preferred solution proposed by the specific embodiment of the present application, however, the guaranteed benefit data is directly proportional to the number of accesses, the data traffic, and the average bandwidth across the computer room, and Under the premise that the data fragment size is inversely proportional, those skilled in the art may also modify or modify the income function, which are all within the protection scope of the present application.
  • the revenue data is first The data centers are sequentially arranged in a large to small order, and the identifiers of the data centers within the same rank as the number are obtained, and the acquired identifiers are used as the optimal distribution information.
  • the real-time distribution information of the data segment is first obtained.
  • the original distribution information is composed of the identifier of the data center where each copy is currently located, so Determining whether the real-time distribution information is consistent with the optimal distribution information, and if the real-time distribution information is inconsistent with the optimal distribution information, generating the identifier according to the real-time distribution information and the optimal distribution information A data replication task to store each of the copies to a data center corresponding to the identification in the optimal distribution information.
  • the data fragmentation adjustment process in this specific embodiment is as follows:
  • Step a) The data server periodically reports the access frequency information of the data fragment to the primary server.
  • Step b) The primary server aggregates the information according to the data fragmentation
  • Step c) The data distribution management component of the primary server recalculates the distribution benefit function f(d, dc1) of all data fragments periodically (on a daily basis) or under the manual trigger of the system administrator, and the calculation process refers to the formula above.
  • the distribution data of the data fragments is updated according to the calculation result. For the data slice d, the distribution before the calculation (dc1, dc1, dc1) is adjusted to (dc1, dc2, dc2) after calculation.
  • Step d) The data distribution management module scans the data distribution in the background. If it is found that the current data distribution and the ideal distribution (adjusted in step C) are inconsistent, a low priority data replication task is initiated, and the data layout is reorganized.
  • Step e) The client (client) preferentially accesses the data of the local room in the subsequent reading.
  • the present application further provides an apparatus for adjusting a data slice distribution, where the device is applied to a distributed file system including a plurality of data centers, and multiple copies of the data fragments to be processed are stored in the One or more data centers in a distributed file system, the device include:
  • the obtaining module 310 is configured to acquire access frequency information of the to-be-processed data slice when the adjustment time corresponding to the to-be-processed data slice is reached;
  • the determining module 320 is configured to determine, according to the access frequency information and a preset benefit function, the data segment corresponding to the revenue data of each of the data centers;
  • the generating module 330 is configured to generate optimal distribution information according to the revenue data of each of the data centers and the number of the copies;
  • the adjusting module 340 adjusts the position of each of the copies in each of the data centers according to the optimal distribution information.
  • the acquiring module is specifically configured to:
  • the access frequency information is composed of sub-access frequency information of the to-be-processed data fragment in each data center, and the sub-access frequency information includes at least a data fragment size and a slave.
  • the revenue data is proportional to the number of accesses, the data traffic, and the average bandwidth across the computer room, and inversely proportional to the data fragment size.
  • the generating module further includes:
  • Processing the sub-module obtaining an identifier of the data center within the same rank as the number, and using the acquired identifier as the optimal distribution information.
  • the generating module further includes:
  • Obtaining a sub-module acquiring real-time distribution information of the data segment, the original distribution information It consists of the identifier of the data center where each copy is currently located;
  • Determining a sub-module determining whether the real-time distribution information is consistent with the optimal distribution information, and when the real-time distribution information is inconsistent with the optimal distribution information, according to the real-time distribution information and the optimal distribution information
  • the same identification generates a data replication task to store each of the copies to a data center corresponding to the identification in the optimal distribution information.
  • a counting module wherein the number of accesses is increased by one before the data center returns a copy corresponding to the data fragment requested by the client to the client;
  • the metering module increases the amount of data of the copy before the data center returns a copy corresponding to the data slice requested by the client to the client.
  • a write module when receiving a data write request sent by the user by the client, acquiring a data fragment to be written carried in the data write request, and determining whether the data write request is further carried Write option information;
  • the write module determining, according to the distribution information specified by the user, a data center for allocating the data slice to be written, and determining Returning the result to the client, so that the client writes the data segment to be written according to the determination result;
  • the data write request carries the default write option information or does not carry any write option information, and the write module determines to be used according to the identifier of the data center where the client is located in the data write request. Allocating the data center of the data fragment to be written, and returning the determination result to the client, so that the client writes the data fragment to be written according to the determination result.
  • a reading module when receiving a data read request sent by the user through the client, returning distribution information of the data fragment corresponding to the data read request to the client, so that the client
  • the terminal selects a data slice corresponding to the data center in which it is located to perform reading according to the distribution information.
  • the present application further provides a distributed file storage system, including at least one client, and the file storage system further includes:
  • One or more data centers for storing multiple copies of data segments to be processed
  • An apparatus for adjusting a data slice distribution where the device is configured to acquire access frequency information of the to-be-processed data slice when an adjustment time corresponding to the to-be-processed data slice is reached; according to the access frequency information And determining, by the preset revenue function, the data segment corresponding to the revenue data of each of the data centers; generating optimal distribution information according to the revenue data of each of the data centers and the number of the copies; according to the optimal distribution information The position of each of the copies in each of the data centers is adjusted.
  • the frequency information stored in the data server only needs to be saved in the memory. If the data server crashes for any reason, the corresponding data is cleared. Abnormal downtime is a small probability event, and the inaccurate frequency of access caused by the entire cluster environment has little effect. And it can automatically restore the reasonable layout with the data segment distribution adjustment of the next cycle.
  • the present invention can be implemented by hardware or by means of software plus a necessary general hardware platform.
  • the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including several The instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various implementation scenarios of the present invention.
  • modules in the device in the implementation scenario can be implemented according to the implementation.
  • the scene descriptions are distributed among the devices that implement the scene, and the corresponding changes may also be located in one or more devices different from the present embodiment.
  • the modules of the above implementation scenarios may be combined into one module, or may be further split into multiple sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种调整数据分片分布的方法:当到达与待处理数据分片对应的调整时刻时,获取待处理数据分片的访问频度信息(S101),随后根据访问频度信息以及预设的收益函数确定数据分片对应各数据中心的收益数据(S102),最后根据各数据中心的收益数据以及副本的数量生成最优分布信息(S103),并按照最优分布信息对各副本在各数据中心的位置进行调整(S104)。从而在无需额外设置用于存储的内存或者硬盘的情况下,根据数据分片的访问频度和特性动态优化数据分片的分布情况,从而降低了数据中心之间的传输带宽需求。

Description

一种调整数据分片分布的方法及数据服务器 技术领域
本发明涉及通信技术领域,特别涉及一种调整数据分片分布的方法。本发明同时还涉及一种数据服务器。
背景技术
在云计算和大数据处理环境下,分布式文件系统作为底层的存储层,向上层的业务提供接近无限扩展的存储服务能力。然而,随着数据中心规模的增大以及全球部署的流行,数据中心因为各种物理问题(例如挖断光纤、机房起火等等)而发生整体下线的事故越来越多,一旦数据中心下线,将会导致严重的服务可用性问题。
为了提高数据服务的可用性和延续性,业界一种常见的做法是将数据的多份拷贝分布到一定区域内的多个数据中心中,通过数据中心之间的数据冗余来提高服务的可用性,数据运营商通过采用跨数据中心数据分布的技术方案,从而使自己的部分服务能承受任意一个数据中心离线。
当数据在多个数据中心(以下简称DataCenter或者DC)分布时,将会存在跨DC之间的数据读写过程,比如用户作业跨机房读写数据或者因为软硬件故障导致文件系统需要从内向外进行数据复制。这对跨DC的网络连接和带宽提出了较高的要求。现有的技术方案往往是数据运营商自建数据中心和网络,从而能够保证数据中心之间有充足的网络带宽。但是数据中心之间的线路需要租用,成本高昂,并不能保障有充足的带宽,
针对以上问题,目前存在一种方案,就是在每个数据中心内部加上一层缓存来尽量避免跨数据中心的数据读取。这种方法虽然能够有效规避跨数据中心读数据所带来的网络流量。但由于缓存的数据放在内存中,相比磁盘内存的容量小上几个数量级(TB vs GB),因此缓存的效果会随着数据量的增大而降低,而且缓存系统在将缓存放到硬盘上会挤占用户数据的 可用IO能力,此外,缓存和底层的文件系统配合比较困难。如某个文件数据改写会导致整个文件的缓存数据失效,影响缓存的使用效率。
由此可见,如何优化数据分布来节省网络访问的带宽,成为本领域技术人员亟待解决的技术问题。
发明内容
本发明提供了一种调整数据分片分布的方法,用以在降低带宽需求的同时能最大化数据访问性能,该方法应用于包括多个数据中心的分布式文件存储系统中,待处理数据分片的多个副本存储于所述分布式文件存储系统中的一个或多个数据中心,该方法包括:
当到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息;
根据所述访问频度信息以及预设的收益函数确定所述数据分片对应各所述数据中心的收益数据;
根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息;
按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整。优选地,当到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息,具体为:
在根据预设的时间周期确定当前时刻为调整时刻时,获取所述时间周期内各所述数据中心上报的子访问频度信息;
或,在接收到调整触发消息时,获取在预设的时间周期内各所述数据中心上报的子访问频度信息。
优选地,所述访问频度信息由所述待处理数据分片在各所述数据中心的子访问频度信息组成,所述子访问频度信息至少包括数据分片大小、从与所述子访问频度信息对应的数据中心访问所述数据分片的访问次数、所述数据分片从所述数据中心所产生的数据流量,以及平均跨机房带宽。
优选地,所述收益数据与所述访问次数、所述数据流量以及所述平均 跨机房带宽成正比,以及与所述数据分片大小成反比。
优选地,根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息,具体为:
按照收益数据从大到小的顺序依次排列所述数据中心;
获取与所述数量相同的排名之内的数据中心的标识,并将已获取的标识作为所述最优分布信息。
优选地,按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整,具体为:
获取所述数据分片的实时分布信息,所述原始分布信息由各所述副本当前所在的数据中心的标识组成;
判断所述实时分布信息是否与所述最优分布信息一致;
若所述实时分布信息与所述最优分布信息不一致,根据所述实时分布信息与所述最优分布信息中不相同的标识生成数据复制任务,以将各所述副本存储至与所述最优分布信息中的标识对应的数据中心。
优选地,还包括:
所述访问次数在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前加一;
所述数据流量在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前增加所述副本的数据量。
优选地,还包括:
当接收到用户通过所述客户端发送的数据写入请求时,获取所述数据写入请求中携带的待写入数据分片,并判断所述数据写入请求中是否还携带写入选项信息;
若所述数据写入请求携带跨数据中心分布的写入选项信息,按照所述用户指定的分布信息确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片;
若所述数据写入请求携带默认写入选项信息或未携带任何写入选项信息,根据所述数据写入请求中携带的所述客户端所在的数据中心的标识确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片。
优选地,还包括:
当接收到所述用户通过所述客户端发送的数据读取请求时,将所述数据读取请求对应的数据分片的分布信息返回至所述客户端,以使所述客户端根据所述分布信息选择与自身所在的数据中心对应的数据分片进行读取。
相应地,本申请还提出一种调整数据分片分布的设备,其特征在于,所述设备应用于包括多个数据中心的分布式文件存储系统中,待处理数据分片的多个副本存储于所述分布式文件存储系统中的一个或多个数据中心,该设备包括:
获取模块,在到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息;
确定模块,根据所述访问频度信息以及预设的收益函数确定所述数据分片对应各所述数据中心的收益数据;
生成模块,根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息;
调整模块,按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整。
优选地,所述获取模块具体用于:
在根据预设的时间周期确定当前时刻为调整时刻时,获取所述时间周期内各所述数据中心上报的子访问频度信息;
或,在接收到调整触发消息时,获取在预设的时间周期内各所述数据中心上报的子访问频度信息。
优选地,所述访问频度信息由所述待处理数据分片在各所述数据中心的子访问频度信息组成,所述子访问频度信息至少包括数据分片大小、从 与所述子访问频度信息对应的数据中心访问所述数据分片的访问次数、所述数据分片从所述数据中心所产生的数据流量,以及平均跨机房带宽。
优选地,所述收益数据与所述访问次数、所述数据流量以及所述平均跨机房带宽成正比,以及与所述数据分片大小成反比。
优选地,所述生成模块还包括:
排列子模块,按照收益数据从大到小的顺序依次排列所述数据中心;
处理子模块,获取与所述数量相同的排名之内的数据中心的标识,并将已获取的标识作为所述最优分布信息。
优选地,所述生成模块还包括:
获取子模块,获取所述数据分片的实时分布信息,所述原始分布信息由各所述副本当前所在的数据中心的标识组成;
判断子模块,判断所述实时分布信息是否与所述最优分布信息一致,当所述实时分布信息与所述最优分布信息不一致,根据所述实时分布信息与所述最优分布信息中不相同的标识生成数据复制任务,以将各所述副本存储至与所述最优分布信息中的标识对应的数据中心。
优选地,还包括:
计数模块,在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前,所述访问次数加一;
计量模块,在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前,所述数据流量增加所述副本的数据量。
优选地,还包括:
写入模块,在接收到用户通过所述客户端发送的数据写入请求时,获取所述数据写入请求中携带的待写入数据分片,并判断所述数据写入请求中是否还携带写入选项信息;
在所述数据写入请求携带跨数据中心分布的写入选项信息,所述写入模块按照所述用户指定的分布信息确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确 定结果写入所述待写入数据分片;
在所述数据写入请求携带默认写入选项信息或未携带任何写入选项信息,所述写入模块根据所述数据写入请求中携带的所述客户端所在的数据中心的标识确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片。
优选地,还包括:
读取模块,在接收到所述用户通过所述客户端发送的数据读取请求时,将所述数据读取请求对应的数据分片的分布信息返回至所述客户端,以使所述客户端根据所述分布信息选择与自身所在的数据中心对应的数据分片进行读取。
相应地,本申请另一方面还提出了一种分布式文件存储系统,其特征在于,包括至少一个客户端,所述文件存储系统还包括:
一个或多个数据中心,所述数据中心用于存储待处理数据分片的多个副本;
调整数据分片分布的设备,所述设备用于在到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息;根据所述访问频度信息以及预设的收益函数确定所述数据分片对应各所述数据中心的收益数据;根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息;按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整。
由此可见,通过应用本发明的技术方案,当到达与待处理数据分片对应的调整时刻时,获取待处理数据分片的访问频度信息,随后根据访问频度信息以及预设的收益函数确定数据分片对应各数据中心的收益数据,最后根据各数据中心的收益数据以及副本的数量生成最优分布信息,并按照最优分布信息对各副本在各数据中心的位置进行调整。从而在无需额外设 置用于存储的内存或者硬盘的情况下,根据数据分片的访问频度和特性动态优化数据分片的分布情况,从而降低了数据中心之间的传输带宽需求。
附图说明
图1为本申请提出的一种调整数据分片分布的方法的流程示意图;
图2为本申请实施例提出的一种分布式存储系统的结构示意图。
图3为本申请提出的一种调整数据分片分布的设备的结构示意图。
具体实施方式
如背景技术所述,若通过加大机房间带宽来规避跨机房流量的限制,其成本将会十分的高昂,而通过在每个机房内部设置缓存来规避跨机房流量的话,又会受到内存限制并且总体存储效率会降低。因此,本申请将机房中的数据划分为各个不同的数据分片,该数据分片也可以称为数据块,其作为一种数据的物理记录方式,是一组逻辑上连续排列在一起的记录,每个记录由多个副本构成,数据分片的副本是数据中心与输入、输出设备或其他数据中心之间进行传输的一个数据单位。基于对各个不同机房中的数据分片,通过动态调整数据分布来优化跨数据中心的网络流量,从而在降低带宽需求的同时能最大化数据访问性能。
如图1所示,为本申请提出的一种调整数据分片分布的方法的流程示意图,由于本申请旨在根据用户的访问情况优化现有数据分片在多个数据中心中的分布,因此该方法应用于包括多个数据中心的分布式文件系统中,待处理数据分片的多个副本存储于所述分布式文件系统中的一个或多个数据中心(即多个副本可全部存储于多个数据中心中的一个数据中心,或者是分散存储于当前的数据中心),具体地,包括以下步骤:
S101,当到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息。
其中,所述访问频度信息由所述待处理数据分片在各所述数据中心的子访问频度信息组成,所述子访问频度信息至少包括数据分片大小、从与 所述子访问频度信息对应的数据中心访问所述数据分片的访问次数、所述数据分片从所述数据中心所产生的数据流量,以及平均跨机房带宽。
在本申请的优选实施例中,对于数据分片的调整可以设置系统自发进行,也可以由人工手动触发,对于系统自动触发的情况,在根据预设的时间周期确定当前时刻为调整时刻时,获取所述时间周期内各所述数据中心上报的子访问频度信息;而在由人工触发时,则是在接收到调整触发消息的情况下获取在预设的时间周期内各所述数据中心上报的子访问频度信息。
由于本申请的技术方案需要针对各个数据中心判断数据分片的副本是否适合存储于该数据中心,因此访问频度信息由待处理数据分片在各数据中心的子访问频度信息组成。对于每个数据中心来说,其不仅要记录数据分片大小以及平均跨机房带宽,而且还要在采集每个周期中访问数据分片的访问次数以及数据分片从数据中心所产生的数据流量,从而为后续确定收益提供依据。
需要说明的是,作为子访问频度信息中的变量,访问次数需要在数据中心将客户端所请求的数据分片对应的副本返回给客户端之前加一;而数据流量则需要在数据中心将客户端所请求的数据分片对应的副本返回给客户端之前增加副本的数据量。
本申请在实现动态调整数据分布时,不可避免地会出现数据分片的写入和读取过程。因此,如果针对数据分片的读写过程进行优化,可以有效降低数据分片在读写时所产生的跨数据中心的流量。
在数据分片写入过程中,默认处理方式是将所有数据分片放置到客户端所在的数据中心,可以规避数据分片写入时所产生的跨数据中心的流量。但是如果用户指定的写入选项是跨数据中心写入,则按照用户指定的写入选项来写入数据分片,此时可能会产生跨数据中心的流量。故本申请实施方式中提出了数据分片写入过程的对应方法步骤,具体如下:
a)当接收到用户通过所述客户端发送的数据写入请求时,获取所述数据写入请求中携带的待写入数据分片,并判断所述数据写入请求中是否还 携带写入选项信息;
b)若所述数据写入请求携带跨数据中心分布的写入选项信息,按照所述用户指定的分布信息确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片;
c)若所述数据写入请求携带默认写入选项信息或未携带任何写入选项信息,根据所述数据写入请求中携带的所述客户端所在的数据中心的标识确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片。
在数据分片读取过程中,则只通过读取放置在客户端所在数据中心的数据分片,来规避数据分片读取时所产生的跨数据中心的流量。故本申请实施方式中提出了数据分片读取过程的对应方法步骤,具体如下:
a)当接收到所述用户通过所述客户端发送的数据读取请求时,将所述数据读取请求对应的数据分片的分布信息返回至所述客户端,以使所述客户端根据所述分布信息选择与自身所在的数据中心对应的数据分片进行读取。
本申请实施方式中通过上述方式对数据分片的读写过程进行优化,使得数据分片读写时所产生的跨数据中心的流量大幅降低,进而降低了带宽需求。
为便于清楚阐述本申请的技术方案,如图2所示,本申请的具体实施例提供了一种包括主服务器以及客户端的分布式存储系统,其中客户端由一个或多个客户端组件组成,而主服务器则包括数据服务器、元数据服务器以及数据分布管理组等多个模块,具体地,各个模块的介绍如下:
客户端组件:客户端组件在打开文件时从主服务器获取到数据分片的位置信息,并选择距离最近的DC(同DC优先于外部DC)去访问;在写数据时,默认总是将所有数据分片放置到写者所在的数据中心,以规避写数据所产生的跨数据中心流量。
数据服务器:管理文件的副本的数据和访问频度信息,提供对所管理副本的读写操作。通过周期性的汇报机制数据服务器将频度信息报告给元数据服务器。因数据是多副本,因此会有多个数据服务器汇报同一个数据分片的访问数据。
元数据服务器:记录文件的数据分片位置信息,并汇总某个数据分片从任何一个DC访问的频次信息(在一定时间内累计,如1天内)。
数据分布管理组件:在元数据服务器上的一个组件。该组件周期性的(如一天)计算所有数据分片的位置信息和访问频度信息,并计算数据重新分布的收益。收益超过一定权值则异步触发数据分片位置调整请求完成数据分布调整。
基于上述各个模块,在数据分片的调整之前的数据写入流程以及数据读取流程如下:
(1)数据写入流程
步骤a)客户端程序C在dc1中,收到写数据的请求。C向主服务器请求数据分片的位置,在请求中C会带上自身所处的数据中心名称dc1。默认情况下主服务器会将所有数据分片分配到到dc1中。这样后继的写不会产生跨DC的流量。而用户若指定了写入选项是跨DC分布,则maste会尽量按照用户指定的分布来分配数据分片。这时后继的写有可能产生跨DC的流量。
步骤b)客户端程序根据主服务器所分配的数据分片完成数据写入。
(2)数据读取流程
步骤a)客户端程序C向主服务器申请打开文件f进行数据读取
步骤b)主服务器将数据分片的位置返回给C
步骤c)C优先选择同一数据中心的数据分片,直接连接对应的的数据服务器进行数据读取操作,在请求中C会将自身所在的DC名称dc1携带给数据服务器,并指定要访问和所读取的数据分片d。
步骤d)数据服务器在返回数据给C之前记录对应数据分片d的访问频次+1,并且对应的访问数据量加上这次请求的数据
步骤e)C得到数据服务器回吐的数据并返回给用户。
S102,根据所述访问频度信息以及预设的收益函数确定所述数据分片对应各所述数据中心的收益数据。
其中,所述收益数据与所述访问次数、所述数据流量以及所述平均跨机房带宽成成正比,以及与所述数据分片大小成反比。
在本申请的具体实施例中,收益函数的一个参考公式如下:
Figure PCTCN2016110238-appb-000001
其中,d对应某个特定的数据分片,S(d)是该分片的大小(MB)。A(d,dc)是数据分片d从机房dc访问的次数,B(d,dc)是数据分片d从机房dc所产生的数据流量。C是跨机房带宽(MB)除上整个集群server的数量得到的一个平均每台CS所能拿到的带宽。
通过上述收益函数,计算结果f(d,dc)就是数据分片d分部到dc中时所能带来的收益。在后续过程中即可利用收益函数对所有数据分片d进行计算,并且对所有dc(包括数据所在的dc)进行计算。对计算结果进行排序,确保收益最高的机房中存在数据分片,并且分片的数目同数据访问频度成正比。
需要说明的是,以上公式仅为本申请具体实施例提出的一种优选方案,然而,在保证收益数据与所述访问次数、所述数据流量以及所述平均跨机房带宽成正比,以及与所述数据分片大小成反比的前提下,本领域技术人员也可以对该收益函数进行修改或者变形,这些都属于本申请的保护范围。
S103,根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息。
为了能够使数据分片的副本能够均匀分布到各个数据中,同时兼顾各个数据中心的收益数据,在本申请的优选实施例中,首先按照收益数据从 大到小的顺序依次排列所述数据中心,获取与所述数量相同的排名之内的数据中心的标识,并将已获取的标识作为所述最优分布信息。
S104,按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整。
具体地,在本申请的优选实施例中,首先获取所述数据分片的实时分布信息,需要说明的是,该原始分布信息由各所述副本当前所在的数据中心的标识组成,因此后面可判断所述实时分布信息是否与所述最优分布信息一致,若所述实时分布信息与所述最优分布信息不一致,根据所述实时分布信息与所述最优分布信息中不相同的标识生成数据复制任务,以将各所述副本存储至与所述最优分布信息中的标识对应的数据中心。
以S101中的分布式存储系统为例,该具体实施例中的数据分片调整过程如下:
步骤a)数据服务器定期将数据分片的访问频度信息汇报给主服务器
步骤b)主服务器根据数据分片综合汇聚该信息
步骤c)主服务器的数据分布管理组件定期(每天)或者在系统管理员的手动触发下重新计算所有数据分片的分布收益函数f(d,dc1),计算过程参考上文中的公式。根据计算结果将数据分片的分布数据更新。如对于数据分片d,在计算前的分布时(dc1,dc1,dc1),计算后调整为(dc1,dc2,dc2)
步骤d)数据分布管理模块在后台扫描数据分布情况,如果发现当前数据的分布和理想分布(在步骤C中调整后的)不一致,则发起低优先级的数据复制任务,重新组织数据的布局。
步骤e)Client(客户端)在后继读取中优先访问本机房的数据。
为达到以上技术目的,本申请还提出一种调整数据分片分布的设备,所述设备应用于包括多个数据中心的分布式文件系统中,待处理数据分片的多个副本存储于所述分布式文件系统中的一个或多个数据中心,该设备 包括:
获取模块310,在到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息;
确定模块320,根据所述访问频度信息以及预设的收益函数确定所述数据分片对应各所述数据中心的收益数据;
生成模块330,根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息;
调整模块340,按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整。
在具体应用场景中,所述获取模块具体用于:
在根据预设的时间周期确定当前时刻为调整时刻时,获取所述时间周期内各所述数据中心上报的子访问频度信息;
或,在接收到调整触发消息时,获取在预设的时间周期内各所述数据中心上报的子访问频度信息。
在具体应用场景中,所述访问频度信息由所述待处理数据分片在各所述数据中心的子访问频度信息组成,所述子访问频度信息至少包括数据分片大小、从与所述子访问频度信息对应的数据中心访问所述数据分片的访问次数、所述数据分片从所述数据中心所产生的数据流量,以及平均跨机房带宽。
在具体应用场景中,所述收益数据与所述访问次数、所述数据流量以及所述平均跨机房带宽成正比,以及与所述数据分片大小成反比。
在具体应用场景中,所述生成模块还包括:
排列子模块,按照收益数据从大到小的顺序依次排列所述数据中心;
处理子模块,获取与所述数量相同的排名之内的数据中心的标识,并将已获取的标识作为所述最优分布信息。
在具体应用场景中,所述生成模块还包括:
获取子模块,获取所述数据分片的实时分布信息,所述原始分布信息 由各所述副本当前所在的数据中心的标识组成;
判断子模块,判断所述实时分布信息是否与所述最优分布信息一致,当所述实时分布信息与所述最优分布信息不一致,根据所述实时分布信息与所述最优分布信息中不相同的标识生成数据复制任务,以将各所述副本存储至与所述最优分布信息中的标识对应的数据中心。
在具体应用场景中,还包括:
计数模块,在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前,所述访问次数加一;
计量模块,在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前,所述数据流量增加所述副本的数据量。
在具体应用场景中,还包括:
写入模块,在接收到用户通过所述客户端发送的数据写入请求时,获取所述数据写入请求中携带的待写入数据分片,并判断所述数据写入请求中是否还携带写入选项信息;
在所述数据写入请求携带跨数据中心分布的写入选项信息,所述写入模块按照所述用户指定的分布信息确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片;
在所述数据写入请求携带默认写入选项信息或未携带任何写入选项信息,所述写入模块根据所述数据写入请求中携带的所述客户端所在的数据中心的标识确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片。
在具体应用场景中,还包括:
读取模块,在接收到所述用户通过所述客户端发送的数据读取请求时,将所述数据读取请求对应的数据分片的分布信息返回至所述客户端,以使所述客户端根据所述分布信息选择与自身所在的数据中心对应的数据分片进行读取。
另一方面,本申请另一方面还提出了一种分布式文件存储系统,其特征在于,包括至少一个客户端,所述文件存储系统还包括:
一个或多个数据中心,所述数据中心用于存储待处理数据分片的多个副本;
调整数据分片分布的设备,所述设备用于在到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息;根据所述访问频度信息以及预设的收益函数确定所述数据分片对应各所述数据中心的收益数据;根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息;按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整。
在应用以上方案进行数据分片存储之后,数据服务器中所保存的频度信息只需要在内存中保存,如果数据服务器因为任何原因crash则对应的数据清零。异常宕机是小概率事件,在整个集群环境下所带来的访问频度不准确问题影响不大。并且能随着下一个周期的数据分片分布调整自动恢复合理布局。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本发明可以通过硬件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,本发明的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施场景所述的方法。
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本发明所必须的。
本领域技术人员可以理解实施场景中的装置中的模块可以按照实施 场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。
上述本发明序号仅仅为了描述,不代表实施场景的优劣。
以上公开的仅为本发明的几个具体实施场景,但是,本发明并非局限于此,任何本领域的技术人员能思之的变化都应落入本发明的保护范围。

Claims (19)

  1. 一种调整数据分片分布的方法,其特征在于,所述方法应用于包括多个数据中心的分布式文件存储系统中,待处理数据分片的多个副本存储于所述分布式文件存储系统中的一个或多个数据中心,该方法包括:
    当到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息;
    根据所述访问频度信息以及预设的收益函数确定所述数据分片对应各所述数据中心的收益数据;
    根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息;
    按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整。
  2. 如权利要求1所述的方法,其特征在于,当到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息,具体为:
    在根据预设的时间周期确定当前时刻为调整时刻时,获取所述时间周期内各所述数据中心上报的子访问频度信息;
    或,在接收到调整触发消息时,获取在预设的时间周期内各所述数据中心上报的子访问频度信息。
  3. 如权利要求1所述的方法,其特征在于,所述访问频度信息由所述待处理数据分片在各所述数据中心的子访问频度信息组成,所述子访问频度信息至少包括数据分片大小、从与所述子访问频度信息对应的数据中心访问所述数据分片的访问次数、所述数据分片从所述数据中心所产生的数据流量,以及平均跨机房带宽。
  4. 如权利要求1或3任一项所述的方法,其特征在于,所述收益数据与所述访问次数、所述数据流量以及所述平均跨机房带宽成正比,以及与所述数据分片大小成反比。
  5. 如权利要求1所述的方法,其特征在于,根据各所述数据中心的 收益数据以及所述副本的数量生成最优分布信息,具体为:
    按照收益数据从大到小的顺序依次排列所述数据中心;
    获取与所述数量相同的排名之内的数据中心的标识,并将已获取的标识作为所述最优分布信息。
  6. 如权利要求5所述的方法,其特征在于,按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整,具体为:
    获取所述数据分片的实时分布信息,所述原始分布信息由各所述副本当前所在的数据中心的标识组成;
    判断所述实时分布信息是否与所述最优分布信息一致;
    若所述实时分布信息与所述最优分布信息不一致,根据所述实时分布信息与所述最优分布信息中不相同的标识生成数据复制任务,以将各所述副本存储至与所述最优分布信息中的标识对应的数据中心。
  7. 如权利要求6所述的方法,其特征在于,还包括:
    所述访问次数在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前加一;
    所述数据流量在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前增加所述副本的数据量。
  8. 如权利要求7所述的方法,其特征在于,还包括:
    当接收到用户通过所述客户端发送的数据写入请求时,获取所述数据写入请求中携带的待写入数据分片,并判断所述数据写入请求中是否还携带写入选项信息;
    若所述数据写入请求携带跨数据中心分布的写入选项信息,按照所述用户指定的分布信息确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片;
    若所述数据写入请求携带默认写入选项信息或未携带任何写入选项信息,根据所述数据写入请求中携带的所述客户端所在的数据中心的标识 确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片。
  9. 如权利要求7所述的方法,其特征在于,还包括:
    当接收到所述用户通过所述客户端发送的数据读取请求时,将所述数据读取请求对应的数据分片的分布信息返回至所述客户端,以使所述客户端根据所述分布信息选择与自身所在的数据中心对应的数据分片进行读取。
  10. 一种调整数据分片分布的设备,其特征在于,所述设备应用于包括多个数据中心的分布式文件存储系统中,待处理数据分片的多个副本存储于所述分布式文件存储系统中的一个或多个数据中心,该设备包括:
    获取模块,在到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息;
    确定模块,根据所述访问频度信息以及预设的收益函数确定所述数据分片对应各所述数据中心的收益数据;
    生成模块,根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息;
    调整模块,按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整。
  11. 如权利要求10所述的设备,其特征在于,所述获取模块具体用于:
    在根据预设的时间周期确定当前时刻为调整时刻时,获取所述时间周期内各所述数据中心上报的子访问频度信息;
    或,在接收到调整触发消息时,获取在预设的时间周期内各所述数据中心上报的子访问频度信息。
  12. 如权利要求10所述的设备,其特征在于,所述访问频度信息由所述待处理数据分片在各所述数据中心的子访问频度信息组成,所述子访问频度信息至少包括数据分片大小、从与所述子访问频度信息对应的数据中心访问所述数据分片的访问次数、所述数据分片从所述数据中心所产生 的数据流量,以及平均跨机房带宽。
  13. 如权利要求10或12任一项所述的设备,其特征在于,所述收益数据与所述访问次数、所述数据流量以及所述平均跨机房带宽成正比,以及与所述数据分片大小成反比。
  14. 如权利要求10所述的设备,其特征在于,所述生成模块还包括:
    排列子模块,按照收益数据从大到小的顺序依次排列所述数据中心;
    处理子模块,获取与所述数量相同的排名之内的数据中心的标识,并将已获取的标识作为所述最优分布信息。
  15. 如权利要求10所述的设备,其特征在于,所述生成模块还包括:
    获取子模块,获取所述数据分片的实时分布信息,所述原始分布信息由各所述副本当前所在的数据中心的标识组成;
    判断子模块,判断所述实时分布信息是否与所述最优分布信息一致,当所述实时分布信息与所述最优分布信息不一致,根据所述实时分布信息与所述最优分布信息中不相同的标识生成数据复制任务,以将各所述副本存储至与所述最优分布信息中的标识对应的数据中心。
  16. 如权利要求10所述的设备,其特征在于,还包括:
    计数模块,在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前,所述访问次数加一;
    计量模块,在所述数据中心将客户端所请求的数据分片对应的副本返回给所述客户端之前,所述数据流量增加所述副本的数据量。
  17. 如权利要求16所述的设备,其特征在于,还包括:
    写入模块,在接收到用户通过所述客户端发送的数据写入请求时,获取所述数据写入请求中携带的待写入数据分片,并判断所述数据写入请求中是否还携带写入选项信息;
    在所述数据写入请求携带跨数据中心分布的写入选项信息,所述写入模块按照所述用户指定的分布信息确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确 定结果写入所述待写入数据分片;
    在所述数据写入请求携带默认写入选项信息或未携带任何写入选项信息,所述写入模块根据所述数据写入请求中携带的所述客户端所在的数据中心的标识确定用于分配所述待写入数据分片的数据中心,并将确定结果返回至所述客户端,以使所述客户端按照所述确定结果写入所述待写入数据分片。
  18. 如权利要求16所述的设备,其特征在于,还包括:
    读取模块,在接收到所述用户通过所述客户端发送的数据读取请求时,将所述数据读取请求对应的数据分片的分布信息返回至所述客户端,以使所述客户端根据所述分布信息选择与自身所在的数据中心对应的数据分片进行读取。
  19. 一种分布式文件存储系统,其特征在于,包括至少一个客户端,所述文件存储系统还包括:一个或多个数据中心,所述数据中心用于存储待处理数据分片的多个副本;
    调整数据分片分布的设备,所述设备用于在到达与所述待处理数据分片对应的调整时刻时,获取所述待处理数据分片的访问频度信息;根据所述访问频度信息以及预设的收益函数确定所述数据分片对应各所述数据中心的收益数据;根据各所述数据中心的收益数据以及所述副本的数量生成最优分布信息;按照所述最优分布信息对各所述副本在各所述数据中心的位置进行调整。
PCT/CN2016/110238 2015-12-30 2016-12-16 一种调整数据分片分布的方法及数据服务器 WO2017114178A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/780,380 US10956990B2 (en) 2015-12-30 2016-12-16 Methods and apparatuses for adjusting the distribution of partitioned data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511024615.9A CN106933868B (zh) 2015-12-30 2015-12-30 一种调整数据分片分布的方法及数据服务器
CN201511024615.9 2015-12-30

Publications (1)

Publication Number Publication Date
WO2017114178A1 true WO2017114178A1 (zh) 2017-07-06

Family

ID=59225929

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110238 WO2017114178A1 (zh) 2015-12-30 2016-12-16 一种调整数据分片分布的方法及数据服务器

Country Status (3)

Country Link
US (1) US10956990B2 (zh)
CN (1) CN106933868B (zh)
WO (1) WO2017114178A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475537A (zh) * 2020-04-09 2020-07-31 杭州趣维科技有限公司 基于pulsar的全球数据同步系统

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10951465B1 (en) * 2016-09-29 2021-03-16 Emc Ïp Holding Company Llc Distributed file system analytics
CN110019082A (zh) * 2017-07-31 2019-07-16 普天信息技术有限公司 文件数据的分布式多副本存储方法
CN110399394A (zh) * 2018-04-16 2019-11-01 北京京东尚科信息技术有限公司 计算节点的数据遍历方法、装置及计算节点
CN109614372B (zh) * 2018-10-26 2023-06-02 创新先进技术有限公司 一种对象存储、读取方法、装置、及业务服务器
CN110198346B (zh) * 2019-05-06 2020-10-27 北京三快在线科技有限公司 数据读取方法、装置、电子设备及可读存储介质
CN112084123B (zh) * 2019-06-12 2024-02-27 阿里巴巴集团控股有限公司 数据处理方法及装置和数据处理系统
CN111353121B (zh) * 2020-03-31 2023-04-11 中国空气动力研究与发展中心超高速空气动力研究所 一种用于确定航天器解体碎片不确定性参数分布的方法
US11393548B2 (en) * 2020-12-18 2022-07-19 Micron Technology, Inc. Workload adaptive scans for memory sub-systems
CN115455010B (zh) * 2022-11-09 2023-02-28 以萨技术股份有限公司 一种基于milvus数据库的数据处理方法、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7383381B1 (en) * 2003-02-28 2008-06-03 Sun Microsystems, Inc. Systems and methods for configuring a storage virtualization environment
US20080155537A1 (en) * 2006-07-24 2008-06-26 Peter Dinda Methods and systems for automatic inference and adaptation of virtualized computing environments
CN102150150A (zh) * 2008-09-11 2011-08-10 微软公司 用于跨数据中心的资源定位和迁移的技术
CN102414673A (zh) * 2009-04-24 2012-04-11 微软公司 智能的备份数据分层
CN104932956A (zh) * 2015-06-19 2015-09-23 华南理工大学 一种面向大数据的云容灾备份方法

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6324620B1 (en) * 1998-07-23 2001-11-27 International Business Machines Corporation Dynamic DASD data management and partitioning based on access frequency utilization and capacity
US7036140B2 (en) 1999-01-13 2006-04-25 Arris International, Inc. Capacity scaling and functional element redistribution within an in-building coax cable internet access system
US7058807B2 (en) 2002-04-15 2006-06-06 Intel Corporation Validation of inclusion of a platform within a data center
US7810097B2 (en) 2003-07-28 2010-10-05 Hewlett-Packard Development Company, L.P. Priority analysis of access transactions in an information system
US7710865B2 (en) 2005-02-25 2010-05-04 Cisco Technology, Inc. Disaster recovery for active-standby data center using route health and BGP
US7609619B2 (en) 2005-02-25 2009-10-27 Cisco Technology, Inc. Active-active data center using RHI, BGP, and IGP anycast for disaster recovery and load distribution
US9128766B1 (en) * 2006-04-24 2015-09-08 Hewlett-Packard Development Company, L.P. Computer workload redistribution schedule
US8104041B2 (en) 2006-04-24 2012-01-24 Hewlett-Packard Development Company, L.P. Computer workload redistribution based on prediction from analysis of local resource utilization chronology data
US7827147B1 (en) * 2007-03-30 2010-11-02 Data Center Technologies System and method for automatically redistributing metadata across managers
US20100138677A1 (en) * 2008-12-01 2010-06-03 International Business Machines Corporation Optimization of data distribution and power consumption in a data center
US9519517B2 (en) * 2009-02-13 2016-12-13 Schneider Electtic It Corporation Data center control
US8458287B2 (en) 2009-07-31 2013-06-04 Microsoft Corporation Erasure coded storage aggregation in data centers
US8555276B2 (en) * 2011-03-11 2013-10-08 Joyent, Inc. Systems and methods for transparently optimizing workloads
US8452819B1 (en) * 2011-03-22 2013-05-28 Amazon Technologies, Inc. Methods and apparatus for optimizing resource utilization in distributed storage systems
US9141646B1 (en) 2011-12-30 2015-09-22 Teradata Us, Inc. Database redistribution in dynamically-configured database systems
CN102609508B (zh) * 2012-02-05 2013-12-25 四川大学 一种面向网络存储的文件高速访问方法
US9560127B2 (en) 2013-01-18 2017-01-31 International Business Machines Corporation Systems, methods and algorithms for logical movement of data objects
CN103384272B (zh) * 2013-07-05 2016-01-13 华中科技大学 一种云服务分布式数据中心系统及其负载调度方法
CN103701916B (zh) * 2013-12-31 2017-10-27 赛凡信息科技(厦门)有限公司 分布式存储系统的动态负载均衡方法
CN103984737B (zh) * 2014-05-22 2017-01-25 武汉大学 一种基于计算相关度的多数据中心数据布局优化方法
US10528970B2 (en) * 2014-12-01 2020-01-07 Amobee, Inc. Systems, methods, and devices for pipelined processing of online advertising performance data
US10013466B2 (en) * 2014-12-16 2018-07-03 Sap Se Using time information to prune queries against partitioned data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7383381B1 (en) * 2003-02-28 2008-06-03 Sun Microsystems, Inc. Systems and methods for configuring a storage virtualization environment
US20080155537A1 (en) * 2006-07-24 2008-06-26 Peter Dinda Methods and systems for automatic inference and adaptation of virtualized computing environments
CN102150150A (zh) * 2008-09-11 2011-08-10 微软公司 用于跨数据中心的资源定位和迁移的技术
CN102414673A (zh) * 2009-04-24 2012-04-11 微软公司 智能的备份数据分层
CN104932956A (zh) * 2015-06-19 2015-09-23 华南理工大学 一种面向大数据的云容灾备份方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475537A (zh) * 2020-04-09 2020-07-31 杭州趣维科技有限公司 基于pulsar的全球数据同步系统
CN111475537B (zh) * 2020-04-09 2023-06-23 杭州小影创新科技股份有限公司 基于pulsar的全球数据同步系统

Also Published As

Publication number Publication date
CN106933868A (zh) 2017-07-07
CN106933868B (zh) 2020-04-24
US20180357727A1 (en) 2018-12-13
US10956990B2 (en) 2021-03-23

Similar Documents

Publication Publication Date Title
WO2017114178A1 (zh) 一种调整数据分片分布的方法及数据服务器
US20230333942A1 (en) Tiered cloud storage for different availability and performance requirements
US10782880B2 (en) Apparatus and method for providing storage for providing cloud services
US10649903B2 (en) Modifying provisioned throughput capacity for data stores according to cache performance
US20150106578A1 (en) Systems, methods and devices for implementing data management in a distributed data storage system
US9444905B2 (en) Allocating network bandwidth to prefetch requests to prefetch data from a remote storage to cache in a local storage
US8832218B2 (en) Determining priorities for cached objects to order the transfer of modifications of cached objects based on measured network bandwidth
US11048591B1 (en) Efficient name space organization in a global name space cluster
US11169927B2 (en) Efficient cache management
US10237343B1 (en) Sustaining backup service level objectives using dynamic resource allocation
US11442927B1 (en) Storage performance-based distribution of deduplicated data to nodes within a clustered storage environment
WO2023216571A1 (zh) 弹性搜索集群的资源调度方法、装置及系统
US11055223B2 (en) Efficient cache warm up based on user requests
WO2014153931A1 (zh) 文件存储方法、装置、访问客户端及元数据服务器系统
CN112650729B (zh) 一种分布式文件系统的权限管理方法、系统以及存储介质
WO2021135412A1 (zh) 部署实例的方法、实例管理节点、计算节点和计算设备
DE112021000408T5 (de) Prädiktives bereitstellen von fern gespeicherten dateien
TWI756202B (zh) 調整資料片段分布的方法及資料伺服器
WO2012171363A1 (zh) 分布式缓存系统中的数据操作方法和装置
US11635907B2 (en) Record information management based on self-describing attributes
WO2022089321A1 (zh) 调度接入点的方法、装置、服务器以及存储介质
US11126371B2 (en) Caching file data within a clustered computing system
US10140190B1 (en) Efficient transaction log flushing
CN113342277B (zh) 数据处理方法及装置
CN111338570B (zh) 一种并行文件系统io优化方法与系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16880968

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16880968

Country of ref document: EP

Kind code of ref document: A1