WO2016070341A1 - 数据处理方法和装置 - Google Patents

数据处理方法和装置 Download PDF

Info

Publication number
WO2016070341A1
WO2016070341A1 PCT/CN2014/090299 CN2014090299W WO2016070341A1 WO 2016070341 A1 WO2016070341 A1 WO 2016070341A1 CN 2014090299 W CN2014090299 W CN 2014090299W WO 2016070341 A1 WO2016070341 A1 WO 2016070341A1
Authority
WO
WIPO (PCT)
Prior art keywords
partition
partitions
key
value
total number
Prior art date
Application number
PCT/CN2014/090299
Other languages
English (en)
French (fr)
Inventor
罗雄
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to AU2014410705A priority Critical patent/AU2014410705B2/en
Priority to CN201480075293.8A priority patent/CN106063226B/zh
Priority to EP14905367.0A priority patent/EP3128716B1/en
Priority to CA2941163A priority patent/CA2941163C/en
Priority to CN201710379148.4A priority patent/CN107357522B/zh
Priority to JP2016560892A priority patent/JP6288596B2/ja
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2014/090299 priority patent/WO2016070341A1/zh
Priority to CN201910052954.XA priority patent/CN109918021B/zh
Priority to KR1020167026230A priority patent/KR101912728B1/ko
Publication of WO2016070341A1 publication Critical patent/WO2016070341A1/zh
Priority to US15/587,051 priority patent/US9952778B2/en
Priority to US15/946,484 priority patent/US10628050B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • the present application relates to a data processing technique, and more particularly to a data processing method and apparatus.
  • a storage system consisting of multiple physical storage nodes appears, and each storage node can provide storage space.
  • This storage method is called distributed storage.
  • a distributed storage method is called Key-Value storage.
  • Key-Value storage stored data (or data fragments) is called a value, and each data has a A unique identifier in the entire storage system. This identifier is a key, and Key and Value correspond one-to-one.
  • the Value corresponding to Key and Key is called Key-Value, or K-V for short.
  • Each Key-Value is stored in a storage disk of the storage system.
  • DHT Distributed Hash Table
  • This mapping rule is based on a hash value generated by hashing the Key.
  • Each hash value belongs to a partition, and the partition corresponds to a storage disk, so that each Key-Value corresponds to one storage disk.
  • the Key-Value corresponding to the two keys is stored on the same storage disk.
  • the correspondence between partitions and storage disks is called a partitioned view.
  • the hash value calculated according to the Key falls within the integer interval of [0, 2 ⁇ 32-1], and is performed on the large-scale integer interval at the time of system initialization.
  • Segmentation each segment is equal or approximately equal in size, such a segment is a partition, and the number of hash values in each partition is basically the same.
  • Cluster of storage disks When the number of storage disks is small, each storage disk has too many partitions, which makes the partition view too complicated. It is inefficient when forwarding packets according to the partition view.
  • the specific examples are as follows.
  • each storage disk has roughly 100 partitions, which means that the entire cluster has a total of 2,500,000 partitions. Assuming that the information of each partition occupies 4 bits of storage space, the partition information will occupy a total of 10 MB of storage space, and the partition view information is greater than 10 MB.
  • the invention provides a data processing method and device, which can reduce the occupation of system resources when processing Key-Value data.
  • the present invention provides a data processing method, which is applied to a partition management device, where the partition management device stores a partition view, where the partition view records a correspondence between a current partition ID and a storage disk address, and the method
  • the method includes: obtaining a value Key in the key-value Key-Value data, and calculating a final partition ID corresponding to the Key-Value data according to the Key, wherein the Key-Value data includes a value Value and uniquely corresponding to the Value a key of the current partition ID corresponding to the final partition ID, wherein each current partition ID corresponds to a plurality of final partition IDs; querying the partitioned view to obtain a storage disk address corresponding to the current partition ID; The address is used as a destination address to generate a Key-Value packet, and the Key-Value packet is sent to the storage disk, and the Key-Value packet carries the Key-Value data.
  • the present invention provides a data processing apparatus, including: a storage module, configured to store a partition view, wherein the partition view records a correspondence between a current partition ID and a storage disk address; and a final partition calculation module, configured to: Obtaining a value Key in the key-value Key-Value data, and calculating a final partition ID corresponding to the Key-Value data according to the Key, wherein the Key-Value data includes a value Value and a key uniquely corresponding to the Value ; current partition calculation module, And a current partition ID corresponding to the final partition ID, where each current partition ID corresponds to a plurality of final partition IDs; a query module is configured to query the partitioned view stored by the storage module to obtain a current partition ID.
  • a sending module configured to generate a Key-Value message by using the storage disk address as a destination address, and send the Key-Value message to the storage disk, where the Key-Value message carries Said Key-Value data.
  • a third aspect of the present invention provides a data processing device, the data processing device including: a memory configured to store a partition view, wherein the partition view records a correspondence between a current partition ID and a storage disk address; Configuring to provide an external connection; a computer readable medium configured to store a computer program; a processor coupled to the memory, interface, computer readable medium, configured to execute by running the program The following steps: obtaining a value Key in the key-value Key-Value data, and calculating a final partition ID corresponding to the Key-Value data according to the Key, wherein the Key-Value data includes a value Value and is unique to the Value Corresponding Key; calculating a current partition ID corresponding to the final partition ID, where each current partition ID corresponds to a plurality of final partition IDs; querying the partitioned view to obtain a storage disk address corresponding to the current partition ID; The disk address is used as the destination address to generate a Key-Value packet, and the Key-Value packet is sent from the
  • a fourth aspect provides a partition management method, which is executed by a controller, where the controller performs partition management on a storage disk in a cluster, where the cluster includes multiple storage disks, and the method includes: when detecting When the N new storage disks are ready to join the cluster, obtain the current number of storage disks M in the cluster, and the total number of existing Ts in the cluster, where M, N, and T are natural numbers; Whether the mathematical relationship between T and the total number of storage disks M+N satisfies a first predetermined condition; if the first predetermined condition is satisfied, splitting at least one of the current partitions such that the total number of partitions after splitting is S, and the split is The partition is allocated to M+N storage disks, and the mathematical relationship between the total number of partitions S after splitting and the total number of storage disks M+N satisfies the second pre- The condition is determined, and the total number of partitions after splitting is not greater than the total number of final partitions L supported by the cluster, where L and S are both
  • the operation of updating the partitioned view is further performed, and the corresponding relationship between the current partition and the IP disk is recorded in the partitioned view.
  • the present invention provides a partition management apparatus, configured to perform partition management on a storage disk in a cluster, where the cluster includes a plurality of storage disks, and the device includes: a storage disk detection module, configured to detect when When the N new storage disks are ready to join the cluster, obtain the current storage disk number M in the cluster, and the total number of existing Ts in the cluster, where M, N, and T are natural numbers; the first predetermined condition a determining module, configured to determine whether a mathematical relationship between the total number of partitions T and the total number of storage disks M+N satisfies a first predetermined condition; and a partition splitting module configured to split at least one of the current partitions if the first predetermined condition is met So that the total number of partitions after splitting is S, and the split partitions are allocated to M+N storage disks, and the mathematical relationship between the total number of partitioned partitions S and the total number of storage disks M+N satisfies the second predetermined condition, and after splitting The total number
  • the partition splitting module is further configured to update the operation of the partitioned view, and the corresponding relationship between the current partition and the IP disk is recorded in the partitioned view.
  • the present invention provides a partition management device, and a cluster connection, configured to perform partition management on a storage disk in a cluster, where the cluster includes a plurality of storage disks, and the partition management device includes: a memory, configured For storing a partitioned view, the partitioned view records a correspondence between a current partition ID and a storage disk address; an interface configured to provide an external connection; a computer readable medium configured to store a computer program; And the memory, interface, computer readable medium, configured to, by running the program, perform the step of: when it is detected through the interface that there are N new storage disks ready to join the cluster, Obtaining the number M of the current storage disks in the cluster, and the total number of partitions T currently existing in the cluster, Where M, N and T are natural numbers; determining whether the mathematical relationship between the total number of partitions T and the total number of storage disks M+N satisfies a first predetermined condition; if the first predetermined condition is satisfied, splitting at least one of the
  • the processor further performs an operation of updating the partitioned view, and the corresponding relationship between the current partition and the IP disk is recorded in the partitioned view.
  • the partition management device can use the current partition to perform Key-Value data forwarding. Since the number of current partitions is smaller than the final partition, the scheme for forwarding using the final partition relative to the prior art reduces resource consumption.
  • FIG. 1 is a topological view of an embodiment of a storage system of the present invention
  • FIG. 2 is a flow chart of an embodiment of a partition management method of the present invention
  • Figure 3 is a structural diagram of an embodiment of a controller of the present invention.
  • FIG. 4 is a schematic diagram of an embodiment of a partition management apparatus according to the present invention.
  • Figure 5 is a flow chart of an embodiment of a data processing method of the present invention.
  • Figure 6 is a structural diagram of an embodiment of a data processing device of the present invention.
  • Figure 7 is a schematic diagram of an embodiment of a data processing apparatus of the present invention.
  • FIG. 1 is a topological view of a storage system of an embodiment of the invention.
  • the application server 11 is connected to the management server cluster 12, the management server cluster 12 is connected to the switching device cluster 13, the switching cluster 13 is connected to the IP disk cluster 14, and the controller 15 is connected to the management server cluster 12 and the IP disk cluster 14.
  • the management server cluster 12 is composed of at least one management server 121; the IP disk cluster 14 is composed of at least one IP disk 141.
  • the application server 1 issues a command to read data or a command to write data.
  • the management server 121 is, for example, a distributed object pool (DOP), and provides an object interface to the application server 11. This object can be large, for example, in GB.
  • DOP distributed object pool
  • the management server 121 can split the object into small fragments, for example, split the object into 1MB-sized fragments, each of which is a Value; each Value has a label called Key.
  • the management server 121 may perform a hash operation on the Key and associate the calculated hash value with the partition.
  • the management server 121 may also store a partition view in which the correspondence between the IP disk and the partition is recorded, and specifically, the correspondence between the current partition ID and the IP disk. Therefore, the management server can find the IP disk corresponding to each Key-Value.
  • the IP disk that is found is called the target disk.
  • the management server 121 can use the address of the target IP disk as the destination address to generate an IP packet and send it to the switching cluster 13.
  • the contents of the partition view record are as shown in Table 1 below, where i is a natural number and m is a natural number greater than 1.
  • the switching cluster 13 is composed of at least one switch for data exchange between the management server 121 and the IP disk 141 at the time of reading data or writing data.
  • the operation of querying the partition view can also be performed to the switch in the switch cluster 13, that is, the switch in the switch cluster 13 stores the partition view.
  • a new type of packet which can be called a Key-Value packet
  • the key-value packet is the same as the rest of the IP packet.
  • the difference is that the destination address is the partition number obtained by the Key calculation, and the packet type field is used to distinguish between the IP packet and the Key-Value packet.
  • the switch queries the mapping table of the partition number and the IP address of the IP address, replaces the destination address with the IP address of the IP address, and modifies the packet type to convert the Key-Value packet into an IP packet. Forward to the corresponding IP disk for storage.
  • the recording manner of the correspondence between the partition and the IP disk may be a correspondence relationship between the recording component area number (also referred to as a partition ID) and the IP disk address.
  • the partition belongs to the IP disk; and another way of expression is: the IP disk has a partition.
  • Partitioning is a logical concept.
  • a partition does not have storage space.
  • the actual data is stored in an IP disk.
  • each partition has a corresponding IP disk.
  • the IP disk corresponding to the partition can be obtained by querying the partition view for real storage. From the user's point of view, the data seems to be stored in the partition. Therefore, sometimes we also refer to the process of the IP disk corresponding to the data storage partition as data storage partition.
  • the controller 15 is configured to calculate the total number of partitions, obtain IP disk information, establish a correspondence between the IP disk and the partition, and update the corresponding relationship to each management server in time. This correspondence is also called a partitioned view.
  • each IP disk corresponds to Too many partitions.
  • each storage disk has 100 partitions; but in the case of 75 storage disks, each storage disk has 33333 partitions. This number of partitions is not suitable because it causes at least the following problems.
  • the management server uses this partition view to forward IP packets, it needs to retrieve the corresponding relationship in the partition view.
  • the controller needs to publish the partitioned view to each management server, which takes up a large bandwidth. Therefore, it takes up too much processing resources of the management server and bandwidth resources of the storage system.
  • each partition will have a copy partition on other IP disks, and the partition and its copy partitions are located on different IP disks, but the corresponding storage has the same data, which is called multiple copies.
  • the B partition and the C partition are the copy partitions of the A partition, when a packet is stored in the IP disk where the A partition is located, a copy of the packet is simultaneously stored in the IP disk where the B partition is located, and the C partition is located. IP disk.
  • the embodiment of the present invention introduces the concept of a parent partition and a final partition, and the total number of final partitions is constant, similar to the partition in the prior art.
  • the parent partition can be split into multiple child partitions. If the child partition can continue to be split into new child partitions, then this child partition is the parent partition of the next-level child partition. As the number of IP disks in the cluster increases, the splitting process can continue until the split becomes the final partition. At this point, the number of partitions owned by the entire cluster reaches the total number of final partitions.
  • partitions managed by the user and the storage system are the same, and the number of partitions remains unchanged regardless of the number of IP disks in the cluster.
  • the user still sees the number of final partitions.
  • the storage system uses the parent partition for management, that is, the partition view and packet forwarding are processed using the parent partition.
  • partitions refer to partitions managed by the storage system, unless otherwise specified.
  • each parent partition is split into 2 sub-partitions, then each IP disk still corresponds to 100 partitions. If each partition has one copy, then when an IP disk fails, the affected others The number of IP disks will be limited to 100.
  • the present invention can realize the customization of the number of partitions, so that the number of partitions owned by each IP disk can be controlled. Based on the method provided by the embodiment of the present invention, when the controller sends the partition view to each management server in the management server cluster, the occupied bandwidth is smaller. Moreover, after the entry of the corresponding relationship is reduced, the management server is faster in querying the storage disk address corresponding to the key.
  • the data sequentially passes through the management server 21 and the switching device cluster 3 to reach the IP disk.
  • the IP disk can use the disk or the flash memory as the storage medium, and provides a Key-Value interface on the software.
  • the hardware provides an Ethernet interface to obtain an IP packet after decompressing the Ethernet frame received through the Ethernet interface.
  • IP disks such as Seagate's Kinetic products.
  • the memory storing the Key-Value data is collectively referred to as a storage disk in the embodiment of the present invention.
  • the storage disk may also use other storage devices that support the Key-Value interface, using a non-Ethernet interface on the hardware.
  • the storage medium used by the storage disk may be a hard disk or a flash memory.
  • FIG. 2 is a diagram showing a partition management method according to an embodiment of the present invention.
  • the method of executing the method by the controller includes the following steps.
  • Step 21 According to the maximum capacity of the IP disk in the cluster, determine the number of final partitions L, where L is a natural number. And based on the initial number of IP disks, determine the initial number of partitions owned by each IP disk. This step is a pre-step that is executed when the controller is initialized and is optional. And each initial partition corresponds to a storage disk, and the corresponding relationship is recorded in the partition view.
  • the number of initial IP disks will be recorded as the current number of IP disks, and the number of initial partitions owned by each IP disk will be recorded as the current number of partitions P.
  • the controller also records the final partition number L. If the number of IP disks in the cluster changes or the number of partitions of each IP disk changes, the current IP disk number M and the current partition number P are updated.
  • the symbol " ⁇ " indicates a product.
  • the number of final partitions is constant, cannot be split, and can be perceived by the user.
  • the initial partition is usually the parent partition.
  • the parent partition is a partition that can be split, splitting to generate the next-level parent partition or generating the final Partition, the parent partition is used by the storage system, and the user cannot perceive the existence of the parent partition.
  • the current partition refers to a partition used by the management server at the current moment, and is issued by the controller to the management server. If the partition split is performed, the current partition is the partition after the partition is split.
  • the current partition can be a parent partition or a final partition.
  • the initial number of partitions can be set by the user or automatically by the system.
  • the number of partitions owned by each IP disk in the storage system can be freely set, and the number of partitions can be set to meet user requirements, but not occupying storage resources, computing resources, and bandwidth.
  • the value of a resource such as a resource.
  • Each initial partition corresponds to a final partition.
  • Each initial partition has an ID, and the ID number is an integer greater than or equal to zero.
  • Each final partition has an ID, and the number is an integer greater than or equal to zero.
  • the method for obtaining the final partition corresponding to the initial partition is: modulo the total number of initial partitions by using the ID of the final partition, and the value of the remainder indicates the initial partition ID corresponding to the final partition.
  • the initial partition is now the current partition.
  • Step 22 When it is detected that there are N new IP disks ready to join the IP disk cluster, obtain the current storage disk number M in the cluster, and the total number of existing Ts in the cluster, where M, N, and T are Natural number.
  • the controller is connected to the IP disk cluster, so the new IP disk ready to join the cluster can be detected by the controller.
  • M IP disks There are currently M IP disks, and the current total number of partitions is M ⁇ P. These N IPs
  • the disk is physically connected to the cluster and can be detected by the controller, but the partition has not yet been allocated, so data cannot be stored yet.
  • current refers to the moment when the step is to be performed.
  • the IP disk has not been added after the controller is initialized, so the current number of IP disks is M.
  • the current number of IP disks is not M. If the partition has been split before performing this step, then the current number of initial partitions per IP disk is larger than P. T partitions are roughly distributed equally among M IP disks.
  • Step 23 Determine whether the current number of partitions meets the common needs of the current IP disk and the newly added IP disk. That is, it is judged whether the mathematical relationship between the total number of partitions T and the total number of storage disks M+N satisfies the first predetermined condition. Specifically, the determination can be made in a manner in which the formula M ⁇ P/(M+N) is compared with the first threshold.
  • the first threshold is a natural number.
  • the first threshold may be an integer greater than 10 and less than 20, such as 16, 17, 18 or 19.
  • One method of judging is: if T/(M+N) is less than the first threshold, the first preset condition is met, and splitting is required.
  • the average number of partitions owned by each IP disk is less than (or can be set to less than or equal to) the split threshold, indicating that if there is too little partition owned by each IP disk without splitting, it is necessary to increase the total number of partitions by splitting, then step 24 is performed. Otherwise, go to step 25.
  • Another method of judging is: if the partition is split once, the average number of partitions per IP disk is greater than (or can be set to be greater than or equal to) a certain threshold, indicating that if splitting, there are too many partitions for each IP disk. Then, step 25 is performed; otherwise, step 24 is performed.
  • the two judgment methods can be combined to judge and select a scheme that is most satisfactory to the business.
  • the first method of judging is taken as an example to introduce the present invention.
  • Step 24 Splitting at least one current partition, which can be split once or split Multiple times, until the number of partitions after the split meets the requirements, then go to step 26.
  • the number of partitions after the split meets the requirements, and the mathematical relationship between the total number of partitions S after splitting and the total number of storage disks M+N satisfies the second predetermined condition.
  • the total number of partitions after splitting is not greater than the number of final partitions L.
  • Splitting multiple times means that the division resulting from the split is split again several times.
  • the mathematical relationship between the total number of partitions S after splitting and the total number of storage disks M+N satisfies the second predetermined condition.
  • S/(M+N) is greater than or equal to a second threshold
  • the second threshold is a natural number.
  • the second threshold may for example be a natural number greater than 25 and less than 50, such as 25, 26, 27, 48 or 49.
  • the mathematical relationship between the total number of partitions S after splitting and the total number of storage disks M+N satisfies the second predetermined condition to stop splitting. Specifically, for example, if the average number of partitions owned by each IP disk after splitting is greater than or equal to the second threshold, it is counted as satisfying the second preset condition, and the splitting is stopped. Or, after splitting, the average number of partitions owned by each IP disk satisfies a predetermined threshold range to meet the requirements, even if it meets the requirements.
  • each partition is split into multiple sub-partitions, and the number of sub-partitions split into each partition is the same. From the first split until the end of the split, the multiple of the partition change is called the splitting factor. Assuming each split, each parent partition is split into 2 child partitions. Then, if the split is ended once the split is performed, the splitting coefficient is 2. If the split ends after 2 splits, the splitting factor is 2 2 .
  • the former splitting mode splits faster; the latter splitting mode makes the total number of split partitions more scalable. For the convenience of description, the previous embodiment of the present invention is described in a split mode.
  • the 288 partitions in the 512 current partitions can be split according to the splitting coefficient 2, so that the total number of the current partitions after the split is exactly 800. Applying this latter splitting method, only one partition can be split at a time. If the splitting coefficient is 2, then the difference between the total number of partitions before and after the split is 1, so that the granularity of partition splitting is the smallest.
  • the splitting coefficient can also be changed.
  • the splitting factor of 2 is used, but in the second split, the splitting factor is changed to 5. It is also possible to achieve the purpose of making the value of the total number of partitions obtained by the split easier to adjust.
  • the current partition has a corresponding relationship with the final partition. This correspondence can be stored in the controller and can be issued by the controller to each management server.
  • Each current partition has an ID, and the current partition ID may be an integer greater than or equal to zero.
  • There are a total of T in the current partition and the IDs of the T current partitions form an arithmetic progression in which the first item is 0, the tolerance is 1, and the number of items is T.
  • Each final partition has an ID, and the final partition ID may be an integer greater than or equal to zero.
  • the number of entries is S. For example, 12 partitions are divided into 24 partitions, and the partition IDs before the split are 0, 1, 2, 3, ..., 9, 10, 11; the partition IDs after the split are 0, 1, 2, 3, ..., 21, 22, 23.
  • the partition ID generation rule after partition splitting can be like this: in the split partition, one partition retains the original partition ID, and the remaining partition IDs form an arithmetic progression with the value of the original partition ID, and the number of items in the series is incremented, and the tolerance is before splitting.
  • the total number of partitions M For example, there are a total of 200 partitions before the split, and each partition is split into 3 after splitting, and 3 points are generated after splitting the partition with ID 21
  • the IDs of the zones are: 221, 421, and 621. This ID generation rule can be changed. As long as the entire splitting process ends, the ID of the current partition is still an increasing number of arithmetic progressions with a first term of 0 and a tolerance of 1.
  • the partition IDs whose IDs are 0 may be 0, 201, and 202; the partition IDs obtained by splitting the IDs of 1 are 1, 203, 204.
  • the partition ID obtained after the partition with the ID of 3 is 3, 205, and 206; the IDs of the remaining partitions are deduced by analogy.
  • Step 25 Perform partition migration, and migrate some of the partitions owned by the original M IP disks to the newly added IP addresses, so that the M ⁇ P partitions are evenly distributed among the M+N IP disks. After performing step 25, the total number of partitions in the entire system is unchanged, and the average number of partitions owned by each IP disk is reduced. Steps 24 and 25 are performed one by one. In a complete partition management method embodiment, after step 24 is performed, step 25 or step 26 is no longer performed.
  • Step 26 Update the total number of current IP disks recorded in the controller to M+N, and update the total number of current partitions to S. This step can also be performed together with step 24.
  • the current partition number of each IP disk is roughly S/(M+N). Therefore, the total number of partitions S may not be recorded, but the current number of partitions of each IP disk in the cluster is roughly S/(M+). N).
  • Step 26 is to prepare for the next split, so it is not a necessary step for this partition management operation.
  • the S current partitions are allocated to M+N IP disks.
  • the operation of updating the partitioned view may also be performed.
  • the partitioned view records the IP disk corresponding to the current partition. Specifically, it may be the correspondence between the current partition ID and the IP disk address.
  • the operation of updating the partitioned view may be performed at step 24 or at step 26. Subsequent data processing methods can be updated in the partitioned view described in this step.
  • the partition since the partition can be used after the corresponding relationship between the current partition and the IP disk is correctly recorded, in other embodiments of the present invention, when the current partition changes, the partition view update operation needs to be performed.
  • the controller 3 includes an interface 31, a processor 32, and a storage medium 33.
  • the interface 31 is configured to provide an external interface, such as a storage disk cluster and a management server.
  • the storage medium 33 is used to store computer program code.
  • the processor 32 executes the above-described partition management method by running program code in the storage medium 33.
  • an embodiment of the present invention further provides a partition management apparatus 4.
  • the partition management apparatus 4 may be hardware or virtual hardware formed by software.
  • the partition management device 4 can execute the above-described partition management method.
  • the partition management device 4 includes a storage disk detecting module 41, a first predetermined condition determining module 42, and a partition splitting module 43.
  • the partition management device 4 may further include an initialization module 40.
  • the partition management device 4 may further include an update module 44.
  • the initialization module 40 is configured to determine the final partition number L according to the maximum capacity of the IP disk in the cluster, where L is a natural number. And based on the initial number of IP disks, determine the initial number of partitions owned by each IP disk. This step is a pre-step and is only performed when the controller is initialized, so it is optional.
  • the update module 44 also records the final partition number L. If the number of IP disks in the cluster changes or the number of partitions of each IP disk changes, the current IP disk number M and the current partition number P are updated.
  • the symbol " ⁇ " indicates a product.
  • the number of final partitions is constant, cannot be split, and can be perceived by the user.
  • the initial partition is usually the parent partition.
  • the parent partition is a partition that can be split. The split generates the next-level parent partition or generates the final partition.
  • the parent partition is used by the storage system, and the user cannot perceive the existence of the parent partition.
  • the number of initial partitions can be set by the user or automatically by the system.
  • the number of partitions owned by each IP disk in the storage system can be freely set, and the number of partitions can be set to meet user requirements, but not occupying storage resources, computing resources, and bandwidth.
  • the value of a resource such as a resource.
  • the storage disk detection module 41 is configured to acquire, when the N new storage disks are ready to join the cluster, the number of the current storage disks in the cluster, and the total number of existing Ts in the cluster, where M , N and T are natural numbers.
  • the partition management device 4 is connected to the IP disk cluster, so that the IP disk newly added to the cluster can be detected by the storage disk detecting module 41.
  • IP disk cluster There are currently M IP disks, and the current total number of partitions is M ⁇ P.
  • the N IP disks are physically connected to the cluster and can be detected by the storage disk detection module 41, but the partitions have not yet been allocated, so data cannot be stored yet.
  • “current” refers to the moment when an operation is ready to be performed.
  • the partition management device 4 has not added the IP disk after the initialization, so the current number of IP disks is M. In other embodiments, if the number of IP disks in the IP disk cluster changes before the step is performed, the current number of IP disks is not M. If the partition has been split before performing this step, then the current number of initial partitions per IP disk is larger than P. T partitions are roughly distributed equally among M IP disks.
  • the first predetermined condition determining module 42 is configured to determine whether the mathematical relationship between the total number of partitions T and the total number of storage disks M+N satisfies the first predetermined condition.
  • the first threshold may be an integer greater than 10 and less than 20, such as 16, 17, 18, 19 or 20.
  • One method of judging is: if T/(M+N) is smaller than the first threshold, it is split, and the first preset condition is satisfied, and splitting is required.
  • the average number of partitions owned by each IP disk is less than (or may be set to be greater than or equal to) the split threshold, indicating that if there are too few partitions owned by each IP disk without splitting, then step 24 is performed; otherwise, step 25 is performed.
  • Another method of judging is: if the partition is split once, the average number of partitions per IP disk is greater than (or can be set to be greater than or equal to) a certain threshold, indicating that if splitting, there are too many partitions for each IP disk. Then, step 25 is performed; otherwise, step 24 is performed.
  • the two judgment methods can be combined to judge and select a scheme that is most satisfactory to the business.
  • the first method of judging is taken as an example to introduce the present invention.
  • the partition splitting module 43 if it is determined by the first predetermined condition determining module 42 that the first predetermined condition is met, the partition splitting module 43 is configured to split at least one of the current partitions, so that the total number of partitions after the split is S, and assign the split partition to M+N storage Storage tray.
  • the mathematical relationship between the total number of partitions S after splitting and the total number of storage disks M+N satisfies a second predetermined condition, and the total number of partitions after splitting is not greater than the total number of final partitions L supported by the cluster, where L and S are both natural numbers greater than one. .
  • Assigning the split partition to M+N storage disks can be evenly distributed, and can be evenly distributed as evenly as possible.
  • Splitting at least one current partition may be split once or split multiple times until the number of partitions after the split meets the requirements.
  • the mathematical relationship between the total number of partitions S after splitting and the total number of storage disks M+N satisfies the second predetermined condition. Specifically, S/(M+N) is greater than or equal to a second threshold, and the second threshold is a natural number.
  • the second threshold may for example be a natural number greater than 25 and less than 50, such as 47, 48, 49 or 50.
  • the mathematical relationship between the total number of partitions S after splitting and the total number of storage disks M+N satisfies the second predetermined condition to stop splitting. Specifically, for example, if the average number of partitions owned by each IP disk after splitting is greater than or equal to the second threshold, it is counted as satisfying the second preset condition, and the splitting is stopped. Or, after splitting, the average number of partitions owned by each IP disk satisfies a predetermined threshold range to meet the requirements, even if it meets the requirements.
  • each partition is split into multiple sub-partitions, and the number of sub-partitions split into each partition is the same. From the first split until the end of the split, the multiple of the partition change is called the splitting factor. Assuming each split, each parent partition is split into 2 child partitions. Then, if the split is ended once the split is performed, the splitting coefficient is 2. If the split ends after 2 splits, the splitting factor is 2 2 .
  • the former splitting mode splits faster; the latter splitting mode makes the total number of split partitions more scalable.
  • the following is an example of the latter splitting method: Assume that the total number of final partitions is 1000, and the total number of partitions is 512, and the splitting coefficient is 2. If splitting is performed for each partition, then 1024 partitions will be obtained after splitting. The total number of partitions. It is not allowed because the total number of partitions currently is greater than the total number of final partitions. In order to avoid this, only 488 current partitions can be split, 488+512 1000, which means that the total number of current partitions after splitting will just reach 1000, and there is no value greater than the total number of final partitions. In addition, if the user thinks that the number of 1000 current partitions is too large, for example, the user thinks that the total number of current partitions after the split reaches 800 is most suitable.
  • the 288 partitions in the 512 current partitions can be split according to the splitting coefficient 2, so that the total number of the current partitions after the split is exactly 800. Applying this latter splitting method, only one partition can be split at a time. If the splitting coefficient is 2, then the difference between the total number of partitions before and after the split is 1, so that the granularity of partition splitting is the smallest.
  • the splitting coefficient can also be changed.
  • the splitting factor of 2 is used, but in the second split, the splitting factor is changed to 5. It is also possible to achieve the purpose of making the value of the total number of partitions obtained by the split easier to adjust.
  • the current partition has a corresponding relationship with the final partition. This correspondence may be stored by the update module 44 or may be issued by the update module 44 to the data processing apparatus.
  • Each current partition has an ID, and the current partition ID may be an integer greater than or equal to zero.
  • the IDs of all current partitions form an arithmetic progression column with a first term of 0 and a tolerance of 1.
  • Each final partition has an ID, and the final partition ID may be an integer greater than or equal to zero.
  • the IDs of all the final partitions form an arithmetic progression in which the first term is 0 and the difference is 1. For example, 12 partitions are divided into 24 partitions, and the partition IDs before the split are 0, 1, 2, 3, ..., 9, 10, 11; the partition IDs after the split are 0, 1, 2, 3, ..., 21, 22, 23.
  • the partition ID generation rule after the partition splitting may be like this: in the split partition, one partition retains the original partition ID, and the values of the remaining partition IDs form an arithmetic progression with the value of the original partition ID, and the increments in the arithmetic series are incremented. Is the total number of partitions before the split M. For example, there were a total of 200 before the split. After partitioning, each partition is split into three, and the IDs of the three partitions generated after splitting the partition with ID 21 are: 221, 421, and 621. This ID generation rule can be changed. As long as the entire splitting process ends, the ID of the current partition is still an increasing number of arithmetic progressions with a first term of 0 and a tolerance of 1.
  • the update module 44 is configured to update the total number of current IP disks recorded in the partition management device 4 to M+N after the operation of the partition splitting module 43 is completed, and update the total number of current partitions to S.
  • the current partition number of each IP disk is roughly S/(M+N). Therefore, the total number of partitions S may not be recorded, but the current number of partitions of each IP disk in the cluster is roughly S/(M+). N).
  • the partition splitting module 43 or the update module 44 may also perform the operation of the partitioned view.
  • the partitioned view records the IP disk corresponding to the current partition, for example, the partition ID of the current partition and the IP address address of the corresponding IP disk. Correspondence relationship. That is to say, the partition view records: which of the current S partitions corresponds to which IP disk of each of the M+N IP disks. Subsequent data processing devices may use the updated partitioned view.
  • a partition migration module (not shown) may be further included. If the partition splitting is not performed, the partition migration module may be used to perform partition migration, and the partitions owned by the original M IP disks are migrated to a new one. Among the increased N IPs, M ⁇ P partitions are evenly distributed among M+N IP disks.
  • the present invention further provides a data processing method, which is applied to a partition management device.
  • a partition view is stored in the partition management device, and the partition view records a correspondence between a current partition ID and a storage disk (for example, an IP disk) address.
  • the data processing method is executed after the partition management method, based on the partition management method, but the two have relative independence.
  • the partition management device is connected to the controller.
  • the partition management device is, for example, a management server or a cluster of switching devices. The following is an example of a management server.
  • the data processing method embodiment may be executed based on the partition view provided by the partition management method described above, and the partition view is generated by the controller and sent to each partition management device in the cluster of the partition management device for storage.
  • Step 51 Generate Key-Value data according to the data to be written.
  • the data to be written is divided into a set containing a value of Value, and a Key of the Value is generated to form Key-Value data, and Key-Value is a combination of Values corresponding to Key and Key. Since a data to be written can be split into multiple Values, multiple Key-Values are generated accordingly. For convenience of description, the subsequent steps only describe the processing of one of the specific Key-Values.
  • the data to be written comes from the application server, such as a file or a data stream.
  • the management server can split the data for storage convenience. For example, it can be split into equal-sized data slices of size 1MB, and each slice is called a Value. Key is used to uniquely mark a Value. Therefore, the Keys of different Values are different. For example, you can use "data file name + number number" as a Key for Value. For smaller data, you can directly generate its Key without splitting, and then form Key-Value data. In some special scenarios, large-sized data can be directly formed into corresponding Key-Value data without being split, and then sent to an IP disk for storage.
  • Step 52 Obtain a value Key in the key-value Key-Value data, and calculate a final partition ID corresponding to the Key-Value data according to the Key.
  • the Key-Value data includes a value Value and a Key uniquely corresponding to the Value.
  • a method for calculating a final partition ID is to hash the key to obtain a hash value of the key, and the hash value is modulo according to the total number of final partitions L, and the remainder is used as the final partition ID.
  • L is a natural number greater than or equal to 2.
  • the final partition ID thus obtained is a number.
  • an equivalent transformation is to map this number to another marker, for example to an English number. Use this English number as the final partition ID.
  • the initial partition and the current partition can also be represented by English numbers.
  • the English number can be remapped to a number, and the number obtained by the mapping is counted using the "modulo" method like the digital partition ID. .
  • the concept of the final partition can be found in the description of the previous partition management method embodiment.
  • Step 53 Calculate a current partition ID corresponding to the final partition ID, where each current partition ID corresponds to multiple final partition IDs.
  • An algorithm for calculating the current partition of the final partition is to modulo the final partition ID according to the total number of partitions T, and the remainder is used as the current partition ID.
  • the current total number of partitions T is a natural number.
  • the current partition ID is a number, and an equivalent transformation maps this number to another token as the current partition ID.
  • the current partition refers to the partition owned by the IP disk cluster at the current moment, and each current partition corresponds to an IP disk. See the description of the current partition in the previous partition management method embodiment.
  • the final partition is the child partition of the current partition.
  • the current partition ID and the multiple final partition IDs have a corresponding relationship, and the corresponding relationship may refer to the partition management method embodiment.
  • the current partition and the final partition have a corresponding relationship, and the corresponding relationship may be stored in the controller, and the corresponding relationship is read when step 53 is performed.
  • the correspondence may not be pre-stored.
  • the correspondence is obtained according to an algorithm.
  • Each current partition has an ID, and the current partition ID may be an integer greater than or equal to.
  • the set of IDs of all current partitions form an arithmetic progression column with a first term of 0 and a tolerance of 1.
  • Each final partition has an ID, and the final partition ID can be an integer greater than or equal to.
  • the set of IDs of all the final partitions form an arithmetic progression column in which the first term is 0 and the difference is 1.
  • the method of obtaining the final partition corresponding to the current partition is: using the final partition ID to the current
  • the total number of partitions is modulo, and the value of the remainder is the current partition ID corresponding to this final partition.
  • the ID generation rule can be like this: in the split partition, one partition retains the original partition ID, and the values of the remaining partition IDs form an arithmetic progression with the value of the original partition ID, and the numbers in the series are incremented, and the tolerance is The total number of partitions before splitting. For example, there are a total of 200 partitions before splitting, and each partition is split into three after splitting. The IDs of the three partitions generated after splitting the partition with ID 21 are: 221, 421, and 621. This ID generation rule can be changed. As long as the entire splitting process ends, the ID of the current partition is still an increasing number of arithmetic progressions with a first term of 0 and a tolerance of 1.
  • splitting a partition with an ID of 0, the split partition IDs may also be 0, 201, and 202; the partition ID obtained by dividing the partition with ID 1 is 1, 203, 204; the partition ID obtained after the partition of the ID is 3 is 3, 205, 206; the IDs of the remaining partitions are deduced by analogy.
  • Step 54 Query the partitioned view to obtain a storage disk address corresponding to the current partition ID.
  • a partition view is stored in the partition management device, and the partition view records a correspondence between a current partition ID and a storage disk address. If the storage disk is an IP disk, the storage disk address can be an IP address. If the storage disk is based on other types of protocols, such as ATM or IPX protocols, then the storage disk address is the ATM address, or the IPX address.
  • Step 55 Generate a Key-Value packet with the storage disk address as the destination address, and send the Key-Value packet to the storage disk.
  • the payload of the Key-Value packet carries the Key-Value. data.
  • the storage disk After receiving the Key-Value packet, the storage disk stores the Key-Value data.
  • the data processing includes: a memory 61 configured to store a partitioned view, wherein the partitioned view records a correspondence between a current partition ID and a storage disk address; Is configured to provide an external connection; a computer readable medium 63 configured to store a computer program; a processor 64, and the memory 61, interface 62, A computer readable medium connection 63 is configured to perform the data processing method described above by running the program. For example, the following steps are included.
  • the system includes a storage module 71, a final partition calculation module 6, a current partition calculation module 73, a query module 74, and a sending module 75.
  • a Key-Value data generating module 76 may also be included.
  • the storage module 71 is configured to store a partition view, where the partition view records a correspondence between a current partition ID and a storage disk address.
  • the storage medium used by the storage module 71 may be a flash memory or a hard disk.
  • the partitioned view of storage module 71 is from a partition management device, such as partition management device 4 in FIG.
  • the storage module 71 can be connected to the partition management device 4 to receive the partition view.
  • the final partition calculation module 72 is configured to obtain a value Key in the key-value Key-Value data, and calculate a final partition ID corresponding to the Key-Value data according to the Key, where the Key-Value data includes a value Value and The Key that uniquely corresponds to the Value.
  • the method of calculating the final partition ID by the final partition calculation module 72 is: performing a hash operation on the key to obtain a hash value of the key, and modulo the hash value according to the total number of final partitions L, and the remainder As the final partition ID.
  • L is a natural number greater than or equal to 2.
  • the end of this The partition ID is a number.
  • an equivalent transformation is to map this number to another token, such as an English number. Use this English number as the final partition ID.
  • the initial partition and the current partition can also be represented by English numbers.
  • the English number can be remapped to a number, and the number obtained by the mapping is counted using the "modulo" method like the digital partition ID. .
  • the concept of the final partition can be found in the description of the previous partition management method embodiment.
  • the current partition calculation module 73 is configured to calculate a current partition ID corresponding to the final partition ID, where each current partition ID corresponds to multiple final partition IDs.
  • An algorithm for calculating the current partition of the final partition by the current partition calculation module 73 is: modulo the final partition ID according to the current total number of partitions T, and the remainder is used as the current partition ID.
  • the current total number of partitions T is a natural number.
  • the current partition ID is a number, and an equivalent transformation maps this number to another token as the current partition ID.
  • the current partition refers to the partition owned by the IP disk cluster at the current moment, and each current partition corresponds to an IP disk. See the description of the current partition in the previous partition management method embodiment.
  • the final partition is the child partition of the current partition.
  • the current partition ID and the plurality of final partition IDs have a corresponding relationship, and the corresponding relationship can be referred to the partition management device embodiment.
  • the partition management device generates a correspondence and then issues the information to each data processing device.
  • the current partition and the final partition have a corresponding relationship.
  • the corresponding relationship may be stored in the current partition calculation module 73, or the corresponding relationship may not be pre-stored, and the current partition calculation module 73 calculates and obtains the corresponding relationship.
  • Each current partition has an ID, and the current partition ID may be an integer greater than or equal to.
  • the set of IDs of all current partitions may form an increasing number of arithmetic progressions with a first term of 0 and a tolerance of one.
  • Each final partition has an ID, and the final partition ID can be an integer greater than or equal to.
  • the set of IDs of all the final partitions form an incremental progression sequence in which the first term is 0 and the difference is 1.
  • the method for obtaining the final partition corresponding to the current partition is: modulo the total number of initial partitions by using the ID of the current partition, and the value of the remainder indicates the initial partition ID corresponding to the current region. For example, 12 partitions into 24 partitions, and the partition IDs before the split are 0, 1, 2, 3, ..., 9, 10, 11; The area IDs are 0, 1, 2, 3, ..., 21, 22, and 23, respectively.
  • the ID generation rule can be like this: in the split partition, one partition retains the original partition ID, and the values of the remaining partition IDs form an arithmetic progression with the value of the original partition ID, and the numbers in the series are incremented, and the tolerance is The total number of partitions before splitting. For example, there are a total of 200 partitions before splitting, and each partition is split into three after splitting. The IDs of the three partitions generated after splitting the partition with ID 21 are: 221, 421, and 621. This ID generation rule can be changed. As long as the entire splitting process ends, the ID of the current partition is still an increasing number of arithmetic progressions with a first term of 0 and a tolerance of 1.
  • splitting a partition with an ID of 0, the split partition IDs may also be 0, 201, and 202; the partition ID obtained by dividing the partition with ID 1 is 1, 203, 204; the partition ID obtained after the partition of the ID is 3 is 3, 205, 206; the IDs of the remaining partitions are deduced by analogy.
  • the querying module 74 is configured to query the partitioned view stored by the storage module 71 to obtain a storage disk address corresponding to the current partition ID.
  • the partition view records the correspondence between the current partition ID and the storage disk address. If the storage disk is an IP disk, the storage disk address can be an IP address. If the storage disk is based on other types of protocols, such as ATM or IPX protocols, then the storage disk address is the ATM address, or the IPX address.
  • the sending module 75 is configured to generate a Key-Value packet by using the storage disk address as a destination address, and send the Key-Value packet to the storage disk, where the Key-Value packet carries the Key-Value data. .
  • the storage disk is configured to receive the Key-Value packet through the switch cluster, and then store the Key-Value data.
  • the Key-Value data generating module 76 is configured to generate Key-Value data.
  • the data to be written is divided into a set containing a value of Value, and the Key of the Value is generated to form Key-Value data, and Key-Value is a combination of Values corresponding to Key and Key. Since a data to be written can be split into multiple Values, a corresponding multiple Key-Value is generated, which is convenient for description.
  • the embodiment of the invention only describes the processing mode of a specific Key-Value.
  • the data to be written comes from the application server, such as a file or a data stream.
  • the management server can split the data for storage convenience. For example, it can be split into equal-sized data slices of size 1MB, and each slice is called a Value. Key is used to uniquely mark a Value. Therefore, the Keys of different Values are different. For example, you can use "data file name + number number" as a Key for Value. For data with a small size, you can directly generate this Key-Value data as a Value without splitting. In some special scenarios, large-sized data can also be generated as Key-Value data directly without using split.
  • the final partition computing module 72 can be connected to the Key-Value data generating module 76. If the data processing device 7 does not have the Key-Value data generating module 76, then the final partition The calculation module 72 can obtain Key-Value data directly through the external interface into the Uno application server.
  • aspects of the invention, or possible implementations of various aspects may be embodied as a system, method, or computer program product.
  • aspects of the invention, or possible implementations of various aspects may be in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, etc.), or a combination of software and hardware aspects, They are collectively referred to herein as "circuits," “modules,” or “systems.”
  • aspects of the invention, or possible implementations of various aspects may take the form of a computer program product, which is a computer readable program code stored in a computer readable medium.
  • the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
  • the computer readable storage medium includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erase programmable read-only memory (EPROM or flash memory), optical fiber, portable read-only memory (CD-ROM).
  • a processor in a computer reads a computer readable program stored in a computer readable medium
  • the code enables the processor to perform the functional actions specified in each step or combination of steps in the flowchart; generating means for implementing the functional actions specified in each block of the block diagram or in a combination of blocks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种数据处理技术,应用于分区管理设备,所述分区管理设备中存储有分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系,当前分区总数可以小于终分区总数,使用该技术可以用当前分区对Key-Value数据进行数据转发,减小了分区视图的复杂度。

Description

数据处理方法和装置 技术领域
本申请涉及一种数据处理技术,特别有关于一种数据处理方法和装置。
背景技术
随着社会的发展,需要被存储和管理的数据的规模越来越多,甚至被称为海量数据。用传统的集中存储管理超大规模数据时,难以提供高效的读写操作,难以满足良好的扩展性以及高可用性。
在这种背景下,出现了由多个物理存储节点组成的存储系统,每个存储节点都可以提供存储空间,这种存储方式称为分布式存储。一种分布式存储方式被称为键-值(Key-Value)存储,在Key-Value存储中,被存储的数据(或者数据分片)被称为值(Value),每个数据拥有一个在整个存储系统范围内唯一的标识,这个标识就是键(Key),Key和Value一一对应。
Key和Key所对应的Value作为整体称为Key-Value,简称K-V。每一个Key-Value存储在存储系统的一个存储盘中。在分布式哈希表(Distributed hash table,DHT)技术中,对某个具体的Key-Value而言,由哪个存储盘对其进行存储可以由一个映射规则确定。这个映射规则基于对Key进行哈希(hash)运算生成的哈希(Hash)值,每个哈希值属于一个分区,分区和存储盘对应,从而使得每个Key-Value和一个存储盘对应。基于这种方法,如果两个不同的key计算出的hash值相同,那么这两个Key对应的Key-Value存储在同一个存储盘上。分区和存储盘的对应关系被称为分区视图。
在现有技术中,按照DHT技术,根据Key计算出的Hash值,例如落入在[0,2^32-1]的整数区间内,在系统初始化的时候,对这个大范围的整数区间进行分段,每个分段大小相等或近似相等,这样的一个分段就是一个分区(Partition),每个分区内的哈希值的数量基本一样。在存储盘集群的 存储盘数量较少时,每个存储盘拥有过多的分区,导致分区视图过于复杂,在按照分区视图提供的转发数据包时效率低下,具体举例如下。
假设有一个存储盘集群最大支持25,000个存储盘的集群,存储盘数量最大化时每个存储盘大致拥有100个分区,也就是说整个集群一共拥有2,500,000个分区。假设每个分区的信息占用4个比特的存储空间,那么这些分区信息总共会占用10MB的存储空间,分区视图信息大于10MB。
在使用分区视图时,占用了大量的系统资源。
发明内容
本发明提供一种数据处理方法和装置,可以减少处理Key-Value数据时,对系统资源的占用。
第一方面,本发明提供一种数据处理方法,应用于分区管理设备中,所述分区管理设备中存储有分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系,该方法包括:获得键-值Key-Value数据中的值Key,根据所述Key计算所述Key-Value数据对应的终分区ID,其中,所述Key-Value数据包括值Value以及与所述Value唯一对应的Key;计算所述终分区ID对应的当前分区ID,其中,每个当前分区ID对应多个终分区ID;查询所述分区视图,获得当前分区ID对应的存储盘地址;以所述存储盘地址作为目的地址生成Key-Value报文,发送所述Key-Value报文给所述存储盘,所述Key-Value报文携带所述Key-Value数据。
第二方面,本发明提供一种数据处理装置,该装置包括:存储模块,用于存储分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系;终分区计算模块,用于获得键-值Key-Value数据中的值Key,根据所述Key计算所述Key-Value数据对应的终分区ID,其中,所述Key-Value数据包括值Value以及与所述Value唯一对应的Key;当前分区计算模块, 用于计算所述终分区ID对应的当前分区ID,其中,每个当前分区ID对应多个终分区ID;查询模块,用于查询所述存储模块所存储的所述分区视图,获得当前分区ID对应的存储盘地址;发送模块,用于以所述存储盘地址作为目的地址生成Key-Value报文,发送所述Key-Value报文给所述存储盘,所述Key-Value报文携带所述Key-Value数据。
第三方面:本发明提供一种数据处理设备,所述数据处理设备包括:存储器,被配置为存储有分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系;接口,被配置为用于提供对外连接;计算机可读介质,被配置为用于存储计算机程序;处理器,和所述存储器、接口、计算机可读介质连接,被配置为用于通过运行所述程序,执行以下步骤:获得键-值Key-Value数据中的值Key,根据所述Key计算所述Key-Value数据对应的终分区ID,其中,所述Key-Value数据包括值Value以及与所述Value唯一对应的Key;计算所述终分区ID对应的当前分区ID,其中,每个当前分区ID对应多个终分区ID;查询所述分区视图,获得当前分区ID对应的存储盘地址;以所述存储盘地址作为目的地址生成Key-Value报文,从所述接口发送所述Key-Value报文给所述存储盘,所述Key-Value报文携带所述Key-Value数据。
第四方面:本发明提供一种分区管理方法,由控制器执行,所述控制器对集群中的存储盘进行分区管理,所述集群中包括多个存储盘,该方法包括:当检测到有N个新的存储盘准备加入所述集群时,获取所述集群中当前存储盘数量M,以及所述集群中当前已有分区总数T,其中M,N以及T都是自然数;判断当前分区总数T与存储盘总数M+N的数学关系是否满足第一预定条件;如果满足第一预定条件,则对至少一个所述当前分区进行分裂,使得分裂后的分区总数是S,并将分裂后的分区分配给M+N个存储盘,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预 定条件,并且分裂后的分区总数不大于所述集群支持的终分区总数L,其中L、S都是大于1的自然数。
第四方面的第一种实现方式中,还执行更新分区视图的操作,分区视图中记录当前分区和IP盘的对应关系。
第五方面,本发明提供一种分区管理装置,用于对集群中的存储盘进行分区管理,所述集群中包括多个存储盘,该装置包括:存储盘检测模块,用于当检测到有N个新的存储盘准备加入所述集群时,获取所述集群中当前存储盘数量M,以及所述集群中当前已有分区总数T,其中M,N以及T都是自然数;第一预定条件判定模块,用于判断当前分区总数T与存储盘总数M+N的数学关系是否满足第一预定条件;分区分裂模块,用于如果满足第一预定条件,则对至少一个所述当前分区进行分裂,使得分裂后的分区总数是S,并将分裂后的分区分配给M+N个存储盘,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件,并且分裂后的分区总数不大于所述集群支持的终分区总数L,其中L、S都是大于1的自然数。
第五方面的第一种实现方式中,分区分裂模块还用于更新分区视图的操作,分区视图中记录当前分区和IP盘的对应关系。
第六方面,本发明提供一种分区管理设备,和集群连接,用于对集群中的存储盘进行分区管理,所述集群中包括多个存储盘,所述分区管理设备包括:存储器,被配置为存储有分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系;接口,被配置为用于提供对外连接;计算机可读介质,被配置为用于存储计算机程序;处理器,和所述存储器、接口、计算机可读介质连接,被配置为用于通过运行所述程序,执行以下步骤:当通过所述接口检测到有N个新的存储盘准备加入所述集群时,获取所述集群中当前存储盘数量M,以及所述集群中当前已有分区总数T,其 中M,N以及T都是自然数;判断当前分区总数T与存储盘总数M+N的数学关系是否满足第一预定条件;如果满足第一预定条件,则对至少一个所述当前分区进行分裂,使得分裂后的分区总数是S,并将分裂后的分区分配给M+N个存储盘,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件,并且分裂后的分区总数不大于所述集群支持的终分区总数L,其中L、S都是大于1的自然数。
第六方面的第一种实现方式中,处理器还执行更新分区视图的操作,分区视图中记录当前分区和IP盘的对应关系。
应用本发明方案,分区管理设备可以使用当前分区进行Key-Value数据转发,由于当前分区的数目小于终分区,因此相对于现有技术使用终分区进行转发的方案,减小了资源消耗。
附图说明
图1是本发明存储系统实施例拓扑图;
图2是本发明分区管理方法实施例流程图;
图3是本发明控制器实施例结构图;
图4是本发明还提供一种分区管理装置实施例示意图;
图5是本发明数据处理方法实施例流程图;
图6是本发明数据处理设备实施例结构图;
图7是本发明数据处理装置实施例示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是 全部的实施例。基于本发明中的实施例所获得的所有其他实施例,都属于本发明保护的范围。
如图1所示是发明实施例存储系统拓扑图。应用服务器11和管理服务器集群12连接,管理服务器集群12和交换设备集群13连接,交换集群13和IP盘集群14连接,控制器15和管理服务器集群12、IP盘集群14连接。管理服务器集群12由至少一个管理服务器121组成;IP盘集群14由至少一个IP盘141组成。
应用服务器1发出读数据的命令或者写数据的命令。管理服务器121例如是分布式对象池(distribute object pool,DOP),对应用服务器11提供对象接口,这个对象可以很大,例如以GB为基本单位。
如果对象过大,管理服务器121可以把对象拆分成小的分片,例如把对象拆分成1MB大小的分片,每个分片是一个Value;每个Value拥有一个标签,称为Key。管理服务器121可以对Key进行哈希运算,把算出的哈希值和分区对应起来。此外,管理服务器121还可以存储分区视图,分区视图中记录IP盘和分区的对应关系,具体而言,可以是当前分区ID和IP盘的对应关系。因此管理服务器可以查找到每个Key-Value对应的IP盘。查找到的IP盘称为目标盘,管理服务器121可以把目标IP盘的地址作为目的地址,生成IP报文,发送给交换集群13。分区视图记录的内容例如下表1所示,其中i是自然数,m是大于1的自然数。
Figure PCTCN2014090299-appb-000001
表1
交换集群13由至少一个交换机组成,用于在读数据或写数据的时候,在管理服务器121和IP盘141之间的数据交换。查询分区视图的操作也可以挪给交换集群13中的交换机执行,也就是说由交换机集群13中的交换机存储分区视图。在这种场景下,在管理服务器121向交换机集群3发送数据时,可以不使用IP报文,而是使用一种可以称为Key-Value报文的新型报文。Key-Value报文和IP报文其余部分相同,区别仅在于目的地址是由Key计算获得的分区号,并且用报文类型字段区分IP报文和Key-Value报文。由交换机查询自己存储的分区号和IP盘地址的映射表,把目的地址替换成分区所对应的IP盘的地址,并修改报文类型,从而把Key-Value报文转换成IP报文,然后转发给对应的IP盘进行存储。
在管理服务器121中,分区和IP盘的对应关系的记录方式,可以是记录成分区号(也称为分区ID)和IP盘地址的对应关系。为了方便理解,这种对应关系的另外一种表达方式是:分区属于IP盘;还有一种表达方式是:IP盘拥有分区。
分区是一个逻辑概念,分区并不拥有存储空间,实际存储数据的是IP盘。但每个分区有对应的IP盘,在存储数据过程中,通过查询分区视图可以获知分区所对应的IP盘,以便进行真实的存储。在用户看来,数据似乎是被存入了分区中,因此,有时候我们也把数据存储分区所对应的IP盘的过程,称为数据存入分区。
控制器15用于计算分区总数,获取IP盘信息,建立IP盘和分区的对应关系,并把对应关系及时更新到各个管理服务器中。这个对应关系也称为分区视图。
现有技术中,由于分区总数是不变的,随着IP盘数量的增多,每个IP盘对应的分区成反比例的减少。因此当IP盘数量较少时,每个IP盘对应了 过多的分区。以背景技术中的例子为例,在25,000个存储盘的情况下,每个存储盘拥有100个分区是合适的;但是在75个存储盘的情况下,每个存储盘会拥有33333个分区,这个分区数量是不合适的,因为它至少会引起以下问题。
(1)分区视图过于复杂,分区视图中一共拥有75×33333=2499975条对应关系。和25,000个存储盘的情况下的对应关系条数相近。管理服务器在使用这个分区视图转发IP报文时,需要检索分区视图中的对应关系,在IP盘数量不多时就要检索如此大量的对应关系,造成转发效率低下。而且发布分区视图时,控制器需要把分区视图发布给每个管理服务器,占用带宽较大。因此,占用了过多管理服务器的处理资源以及存储系统的带宽资源。
(2)为了增强数据可靠性,每个分区会存在副本分区在其他IP盘,分区和它的副本分区位于不同的IP盘,但对应存储有相同的数据,这种情况被称为多副本。假设B分区、C分区是A分区的副本分区,那么在把一个报文存入A分区所在的IP盘时,会同时把这个报文的副本存入B分区所在的IP盘,以及C分区所在的IP盘。
在多副本场景下,假设某个IP盘发生故障,那么故障存储盘中的数据需要通过副本进行恢复,以确保数据可靠性。具体而言,就是查找故障IP盘中各个分区的副本分区,然后对副本分区中的数据进行复制,把复制得到的数据存储到没有故障的IP盘。当每个IP盘中的分区过多时,这些分区的副本分区数量也会很多,这些副本分区将分布在大量的其他IP盘中。以至于当一个IP盘故障时,大量其他IP盘都要参与数据的恢复,在数据恢复期间,这些参与数据恢复的IP盘的性能都受到影响。
(3)在多副本场景下,如果每个IP盘拥有的分区数量过多,那么对某 一个IP盘而言,其分区的副本会分散到大量的IP盘中,由于存储副本的IP盘数量较大,同时出现多个IP盘故障的可能性变大,使得数据可靠性降低。
(4)本发明实施例引入母分区和终分区的概念,终分区总数是不变的,类似于现有技术中的分区。母分区可以分裂成多个子分区,如果子分区可以继续分裂成新的子分区,那么这个子分区是下一级子分区的母分区。随着集群中IP盘数量的增多,分裂过程可以一直持续下去,直至分裂成为终分区。此时整个集群中拥有的分区数量达到终分区总数。
需要说明的是,现有技术中用户看到的分区和存储系统内部管理的分区数量是相同的,不论集群中拥有多少IP盘,分区数量始终不变。而本发明实施例中,用户看到的仍然是终分区的数量。但是,在母分区分裂成终分区之前,存储系统中使用母分区进行管理,也就是说分区视图以及报文转发都使用母分区进行处理。本发明各实施例中,在没有特别说明的情况下,分区都是指存储系统所管理的分区。
当集群中的IP盘数量较少时,为每个IP盘分配适量的母分区,每个IP盘的合并分区数量可以由管理员指定或者由系统自动设定,不受终分区数量限制。例如在集群拥有75个IP盘时,每个IP盘对应100个母分区,分区视图中仅有75×100=7500条对应关系。在集群拥有150个IP盘时,每个母分区分裂成2个子分区,那么每个IP盘仍然对应100个分区,假设每个分区拥有一个副本,那么某个IP盘故障时,受影响的其他IP盘数量将限定在100以内。也就是说,本发明可以实现分区数量的定制化,使得每个IP盘拥有的分区数量可控。基于本发明实施例提供的方法,控制器向管理服务器集群中的各个管理服务器发送分区视图时,占用的带宽更小。而且对应关系的条目减少以后,管理服务器在查询键(Key)对应的存储盘地址时速度更快。
应用服务器1需要读数据或者写数据时,数据依次经过管理服务器21、交换设备集群3到达IP盘。
IP盘可以使用磁盘或者闪存作为存储介质,在软件上提供Key-Value接口,硬件上提供以太网接口,对通过以太网接口接收的以太帧解压后获得IP报文。IP盘例如希捷公司的Kinetic产品。
本发明实施例中将存储Key-Value数据的存储器统称为存储盘。除了IP盘以外,在其他实施方式中,存储盘也可以使用支持Key-Value接口的其他存储设备,在硬件上使用非以太接口。存储盘使用的存储介质可以是硬盘或者闪存。
图2介绍本发明实施例分区管理方法,描述了当IP盘集群中的IP盘数量扩容时,如何增加分区的数量来满足新增IP盘的需要。该方法由控制器执行该方法包括下面的步骤。
步骤21:根据集群中IP盘最大容量,确定终分区数量L,L是自然数。以及根据初始IP盘数量,确定每个IP盘拥有的初始分区数量。本步骤是个前置步骤,在控制器初始化时执行,是可选的。并且每个初始分区对应有一个存储盘,用分区视图记录这个对应关系。
在执行完本步骤后,控制器中会把初始IP盘数量记录为当前IP盘数M、每个IP盘拥有的初始分区数量记录为当前分区数P,集群中当前已有分区总数T=M·P。控制器还会记录终分区数L。后续如果集群中IP盘数量变化或者各个IP盘的分区数量发生变化,会对当前IP盘数M,当前分区数P进行更新。本发明各实施例中,符号“·”表示乘积。
终分区的数量是不变的,无法分裂,可以被用户感知。初始分区通常是母分区。母分区是可以分裂的分区,分裂生成下一级母分区或者生成终 分区,母分区供存储系统使用,用户无法感知母分区的存在。本发明实施例中,当前分区是指在当前时刻,管理服务器使用的分区,由控制器发布给管理服务器。如果执行了分区分裂了,当前分区就是分区分裂后的分区。当前分区可以是母分区,也可以包括终分区。
L的值可以由用户设定也可以由系统自动分配,通常由IP盘最大容量,以及IP盘最大容量时各个IP盘的分数,二者共同确定。L=IP盘最大容量×每个IP盘的分区数。假设每个IP盘拥有X个分区是较佳值。例如用户认为每个IP盘拥有100个分区是一个比较合理的值,也就是X=100,而IP盘集群能够支持的最大IP盘数量是10,000个。那么L=10,000×100=1,000,000。
类似的,初始分区数量由用户设定也可以由系统自动分配,一种可选方式是:假设初始情况下IP盘数M=75,每个IP盘分配100个分区,那么初始分区总数=75×100=7,500。应用本发明方法,在初始阶段,就可以使存储系统中每个IP盘拥有的分区数量自由设定,可以把分区数量设定为既符合用户需求,又不过多占用存储资源、计算资源、带宽资源等资源的值。
每个初始分区对应有终分区。每个初始分区有ID,ID编号是大于等于0的整数。每个终分区有ID,编号是大于等于0的整数。获得初始分区所对应终分区的方法是:用终分区的ID对初始分区总数取模,余数的值表示这个终分区所对应的初始分区ID。此时初始分区也就是当前分区。
步骤22:检测到有N个新的IP盘准备加入IP盘集群时,获取所述集群中当前存储盘数量M,以及所述集群中当前已有分区总数T,其中M,N以及T都是自然数。
控制器和IP盘集群连接,因此准备加入集群的新的IP盘,能够被控制器检测到。在当前已有M个IP盘,当前的分区总数是M·P。这N个IP 盘物理上已经接入集群,可以被控制器检测到,但还没有被分配分区,因此还无法存储数据。
需要说明的是,“当前”是指准备执行本步骤的时刻。本实施例中,控制器初始化后尚未增加过IP盘,因此当前IP盘数量是M。在其他实施例中,如果在执行本步骤之前,IP盘集群中IP盘数量发生过变化,那么当前IP盘数量不是M。如果在执行本步骤之前,分区进行过分裂,那么当前每个IP盘拥有的初始分区数数量比P更大。T个分区大致平均分配在M个IP盘中。
步骤23:判断当前分区数量是否满足当前IP盘与新增IP盘的共同需要。也就是判断当前分区总数T与存储盘总数M+N的数学关系是否满足第一预定条件。具体地,可以使用公式M·P/(M+N)与第一阈值进行比较的方式进行判断。其中,第一阈值是自然数。第一阈值可以由控制器预设。在控制器初始化后第一次执行本步骤时,T=M·P。可选地,上述第一阈值可以是一个大于10小于20的整数,例如16,17,18或者19。
一种判断方法是:如果T/(M+N)小于第一阈值,就满足第一预设条件,需要进行分裂。每个IP盘平均拥有的分区数目小于(也可以设置为小于等于)分裂阈值,说明如果不进行分裂每个IP盘拥有的分区太少,因此有必要通过分裂增加分区的总数,那么执行步骤24;否则执行步骤25。
另外一种判断方法是:如果进行1次分区分裂后,每个IP盘平均拥有的分区数大于(也可以设置为大于等于)某阈值,说明如果进行分裂,每个IP盘用的分区过多,那么就执行步骤25;否则,执行步骤24。另外也可以把这两种判断方法结合起来进行判断,选择一种对对业务满足程度最高的方案。后续实施例均以第一种判断方法为例对本发明介绍。
步骤24:对至少一个当前分区进行分裂,可以分裂一次,也可以分裂 多次,直至分裂后的分区数目符合要求,然后执行步骤26。分裂后的分区数目符合要求,可以是分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件。分裂后的总分区数不大于终分区数L。分裂多次是指对分裂产生的分区再次进行若干次分裂。
分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件具体可以是:S/(M+N)大于等于第二阈值,第二阈值是自然数。第二阈值例如可以是大于25小于50的自然数,例如25,26,27,48或者49。
分裂后的分区数目符合要求可以有多种判断方式。例如,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件就停止分裂。具体而言,例如如果分裂后平均每个IP盘拥有的分区数大于等于第二阈值,就算作满足第二预设条件,停止继续分裂。或者分裂后平均每个IP盘拥有的分区数满足一个预设阈值范围就符合要求,就算作符合要求。
如果每次分裂,每个分区分裂成多个子分区,每个分区分裂成的子分区的数目相同。从第一次分裂前,直至分裂结束,分区变化的倍数称为分裂系数。假设每次分裂,每个母分区分裂成2个子分区。那么如果进行1次分裂就结束分裂,那么分裂系数是2。如果进行2次分裂后分裂结束,那么分裂系数是22
如果对至少一个所述当前分区进行分裂是指:对所有当前分区进行分裂。那么分裂后分区总数S=T×分裂系数。如果对至少一个所述当前分区进行分裂是指:对一部分当前分区进行分裂。那么分裂后分区总数S<T×分裂系数。前一种分裂方式的分裂速度更快;后一种分裂方式使得分裂后的分区总数可调整性更好。为了表述方便,本发明实施例以前一种分裂方式进行说明。
下面对后一种分裂方法举例:假设终分区总数是1000,而当前分区总 数是512,分裂系数是2,如果对每个分区都进行分裂,那么分裂后将获得1024个分区,超过了终分区的总数。由于当前分区总数大于终分区总数是不允许的。为了避免此种情况发生,可以仅对488个当前分区进行分裂,488+512=1000,也就是说分裂后的当前分区总数将恰好达到1000,没有大于终分区总数的值。此外,如果用户认为1000个当前分区数目过大,例如用户认为分裂后的当前分区总数达到800最合适。那么可以对512个当前分区中的288个分区按照分裂系数2进行分裂,使得分裂后的当前分区总数恰好达到800。应用这后一种分裂方法,可以每次只分裂一个分区,如果分裂系数是2,那么分裂前后当前分区总数的差值为1,使得分区分裂的粒度最小。
此外,在上面实施例的基础上,分裂系数也是可以更改的。比如在第一次分裂中,使用分裂系数2,但在第二次分裂中,把分裂系数改为5。也可以达到使分裂获得的分区总数的值更容易调节的目的。
当前分区和终分区有对应关系,这个对应关系可以存储在控制器中,并且可以由控制器发布给各管理服务器。每个当前分区有ID,当前分区ID可以是大于等于0的整数。当前分区一共有T个,这T个当前分区的ID形成第一项是0、公差为1的等差数列,项数是T。每个终分区有ID,终分区ID可以是大于等于0的整数。终分区共有S个,S个终分区的ID形成第一项是0、差值为1的等差数列,项数是S。例如由12个分区成24个分区,分裂前的分区ID依次是0,1,2,3……,9,10,11;分裂后的分区ID依次是0,1,2,3……,21,22,23。
分区分裂后的分区ID生成规则可以是这样:分裂后的分区中,一个分区保留原分区ID,其余分区ID与原分区ID的值形成等差数列,数列中项数的递增,公差是分裂前的分区总数M。例如,分裂前一共有200个分区,分裂后每个分区分裂为3个,对ID是21的分区进行分裂后生成的3个分 区的ID依次是:221、421以及621。这个ID生成规则是可以改变的,只要整个分裂过程结束后,当前分区的ID仍然是一个首项为0、并且公差为1的递增的等差数列即可。
例如,按照另外一种分区ID生成规则:ID是0的分区进行分裂出的3个分区ID还可以是0、201以及202;ID是1的分区分裂后得到的分区ID是1、203、204;ID是3的分区分裂后得到的分区ID是3、205、206;其余分区的ID以此类推。
需要说明的是,在当前分区总数S即将达到终分区总数L时。可能会出现这样的情况:如果对每个分区都进行一次分裂所生成的分区总数大于终分区总数L。这个时候,那么可以只对部分分区进行分裂。或者减小分裂系数,使得既能通过分裂增加分区总数,又不至于使得分区总数超过终分区总数L。
步骤25:执行分区迁移,将原有的M个IP盘所拥有的分区,迁移一部分到新增的N个IP中,使M·P个分区平均分布在M+N个IP盘中。执行步骤25后后,整个系统中的分区总数不变,平均每个IP盘拥有的分区数减少。步骤24和步骤25二选一执行,在一个完整的分区管理方法实施例中,执行了步骤24后,不再执行步骤25或步骤26。
步骤26:对控制器中记录的当前IP盘总数更新为M+N,当前分区总数更新为S。本步骤也可以和步骤24一起执行。每个IP盘的当前分区数大致为S/(M+N),因此,也可以不记录总分区数S,而改为记录集群中每个IP盘的当前分区数大致为S/(M+N)。
步骤26是为了下一次分裂做准备,因此对本次分区管理操作而言,不是必要步骤。
需要说明的是,S个当前分区分配到M+N个IP盘。还可以执行更新分区视图的操作,分区视图中记录当前分区所对应的IP盘,具体而言,可以是当前分区ID和IP盘地址的对应关系。更新分区视图的操作可以在步骤24执行也可以在步骤26执行。后续的数据处理方法,可以本步骤所述更新后的分区视图。实际上,由于正确记录了当前分区和IP盘的对应关系后,分区才能够被使用,因此在本发明其他实施例中,涉及到当前分区发生变化时,都需要执行分区视图更新的操作。
上述分区管理方法,可以由图3所示的硬件执行。图3中,控制器3包括接口31,处理器32以及存储介质33。
接口31用以提供对外接口,例如连接存储盘集群和管理服务器。存储介质33用以存储计算机程序代码。处理器32通过运行存储介质33中的程序代码执行上述的分区管理方法。
参见图4,本发明实施例还提供一种分区管理装置4,分区管理装置4可以是硬件,或者软件形成的虚拟硬件。分区管理装置4可以执行上述的分区管理方法。分区管理装置4包括存储盘检测模块41、第一预定条件判断模块42以及分区分裂模块43。可选的,分区管理装置4还可以包括初始化模块40。可选的,分区管理装置4还可以包括更新模块44。
初始化模块40,用于根据集群中IP盘最大容量,确定终分区数量L,L是自然数。以及根据初始IP盘数量,确定每个IP盘拥有的初始分区数量。本步骤是个前置步骤,仅在控制器初始化时执行,因此是可选的。
更新模块44可以把初始IP盘数量记录为当前IP盘数M、每个IP盘拥有的初始分区数量记录为当前分区数P,集群中当前已有分区总数T=M·P。更新模块44还会记录终分区数L。后续如果集群中IP盘数量变化或者各个IP盘的分区数量发生变化,会对当前IP盘数M,当前分区数P进行更新。 本发明各实施例中,符号“·”表示乘积。
终分区的数量是不变的,无法分裂,可以被用户感知。初始分区通常是母分区。母分区是可以分裂的分区,分裂生成下一级母分区或者生成终分区,母分区供存储系统使用,用户无法感知母分区的存在。
L的值可以由用户设定也可以由系统自动分配,通常由IP盘最大容量,以及IP盘最大容量时各个IP盘的分数,二者共同确定。L=IP盘最大容量×每个IP盘的分区数。假设每个分区拥有X个分区是较佳值。例如用户认为每个IP盘拥有100个分区是一个比较合理的值,也就是X=100,而IP盘集群能够支持的最大IP盘数量是10,000个。那么L=10,000×100=1,000,000。
类似的,初始分区数量由用户设定也可以由系统自动分配,一种可选方式是:假设初始情况下IP盘数M=75,每个分区分配100个分区,那么初始分区总数=75×100=7,500。应用本发明方法,在初始阶段,就可以使存储系统中每个IP盘拥有的分区数量自由设定,可以把分区数量设定为既符合用户需求,又不过多占用存储资源、计算资源、带宽资源等资源的值。
存储盘检测模块41,用于当检测到有N个新的存储盘准备加入所述集群时,获取所述集群中当前存储盘数量M,以及所述集群中当前已有分区总数T,其中M,N以及T都是自然数。
分区管理装置4和IP盘集群连接,因此新加入集群的IP盘,能够被存储盘检测模块41检测到。在当前已有M个IP盘,当前的分区总数是M·P。这N个IP盘物理上已经接入集群,可以被存储盘检测模块41检测到,但还没有被分配分区,因此还无法存储数据。
需要说明的是,“当前”是指准备执行某个操作的时刻。本实施例中, 分区管理装置4在初始化后尚未增加过IP盘,因此当前IP盘数量是M。在其他实施例中,如果在执行本步骤之前,IP盘集群中IP盘数量发生过变化,那么当前IP盘数量不是M。如果在执行本步骤之前,分区进行过分裂,那么当前每个IP盘拥有的初始分区数数量比P更大。T个分区大致平均分配在M个IP盘中。
第一预定条件判定模块42,用于判断当前分区总数T与存储盘总数M+N的数学关系是否满足第一预定条件。
判断当前分区数量是否满足当前IP盘与新增IP盘的共同需要。也就是判断当前分区总数T与存储盘总数M+N的数学关系是否满足第一预定条件,可以使用公式T/(M+N)与第一阈值进行比较的方式进行判断。分裂阈值可以由控制器预设。在控制器初始化后第一次执行本步骤时,T=M·P。第一阈值可以是一个大于10小于20的整数,例如16,17,18,19或者20。
一种判断方法是:如果T/(M+N)小于第一阈值则分裂,就满足第一预设条件,需要进行分裂。每个IP盘平均拥有的分区数目小于(也可以设置为大于等于)分裂阈值,说明如果不进行分裂每个IP盘拥有的分区太少,那么执行步骤24;否则执行步骤25。
另外一种判断方法是:如果进行1次分区分裂后,每个IP盘平均拥有的分区数大于(也可以设置为大于等于)某阈值,说明如果进行分裂,每个IP盘用的分区过多,那么就执行步骤25;否则,执行步骤24。另外也可以把这两种判断方法结合起来进行判断,选择一种对对业务满足程度最高的方案。后续实施例均以第一种判断方法为例对本发明介绍。
分区分裂模块43,如果经过第一预定条件判定模块42的判断,结论是满足第一预定条件,则分区分裂模块43用于则对至少一个所述当前分区进行分裂,使得分裂后的分区总数是S,并将分裂后的分区分配给M+N个存 储盘。分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件,并且分裂后的分区总数不大于所述集群支持的终分区总数L,其中L、S都是大于1的自然数。将分裂后的分区分配给M+N个存储盘可以是平均分配,无法平均分配时可以尽量平均分配。
对至少一个当前分区进行分裂,可以分裂一次,也可以分裂多次,直至分裂后的分区数目符合要求。分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件具体可以是:S/(M+N)大于等于第二阈值,第二阈值是自然数。第二阈值例如可以是大于25小于50的自然数,例如47,48,49或50。
分裂后的分区数目符合要求可以有多种判断方式。例如,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件就停止分裂。具体而言,例如如果分裂后平均每个IP盘拥有的分区数大于等于第二阈值,就算作满足第二预设条件,停止继续分裂。或者分裂后平均每个IP盘拥有的分区数满足一个预设阈值范围就符合要求,就算作符合要求。
如果每次分裂,每个分区分裂成多个子分区,每个分区分裂成的子分区的数目相同。从第一次分裂前,直至分裂结束,分区变化的倍数称为分裂系数。假设每次分裂,每个母分区分裂成2个子分区。那么如果进行1次分裂就结束分裂,那么分裂系数是2。如果进行2次分裂后分裂结束,那么分裂系数是22
如果对至少一个所述当前分区进行分裂是指:对所有当前分区进行分裂。那么分裂后分区总数S=T×分裂系数。如果对至少一个所述当前分区进行分裂是指:对一部分当前分区进行分裂。那么分裂后分区总数S<T×分裂系数。前一种分裂方式的分裂速度更快;后一种分裂方式使得分裂后的分区总数可调整性更好。具体可以参见分区管理方法实施例的描述。为 了表述方便,本发明实施例以前一种分裂方式进行说明。
下面对后一种分裂方法举例:假设终分区总数是1000,而当前分区总数是512,分裂系数是2,如果对每个分区都进行分裂,那么分裂后将获得1024个分区,超过了终分区的总数。由于当前分区总数大于终分区总数是不允许的。为了避免此种情况发生,可以仅对488个当前分区进行分裂,488+512=1000,也就是说分裂后的当前分区总数将恰好达到1000,没有大于终分区总数的值。此外,如果用户认为1000个当前分区数目过大,例如用户认为分裂后的当前分区总数达到800最合适。那么可以对512个当前分区中的288个分区按照分裂系数2进行分裂,使得分裂后的当前分区总数恰好达到800。应用这后一种分裂方法,可以每次只分裂一个分区,如果分裂系数是2,那么分裂前后当前分区总数的差值为1,使得分区分裂的粒度最小。
此外,在上面实施例的基础上,分裂系数也是可以更改的。比如在第一次分裂中,使用分裂系数2,但在第二次分裂中,把分裂系数改为5。也可以达到使分裂获得的分区总数的值更容易调节的目的。当前分区和终分区有对应关系,这个对应关系可以由更新模块44进行存储,还可以由更新模块44发布给数据处理装置。每个当前分区有ID,当前分区ID可以是大于等于0的整数。所有当前分区的ID形成第一项是0、公差为1的等差数列。每个终分区有ID,终分区ID可以是大于等于0的整数。所有终分区的ID形成第一项是0、差值为1的等差数列。例如由12个分区成24个分区,分裂前的分区ID依次是0,1,2,3……,9,10,11;分裂后的分区ID依次是0,1,2,3……,21,22,23。
分区分裂后的分区ID生成规则可以是这样:分裂后的分区中,一个分区保留原分区ID,其余分区ID的值与原分区ID的值形成等差数列,等差数列中各项递增,公差是分裂前的分区总数M。例如,分裂前一共有200 个分区,分裂后每个分区分裂为3个,对ID是21的分区进行分裂后生成的3个分区的ID依次是:221、421以及621。这个ID生成规则是可以改变的,只要整个分裂过程结束后,当前分区的ID仍然是一个首项为0、并且公差为1的、递增的等差数列即可。
需要说明的是,在当前分区总数S即将达到终分区总数L时。可能会出现这样的情况:如果对每个分区都进行一次分裂所生成的分区总数大于终分区总数L。这个时候,那么可以只对部分分区进行分裂。或者减小分裂系数,使得既能通过分裂增加分区总数,又不至于使得分区总数超过终分区总数L。
更新模块44,用于在分区分裂模块43的操作执行完成后,把分区管理装置4中记录的当前IP盘总数更新为M+N,当前分区总数更新为S。每个IP盘的当前分区数大致为S/(M+N),因此,也可以不记录总分区数S,而改为记录集群中每个IP盘的当前分区数大致为S/(M+N)。
由于更新模块44的操作是为了下一次分裂做准备,因此对本次分区管理操作而言,不是必要模块。
可选的,分区分裂模块43或者更新模块44还可以执行分区视图的操作,分区视图中记录当前分区所对应的IP盘,例如当前分区的分区ID和所对应的IP盘的IP盘地址地址的对应关系。也就是说分区视图记录了:对当前的S个分区中,各个分区和M+N个IP盘的哪个IP盘对应。后续的数据处理装置,可以使用所述更新后的分区视图。
可选的,还可以包括分区迁移模块(图中未示出),如果不执行分区分裂,可以使用分区迁移模块执行分区迁移,将原有的M个IP盘所拥有的分区,迁移一部分到新增的N个IP中,使M·P个分区平均分布在M+N个IP盘中。
参见图5本发明另外提供一种数据处理方法,应用于分区管理设备中。所述分区管理设备中存储有分区视图,所述分区视图记录有当前分区ID和存储盘(例如IP盘)地址的对应关系。数据处理方法在分区管理方法之后执行,以分区管理方法作为基础,但二者有相对的独立性。分区管理设备和控制器连接。分区管理设备例如是管理服务器,或者是交换设备集群,下面以管理服务器为例进行说明。数据处理方法实施例可以基于前文描述的分区管理方法所提供的分区视图执行,分区视图由控制器生成,发送给分区管理设备集群中的各个分区管理设备进行存储。
步骤51:根据待写数据生成Key-Value数据。例如把待写数据切分成包含值Value集合,生成所述Value的Key,形成Key-Value数据,Key-Value是Key和Key所对应的Value的组合。由于一个待写数据可以拆分成多个Value,相应的会生成多个Key-Value,为了描述方便,后续步骤仅描述其中某一个具体的Key-Value的处理过程。
待写数据来自于应用服务器,例如是文件或者数据流。如果待写数据的尺寸(Size)比较大,为了存储方便,管理服务器可以对数据进行拆分。例如可以拆分成大小为1MB的等大小数据分片,每个分片称为一个Value。Key用于唯一标记一个Value。因此不同Value的Key不同。例如可以使用“数据文件名+数字编号”作为一个Value的Key。对于尺寸比较小的数据,可以不用拆分,直接生成它的Key,然后形成Key-Value数据。在一些特殊场景下,尺寸大的数据也可以不用拆分,直接形成相应的Key-Value数据,然后发送到IP盘进行存储。
步骤52:获得键-值Key-Value数据中的值Key,根据所述Key计算所述Key-Value数据对应的终分区ID。如前所述,所述Key-Value数据包括值Value以及与所述Value唯一对应的Key。
一种计算终分区ID的方法是,对所述Key进行哈希运算得到所述Key的哈希值,对所述哈希值按照所述终分区总数L取模,余数作为所述终分区ID。L是大于等于2的自然数。这样获得的终分区ID是一个数字,在另外一种实施例中,一种等同的变换是把这个数字映射为另外一个标记,例如映射成英文编号。把这个英文编号作为终分区ID。除了终分区,初始分区、当前分区也可以用英文编号表示。而在计算Key对应的终分区,或者计算当前分区和终分区的对应关系时,可以把这个英文编号重新映射为数字,并把映射获得的数字像数字分区ID一样使用“取模”法进行计数。在这种算法中,终分区的概念可以参见前面的分区管理方法实施例中的描述。
步骤53:计算所述终分区ID对应的当前分区ID,其中,每个当前分区ID对应多个终分区ID。
一种计算终分区的当前分区的算法是:对所述终分区ID按照当前分区总数T取模,余数作为当前分区ID。当前分区总数T是自然数。同样的,当前分区ID是一个数字,一种等同的变换是把这个数字映射为另外一个标记以作为当前分区ID。当前分区是指在当前时刻,IP盘集群所拥有的分区,每个当前分区对应有一个IP盘。参见前面的分区管理方法实施例中对当前分区的描述。终分区是当前分区的子分区。
由此可见当前分区ID和多个终分区ID存在对应关系,对应关系可以参照分区管理方法实施例。当前分区和终分区有对应关系,这个对应关系可以存储在控制器中,在执行步骤53时读取这个对应关系。也可以不预存这个对应关系,执行步骤53时按照算法计算获得这个对应关系。每个当前分区有ID,当前分区ID可以是大于等于的整数。所有当前分区的ID的集合形成第一项是0、公差为1的等差数列。每个终分区有ID,终分区ID可以是大于等于的整数。所有终分区的ID的集合形成第一项是0、差值为1的等差数列。获得当前分区所对应终分区的方法是:用终分区ID对当前 分区总数取模,余数的值是这个终分区所对应的当前分区ID。
此外,分区分裂后,ID生成规则可以是这样:分裂后的分区中,一个分区保留原分区ID,其余分区ID的值与原分区ID的值形成等差数列,数列中各项递增,公差是分裂前分区总数。例如,分裂前一共有200个分区,分裂后每个分区分裂为3个,对ID是21的分区进行分裂后生成的3个分区的ID依次是:221、421以及621。这个ID生成规则是可以改变的,只要整个分裂过程结束后,当前分区的ID仍然是一个首项为0、并且公差为1的、递增的等差数列即可。例如按照另外一种分区方法:以对ID是0的分区进行分裂,分裂出的3个分区ID还可以是0、201以及202;ID是1的分区分裂后得到的分区ID是1、203、204;ID是3的分区分裂后得到的分区ID是3、205、206;其余分区的ID以此类推。
步骤54:查询所述分区视图,获得当前分区ID对应的存储盘地址。
所述分区管理设备中存储有分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系。如果存储盘是IP盘,那么存储盘地址可以是一个IP地址。如果存储盘是基于其他类型的协议,比如ATM或者IPX协议,那么存储盘地址就是ATM地址,或者IPX地址。
步骤55:以所述存储盘地址作为目的地址生成Key-Value报文,发送所述Key-Value报文给所述存储盘,所述Key-Value报文的净荷中携带所述Key-Value数据。
存储盘接收到Key-Value报文后,存储Key-Value数据。
如图6所示,是本发明数据处理设备实施例,数据处理包括:存储器61,被配置为存储有分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系;接口62,被配置为用于提供对外连接;计算机可读介质63,被配置为用于存储计算机程序;处理器64,和所述存储器61、接口62、 计算机可读介质连接63,被配置为用于通过运行所述程序,执行上述数据处理方法。例如包括以下步骤。
获得键-值Key-Value数据中的值Key,根据所述Key计算所述Key-Value数据对应的终分区ID,其中,所述Key-Value数据包括值Value以及与所述Value唯一对应的Key;计算所述终分区ID对应的当前分区ID,其中,每个当前分区ID对应多个终分区ID;查询所述分区视图,获得当前分区ID对应的存储盘地址;以所述存储盘地址作为目的地址生成Key-Value报文,发送所述Key-Value报文给所述存储盘,所述Key-Value报文携带所述Key-Value数据。
如图7所示,是本发明实施例数据处理装置7的附图。包括:存储模块71、终分区计算模块6、当前分区计算模块73、查询模块74和发送模块75。可选的,还可以包括Key-Value数据生成模块76。
存储模块71,用于存储分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系。存储模块71所使用的存储介质可以是闪存或者硬盘。
存储模块71的分区视图来自分区管理装置,例如图4中的分区管理装置4。存储模块71可以和分区管理装置4连接,以便接收分区视图。
终分区计算模块72,用于获得键-值Key-Value数据中的值Key,根据所述Key计算所述Key-Value数据对应的终分区ID,其中,所述Key-Value数据包括值Value以及与所述Value唯一对应的Key。
终分区计算模块72计算计算终分区ID的一种方法是,对所述Key进行哈希运算得到所述Key的哈希值,对所述哈希值按照所述终分区总数L取模,余数作为所述终分区ID。L是大于等于2的自然数。这样获得的终 分区ID是一个数字,在另外一种实施例中,一种等同的变换是把这个数字映射为另外一个标记,例如映射成英文编号。把这个英文编号作为终分区ID。除了终分区,初始分区、当前分区也可以用英文编号表示。而在计算Key对应的终分区,或者计算当前分区和终分区的对应关系时,可以把这个英文编号重新映射为数字,并把映射获得的数字像数字分区ID一样使用“取模”法进行计数。终分区的概念可以参见前面的分区管理方法实施例中的描述。
当前分区计算模块73,用于计算所述终分区ID对应的当前分区ID,其中,每个当前分区ID对应多个终分区ID。
当前分区计算模块73计算终分区的当前分区的一种算法是:对所述终分区ID按照当前分区总数T取模,余数作为当前分区ID。当前分区总数T是自然数。同样的,当前分区ID是一个数字,一种等同的变换是把这个数字映射为另外一个标记以作为当前分区ID。当前分区是指在当前时刻,IP盘集群所拥有的分区,每个当前分区对应有一个IP盘。参见前面的分区管理方法实施例中对当前分区的描述。终分区是当前分区的子分区。
由此可见当前分区ID和多个终分区ID存在对应关系,对应关系可以参照分区管理装置实施例,分区管理装置生成对应关系后发布给各个数据处理装置。当前分区和终分区有对应关系,这个对应关系可以存储在当前分区计算模块73中,也可以不预存这个对应关系,由当前分区计算模块73计算获得这个对应关系。每个当前分区有ID,当前分区ID可以是大于等于的整数。所有当前分区的ID的集合可以形成第一项是0、公差为1的递增的等差数列。每个终分区有ID,终分区ID可以是大于等于的整数。所有终分区的ID的集合形成第一项是0、差值为1的递增等差数列。获得当前分区所对应终分区的方法是:用当前分区的ID对初始分区总数取模,余数的值表示这个当前区所对应的初始分区ID。例如由12个分区成24个分区,分裂前的分区ID依次是0,1,2,3……,9,10,11;分裂后的分 区ID依次是0,1,2,3……,21,22,23。
此外,分区分裂后,ID生成规则可以是这样:分裂后的分区中,一个分区保留原分区ID,其余分区ID的值与原分区ID的值形成等差数列,数列中各项递增,公差是分裂前分区总数。例如,分裂前一共有200个分区,分裂后每个分区分裂为3个,对ID是21的分区进行分裂后生成的3个分区的ID依次是:221、421以及621。这个ID生成规则是可以改变的,只要整个分裂过程结束后,当前分区的ID仍然是一个首项为0、并且公差为1的递增的等差数列即可。例如按照另外一种分区方法:以对ID是0的分区进行分裂,分裂出的3个分区ID还可以是0、201以及202;ID是1的分区分裂后得到的分区ID是1、203、204;ID是3的分区分裂后得到的分区ID是3、205、206;其余分区的ID以此类推。
查询模块74,用于查询所述存储模块71所存储的所述分区视图,获得当前分区ID对应的存储盘地址。
所述分区视图记录有当前分区ID和存储盘地址的对应关系。如果存储盘是IP盘,那么存储盘地址可以是一个IP地址。如果存储盘是基于其他类型的协议,比如ATM或者IPX协议,那么存储盘地址就是ATM地址,或者IPX地址。
发送模块75,用于以所述存储盘地址作为目的地址生成Key-Value报文,发送所述Key-Value报文给所述存储盘,所述Key-Value报文携带所述Key-Value数据。
接下来,作为Key-Value报文的目的设备,存储盘用于通过交换机集群接收Key-Value报文,然后存储Key-Value数据。
Key-Value数据生成模块76,用于生成Key-Value数据。例如把待写数据切分成包含值Value集合,生成所述Value的所述Key,形成Key-Value数据,Key-Value是Key和Key所对应的Value的组合。由于一个待写数据可以拆分成多个Value,相应的会生成多个Key-Value,为了描述方便,本 发明实施例仅介绍某一个具体的Key-Value的处理方式。
待写数据来自于应用服务器,例如是文件或者数据流。如果待写数据的尺寸(Size)比较大,为了存储方便,管理服务器可以对数据进行拆分。例如可以拆分成大小为1MB的等大小数据分片,每个分片称为一个Value。Key用于唯一标记一个Value。因此不同Value的Key不同。例如可以使用“数据文件名+数字编号”作为一个Value的Key。对于尺寸比较小的数据,可以不用拆分,直接把这个数据作为Value生成Key-Value数据。在一些特殊场景下,尺寸大的数据也可以不用拆分,直接作为Value生成Key-Value数据。
如果数据处理设备7有Key-Value数据生成模块76的话,那么终分区计算模块72可以和Key-Value数据生成模块76连接,如果数据处理设备7没有Key-Value数据生成模块76的话,那么终分区计算模块72可以直接通过对外接口成欧诺个应用服务器获得Key-Value数据。
本发明的各个方面、或各个方面的可能实现方式可以被具体实施为系统、方法或者计算机程序产品。因此,本发明的各方面、或各个方面的可能实现方式可以采用完全硬件实施例、完全软件实施例(包括固件、驻留软件等等),或者组合软件和硬件方面的实施例的形式,在这里都统称为“电路”、“模块”或者“系统”。此外,本发明的各方面、或各个方面的可能实现方式可以采用计算机程序产品的形式,计算机程序产品是指存储在计算机可读介质中的计算机可读程序代码。
计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质包含但不限于电子、磁性、光学、电磁、红外或半导体系统、设备或者装置,或者前述的任意适当组合,如随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或者快闪存储器)、光纤、便携式只读存储器(CD-ROM)。
计算机中的处理器读取存储在计算机可读介质中的计算机可读程序代 码,使得处理器能够执行在流程图中每个步骤、或各步骤的组合中规定的功能动作;生成实施在框图的每一块、或各块的组合中规定的功能动作的装置。

Claims (32)

  1. 一种数据处理方法,应用于分区管理设备中,所述分区管理设备中存储有分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系,该方法包括:
    获得键-值Key-Value数据中的值Key,根据所述Key计算所述Key-Value数据对应的终分区ID,其中,所述Key-Value数据包括值Value以及与所述Value唯一对应的Key;
    计算所述终分区ID对应的当前分区ID,其中,每个当前分区ID对应多个终分区ID;
    查询所述分区视图,获得当前分区ID对应的存储盘地址;
    以所述存储盘地址作为目的地址生成Key-Value报文,发送所述Key-Value报文给所述存储盘,所述Key-Value报文携带所述Key-Value数据。
  2. 根据权利要求1所述的数据处理方法,其中,所述分区管理设备还记录有终分区总数L,根据所述Key计算所述Key-Value数据对应的终分区ID具体包括:
    对所述Key进行哈希运算得到所述Key的哈希值,对所述哈希值按照所述终分区总数L取模,余数作为所述终分区ID。
  3. 根据权利要求1或2所述的数据处理方法,其中,所述分区管理设备还记录有当前分区总数T,计算所述终分区ID对应的当前分区ID具体包括:
    对所述终分区ID按照当前分区总数T取模,余数作为当前分区ID,所有终分区ID是大于等于0的整数。。
  4. 根据权利要求1所述的数据处理方法,其中:
    所述终分区是所述当前分区的子分区,由当前分区分裂获得。
  5. 根据权利要求1所述的数据处理方法,所述数据处理方法具体为写数据方法,所述获得键-值Key-Value数据中的值Key之前,进一步包括:
    把待写数据切分成包含所述值Value的Value集合,生成所述Value的所述Key,形成所述Key-Value数据。
  6. 一种数据处理装置,该装置包括:
    存储模块,用于存储分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系;
    终分区计算模块,用于获得键-值Key-Value数据中的值Key,根据所述Key计算所述Key-Value数据对应的终分区ID,其中,所述Key-Value数据包括值Value以及与所述Value唯一对应的Key;
    当前分区计算模块,用于计算所述终分区ID对应的当前分区ID,其中,每个当前分区ID对应多个终分区ID;
    查询模块,用于查询所述存储模块所存储的所述分区视图,获得当前分区ID对应的存储盘地址;
    发送模块,用于以所述存储盘地址作为目的地址生成Key-Value报文,发送所述Key-Value报文给所述存储盘,所述Key-Value报文携带所述Key-Value数据。
  7. 根据权利要求6所述的数据处理装置,其中,所述存储模块还用于记录终分区总数L,所述终分区计算模块具体用于:
    对所述Key进行哈希运算得到所述Key的哈希值,对所述哈希值按照所述终分区总数L取模,余数作为所述终分区ID。
  8. 根据权利要求6或7所述的数据处理装置,其中,所述存储模块还用于记录当前分区总数T,所述当前分区计算模块具体用于:
    对所述终分区ID按照当前分区总数T取模,余数作为当前分区ID,所述终分区ID是大于等于0的整数。
  9. 根据权利要求6所述的数据处理装置,其中:
    所述终分区是所述当前分区的子分区,由当前分区分裂获得。
  10. 根据权利要求6所述的数据处理装置,所述数据处理装置进一步包括:Key-Value数据生成模块,用于把待写数据切分成包含所述值Value的Value集合,生成所述Value的所述Key,形成所述Key-Value数据。
  11. 一种数据处理设备,所述数据处理设备包括:
    存储器,被配置为存储有分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系;
    接口,被配置为用于提供对外连接;
    计算机可读介质,被配置为用于存储计算机程序;
    处理器,和所述存储器、接口、计算机可读介质连接,被配置为用于通过运行所述程序,执行以下步骤:
    获得键-值Key-Value数据中的值Key,根据所述Key计算所述Key-Value数据对应的终分区ID,其中,所述Key-Value数据包括值Value以及与所述Value唯一对应的Key;
    计算所述终分区ID对应的当前分区ID,其中,每个当前分区ID对应多个终分区ID;
    查询所述分区视图,获得当前分区ID对应的存储盘地址;
    以所述存储盘地址作为目的地址生成Key-Value报文,从所述接口发送所述Key-Value报文给所述存储盘,所述Key-Value报文携带所述Key-Value数据。
  12. 根据权利要求11所述的数据处理设备,其中,所述存储器进一步被配置为记录终分区总数L,所述根据所述Key计算所述Key-Value数据对应的终分区ID具体包括:
    对所述Key进行哈希运算得到所述Key的哈希值,对所述哈希值按照所述终分区总数L取模,余数作为所述终分区ID。
  13. 根据权利要求11或12所述的数据处理设备,其中,所述存储器进一步被配置为记录当前分区总数T,所述计算所述终分区ID对应的当前分区ID具体包括:
    对所述终分区ID按照当前分区总数T取模,余数作为当前分区ID,所述终分区ID是大于等于0的整数。
  14. 根据权利要求11所述的数据处理设备,其中:
    所述终分区是所述当前分区的子分区,由当前分区分裂获得。
  15. 根据权利要求12所述的数据处理设备,获得键-值Key-Value数据中的值Key之前,进一步包括:
    把待写数据切分成包含所述值Value的Value集合,生成所述Value的所述Key,形成所述Key-Value数据。
  16. 一种分区管理方法,由控制器执行,所述控制器对集群中的存储盘进行分区管理,所述集群中包括多个存储盘,该方法包括:
    当检测到有N个新的存储盘准备加入所述集群时,获取所述集群中当前存储盘数量M,以及所述集群中当前分区总数T,其中M,N以及T都是自然数;
    判断当前分区总数T与存储盘总数M+N的数学关系是否满足第一预定条件;
    如果满足第一预定条件,则对至少一个所述当前分区进行分裂,使得分裂后的分区总数是S,S>T,并将分裂后的分区分配给M+N个存储盘,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件,并且分裂后的分区总数不大于所述集群支持的终分区总数L,其中L、S都是大于1的自然数。
  17. 根据权利要求16所述的方法,其中,当前分区总数T满足第一预定条件是指:
    T/(M+N)小于第一阈值,第一阈值是自然数。
  18. 根据权利要求17所述的方法,其中,
    所述第一阈值大于10小于20。
  19. 根据权利要求16所述的方法,其中,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件是指:
    S/(M+N)大于等于第二阈值,第二阈值是自然数。
  20. 根据权利要求19所述的方法,其中,
    所述第二阈值大于25小于50。
  21. 根据权利要求16所述的方法,其中:
    各所述当前分区ID是大于等于0的整数,所有当前分区的分区ID的集合是第一项是0、项数为T、公差为1的等差数列;
    分裂后的各分区的分区ID是大于等于0的整数,所有分裂后的各分区的分区ID的集合是第一项是0、项数为S、公差为1的等差数列。
  22. 一种分区管理装置,用于对集群中的存储盘进行分区管理,所述集群中包括多个存储盘,该装置包括:
    存储盘检测模块,用于当检测到有N个新的存储盘准备加入所述集群时,获取所述集群中当前存储盘数量M,以及所述集群中当前已有分区总数T,其中M,N以及T都是自然数;
    第一预定条件判定模块,用于判断当前分区总数T与存储盘总数M+N的数学关系是否满足第一预定条件;
    分区分裂模块,用于如果满足第一预定条件,则对至少一个所述当前分区进行分裂,使得分裂后的分区总数是S,并将分裂后的分区分配给M+N个存储盘,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件,并且分裂后的分区总数不大于所述集群支持的终分区总数L,其中L、S都是大于1的自然数。
  23. 根据权利要求22所述的分区管理装置,其中,当前分区总数T满足第一预定条件是指:
    T/(M+N)小于第一阈值,第一阈值是自然数。
  24. 根据权利要求23所述的分区管理装置,其中,
    所述第一阈值大于10小于20。
  25. 根据权利要求22所述的分区管理装置,其中,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件是指:
    S/(M+N)大于等于第二阈值,第二阈值是自然数。
  26. 根据权利要求25所述的分区管理装置,其中:
    所述第二阈值大于25小于50。
  27. 根据权利要求25所述的分区管理装置,其中:
    各所述分区ID是大于等于0的整数,所有当前分区的分区ID的集合是第一项是0、项数为T、公差为1的等差数列;
    各分裂后的分区的分区ID是大于等于0的整数,所有分裂后的分区的分区ID的集合是第一项是0、项数为S、公差为1的等差数列。
  28. 一种分区管理设备,和集群连接,用于对集群中的存储盘进行分区管理,所述集群中包括多个存储盘,所述分区管理设备包括:
    存储器,被配置为存储有分区视图,所述分区视图记录有当前分区ID和存储盘地址的对应关系;
    接口,被配置为用于提供对外连接;
    计算机可读介质,被配置为用于存储计算机程序;
    处理器,和所述存储器、接口、计算机可读介质连接,被配置为用于通过运行所述程序,执行以下步骤:
    当通过所述接口检测到有N个新的存储盘准备加入所述集群时,获取 所述集群中当前存储盘数量M,以及所述集群中当前已有分区总数T,其中M,N以及T都是自然数;
    判断当前分区总数T与存储盘总数M+N的数学关系是否满足第一预定条件;
    如果满足第一预定条件,则对至少一个所述当前分区进行分裂,使得分裂后的分区总数是S,并将分裂后的分区分配给M+N个存储盘,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件,并且分裂后的分区总数不大于所述集群支持的终分区总数L,其中L、S都是大于1的自然数。
  29. 根据权利要求28所述的方法,其中,当前分区总数T满足第一预定条件是指:
    T/(M+N)小于第一阈值,第一阈值是自然数。
  30. 根据权利要求29所述的方法,其中,
    所述第一阈值大于10小于20。
  31. 根据权利要求28所述的方法,其中,分裂后的分区总数S与存储盘总数M+N的数学关系满足第二预定条件是指:
    S/(M+N)大于等于第二阈值,第二阈值是自然数。
  32. 根据权利要求31所述的方法,其中,
    所述第二阈值大于25小于50。
PCT/CN2014/090299 2014-11-05 2014-11-05 数据处理方法和装置 WO2016070341A1 (zh)

Priority Applications (11)

Application Number Priority Date Filing Date Title
CN201480075293.8A CN106063226B (zh) 2014-11-05 2014-11-05 数据处理方法、装置和设备
EP14905367.0A EP3128716B1 (en) 2014-11-05 2014-11-05 Data processing method and apparatus
CA2941163A CA2941163C (en) 2014-11-05 2014-11-05 Data processing method and apparatus
CN201710379148.4A CN107357522B (zh) 2014-11-05 2014-11-05 数据处理方法和装置
JP2016560892A JP6288596B2 (ja) 2014-11-05 2014-11-05 データ処理方法および装置
AU2014410705A AU2014410705B2 (en) 2014-11-05 2014-11-05 Data processing method and apparatus
PCT/CN2014/090299 WO2016070341A1 (zh) 2014-11-05 2014-11-05 数据处理方法和装置
CN201910052954.XA CN109918021B (zh) 2014-11-05 2014-11-05 数据处理方法和装置
KR1020167026230A KR101912728B1 (ko) 2014-11-05 2014-11-05 데이터 처리 방법 및 장치
US15/587,051 US9952778B2 (en) 2014-11-05 2017-05-04 Data processing method and apparatus
US15/946,484 US10628050B2 (en) 2014-11-05 2018-04-05 Data processing method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/090299 WO2016070341A1 (zh) 2014-11-05 2014-11-05 数据处理方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/587,051 Continuation US9952778B2 (en) 2014-11-05 2017-05-04 Data processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2016070341A1 true WO2016070341A1 (zh) 2016-05-12

Family

ID=55908360

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/090299 WO2016070341A1 (zh) 2014-11-05 2014-11-05 数据处理方法和装置

Country Status (8)

Country Link
US (2) US9952778B2 (zh)
EP (1) EP3128716B1 (zh)
JP (1) JP6288596B2 (zh)
KR (1) KR101912728B1 (zh)
CN (3) CN106063226B (zh)
AU (1) AU2014410705B2 (zh)
CA (1) CA2941163C (zh)
WO (1) WO2016070341A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101912728B1 (ko) * 2014-11-05 2018-10-29 후아웨이 테크놀러지 컴퍼니 리미티드 데이터 처리 방법 및 장치
US10698628B2 (en) 2015-06-09 2020-06-30 Ultrata, Llc Infinite memory fabric hardware implementation with memory
CN109783002B (zh) * 2017-11-14 2021-02-26 华为技术有限公司 数据读写方法、管理设备、客户端和存储系统
EP3803587A1 (en) * 2018-05-29 2021-04-14 Telefonaktiebolaget LM Ericsson (publ) Improved performance of function as a service
TWI723410B (zh) * 2019-05-31 2021-04-01 伊雲谷數位科技股份有限公司 雲端資源管理系統、雲端資源管理方法以及非暫態電腦可讀取記錄媒體
CN116997894A (zh) * 2021-03-31 2023-11-03 株式会社富士 数据保存系统
CN113468187B (zh) * 2021-09-02 2021-11-23 太平金融科技服务(上海)有限公司深圳分公司 多方数据整合方法、装置、计算机设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567464A (zh) * 2011-11-29 2012-07-11 西安交通大学 基于扩展主题图的知识资源组织方法
EP2721504A1 (en) * 2011-06-17 2014-04-23 Alibaba Group Holding Limited File processing method, system and server-clustered system for cloud storage
CN103797770A (zh) * 2012-12-31 2014-05-14 华为技术有限公司 一种共享存储资源的方法和系统
CN103812934A (zh) * 2014-01-28 2014-05-21 浙江大学 基于云存储系统的遥感数据发布方法
US20140189128A1 (en) * 2012-12-31 2014-07-03 Huawei Technologies Co., Ltd. Cluster system with calculation and storage converged

Family Cites Families (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5675769A (en) * 1995-02-23 1997-10-07 Powerquest Corporation Method for manipulating disk partitions
JP4206586B2 (ja) * 1999-11-12 2009-01-14 株式会社日立製作所 データベース管理方法および装置並びにデータベース管理プログラムを記録した記憶媒体
US7036126B2 (en) * 2000-12-11 2006-04-25 International Business Machines Corporation Method and an apparatus for logical volume manager plug-ins
US7395402B2 (en) * 2004-04-15 2008-07-01 Broadcom Corporation Method and system of data storage capacity allocation and management using one or more data storage drives
CN100476812C (zh) * 2004-04-15 2009-04-08 美国博通公司 利用至少一个数据存储器实现存储容量分配及管理方法和系统
CN100372299C (zh) * 2004-08-13 2008-02-27 华为技术有限公司 一种支持分布式管理信息树的网络管理方法
JP2006079495A (ja) * 2004-09-13 2006-03-23 Hitachi Ltd ストレージシステム及び論理区画の設定方法
US7809763B2 (en) * 2004-10-15 2010-10-05 Oracle International Corporation Method(s) for updating database object metadata
US7469241B2 (en) * 2004-11-30 2008-12-23 Oracle International Corporation Efficient data aggregation operations using hash tables
US20060168398A1 (en) * 2005-01-24 2006-07-27 Paul Cadaret Distributed processing RAID system
US7685398B2 (en) * 2006-05-18 2010-03-23 Dell Products L.P. Intelligent system for determination of optimal partition size in a build to order environment
US7990979B2 (en) * 2006-08-25 2011-08-02 University Of Florida Research Foundation, Inc. Recursively partitioned static IP router tables
CN101201796B (zh) * 2006-12-14 2010-05-19 英业达股份有限公司 自动调整快照设备的写入同步复制磁盘空间大小的方法
CN101515254B (zh) * 2008-02-18 2010-12-08 鸿富锦精密工业(深圳)有限公司 存储空间管理系统和方法
CN101639835A (zh) * 2008-07-30 2010-02-03 国际商业机器公司 多租户场景中应用数据库分区的方法和装置
SE532996C2 (sv) * 2008-10-03 2010-06-08 Oricane Ab Metod, anordning och datorprogramsprodukt för att representera den del av n-bitars intervall hörande till d-bitars data i ett datakommunikationsnät
US9996572B2 (en) * 2008-10-24 2018-06-12 Microsoft Technology Licensing, Llc Partition management in a partitioned, scalable, and available structured storage
US8886796B2 (en) 2008-10-24 2014-11-11 Microsoft Corporation Load balancing when replicating account data
US8078825B2 (en) * 2009-03-11 2011-12-13 Oracle America, Inc. Composite hash and list partitioning of database tables
US8510538B1 (en) * 2009-04-13 2013-08-13 Google Inc. System and method for limiting the impact of stragglers in large-scale parallel data processing
CN103488680B (zh) * 2009-06-19 2017-09-29 国际商业机器公司 在数据库系统中计数项目的方法
US8156304B2 (en) * 2009-12-04 2012-04-10 Oracle International Corporation Dynamic data storage repartitioning
US9401967B2 (en) * 2010-06-09 2016-07-26 Brocade Communications Systems, Inc. Inline wire speed deduplication system
CN102486798A (zh) * 2010-12-03 2012-06-06 腾讯科技(深圳)有限公司 数据加载的方法及装置
JP5600573B2 (ja) * 2010-12-07 2014-10-01 日本放送協会 負荷分散装置及びプログラム
US8560584B2 (en) * 2010-12-15 2013-10-15 Teradata Us, Inc. Database partition management
US10055480B2 (en) * 2015-05-29 2018-08-21 Sap Se Aggregating database entries by hashing
CN102681899B (zh) * 2011-03-14 2015-06-10 金剑 云计算服务平台的虚拟计算资源动态管理方法
US9002871B2 (en) * 2011-04-26 2015-04-07 Brian J. Bulkowski Method and system of mapreduce implementations on indexed datasets in a distributed database environment
CN102841894A (zh) * 2011-06-22 2012-12-26 比亚迪股份有限公司 一种文件分配表的数据存储方法
CN102244685B (zh) * 2011-08-11 2013-09-18 中国科学院软件研究所 一种支持负载均衡的分布式缓存动态伸缩方法及系统
US9235396B2 (en) * 2011-12-13 2016-01-12 Microsoft Technology Licensing, Llc Optimizing data partitioning for data-parallel computing
US20130159365A1 (en) * 2011-12-16 2013-06-20 Microsoft Corporation Using Distributed Source Control in a Centralized Source Control Environment
US8880565B2 (en) * 2011-12-23 2014-11-04 Sap Se Table creation for partitioned tables
US8762378B2 (en) * 2011-12-23 2014-06-24 Sap Ag Independent table nodes in parallelized database environments
US8880510B2 (en) * 2011-12-23 2014-11-04 Sap Se Unique value calculation in partitioned tables
US9852010B2 (en) * 2012-02-03 2017-12-26 Microsoft Technology Licensing, Llc Decoupling partitioning for scalability
EP2784675B1 (en) * 2012-02-09 2016-12-28 Huawei Technologies Co., Ltd. Method, device and system for data reconstruction
US9218630B2 (en) * 2012-03-22 2015-12-22 Microsoft Technology Licensing, Llc Identifying influential users of a social networking service
US8996464B2 (en) * 2012-06-11 2015-03-31 Microsoft Technology Licensing, Llc Efficient partitioning techniques for massively distributed computation
GB201210702D0 (en) * 2012-06-15 2012-08-01 Qatar Foundation A system and method to store video fingerprints on distributed nodes in cloud systems
CN102799628B (zh) * 2012-06-21 2015-10-07 新浪网技术(中国)有限公司 在key-value数据库中进行数据分区的方法和装置
US9015212B2 (en) * 2012-10-16 2015-04-21 Rackspace Us, Inc. System and method for exposing cloud stored data to a content delivery network
US8775464B2 (en) * 2012-10-17 2014-07-08 Brian J. Bulkowski Method and system of mapreduce implementations on indexed datasets in a distributed database environment
EP2725491B1 (en) * 2012-10-26 2019-01-02 Western Digital Technologies, Inc. A distributed object storage system comprising performance optimizations
US9009421B2 (en) * 2012-11-13 2015-04-14 International Business Machines Corporation Dynamically improving memory affinity of logical partitions
CN102968503B (zh) * 2012-12-10 2015-10-07 曙光信息产业(北京)有限公司 数据库系统的数据处理方法以及数据库系统
CN103064890B (zh) * 2012-12-11 2015-12-23 泉州豪杰信息科技发展有限公司 一种gps海量数据处理方法
ES2658188T3 (es) * 2012-12-27 2018-03-08 Huawei Technologies Co., Ltd. Método y aparato de extensión de particiones
US9298398B2 (en) * 2013-04-16 2016-03-29 International Business Machines Corporation Fine-grained control of data placement
US8688718B1 (en) * 2013-07-31 2014-04-01 Linkedin Corporation Management of data segments for analytics queries
KR20150030332A (ko) * 2013-09-12 2015-03-20 삼성전자주식회사 데이터 분산 처리 시스템 및 이의 동작 방법
JP6281225B2 (ja) * 2013-09-30 2018-02-21 日本電気株式会社 情報処理装置
IN2013MU03836A (zh) * 2013-12-06 2015-07-31 Tata Consultancy Services Ltd
CN103744975A (zh) * 2014-01-13 2014-04-23 锐达互动科技股份有限公司 基于分布式文件的高效缓存服务器
CN103929500A (zh) * 2014-05-06 2014-07-16 刘跃 一种分布式存储系统的数据分片方法
US9721021B2 (en) * 2014-05-27 2017-08-01 Quixey, Inc. Personalized search results
US10210171B2 (en) * 2014-06-18 2019-02-19 Microsoft Technology Licensing, Llc Scalable eventual consistency system using logical document journaling
US10002148B2 (en) * 2014-07-22 2018-06-19 Oracle International Corporation Memory-aware joins based in a database cluster
US20160092493A1 (en) * 2014-09-29 2016-03-31 International Business Machines Corporation Executing map-reduce jobs with named data
US9875263B2 (en) * 2014-10-21 2018-01-23 Microsoft Technology Licensing, Llc Composite partition functions
KR101912728B1 (ko) * 2014-11-05 2018-10-29 후아웨이 테크놀러지 컴퍼니 리미티드 데이터 처리 방법 및 장치
US9934871B2 (en) * 2015-04-17 2018-04-03 Western Digital Technologies, Inc. Verification of storage media upon deployment
US10482076B2 (en) * 2015-08-14 2019-11-19 Sap Se Single level, multi-dimension, hash-based table partitioning
US10977212B2 (en) * 2018-05-03 2021-04-13 Sap Se Data partitioning based on estimated growth

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2721504A1 (en) * 2011-06-17 2014-04-23 Alibaba Group Holding Limited File processing method, system and server-clustered system for cloud storage
CN102567464A (zh) * 2011-11-29 2012-07-11 西安交通大学 基于扩展主题图的知识资源组织方法
CN103797770A (zh) * 2012-12-31 2014-05-14 华为技术有限公司 一种共享存储资源的方法和系统
US20140189128A1 (en) * 2012-12-31 2014-07-03 Huawei Technologies Co., Ltd. Cluster system with calculation and storage converged
CN103812934A (zh) * 2014-01-28 2014-05-21 浙江大学 基于云存储系统的遥感数据发布方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3128716A4 *

Also Published As

Publication number Publication date
CA2941163A1 (en) 2016-05-12
US20170235501A1 (en) 2017-08-17
CN109918021B (zh) 2022-01-07
JP2017514217A (ja) 2017-06-01
EP3128716A4 (en) 2017-08-02
CN109918021A (zh) 2019-06-21
AU2014410705B2 (en) 2017-05-11
US20180225048A1 (en) 2018-08-09
CN106063226B (zh) 2019-03-08
US9952778B2 (en) 2018-04-24
CN107357522B (zh) 2019-11-15
EP3128716B1 (en) 2019-09-04
KR20160124885A (ko) 2016-10-28
CN107357522A (zh) 2017-11-17
US10628050B2 (en) 2020-04-21
JP6288596B2 (ja) 2018-03-07
KR101912728B1 (ko) 2018-10-29
AU2014410705A1 (en) 2016-09-15
EP3128716A1 (en) 2017-02-08
CA2941163C (en) 2019-04-16
CN106063226A (zh) 2016-10-26

Similar Documents

Publication Publication Date Title
WO2016070341A1 (zh) 数据处理方法和装置
US11354039B2 (en) Tenant-level sharding of disks with tenant-specific storage modules to enable policies per tenant in a distributed storage system
CN110431542B (zh) 管理存储网络中的i/o操作
US11055014B2 (en) Storage system providing automatic configuration updates for remote storage objects in a replication process
US11249834B2 (en) Storage system with coordinated recovery across multiple input-output journals of different types
US10534547B2 (en) Consistent transition from asynchronous to synchronous replication in hash-based storage systems
WO2019144553A1 (zh) 数据存储方法、装置及存储介质
WO2017201977A1 (zh) 一种数据写、读方法、装置及分布式对象存储集群
US10320905B2 (en) Highly available network filer super cluster
AU2015360953A1 (en) Dataset replication in a cloud computing environment
US20150143065A1 (en) Data Processing Method and Apparatus, and Shared Storage Device
US9733835B2 (en) Data storage method and storage server
US11099767B2 (en) Storage system with throughput-based timing of synchronous replication recovery
TW201531862A (zh) 記憶體資料分版技術
US20150347043A1 (en) Cluster consistent logical storage object naming
US11681475B2 (en) Methods, devices, and a computer program product for processing an access request and updating a storage system
CN110597809A (zh) 一种支持树状数据结构的一致性算法系统及其实现方法
US20200342065A1 (en) Replicating user created snapshots
US11068500B1 (en) Remote snapshot access in a replication setup
JPWO2015037205A1 (ja) データ処理システム、データ処理方法およびデータ処理プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14905367

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2941163

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2014410705

Country of ref document: AU

Date of ref document: 20141105

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20167026230

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2016560892

Country of ref document: JP

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2014905367

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014905367

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE