WO2020019749A1 - 一种数据分片方法、相关设备及计算机存储介质 - Google Patents

一种数据分片方法、相关设备及计算机存储介质 Download PDF

Info

Publication number
WO2020019749A1
WO2020019749A1 PCT/CN2019/080697 CN2019080697W WO2020019749A1 WO 2020019749 A1 WO2020019749 A1 WO 2020019749A1 CN 2019080697 W CN2019080697 W CN 2019080697W WO 2020019749 A1 WO2020019749 A1 WO 2020019749A1
Authority
WO
WIPO (PCT)
Prior art keywords
target field
target
user data
preset offset
field
Prior art date
Application number
PCT/CN2019/080697
Other languages
English (en)
French (fr)
Inventor
毕杰山
钟延辉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020019749A1 publication Critical patent/WO2020019749A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the technical field of data storage, and in particular, to a data fragmentation method, related equipment, and a computer storage medium.
  • the traditional centralized storage system uses a centralized storage server to store all data, and all services of the entire storage system are performed on the storage server. This not only affects the processing speed of data, but also has a single point of failure.
  • Distributed storage is the distributed storage of data in multiple independent storage nodes. Distributed storage can not only use multiple storage nodes to share the storage load, but also improve system reliability, scalability, and access efficiency.
  • a hash-based algorithm or a range-based method can be used to slice data to determine which storage node in the distributed storage system the data is stored in.
  • the data sharding method based on the hash algorithm can evenly distribute the data to different storage nodes, but it will disrupt the result of the data sorted by the lexicographic order of the key, resulting in the data cannot be allocated to the lexicographic order of the key.
  • Corresponding nodes which affects the performance when reading data in order.
  • the range-based data sharding method can make data stored in different storage nodes in the dictionary order of the key, but it will lead to uneven data distribution.
  • An embodiment of the present application discloses a data sharding method. By configuring a method of processing keywords in user data, and combining a method of hash sharding and range sharding, a storage location of user data in a distributed storage node is determined. .
  • an embodiment of the present application provides a data fragmentation method, including:
  • intercepting the keyword to obtain a target field includes: obtaining a preset offset; intercepting the keyword according to the preset offset, thereby obtaining the keyword The target field.
  • the preset offset includes a first preset offset and a second preset offset, wherein the first preset offset is used to intercept the first target field.
  • the second preset offset is used to intercept a second target field, and the target field includes the first target field and the second target field.
  • the keyword includes a separator
  • intercepting the keyword to obtain a target field includes: intercepting the keyword according to the separator to obtain the target field. .
  • the separator includes a first group of separators and a second group of separators, wherein the first group of separators is used to intercept a first target field, and the second group of separators is used For intercepting a second target field, the target field includes the first target field and the second target field.
  • the database for storing the user data is a schema-free database, wherein the schema-free database is distributed and stored in multiple storage nodes.
  • the sending the user data to a storage node corresponding to the target feature value according to the target feature value of the target field includes: combining the target feature value with the keyword A new keyword is obtained, and the user data is sent to the storage node corresponding to the target feature value according to the new keyword.
  • an embodiment of the present application provides a data fragmentation device, including:
  • a processing unit configured to intercept the keywords to obtain a target field; determine a target feature value of the target field according to a hash algorithm;
  • the communication unit is further configured to send the user data to a storage node corresponding to the target characteristic value according to the target characteristic value of the target field, where different storage nodes correspond to different characteristic value ranges, and the target characteristic The value belongs to one of the characteristic value ranges.
  • intercepting the keyword to obtain a target field includes: obtaining a preset offset; intercepting the keyword according to the preset offset, thereby obtaining the keyword The target field.
  • the preset offset includes a first preset offset and a second preset offset, wherein the first preset offset is used to intercept the first target field.
  • the second preset offset is used to intercept a second target field, and the target field includes the first target field and the second target field.
  • the keyword includes a separator
  • intercepting the keyword to obtain a target field includes: intercepting the keyword according to the separator to obtain the target field. .
  • the separator includes a first group of separators and a second group of separators, wherein the first group of separators is used to intercept a first target field, and the second group of separators is used For intercepting a second target field, the target field includes the first target field and the second target field.
  • the database for storing the user data is a schema-free database, wherein the schema-free database is distributed and stored in multiple storage nodes.
  • the processing unit is further configured to add the target feature value before the highest bit of the keyword to obtain a new keyword; the sending unit is specifically configured to, according to the new A keyword to send the user data to a storage node corresponding to the target feature value.
  • an embodiment of the present application provides a network device including a unit that executes the method according to the first aspect.
  • an embodiment of the present application provides a network device, including a processor, an input-output device, and a memory; the memory is used to store instructions, the processor is used to execute the instructions, and the input-output device is used to Communicating with other devices under the control of the processor; wherein when the processor executes the instructions, the method according to the first aspect is executed.
  • an embodiment of the present application provides a computer storage medium, where the computer-readable storage medium stores a computer program, wherein the computer program implements the method according to the first aspect when executed by a processor.
  • a preset offset or delimiter can be set according to a keyword composition method, and then keywords in the user data can be intercepted according to the preset offset or delimiter, so that it can be used in a schema-free manner.
  • a field having a specific meaning is taken out as a target field, and the obtained target field is hashed to obtain a target feature value, and then the user data is combined with a range slicing method according to the target feature value Send to the storage node corresponding to the target feature value, because the target feature value is obtained by a hash operation, the target feature value can be evenly distributed within the range of the feature value corresponding to the storage node, and the target feature value is combined according to the target feature value
  • the range sharding method sends the user data to the storage nodes corresponding to the target feature values.
  • the data can be evenly distributed to the storage nodes, and the user data adjacent to the target feature values can be stored in the same or similar phase. Neighboring storage nodes.
  • FIG. 1 is a schematic diagram of a system for implementing data fragmentation according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a data fragmentation method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a composition of a keyword provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a configuration interface configured by offset according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of intercepting keywords according to an offset according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another configuration interface configured by offset according to an embodiment of the present application.
  • FIGS. 7A-7B are schematic diagrams of intercepting keywords based on an offset according to another embodiment of the present application.
  • FIG. 8 is a schematic diagram of a configuration interface configured by a separator according to an embodiment of the present application.
  • FIG. 9 is another schematic diagram of intercepting keywords according to a separator according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of another configuration interface configured by a separator according to an embodiment of the present application.
  • 11A-11B are schematic diagrams of intercepting keywords based on a separator according to another embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a data fragmentation device according to an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a network device according to an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a data fragmentation system according to an embodiment of the present application.
  • a distributed storage system can store data in multiple storage nodes and provide high-performance, high-efficiency, and scalable storage services for large-scale storage.
  • data needs to be stored scattered across multiple storage nodes.
  • Data sharding is a technique used to determine the distribution of data among multiple storage nodes.
  • Data sharding is to distribute centralized user data to be stored in different storage nodes according to certain rules. Generally, it is determined based on the key or key hash value that user data with storage should be stored in multiple storage nodes. Which storage node.
  • the user data to be stored usually includes multiple fields. For example, as shown in Table 1, the user data to be stored includes identification, name, address, age, gender, and phone number.
  • the key may consist of one or more fields in the user data to be stored. For example, the key may be a combination of the identification field and the name field in Table 1.
  • Data sharding can achieve three purposes: first, the data is evenly distributed, that is, the amount of data in each storage node must be as close as possible; second, load balancing, that is, the read and write requests for each storage node must be as close as possible; three When increasing or decreasing storage nodes, the amount of data to be migrated should be as small as possible.
  • the data sharding method may include hash sharding and range sharding. The following describes the data sharding by using hash sharding and range sharding.
  • each storage node is responsible for one or more intervals, and the data is stored according to the interval to which the key belongs. For example, suppose there are three storage nodes with the numbers node0, node1, and node2. If the identification field in the user data of Table 1 is used as the key, the three storage nodes are responsible for the intervals of node0 (0,33], node1 (33,66], node2 (66,100], according to the range sharding method, the user data in the above table is stored in these three nodes as shown in Table 2:
  • the user data in Table 1 can be stored in the lexicographic order of the key values, but the data is unevenly distributed among the three nodes, which will cause the load between storage nodes to be uneven. balanced.
  • the hash algorithm based on the hash algorithm calculates the feature value corresponding to the key according to the hash algorithm, and allocates user data to different storage nodes according to the mapping relationship between the feature value and the storage node. For example, continue to use the user data in Table 1 as an example. If the identification field in the above user data is used as the key, the identity of each user data is calculated according to the hash algorithm to obtain the feature value corresponding to each identity, and then according to The feature value corresponding to each identifier and the mapping relationship between the feature value and the storage node. The user data corresponding to the identifier corresponding to each feature value is stored in a storage node that has a mapping relationship with the feature value.
  • the feature values are: 010, 074, 018, 037, 085, 024, 043, 055, 063, and 091. If the feature values are 010, 085 And 043 has a mapping relationship with node0, eigenvalues 018, 024, 055, and 091 have a mapping relationship with node1, and eigenvalues 074, 037, and 063 have a mapping relationship with node2. Then the user data in Table 1 is sharded according to the hash The storage relationship among the three nodes is shown in Table 3:
  • the user data in the above Table 1 can be evenly distributed among the three storage nodes.
  • the user data in Table 1 is arranged in the lexicographic order of the key values, and the data is divided using hash fragments. After the slice, although the user data can be evenly distributed to each storage node for storage, the result that the user data is arranged in the dictionary order of the key value is disrupted.
  • the intervals covered by the three storage nodes are node0 (0,33], node1 (33,66], node2 (66,100), and the key of the user data in Table 1 is based on the hash
  • the eigenvalues calculated by the algorithm are: 010, 074, 018, 037, 085, 024, 043, 055, 063, and 091.
  • the storage of user data in the three storage nodes is shown in Table 4.
  • the feature value in front of the new key determines the routing of the data, and the feature value in each new key is obtained according to the hash algorithm, and the feature value is on (0, 100) Evenly distributed, the range sharding method is used.
  • routing according to the new key it can not only store multiple pieces of data corresponding to the same identifier on the same storage node, but also make the data evenly distributed on each node, and also make the characteristics The data corresponding to the adjacent keys are stored adjacently.
  • a database system with a schema can perceive field information. That is, a database system with a schema can perceive that the key in the above example is a combination of an identification field and a name field.
  • the identification field in the key can be identified, and only the identification field can be hashed.
  • schema-less database systems such as HBase, Google Bigtable, and MongoDB, the system cannot identify the information represented by the fields that make up the key, so it cannot perform a hash operation by specifying a field with a specific meaning in the key. Combine hash fragmentation and range fragmentation to fragment the data.
  • an embodiment of the present application provides a data fragmentation method.
  • an administrator configures user data in a configuration page of a management terminal according to a key composition method of the user data.
  • the key processing method generates configuration information and sends the configuration information to the management server. After any user terminal of one or more user terminals obtains user data, the user needs to obtain the configuration information from the management server, then process the key according to the configuration information, and finally according to the processed key
  • the storage node sends the user data.
  • FIG. 2 is a schematic flowchart of a data fragmentation method according to an embodiment of the present application.
  • the data fragmentation method includes:
  • part of the keywords may be intercepted as the target field, or all fields may be intercepted as the target field, which is not specifically limited in the embodiment of the present application.
  • a hash algorithm also known as Secure Hash Algorithm (SHA)
  • SHA Secure Hash Algorithm
  • the SHA algorithm may be any one of Message Digest Algorithm 5 (Message Digest Algorithm 5), SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512.
  • Message Digest Algorithm 5 Message Digest Algorithm 5
  • SHA-1 SHA-224
  • SHA-256 SHA-256
  • SHA-384 SHA-512
  • the embodiments of the present application are not specifically limited.
  • the method before the above step 102 intercepts the keywords to obtain the target field, the method further includes: obtaining configuration information from a management server, where the configuration information includes intercepting the keywords. Method, truncation length, and selected hash algorithm.
  • the configuration information is set by the administrator according to the key composition method.
  • the key may be composed of one or more fields in the user data to be stored.
  • the composition of the key includes the following two methods:
  • the field length of at least one type of information in the user data to be stored is fixed, when designing the key, a type of information having a fixed field length and a type of information having a fixed field length may be combined.
  • a key or combine information of multiple types of fields with a fixed length as a key.
  • Table 1 the field lengths of the fields representing the three types of information such as identification, age, and phone are fixed.
  • A001Guangdong is used as the key of the user data corresponding to the identifier A001. Therefore, when reading user data, the length of the field representing the identifier is fixed. , When parsing the field information of the key, you can directly extract the field representing the ID from the first 4 bytes, and the remaining field is the field representing the name; or you can use the three fields of ⁇ ID + AGE + TEL ⁇ The combination is used as a key, such as A094251536492xxxx, because the field lengths of the three types of information are fixed. In this way, when parsing the key field information, you can divide according to the field length and combination order of the various types of information, and you can parse out the items contained in the key Class information.
  • a separator may be used to connect the fields representing different types of information in the key as the key.
  • Table 1 the data in Table 1 as an example. If the combination of the two fields ⁇ identification + name ⁇ is used as the key, you can add a separator between the two fields. For example, A001Zhaosan is represented as A001 ⁇ Zhaosan.
  • A001Zhaosan is represented as A001 ⁇ Zhaosan.
  • the interception method in the configuration information may be intercepting the key according to a preset offset; if the user data key is designed by using the second method described above, when the administrator sets the configuration information according to the user data key, the interception method in the configuration information may be interception of the key according to the separator.
  • the administrator configures an interception method and an interception length for intercepting the key in a configuration interface provided by the management server.
  • the administrator selects an interception method on the configuration interface of the management terminal as interception by offset, and the interception length ranges from the i-th byte to the j-th byte, where i and j are positive integers, j is greater than i.
  • the management terminal After the administrator is configured, the management terminal generates the following configuration information: ⁇ interceptiontype: "offset"; properties: ⁇ start: i, stop: j ⁇ , and sends the configuration information to the management server .
  • FIG. 5 is a schematic diagram of intercepting keywords according to an offset according to an embodiment of the present application.
  • a user terminal intercepts a preset offset from a high bit to a low bit of a user data key according to the configuration information. The amount of bytes is used as the target field. If the key of the user data is A001B01234220180523, the i configured by the administrator is equal to 1, and j is equal to 8, that is, the preset offset in the configuration information is 8. Starting with the most significant bit of A, the intercepted target field is A001B012.
  • the administrator may configure an interception method and interception of the key in a configuration interface provided by the management server. Length, as shown in FIG. 6, the administrator selects the interception mode as the offset in the configuration interface of the management terminal, and the interception length ranges from the i-th byte to the j-th byte, and the k-th byte to the m-th byte.
  • the preset offset includes a first preset offset and a second preset offset
  • the first preset offset is used to intercept a first target field
  • the second preset offset is used to intercept a second target field.
  • the target field includes the first target field and the second target field.
  • the target field After the administrator is configured, the management terminal generates the following configuration information: ⁇ interceptiontype: "offset"; properties: ⁇ start: i, stop: j ⁇ ; ⁇ start: k, stop: m ⁇ , and
  • the configuration information is sent to the management server.
  • the user terminal obtains the configuration information from the management server, and intercepts the key in the user data according to the offset in the configuration information.
  • the first target field and the second target field may include overlapping portions, that is, i ⁇ k ⁇ j ⁇ m or i ⁇ k ⁇ m ⁇ j.
  • the first target field obtained according to the above method is A001B0.
  • the second target field is 1B01 and the target field is A001B01B01. As shown in FIG.
  • the first preset offset and the second preset offset may not include an overlapping portion, that is, i ⁇ j ⁇ k ⁇ m, for example, the first preset offset
  • the offset is from the first byte to the fourth byte from the highest bit of the key
  • the second preset offset is from the first byte to the eighth byte from the lowest bit of the key, and continues to
  • the key of the user data is A001B01234220180523 as an example for description.
  • the first target field intercepted according to the above method is A001
  • the second target field is 20180523
  • the target field is A00120180523.
  • the administrator configures an interception method and an interception length for intercepting the key in a configuration interface provided by the management server. As shown in FIG. 8, the administrator selects an interception method on the configuration interface of the management terminal as interception by separator, and the interception length ranges from the i-th separator to the j-th separator, where i and j are integers, and j is greater than i, when i is equal to 0, it means that the interception starts from the most significant bit of the key.
  • the management terminal After the administrator is configured, the management terminal generates the following configuration information: ⁇ interceptiontype: "splitchars”; properties: ⁇ split: "$", start: i, stop: j ⁇ , and configures the configuration information Sending to the management server.
  • the user terminal After acquiring the user data, the user terminal obtains the configuration information from the management server, and intercepts keys in the user data according to a truncation length in the configuration information. As shown in FIG. 9, according to the configuration information, the terminal intercepts the user data key from high to low according to the separator as the target field.
  • the configuration information The start delimiter is the 0th delimiter, and the end delimiter is the 2nd delimiter, which means that starting from the highest bit of the key, the field before the first delimiter and the field between the second delimiter are taken as the target. Field, and the target field intercepted according to the configuration information is A001B012.
  • the administrator may configure an interception method and interception of the key in a configuration interface provided by the management server. Length, as shown in FIG. 10, the administrator selects the interception method as interception at the configuration interface of the management terminal, the interception length is from the i-th separator to the j-th separator, and the k-th separator to the m-th separator Delimiter, where i, j, k, and m are all integers, j> i, m> k, and when i is equal to 0, it means that the interception starts from the highest bit of the key.
  • the separator includes a first group of separators and a second group of separators, wherein the first group of separators is used to intercept a first target field, and the second group of separators is used to intercept a second target field,
  • the target field includes the first target field and the second target field.
  • the first target field and the second target field may include overlapping parts, that is, i ⁇ k ⁇ j ⁇ m or i ⁇ k ⁇ m ⁇ j.
  • the start delimiter of the second group of delimiters is the first delimiter, and the end delimiter is the second delimiter.
  • the obtained first One target field is A001B012
  • the second target field is B012
  • the combined target field is A001B012B012.
  • the start delimiter of the first group of delimiters is the first delimiter
  • the end delimiter is the second delimiter
  • the start delimiter of the second group of delimiters is the third delimiter
  • the terminating delimiter is the fourth delimiter.
  • the target field is hashed according to the hash algorithm selected in the configuration information to obtain the target feature value, and then segmented based on the range-based data.
  • the method in combination with the target characteristic value, allocates the user data to a corresponding storage node.
  • the target field is hashed according to the hash algorithm selected in the configuration information to obtain the target feature value, and then the target is The feature value is added before the highest bit of the keyword to obtain a new keyword.
  • the administrator can set a preset offset or separator according to the composition method of keywords, and then intercept keywords in user data according to the preset offset or separator, so that
  • a field having a specific meaning is taken out as a target field, and the obtained target field is hashed to obtain a target feature value, and then a method of range segmentation is combined with the target feature value
  • Send the user data to a storage node corresponding to the target feature value and since the target feature value is obtained through a hash operation, the target feature value can be evenly distributed within a range of feature values corresponding to the storage node,
  • the method of combining the target feature value with the range sharding sends the user data to the storage node corresponding to the target feature value.
  • the data can be evenly distributed to the storage nodes, and the user data adjacent to the target feature value can also be distributed. Stored in the same or adjacent storage nodes.
  • FIG. 12 is a schematic structural diagram of a data fragmentation apparatus according to an embodiment of the present application.
  • the network device 200 includes at least: a communication unit 210 and a processing unit 220. among them,
  • the communication unit 210 is configured to obtain user data, where the user data includes keywords;
  • a processing unit 220 configured to intercept the keywords to obtain a target field; and determine a target feature value of the target field according to a hash algorithm;
  • the communication unit 210 is further configured to send the user data to a storage node corresponding to the target characteristic value according to the target characteristic value of the target field, where different storage nodes correspond to different characteristic value ranges, and the target Eigenvalues belong to one of the eigenvalue ranges.
  • the processing unit 220 performs The keywords are intercepted to obtain the target field, and the target field is hashed according to the hash algorithm selected in the configuration information to obtain the target feature value of the target field.
  • the communication unit 210 is based on the target feature value. And sending the user data to a corresponding storage node.
  • FIG. 13 is a schematic structural diagram of a network device according to an embodiment of the present application.
  • the network device includes 300 at least: a processor 310, an input-output device 320, and a memory 330.
  • the processor 310, input-output The device 320 and the memory 330 are connected to each other through a bus 340, where:
  • the processor 310 may be a central processing unit (CPU), or a combination of a CPU and a hardware chip.
  • the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • the PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
  • the memory 330 includes, but is not limited to, Random Access Memory (RAM), Read-Only Memory (ROM), or Erasable Programmable Read-Only Memory (EPROM). Or flash memory), the memory 330 is used to store program code and data, and can transmit the stored data to the processor 310.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • flash memory the memory 330 is used to store program code and data, and can transmit the stored data to the processor 310.
  • the processor 310 in the network device 300 is configured to read related instructions in the memory 330 and perform the following operations:
  • the processor 310 controls receiving user data in the input-output device 320, where the user data includes keywords;
  • the processor 310 intercepts the keywords to obtain a target field, and determines a target feature value of the target field according to a hash algorithm;
  • the processor 310 controls the input-output device 320 to send the user data to the storage node corresponding to the target characteristic value according to the target characteristic value of the target field;
  • the system shown in FIG. 14 includes a user equipment cluster (that is, the above-mentioned network equipment) composed of multiple user equipments and a storage node cluster composed of multiple storage nodes.
  • the system of FIG. 14 may provide a database cloud service to a user, wherein a storage node cluster may provide a storage cloud service.
  • any one or more of the plurality of user equipment's input devices obtains configuration information from a management server, and the processor in the user equipment calls the program stored in the memory according to the configuration information.
  • a code and intercepting keywords in the user data to obtain a target field, and hashing the target field according to a hash algorithm selected in the configuration information to obtain a target feature value of the target field, the user
  • the output device in the device sends the user data to a corresponding storage node according to the target characteristic value.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server, or data center Transmission by wire (for example, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (for example, infrared, wireless, microwave, etc.) to another website site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, and the like that includes one or more available medium integration.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (Solid State Disk) (SSD)).
  • a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
  • an optical medium for example, a DVD
  • a semiconductor medium for example, a solid state disk (Solid State Disk) (SSD)).
  • SSD Solid State Disk

Abstract

本申请实施例提供一种数据分片方法及相关设备,该方法通过配置对用户数据中关键字进行截取的截取方式,根据所述截取方式对获取的用户数据中的关键字进行截取,从而得到目标字段,然后根据哈希算法确定所述目标字段的目标特征值;然后根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据,其中,不同的存储节点对应不同的特征值范围,所述目标特征值属于其中一个特征值范围。通过实施该方法,能够在无schema的数据库系统中,既能够使数据均匀的分配到各存储节点中,同时也可以将目标特征值相邻的用户数据存储在相同或者相邻的存储节点中。

Description

一种数据分片方法、相关设备及计算机存储介质 技术领域
本发明涉及数据存储技术领域,尤其涉及一种数据分片方法、相关设备及计算机存储介质。
背景技术
传统的集中式存储系统采用集中的存储服务器存放所有的数据,整个存储系统的所有业务均在存储服务器中进行,这样不仅会影响对数据的处理速度,同时也会存在单点故障的问题。分布式存储是将数据分散存储在多个独立的存储节点中,分布式存储不仅可以利用多个存储节点分担存储负荷,还能提高系统的可靠性、可扩展性以及存取效率。
分布式存储系统中,可以采用基于哈希(hash)算法或者基于范围(range)的方法对数据进行分片,以确定将数据存储到分布式存储系统中的哪个存储节点。基于哈希算法的数据分片方法能够将数据均匀的分配到不同的存储节点中,但是会打乱数据按关键字(key)的字典顺序排序的结果,导致数据不能按key的字典顺序分配到对应的节点中,从而影响按顺序读取数据时的性能,基于范围的数据分片方法可以使数据按key的字典顺序存放到不同的存储节点,但是会导致数据分布不均衡。
发明内容
本申请实施例公开了一种数据分片方法,通过配置对用户数据中关键字的处理方式,并结合hash分片以及range分片的方法,从而确定用户数据在分布式存储节点中的存储位置。
第一方面,本申请实施例提供一种数据分片方法,包括:
获取用户数据,其中,所述用户数据包括关键字;
对所述关键字进行截取,从而得到目标字段;
根据哈希算法确定所述目标字段的目标特征值;
根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据,其中,不同的存储节点对应不同的特征值范围,所述目标特征值属于其中一个特征值范围。
在一可能的实施例中,对所述关键字进行截取,从而得到目标字段,包括:获取预设偏移量;根据所述预设偏移量对所述关键字进行截取,从而得到所述目标字段。
在一可能的实施例中,所述预设偏移量包括第一预设偏移量和第二预设偏移量,其中,所述第一预设偏移量用于截取第一目标字段,所述第二预设偏移量用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段。
在一可能的实施例中,所述关键字包括分隔符,对所述关键字进行截取,从而得到目标字段,包括:根据所述分隔符对所述关键字进行截取,从而得到所述目标字段。
在一可能的实施例中,所述分隔符包括第一组分隔符和第二组分割符,其中,所 述第一组分隔符用于截取第一目标字段,所述第二组分隔符用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段。
在一可能的实施例中,用于存储所述用户数据的数据库为无schema定义的数据库,其中,无schema定义的数据库分布存储在多个存储节点中。
在一可能的实施例中,所述根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据,包括:将所述目标特征值与所述关键字组合得到新关键字,根据所述新关键字,向所述目标特征值对应的存储节点发送所述用户数据。
第二方面,本申请实施例提供一种数据分片装置,包括:
通信单元,用于获取用户数据,其中,所述用户数据包括关键字;
处理单元,用于对所述关键字进行截取,从而得到目标字段;根据哈希算法确定所述目标字段的目标特征值;
所述通信单元还用于根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据,其中,不同的存储节点对应不同的特征值范围,所述目标特征值属于其中一个特征值范围。
在一可能的实施例中,对所述关键字进行截取,从而得到目标字段,包括:获取预设偏移量;根据所述预设偏移量对所述关键字进行截取,从而得到所述目标字段。
在一可能的实施例中,所述预设偏移量包括第一预设偏移量和第二预设偏移量,其中,所述第一预设偏移量用于截取第一目标字段,所述第二预设偏移量用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段。
在一可能的实施例中,所述关键字包括分隔符,对所述关键字进行截取,从而得到目标字段,包括:根据所述分隔符对所述关键字进行截取,从而得到所述目标字段。
在一可能的实施例中,所述分隔符包括第一组分隔符和第二组分割符,其中,所述第一组分隔符用于截取第一目标字段,所述第二组分隔符用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段。
在一可能的实施例中,用于存储所述用户数据的数据库为无schema定义的数据库,其中,无schema定义的数据库分布存储在多个存储节点中。
在一可能的实施例中,所述处理单元,还用于将所述目标特征值添加到所述关键字最高位之前,从而得到新关键字;所述发送单元具体用于,根据所述新关键字,向所述目标特征值对应的存储节点发送所述用户数据。
第三方面,本申请实施例提供一种网络设备,包括执行如第一方面所述的方法的单元。
第四方面,本申请实施例提供一种网络设备,包括处理器、输入输出设备以及存储器;所述存储器用于存储指令,所述处理器用于执行所述指令,所述输入输出设备用于在所述处理器的控制下与其他设备进行通信;其中,所述处理器执行所述指令时执行如第一方面所述的方法。
第五方面,本申请实施例提供一种计算机存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如第一方面所述的方法。
本申请实施例可以根据关键字的组成方式,设置预设偏移量或者分隔符进而根据预设偏移量或者分隔符对用户数据中的关键字进行截取,从而能够在无模式(schema)的数据库系统中,将具有特定意义的字段截取出来作为目标字段,并将截取得到的目标字段进行哈希运算得到目标特征值,再根据所述目标特征值结合range分片的方法将所述用户数据发送到目标特征值对应的存储节点,由于所述目标特征值是通过哈希运算得到,因此所述目标特征值能够在存储节点所对应的特征值范围内分布均匀,根据所述目标特征值结合range分片的方法将所述用户数据发送到目标特征值对应的存储节点,既能够使数据均匀的分配到各存储节点中,同时也可以将目标特征值相邻的用户数据存储在相同或者相邻的存储节点中。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种实现数据分片的系统示意图;
图2为本申请实施例提供的一种数据分片方法的流程示意图;
图3为本申请实施例提供的一种关键字的组成示意图;
图4为本申请实施例提供的一种按偏移量进行配置的配置界面的示意图;
图5为本申请实施例提供的一种根据偏移量对关键字进行截取的示意图;
图6为本申请实施例提供的另一种按偏移量进行配置的配置界面的示意图;
图7A-7B为本申请实施例提供的另一种根据偏移量对关键字进行截取的示意图;
图8为本申请实施例提供的一种按分隔符进行配置的配置界面的示意图;
图9为本申请实施例提供的另一种根据分隔符对关键字进行截取的示意图;
图10为本申请实施例提供的另一种按分隔符进行配置的配置界面的示意图;
图11A-11B为本申请实施例提供的另一种根据分割符对关键字进行截取的示意图;
图12本申请实施例提供的一种数据分片装置的结构示意图;
图13本申请实施例提供的一种网络设备的结构示意图;
图14本申请实施例提供的一种数据分片系统示意图。
具体实施方式
分布式存储系统能够把数据分散存储在多个存储节点中,为大规模存储提供高性能、高效率以及扩展性好的存储服务。在分布式存储系统中,数据需要分散存储在多个存储节点中,数据分片就是用来确定数据在多个存储节点中分布的技术。
数据分片是按照一定的规则,将集中的待存储的用户数据分布到不同的存储节点中,一般是根据key或者key的哈希值确定带存储的用户数据应该存储在多个存储节点中的哪个存储节点。其中,待存储的用户数据通常包括多个字段。例如,如表1所示,待存储的用户数据包括标识、姓名、住址、年龄、性别以及电话等。key可以由待存储的用户数据中的一个或者多个字段组成,比如,key可以由表1中的标识字段 与姓名字段组合而成。
表1待存储的用户数据
Figure PCTCN2019080697-appb-000001
数据分片可以达到三个目的:一是数据分布均匀,即每一个存储节点中的数据量要尽可能相近;二是负载均衡,即每一个存储节点的读写请求量要尽可能接近;三是当增加或者减少存储节点时,需要迁移的数据量要尽可能少。数据分片方法可以包括hash分片以及range分片,下面以hash分片以及range分片的方法对数据分片进行说明。
在range分片方法中,每个存储节点负责一个或者多个区间,数据按照key所属的区间进行存储。举例来讲,假设有三个存储节点,编号分别为node0,node1和node2,若将上述表1用户数据中的标识字段作为key,上述三个存储节点负责的区间分别为node0(0,33]、node1(33,66]、node2(66,100],则根据range分片方法,上表中的用户数据在这三个节点中的存储情况如下表2所示:
表2数据存储情况
存储节点 节点存储的信息
node0 A001、A007、A016、A025、A033
node1 A046、A055、A066
node2 A082、A094
由上表2可知,采用range分片方法,表1中的用户数据可以按照key值的字典顺序进行存储,但是数据在三个节点中的分布不均匀,进而会导致存储节点之间的负载不均衡。
基于哈希算法的hash分片是根据哈希算法计算key对应的特征值,并根据特征值与存储节点之间的映射关系,将用户数据分配到不同的存储节点中。例如,继续以表1中的用户数据为例进行说明,若将上述用户数据中的标识字段作为key,将每条用户数据的标识根据哈希算法计算得到每个标识对应的特征值,然后根据每个标识对应的特征值以及特征值与存储节点之间的映射关系,将每个特征值对应的标识所对应的用户数据存储到与该特征值存在映射关系的存储节点中。根据上述原理,若表1中用户数据的key根据哈希算法计算得到的特征值依次为:010、074、018、037、085、024、043、055、063以及091,若特征值010、085以及043与node0存在映射关系,特征值018、024、055以及091与node1存在映射关系,特征值074、037以及063与node2存在映射关系,则表1中的用户数据在根据hash分片后在三个节点中的存储关系如下表3所示:
表3数据存储情况
存储节点 节点存储的信息
node0 A001、A033、A055
node1 A016、A046、A062、A094
node2 A007、A025、A082
由上表3可知,上述表1中的用户数据可以较为平均的分布于三个存储节点中,但是,表1中的用户数据是按照key值的字典顺序排列的,采用hash分片进行数据分片之后,用户数据虽然能够被均匀的分配到各个存储节点中进行存储,但是用户数据按照key值字典顺序排列的结果被打乱了。
针对上述两种方法中存在的问题,可以采用hash分片与range分片相结合的方法对数据进行分片,继续以表1中的用户数据为例进行说明,若选择{标识+姓名}作为key,在采用hash分片与range分片相结合的方法对数据进行分片时,首先只对key中的标识部分进行哈希运算,得到标识部分对应的特征值,然后将标识部分对应的特征值添加到对应的key的最高位前面组成新key,最后采用range分片的方法,根据新key将数据路由到对应的存储节点中。举例来讲,若有三个存储节点,三个存储节点负责的区间分别为node0(0,33]、node1(33,66]、node2(66,100],表1中用户数据的key根据哈希算法计算得到的特征值依次为:010、074、018、037、085、024、043、055、063以及091,则采用range分片的方法将用户数据路由到对应的节点后,表1中的用户数据在三个存储节点中的存储情况如表4所示。
表3数据存储关系表
存储节点 节点存储的信息
node0 A001、A016、A046
node1 A025、A055、A062、A082
node2 A007、A033、A094
由于在根据新key进行路由时,位于新key前面的特征值对数据的路由起决定作用,而每个新key中的特征值是根据哈希算法得到,该特征值在(0,100]上分布均匀,则采用range分片的方法,根据新key进行路由时,既能使同一个标识对应的多条数据存储在同一个存储节点,又能使数据在各个节点分布均匀,还可以使特征值相邻的key对应的数据相邻存储。
但是,上述方案只适用于有模式(schema)的数据库系统,有schema的数据库系统能够感知字段的信息,即有schema的数据库系统能够感知上述举例中key是由标识字段与姓名字段组合而成,在采用hash分片与range分片相结合的方法对数据进行分片的过程中,能够识别出key中的标识字段进而能够只对标识字段进行哈希运算。而对于无schema的数据库系统,例如HBase、Google Bigtable以及MongoDB等,系统无法识别组成key的各字段所代表的信息,因此无法通过指定key中的某个具有特定意义的字段进行哈希运算,进而结合hash分片与range分片对数据进行分片。
针对上述问题,本申请实施例提供一种数据分片方法,如图1所示,该方法中,由管理员根据用户数据的key的组成方式,在管理终端的配置页面中配置对用户数据的key的处理方式,生成配置信息并将配置信息发送给管理服务器。当一个或者多个用户终端中的任意一个用户终端在获取用户数据之后,需要从管理服务器中获取所述配置信息,然后根据所述配置信息对key进行处理,最后再根据经过处理之后的key向存储节点发送所述用户数据。
具体的,请参见图2,图2是本申请实施例提供的一种数据分片方法的流程示意图,如图2所示,所述数据分片方法包括:
101、获取用户数据,其中,所述用户数据包括关键字;
102、对所述关键字进行截取,从而得到目标字段;
本申请实施例中,在对所述关键字进行截取时,既可以截取部分关键字作为所述目标字段,也可以截取全部字段作为目标字段,本申请实施例不做具体限制。
103、根据哈希算法确定所述目标字段的目标特征值;
对所述关键字进行截取得到所述目标字段之后,采用哈希算法,也称安全散列算法(Secure Hash Algorithm,SHA)对所述目标字段进行哈希运算,得到所述目标字段对应的固定长度的目标特征值。
可选地,所述SHA算法可以是消息摘要算法第五版(Message Digest Algorithm,MD5)、SHA-1、SHA-224、SHA-256、SHA-384以及SHA-512等算法中的任意一种,本申请实施例不做具体限制。
104、根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据,其中,不同的存储节点对应不同的特征值范围,所述目标特征值属于其中一个特征值范围。
本申请实施例中,上述步骤102对所述关键字进行截取,从而得到目标字段之前,所述方法还包括:从管理服务器获取配置信息,所述配置信息包括对所述关键字进行截取的截取方式、截取长度以及所选取的哈希算法。其中,所述配置信息为管理员根 据key的组成方式进行设置的,key可以由待存储的用户数据中的一个或者多个字段组成,key的组成包括以下两种方式:
第一种方式中,如果需要存储的用户数据中有至少一类信息的字段长度是固定的,则在设计key时,可以将字段长度固定的一类信息与字段长度不固定的一类信息组合作为key,或者将多类字段长度固定的信息进行组合作为key。继续以表1中的数据为例进行说明,表1中,表示标识、年龄以及电话这三类信息的字段的字段长度固定,则在设计key时,如图3所示,图3中用户数据由多个字段组成,可以以{标识+住址}这两个字段的组合作为key,例如将A001Guangdong作为标识A001对应的用户数据的key,这样在读取用户数据时,由于表示标识的字段长度固定,在解析key的字段信息时,可以直接从前4个字节中提取出表示标识的字段,则剩余的字段即为表示姓名的字段;或者可以以{标识+年龄+电话}这三个字段的组合作为key,如A094251536492xxxx,由于表示这三类信息的字段长度都固定,这样在解析key字段信息时,可以根据各类信息的字段长度以及组合顺序进行分割,即可解析出key中包含的各类信息。
第二种方式中,在设计key时,可以采用分隔符连接key中表示不同类信息的字段作为key,。继续以表1中的数据为例,若以{标识+姓名}这两个字段的组合作为key,则可以在这两个字段之间增加分隔符,例如将A001Zhaosan表示为A001^Zhaosan,这样在读取用户数据解析key字段信息时,可以直接根据分隔符对key进行分割,即可解析出key中包含的各类信息。
若采用上述第一种方式设计用户数据的key,则管理员根据用户数据的key设置所述配置信息时,所述配置信息中的截取方式可以为根据预设偏移量对key进行截取;若采用上述第二种方式设计用户数据的key,则管理员根据用户数据的key设置所述配置信息时,所述配置信息中的截取方式可以为根据分隔符对key进行截取。
在一种可能的实施例中,若采用上述第一种方式设计用户数据的key,则所述管理员在所述管理服务器提供的配置界面配置对所述key进行截取的截取方式以及截取长度,如图4所示,所述管理员在管理终端的配置界面选择截取方式为按偏移量截取,截取长度为从第i字节到第j字节,其中,i和j均为正整数,j大于i。在所述管理员配置完之后,所述管理终端生成如下配置信息:{interceptiontype:“offset”;properties:{start:i,stop:j}},并将所述配置信息发送给所述管理服务器。用户终端在获取所述用户数据之后,从所述管理服务器获取所述配置信息,并根据所述配置信息中的偏移量对所述用户数据中的key进行截取。如图5所示,图5为本申请实施例提供的一种根据偏移量对关键字进行截取的示意图,用户终端根据所述配置信息自用户数据的key的高位到低位截取预设偏移量的字节作为目标字段,若用户数据的key为A001B01234220180523,管理员配置的i等于1,j等于8,即所述配置信息中的预设偏移量为8,则根据配置信息,从key的最高位开始,截取的目标字段为A001B012。
在另一种可能的实施例中,若采用上述第一种方式设计用户数据的key,则所述管理员可以在所述管理服务器提供的配置界面配置对所述key进行截取的截取方式以及截取长度,如图6所示,所述管理员在管理终端的配置界面选择截取方式为按偏移量截取,截取长度为从第i字节到第j字节,以及第k字节到第m字节,其中,i,j, k和m均为正整数,j>i,m>k,即所述预设偏移量包括第一预设偏移量和第二预设偏移量,所述第一预设偏移量用于截取第一目标字段,所述第二预设偏移量用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段。在所述管理员配置完之后,所述管理终端生成如下配置信息:{interceptiontype:“offset”;properties:{start:i,stop:j};{start:k,stop:m}},并将所述配置信息发送给所述管理服务器。用户终端在获取所述用户数据之后,从所述管理服务器获取所述配置信息,并根据所述配置信息中的偏移量对所述用户数据中的key进行截取。
可选地,如图7A所示,所述第一目标字段与所述第二目标字段可以包括重叠的部分,即i<k<j≤m或者i≤k<m≤j,例如,若管理员配置的i=1,j=6,k=4,m=7,即所述第一预设偏移量为从key的最高位开始的第一字节到第六字节,所述第二预设偏移量为从key的最高位开始的第四字节到第七字节,继续以用户数据的key为A001B01234220180523为例进行说明,则根据上述方法截取得到的第一目标字段为A001B0,第二目标字段为1B01,目标字段为A001B01B01。如图7B所示,所述第一预设偏移量与所述第二预设偏移量也可以不包括重叠的部分,即i<j<k<m,例如,所述第一预设偏移量为从key的最高位开始的第一字节到第四字节,所述第二预设偏移量为从key的最低位开始的第一字节到第八字节,继续以用户数据的key为A001B01234220180523为例进行说明,则根据上述方法截取得到的第一目标字段为A001,第二目标字段为20180523,目标字段为A00120180523。
在一种可能的实施例中,若采用上述第二种方式设计用户数据的key,则所述管理员在所述管理服务器提供的配置界面配置对所述key进行截取的截取方式以及截取长度,如图8所示,所述管理员在管理终端的配置界面选择截取方式为按分隔符截取,截取长度为从第i分隔符到第j分隔符,其中,i和j均为整数,j大于i,当i等于0时,表示从所述key的最高位开始截取。在所述管理员配置完之后,所述管理终端生成如下配置信息:{interceptiontype:“splitchars”;properties:{split:“$”,start:i,stop:j}},并将所述配置信息发送给所述管理服务器。用户终端在获取所述用户数据之后,从所述管理服务器获取所述配置信息,并根据所述配置信息中的截取长度对所述用户数据中的key进行截取。如图9所示,终端根据所述配置信息自用户数据的key中从高位到低位根据分隔符进行截取作为目标字段,若用户数据的key为A001^B012^342^20180523^,所述配置信息中的起始分隔符为第0分隔符,终止分割符为第2分隔符,则表示从key的最高位开始,截取第一分隔符之前的字段以及第二个分隔符之间的字段作为目标字段,根据该配置信息截取的目标字段为A001B012。
在另一种可能的实施例中,若采用上述第二种方式设计用户数据的key,则所述管理员可以在所述管理服务器提供的配置界面配置对所述key进行截取的截取方式以及截取长度,如图10所示,所述管理员在管理终端的配置界面选择截取方式为按偏移量截取,截取长度为从第i分隔符到第j分隔符,以及第k分隔符到第m分隔符,其中,i,j,k和m均为整数,j>i,m>k,当i等于0时,表示从所述key的最高位开始截取。即所述分隔符包括第一组分隔符以及第二组分隔符,其中,所述第一组分隔符用于截取第一目标字段,所述第二组分隔符用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段,在所述管理员配置完之后,所述管理终 端生成如下配置信息:{interceptiontype:“splitchars”;properties:{split:“$”,start:i,stop:j};{split:“$”,start:k,stop:m}},并将所述配置信息发送给所述管理服务器。用户终端在获取所述用户数据之后,从所述管理服务器获取所述配置信息,并根据所述配置信息中的偏移量对所述用户数据中的key进行截取。
可选地,如图11A所示,所述第一目标字段与所述第二目标字段可以包括重叠的部分,即i<k<j≤m或者i≤k<m≤j,例如,若管理员配置的i=0,j=m=2,k=1,即所述配置信息中所述第一组分隔符的起始分隔符为第0分隔符,终止分隔符为第二分隔符,第二组分隔符的起始分隔符为第一分隔符,终止分隔符为第二分隔符,若用户数据的key为A001^B012^342^20180523^,则根据上述配置信息,截取得到的第一目标字段为A001B012,第二目标字段为B012,组合得到的目标字段为A001B012B012。如图11B所示,所述第一目标字段与所述第二目标字段为不相邻的目标字段,即i<j<k<m,例如,i=1,j=2,k=3,m=4,所述配置信息中第一组分割符的起始分隔符为第一分隔符,终止分割符为第二分隔符,第二组分割符的起始分隔符为第三分隔符,终止分隔符为第四分隔符,若用户数据的key为A001^B012^342^20180523^,则根据上述配置信息,截取得到的第一目标字段为B012,第二目标字段为20180523,组合得到的目标字段为B01220180523。
本申请实施例中,对所述关键字进行截取得到目标关键字之后,根据配置信息中所选取的哈希算法对目标字段进行哈希运算,得到目标特征值,再根据基于范围的数据分片方法,结合所述目标特征值,将所述用户数据分配到对应的存储节点中。
在一种可能的实施例中,对所述关键字进行截取得到目标关键字之后,根据配置信息中所选取的哈希算法对目标字段进行哈希运算,得到目标特征值,再将所述目标特征值添加到所述关键字最高位之前,从而得到新关键字,最后根据基于范围的数据分片方法,结合所述新关键字,将所述用户数据分配到对应的存储节点中。例如,若所述配置信息中所选取的哈希算法为R=H(S),其中,H表示所选取的哈希算法对应的哈希函数,S表示目标字段,R表示目标字段S经过哈希运算之后的目标特征值,若所述目标字段S为A001B012,经过哈希运算之后的目标特征值为R=H(A001B012)=5683,则新关键字为5683A001B012。
通过实施本申请实施例,管理员可以根据关键字的组成方式,设置预设偏移量或者分隔符进而根据预设偏移量或者分隔符对用户数据中的关键字进行截取,从而能够在无模式(schema)的数据库系统中,将具有特定意义的字段截取出来作为目标字段,并将截取得到的目标字段进行哈希运算得到目标特征值,再根据所述目标特征值结合range分片的方法将所述用户数据发送到目标特征值对应的存储节点,由于所述目标特征值是通过哈希运算得到,因此所述目标特征值能够在存储节点所对应的特征值范围内分布均匀,根据所述目标特征值结合range分片的方法将所述用户数据发送到目标特征值对应的存储节点,既能够使数据均匀的分配到各存储节点中,同时也可以将目标特征值相邻的用户数据存储在相同或者相邻的存储节点中。
请参见图12,图12为本申请实施例提供的一种数据分片装置的结构示意图,如图12所示,所述网络设备包括200至少包括:通信单元210和处理单元220。其中,
所述通信单元210,用于获取用户数据,其中,所述用户数据包括关键字;
处理单元220,用于对所述关键字进行截取,从而得到目标字段;并根据哈希算法确定所述目标字段的目标特征值;
所述通信单元210还用于根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据,其中,不同的存储节点对应不同的特征值范围,所述目标特征值属于其中一个特征值范围。
本申请实施例中,所述通信单元210在获取用户数据,并从管理服务器获取用于对用户数据中的关键字key进行处理的配置信息之后,所述处理单元220根据所述配置信息对所述关键字进行截取,得到目标字段,并根据配置信息中选择的哈希算法对所述目标字段进行哈希运算得到所述目标字段的目标特征值,所述通信单元210根据所述目标特征值将所述用户数据发送到对应的存储节点。
具体地,上述网络设备200执行的各种操作的具体实现可参照上述方法实施例的具体操作,在此不再赘述。
请参见13,图13为本申请实施例提供的一种网络设备的结构示意图,所述网络设备包括300至少包括:处理器310、输入输出设备320以及存储器330,所述处理器310、输入输出设备320以及存储器330通过总线340相互连接,其中,
所述处理器310可以是中央处理器(central processing unit,CPU),或者CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
所述存储器330包括但不限于是随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)或可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或者快闪存储器),该存储器330用于存储程序代码及数据,并可以将存储的数据传输给处理器310。
所述网络设备300中的处理器310用于读取存储器330中的相关指令执行以下操作:
处理器310控制输入输出设备320中接收用户数据,所述用户数据包括关键字;
处理器310对所述关键字进行截取,从而得到目标字段,并根据哈希算法确定所述目标字段的目标特征值;
处理器310控制输入输出设备320根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据;
该网络设备应用在如图14所示的数据分片系统中,图14所示的系统包括多个用户设备组成的用户设备集群(即上述网络设备)以及多个存储节点组成的存储节点集群,图14的系统可以向用户提供数据库云服务,其中存储节点集群可以为存储云服务。所述多个用户设备中任意一个或者多个用户设备的输入设备在接收到用户数据之后,从管理服务器获取配置信息,用户设备中的处理器根据所述配置信息,并调用存储器中存储的程序代码,并对所述用户数据中的关键字进行截取得到目标字段,并根据配置信息中选择的哈希算法对所述目标字段进行哈希运算得到所述目标字段的目标特征 值,所述用户设备中的输出设备根据所述目标特征值将所述用户数据发送到对应的存储节点。
具体地,上述网络设备300执行的各种操作的具体实现可参照上述方法实施例的具体操作,在此不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (16)

  1. 一种数据分片方法,其特征在于,包括:
    获取用户数据,其中,所述用户数据包括关键字;
    对所述关键字进行截取,从而得到目标字段;
    根据哈希算法确定所述目标字段的目标特征值;
    根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据,其中,不同的存储节点对应不同的特征值范围,所述目标特征值属于其中一个特征值范围。
  2. 根据权利要求1所述的方法,其特征在于,对所述关键字进行截取,从而得到目标字段,包括:
    获取预设偏移量;
    根据所述预设偏移量对所述关键字进行截取,从而得到所述目标字段。
  3. 根据权利要求2所述的方法,其特征在于,所述预设偏移量包括第一预设偏移量和第二预设偏移量,其中,所述第一预设偏移量用于截取第一目标字段,所述第二预设偏移量用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段。
  4. 根据权利要求1所述的方法,其特征在于,所述关键字包括分隔符,对所述关键字进行截取,从而得到目标字段,包括:
    根据所述分隔符对所述关键字进行截取,从而得到所述目标字段。
  5. 根据权利要求4所述的方法,其特征在于,所述分隔符包括第一组分隔符和第二组分割符,其中,所述第一组分隔符用于截取第一目标字段,所述第二组分隔符用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段。
  6. 根据权利要求1至5任一项权利要求所述的方法,其特征在于,用于存储所述用户数据的数据库为无schema定义的数据库,其中,无schema定义的数据库分布存储在多个存储节点中。
  7. 根据权利要求1至6任一项权利要求所述的方法,其特征在于,所述根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据,包括:
    将所述目标特征值与所述关键字组合得到新关键字,根据所述新关键字,向所述目标特征值对应的存储节点发送所述用户数据。
  8. 一种数据分片装置,其特征在于,包括:
    通信单元,用于获取用户数据,其中,所述用户数据包括关键字;
    处理单元,用于对所述关键字进行截取,从而得到目标字段;
    根据哈希算法确定所述目标字段的目标特征值;
    所述通信单元还用于根据所述目标字段的目标特征值,向所述目标特征值对应的存储节点发送所述用户数据,其中,不同的存储节点对应不同的特征值范围,所述目标特征值属于其中一个特征值范围。
  9. 根据权利要求8所述的装置,其特征在于,对所述关键字进行截取,从而得到目标字段,包括:
    获取预设偏移量;
    根据所述预设偏移量对所述关键字进行截取,从而得到所述目标字段。
  10. 根据权利要求9所述的装置,其特征在于,所述预设偏移量包括第一预设偏移量和第二预设偏移量,其中,所述第一预设偏移量用于截取第一目标字段,所述第二预设偏移量用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段。
  11. 根据权利要求8所述的装置,其特征在于,所述关键字包括分隔符,对所述关键字进行截取,从而得到目标字段,包括:
    根据所述分隔符对所述关键字进行截取,从而得到所述目标字段。
  12. 根据权利要求11所述的装置,其特征在于,所述分隔符包括第一组分隔符和第二组分割符,其中,所述第一组分隔符用于截取第一目标字段,所述第二组分隔符用于截取第二目标字段,所述目标字段包括所述第一目标字段和所述第二目标字段。
  13. 根据权利要求8至12任一项所述的装置,其特征在于,用于存储所述用户数据的数据库为无schema定义的数据库,其中,无schema定义的数据库分布存储在多个存储节点中。
  14. 根据权利要求8至13任一项所述的装置,其特征在于,
    所述处理单元,还用于将所述目标特征值与所述关键字组合得到新关键字;
    所述发送单元具体用于,根据所述新关键字,向所述目标特征值对应的存储节点发送所述用户数据。
  15. 一种网络设备,其特征在于,包括处理器、以及存储器;所述存储器用于存储指令,所述处理器用于执行所述指令,所述输入输出设备用于在所述处理器的控制下与其他设备进行通信;其中,所述处理器执行所述指令时执行如权利要求1至7任一项所述的方法。
  16. 一种非瞬态的计算机存储介质,所述计算机存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的方法。
PCT/CN2019/080697 2018-07-24 2019-03-30 一种数据分片方法、相关设备及计算机存储介质 WO2020019749A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810824276.XA CN110851525B (zh) 2018-07-24 2018-07-24 一种数据分片方法、相关设备及计算机存储介质
CN201810824276.X 2018-07-24

Publications (1)

Publication Number Publication Date
WO2020019749A1 true WO2020019749A1 (zh) 2020-01-30

Family

ID=69182120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/080697 WO2020019749A1 (zh) 2018-07-24 2019-03-30 一种数据分片方法、相关设备及计算机存储介质

Country Status (2)

Country Link
CN (1) CN110851525B (zh)
WO (1) WO2020019749A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552695A (zh) * 2020-06-04 2020-08-18 支付宝(杭州)信息技术有限公司 数据存储和查询的方法、装置以及机器可读存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113301084A (zh) * 2020-06-30 2021-08-24 阿里巴巴集团控股有限公司 数据处理方法以及装置
CN112364251B (zh) * 2020-12-03 2021-08-17 腾讯科技(深圳)有限公司 数据推荐方法、装置、电子设备及存储介质
CN114253747B (zh) * 2021-12-27 2023-04-28 北京宇信科技集团股份有限公司 一种分布式消息管理系统和方法
CN116346826B (zh) * 2023-05-30 2023-08-04 工业富联(佛山)创新中心有限公司 数据库节点部署方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724063A (zh) * 2012-05-11 2012-10-10 北京邮电大学 日志采集服务器及数据包分发、日志聚类方法及网络
CN105100146A (zh) * 2014-05-07 2015-11-25 腾讯科技(深圳)有限公司 数据存储方法、装置及系统
CN106503010A (zh) * 2015-09-07 2017-03-15 北京国双科技有限公司 一种数据库更改写入分区的方法及装置
CN106844676A (zh) * 2017-01-24 2017-06-13 北京奇虎科技有限公司 数据存储方法及装置
CN107154957A (zh) * 2016-12-29 2017-09-12 贵州电网有限责任公司铜仁供电局 基于虚拟环负载均衡算法的分布式存储控制方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307484A1 (en) * 2010-06-11 2011-12-15 Nitin Dinesh Anand System and method of addressing and accessing information using a keyword identifier
CN108153849B (zh) * 2017-12-20 2020-10-23 杭州登虹科技有限公司 一种数据库表切分方法、装置、系统和介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724063A (zh) * 2012-05-11 2012-10-10 北京邮电大学 日志采集服务器及数据包分发、日志聚类方法及网络
CN105100146A (zh) * 2014-05-07 2015-11-25 腾讯科技(深圳)有限公司 数据存储方法、装置及系统
CN106503010A (zh) * 2015-09-07 2017-03-15 北京国双科技有限公司 一种数据库更改写入分区的方法及装置
CN107154957A (zh) * 2016-12-29 2017-09-12 贵州电网有限责任公司铜仁供电局 基于虚拟环负载均衡算法的分布式存储控制方法
CN106844676A (zh) * 2017-01-24 2017-06-13 北京奇虎科技有限公司 数据存储方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552695A (zh) * 2020-06-04 2020-08-18 支付宝(杭州)信息技术有限公司 数据存储和查询的方法、装置以及机器可读存储介质

Also Published As

Publication number Publication date
CN110851525B (zh) 2022-08-26
CN110851525A (zh) 2020-02-28

Similar Documents

Publication Publication Date Title
WO2020019749A1 (zh) 一种数据分片方法、相关设备及计算机存储介质
US20200195740A1 (en) Subscribe and publish method and server
US9998557B2 (en) Systems and methods for identifying and characterizing client devices
JP6004299B2 (ja) フローテーブルをマッチングするための方法及び装置、並びにスイッチ
US9317536B2 (en) System and methods for mapping and searching objects in multidimensional space
US10776361B2 (en) Time series database search system
US9256828B2 (en) Alarm correlation analysis method, apparatus and system
US11863439B2 (en) Method, apparatus and storage medium for application identification
WO2020220540A1 (zh) 基于点对点网络的数据存储方法、装置、介质及终端设备
WO2019196239A1 (zh) 一种线程接口的管理方法、终端设备及计算机可读存储介质
WO2021139211A1 (zh) 基于声纹库的声纹识别方法、主控节点及计算节点
WO2021027331A1 (zh) 基于图数据的全量关系计算方法、装置、设备及存储介质
CN113282941A (zh) 获取对象标识的方法、装置、电子设备及存储介质
US20150278543A1 (en) System and Method for Optimizing Storage of File System Access Control Lists
CN113312355A (zh) 一种数据管理的方法和装置
US20170012874A1 (en) Software router and methods for looking up routing table and for updating routing entry of the software router
CN112165505B (zh) 去中心化的数据处理方法、电子装置和存储介质
EP3926453A1 (en) Partitioning method and apparatus therefor
WO2017206562A1 (zh) 一种数据表的处理方法、装置及系统
US11507533B2 (en) Data query method and apparatus
CN116303343A (zh) 数据分片方法、装置、电子设备及存储介质
WO2019241926A1 (zh) 访问控制列表的管理方法及装置
WO2022143758A1 (zh) 一种数据脱敏方法、装置及存储系统
US11805050B2 (en) Systems and methods to filter out noisy application signatures to improve precision of first packet application classification
US20210042328A1 (en) Partitioning data in a clustered database environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19840488

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19840488

Country of ref document: EP

Kind code of ref document: A1