CN102968498B

CN102968498B - Data processing method and device

Info

Publication number: CN102968498B
Application number: CN201210516613.1A
Authority: CN
Inventors: 张巍; 雷晓松
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2012-12-05
Filing date: 2012-12-05
Publication date: 2016-08-10
Anticipated expiration: 2032-12-05
Also published as: CN102968498A

Abstract

The present embodiments relate to a kind of data processing method and device, described method includes: obtain data to be stored and the Data Identification of described data to be stored；It is calculated, according to described Data Identification, the first subregion that described data to be stored will store, and obtains the primary nodal point belonging to described first subregion；Described Data Identification and the described data to be stored of described data to be stored are stored respectively in described primary nodal point, and record address data memory；The Data Identification of described data to be stored, first partition identification at place and address data memory are generated index information, and this index information is added in the index area of described primary nodal point.The embodiment of the present invention need not carry out hard disk overall surface sweeping, reduces the number of times to hard disk read operation when Partition migrates, improves data reading performance using redundancy.

Description

Data processing method and device

Technical field

The present invention relates to memory system technologies field, particularly relate to a kind of data processing method and device.

Background technology

Cloud storage, similar with cloud computing, it refers to by cluster application, grid or distributed document Various types of storage devices a large amount of in network are gathered by the functions such as system by application software Collaborative work, the common system that data storage and Operational Visit function are externally provided.

Cloud storage uses distributed hashtable (Distributed Hash Table, DHT) technology groups to become distribution Formula file system (i.e. distributed storage cluster, hereinafter referred to as cluster), each memory node is according to unanimously Property Hash (hash) algorithm is assigned independent multiple subregions (Partition).Owing to cloud storage is adopted Forming Mass storage pond with cheap low reliable hardware, storage hardware fault is normality；Simultaneously for meeting money The elastic supply in source, often occurs that memory node is dynamically added or leaves cluster.When one malfunctions or When carrying out dynamic capacity-expanding, capacity reducing, Partition sequence can redistribute on node, management on node Partition scope can change, and carries out again the migration operation of load balancing (rebalance), Some Partition needs to move to other node, and the Partition that malfunctioning node undertakes will be by other Node undertakes, and newly-increased node undertakes the Partition that a part is undertaken by other nodes, thus ensures respectively to deposit Load balancing between storage node.

Existing data storage format generally uses open source software Tokyo Cabinet HDB/BDB/FDB(to breathe out Uncommon data base/multipath tree data base/fixed-length record data base) type of organization.Memory node is deposited The data of storage include a series of key-value(key-value) right, at the bucket array of memory node (Bucket Array) deposits key-value chained list successively, corresponding to the bucket mark of order Bucket ID, and key-value is to the storage ground on the storage medium (such as hard disk etc.) of memory node Location is scattered.It is to say, for different Key-value corresponding for bucket ID and same Bucket Different Key-value Laden Balances corresponding for ID are on hard disk.

Thus, memory node, when migrating data, needs to carry out scan full hard disk, often one key-value of scanning, According to the partition information of storage in the value scanned, with the needs migration specified Partition compares, and as identical, migrates.The read-write of hard disk is wasted very by such mode Seriously, hard disk less than the 5% of overall capacity, but may be entered by the real Partition needing to migrate Row scan full hard disk.And, when Data Migration, it is only capable of serial key-value and mates and to send migration dynamic Making, I/O number of hard disk is too much, inefficiency.

Summary of the invention

In view of this, it is an object of the invention to provide a kind of data processing method and device, it is not necessary to firmly Dish carries out scan full hard disk, reduces the number of times to hard disk read operation when Partition migrates, improves data and read Take efficiency, can concurrently send key-value when Data Migration, improve the utilization rate of inter-node bandwidth, Improve digital independent performance.

For achieving the above object, embodiment of the present invention first aspect provides a kind of data processing method, institute The method of stating includes:

Obtain data to be stored and the Data Identification of described data to be stored；

It is calculated, according to described Data Identification, the first subregion that described data to be stored will store, and obtains Obtain the primary nodal point belonging to described first subregion；

Described Data Identification and the described data to be stored of described data to be stored are stored respectively in described One node, and record address data memory；

The Data Identification of described data to be stored, first partition identification at place and address data memory is raw Become index information, and this index information is added in the index area of described primary nodal point.

In conjunction with first aspect, in the first possible implementation of first aspect, described index area bag Include at least one subindex district, breathe out according to the described Data Identification in the index information to described generation Wish the result calculated or determine that the index information of described generation will be deposited according to the size order of Data Identification The subindex district put, deposits in the index information of described generation in the described described subindex district determined.

In conjunction with the first possible implementation of first aspect or first aspect, in the reality that the second is possible In existing mode, the Data Identification of described data to be stored, first partition identification at place and data are stored Address generates index information, and is added to by this index information in the index area of described primary nodal point, including:

By rope existing in the index area with described primary nodal point of the Data Identification in the index information of described generation The Data Identification of fuse breath compares, and puts in order according to set in advance, determines the index of described generation The Data Identification in information storage position in described existing index information, believes the index of described generation Breath adds described storage position to.

In conjunction with the first possible implementation of first aspect or first aspect, in the reality that the third is possible In existing mode, after obtaining the primary nodal point belonging to described first subregion, also include:

Judge whether the index area of described primary nodal point exists and the Data Identification phase of described data to be stored With index information, when described index area does not exist the identical index information of described Data Identification, then Perform Data Identification and the described data to be stored of described data to be stored are stored respectively in described first segment The step of point；When described index area exists the identical index information of described Data Identification, do not perform Data Identification and the described data to be stored of described data to be stored are stored respectively in described primary nodal point Step.

In conjunction with the first possible implementation of first aspect or first aspect, in the 4th of first aspect Plant in possible implementation, also include:

Obtain the Data Identification of the data to be checked of input；

Data Identification according to described data to be checked is calculated second point of described data place to be checked District, and obtain the secondary nodal point belonging to described second subregion；

From the index area of described secondary nodal point, the Data Identification of inquiry and described data to be checked matches Index information, the index information in described index area include storing the Data Identification of data, place point District and address data memory；

According to the address data memory in the index information of the data described to be checked matched, from described Two nodes read described data to be checked.

In conjunction with the 4th kind of possible implementation of first aspect, in the 5th kind of possible reality of first aspect In existing mode, before reading described data to be checked from described secondary nodal point, also include:

The described address data memory of multiple described data to be checked is ranked up, according to ranking results from In the described secondary nodal point that described data to be checked are corresponding, order reads described data to be checked.

In conjunction with the first possible implementation of first aspect or first aspect, in the 6th of first aspect Planting in possible implementation, described method also includes:

When meeting preset zoned migration condition, obtain subregion to be migrated；

Obtain the 3rd node belonging to described subregion to be migrated；

From the index area of described 3rd node, coupling obtains own identical with described partition identification to be migrated Index information, the index information in the index area of described 3rd node include storing data Data Identification, The subregion at place and address data memory；

The address data memory of the described index information that coupling obtains is ranked up, and ranking results is sent out Give described 3rd node, read described data to be migrated in order to described 3rd node sequence and migrate to mesh Mark node.

Second aspect, the embodiment of the present invention additionally provides a kind of data processing equipment, and described device includes:

Acquiring unit, for obtaining data to be stored and the Data Identification of described data to be stored；

Computing unit, is calculated described to be stored for the Data Identification obtained according to described acquiring unit The first subregion that data will store, and obtain the primary nodal point belonging to described first subregion；

Memory element, for depositing Data Identification and the described data to be stored of described data to be stored respectively It is stored in described primary nodal point, and records address data memory；

Indexing units, the Data Identification of the data to be stored by described acquiring unit is obtained, described based on The address data memory of the first partition identification and described unit records of calculating the place that unit determines generates Index information, and this index information is added to the index of the described primary nodal point that described computing unit determines Qu Zhong.

In conjunction with second aspect, in the first possible implementation of second aspect, described index area bag Include at least one subindex district, breathe out according to the described Data Identification in the index information to described generation Wish the result calculated or determine that the index information of described generation will be deposited according to the size order of Data Identification The subindex district put, deposits in the index information of described generation in the described described subindex district determined.

In conjunction with the first possible implementation of second aspect or second aspect, in the second of second aspect In kind possible implementation, described indexing units by the Data Identification in the index information of described generation with In the index area of described primary nodal point, the Data Identification of existing index information compares, according to presetting Put in order, determine Data Identification in the index information of described generation at described existing index information In storage position, the index information of described generation is added to described storage position.

In conjunction with the first possible implementation of second aspect or second aspect, in the 3rd of second aspect Planting in possible implementation, described device also includes:

Duplicate removal unit, in the index area judging described primary nodal point that described computing unit obtains whether There is the index information identical with the Data Identification of described data to be stored, when described index area does not exists During the identical index information of described Data Identification, trigger described memory element；When described index area is not deposited When the index information that described Data Identification is identical, the most do not trigger described memory element.

In conjunction with the first possible implementation of second aspect or second aspect, in the 4th of second aspect Planting in possible implementation, described acquiring unit is additionally operable to obtain the data mark of the data to be checked of input Know；

Described computing unit is additionally operable to the Data Identification meter of the data to be checked obtained according to described acquiring unit Calculate the second subregion obtaining described data place to be checked, and obtain the second section belonging to described second subregion Point；

Described device also includes:

Matching unit, for the data of inquiry from the index area of described secondary nodal point with described inquiry data The index information that mark matches, the index information in described index area includes the data mark storing data Knowledge, the subregion at place and address data memory；

Read unit, in the index information of the data described to be checked obtained according to described matching unit Address data memory, from described secondary nodal point, read described data to be checked.

In conjunction with the 4th kind of possible implementation of second aspect, in the 5th kind of possible reality of second aspect In existing mode, described device also includes:

Sequencing unit, described in arriving for multiple described Data Matching to be checked described matching unit Address data memory is ranked up；Described reading unit according to the ranking results of described sequencing unit from described In the described secondary nodal point that data to be checked are corresponding, order reads described data to be checked.

In conjunction with the first possible implementation of second aspect or second aspect, in the 6th of second aspect Planting in possible implementation, described acquiring unit is additionally operable to when meeting preset zoned migration condition, Obtain subregion to be migrated；

Described computing unit is additionally operable to obtain the 3rd node belonging to described subregion to be migrated；

Described device also includes:

Matching unit, obtains and described subregion to be migrated for mating from the index area of described 3rd node Identifying identical all index informations, the index information in described index area includes the data storing data Mark, the subregion at place and address data memory；

Sequencing unit, for the data storage ground to the described index information that described matching unit coupling obtains Location is ranked up, and ranking results is sent to described 3rd node, reads in order to described 3rd node sequence Take described data to be migrated and migrate to destination node.

The data processing method of embodiment of the present invention offer and device, divide to open by key and value and deposit, In the index area of data, record key, partition and value address offset on hard disk, counting Have only to scanning index district according to when migrating, key and the value address needing to migrate can be found out, no Need hard disk is carried out scan full hard disk, it is possible to reduce number of times to hard disk read operation when Partition migrates, Improve data reading performance using redundancy, can concurrently send key-value when Data Migration, improve inter-node bandwidth Utilization rate, improve digital independent performance.

Accompanying drawing explanation

A kind of data processing method flow chart that Fig. 1 provides for the embodiment of the present invention；

The data store content schematic diagram of a kind of index area that Fig. 2 provides for the embodiment of the present invention；

The data storage format schematic diagram of a kind of index area that Fig. 3 provides for the embodiment of the present invention；

The method flow diagram of a kind of data query step that Fig. 4 provides for the embodiment of the present invention；

The method flow diagram of a kind of Data Migration step that Fig. 5 provides for the embodiment of the present invention；

The schematic diagram of a kind of data processing equipment that Fig. 6 provides for the embodiment of the present invention；

The schematic diagram of a kind of data-storage system that Fig. 7 provides for the embodiment of the present invention；

A kind of schematic diagram storing management node that Fig. 8 provides for the embodiment of the present invention；

Fig. 9 is the schematic diagram of the memorizer Program of the storage management node shown in Fig. 8；

The schematic diagram of a kind of data memory node that Figure 10 provides for the embodiment of the present invention；

Figure 11 is the schematic diagram of the memorizer Program of the data memory node shown in Figure 10.

Detailed description of the invention

Below by drawings and Examples, the technical scheme of the embodiment of the present invention is done and further retouches in detail State.

In prior art, distributed storage cluster generally uses key-value key-value structured data to store system System, storage data generally use the form of Key-value pair to represent, at whole collection when Key is data access Uniquely identifying in Qun, value is the data of access itself.

The cloud storage distributed storage cluster in order to ensure the robustness of system self, harmony, inside it The application of middle Data Migration is the most widely.The most most basic includes that two kinds: one is that data redundancy is standby Part processes, in order to ensure the fault-tolerance of cluster self, data reliability；Two is owing to cluster is by numerous Dynamic node composition, i.e. have a node may delay the most suddenly machine, some nodes then have can Can at a time rejoin cluster, system in order to ensure the harmony of global storage, can automatically or Manually trigger the order making each internodal storage utilization rate balance, thus need the portion of existing node Divide partition to move in newly-increased node, or migrate after the partition equilibrium of malfunctioning node On other nodes.

Partition is that the memory space of memory node is divided by system according to concordance Hash (hash) algorithm For different subregions, each memory node has multiple independent Partition.Such as, in system 10 Memory node, needs to undertake 100 Partition, corresponding partition identification be Partition 0～ These 100 Partition are assigned to 10 nodes, such as, distribute to first by Partition 99 Individual node include Partition 0～Partition 9, distribute to including of second node Partition 10～Partition 19, distribute to the 3rd node include Partition 20～ Partition 29, the like, that distributes to the tenth node includes Partition 90～Partition 99, can be allocated out of order the most in actual applications, such as according to the side of delivery (mould 10) Formula is distributed.

When a newly-increased node, system needs to equalize a part of partition from each node to this New node.The most now, that distributes to first node includes Partition 0～Partition 9, That distributes to second node includes Partition 10～Partition 18, distributes to the 3rd node Include Partition 19～Partition 27, the like, distribute to including of the tenth node Partition 82～Partition 90, distribute to the 11st node include Partition 91～ Partition 99.It is to say, system calculates the partition that can obtain needing to migrate by hash, Such as, partition 19 needs to move to the 3rd node from second node, partition 28, Partition 29 needs to move to the 4th node from the 3rd node, the like, partition 91～Partition 99 need to move to the 11st node from the tenth node.Certainly, if pressed According to the mode of delivery (mould 11), the partition that needs migrate can be different.

The data processing method that the embodiment of the present invention provides can be connect by arbitrary in key-value storage system The controller of the memory node receiving data processing task performs, it is also possible to due to the 3rd of memory node the Side's processor performs.Fig. 1 is the data processing method flow chart that the present embodiment provides, as it is shown in figure 1, The data processing method of the embodiment of the present invention includes:

Step S101, obtain data to be stored and the Data Identification of described data to be stored.

Data to be stored generally use the form of key (Key)-value (value) to represent, Key is that data are deposited Uniquely identifying in whole cluster when taking, value is the data of access itself.

As a example by storing a certain bar microblog data, the key of the data to be stored that system generates uses character string Represent, potentially include time, user profile and serial number, represent when certain user is sending out A certain bar microblogging, value is then concrete content of microblog, such as " today has a meal at * * * ".

For using the data to be stored that represent of key-value form, this step is then that to obtain this to be stored Data itself (i.e. value) and Data Identification key.

Step S102, be calculated that described data to be stored will store according to described Data Identification first Subregion, and obtain the primary nodal point belonging to described first subregion.

Obtaining the eigenvalue of described key, described eigenvalue represents described key for unique, obtains key The method of eigenvalue can be that described key is carried out Hash calculation, obtain the cryptographic Hash of described key, Using the cryptographic Hash that obtains as the eigenvalue of described Key, obtain described number to be stored according to key eigenvalue Partition according to place.Specifically, the value of described key is carried out modulo operation, determines place partition。

Then, then the Partition mark at data place to be stored is carried out concordance Hash calculation, with The node of correspondence is determined according to partition.If calculated data place to be stored Partition is designated 70, has 10 nodes in distributed type assemblies, after carrying out delivery (mould 10) computing, May determine that these data to be stored should be on first node.

Certainly, when obtaining the node belonging to partition, it is also possible to configured by system cluster Partition segment information table is inquired about and is obtained node corresponding to partition.Described partition divides Segment information table can be pre-configured with, it is also possible to carries out the most more according to the storage condition of actual cluster New.

Wherein, after obtaining the node belonging to partition, also include: judge described primary nodal point Index area in whether there is the step of the index information identical with the Data Identification of described data to be stored, When described index area does not exist described identical for Data Identification key index information, perform step S103 Store；When described index area exists index information identical for described key, do not store, Illustrate on this memory node, to have stored described data to be stored.

Step S103, described Data Identification and the described data to be stored of described data to be stored are deposited respectively It is stored in described primary nodal point, and records address data memory.

Key and value of data to be stored is deposited in respectively the described primary nodal point that step S102 obtains Storage medium on.

Address data memory is that described data to be stored are on the storage medium (such as hard disk) of described node Address offset amount, is value address, represents value address offset amount on the hard disk of this node, As represented with LBA address.

Step S104, by the Data Identification of described data to be stored, first partition identification at place and data Storage address generates index information, and is added to by this index information in the index area of described primary nodal point.

Index information in the index area of memory node is included on this memory node each storage stored The index information of data, each index information includes: has stored the Data Identification of data, has stored number According to first partition identification at place with stored the address data memory of data, i.e. include: stored data Key, partition and the value address information at place.

Index area can include multiple subindex district, and the index information of described generation is deposited in described sub-rope Draw in district.

Fig. 2 is the data store content schematic diagram of the index area that the embodiment of the present invention provides, as in figure 2 it is shown, This index area includes m BucketID, includes N bar index information, key11 table at Bucket1 Show that the key of Article 1 index information in Bucket1, partition_K11 represent key11 place Partition, value_LBA_k11 represent that value corresponding to value address, i.e. key11 is at hard disk On address offset amount.When inquiring about data, it is only necessary to mate key in index area and can find correspondence Storage data, migrate data time, the partition at this key place can be calculated according to key, In index area, mate the partition of correspondence, i.e. can find corresponding with on partition to be migrated Storage data.

The index information of described generation is deposited in described subindex district include: according to described generation Data Identification in index information carries out the result of Hash calculation or size order according to Data Identification is true The mark in the subindex district that the index information of fixed described generation will be deposited, by the index information of described generation Deposit in the described described subindex district determined.

When there being newly-generated index information, by the key in the index information of described generation and described first In the index area of node, the key of existing index information compares, and puts in order according to set in advance, really The key in the described index information of fixed described generation storage position in described existing index information, will The index information of described generation adds described storage position to.

Or, when there being newly-generated index information, it is also possible to the key in the index information of described generation Carry out Hash calculation, to determine the bucket ID that the described index information of described generation will be deposited, such as, Key can carry out delivery, and (mould m) calculates, and obtains the value of bucket ID, by the described rope of this generation Fuse breath is deposited in described bucket ID.

Wherein, the data processing method of the embodiment of the present invention after completing the storing step of data to be stored, The step of data query can also be included, in order to inquire about or to read data to storing data, thus, The step of data query is performed on the basis of the embodiment shown in Fig. 1.Fig. 3 is the number that the present embodiment provides According to the method flow diagram of query steps, as it is shown on figure 3, the step of described data query includes:

The Data Identification of the data to be checked that step S201, acquisition input.

For using the data of key-value form storage, when reading data to be checked, receive user Input key, certain user can also input inquiry word, system query word is converted to correspondence key.

Step S202, Data Identification according to described data to be checked are calculated described data institute to be checked The second subregion, and obtain the secondary nodal point belonging to described second subregion.

The method identical with step S102 is used to be calculated the Partition at data place to be checked, root According to calculated Partition, determine the secondary nodal point at described data place to be checked.

Step S203, from the index area of described secondary nodal point inquiry and the data mark of described data to be checked The index information of sensible coupling.

Index information in described index area includes the subregion sum storing the Data Identification of data, place According to storage address.

From the index area of the described secondary nodal point at described data place to be checked, by the number of data to be checked Match according to the Data Identification having stored data in mark key and this secondary nodal point index area, obtain and this The index information that the key of data to be checked matches, thus obtain the address data memory of data to be checked (i.e. value address).If there being multiple data to be checked to be stored on different multiple secondary nodal points, From the index area of multiple secondary nodal points, mate the Data Identification of data to be checked the most respectively, respectively obtain many The index information matched on individual secondary nodal point, obtains the value address of multiple queries data.

Address data memory in the index information of the data described to be checked that step S204, basis match, Described data to be checked are read from described secondary nodal point.

Wherein, if there being multiple data to be checked, after step S203, also include: to multiple described The described address data memory of data to be checked is ranked up, step S204 then according to ranking results from described In the storage medium of the described secondary nodal point that data to be checked are corresponding, order reads described data to be checked.

When distributed memory system occurs that memory node increases or deletes, need to carry out rebalance's Migration operation, partition can redistribute on memory node, thus, the data of the embodiment of the present invention Processing method also includes the step of Data Migration, partition to be migrated is carried out migration operation, at Fig. 1 The step of Data Migration is performed on the basis of shown embodiment.Fig. 4 is the Data Migration that the present embodiment provides The method flow diagram of step, as shown in Figure 4, described Data Migration step includes:

Step S301, when meeting preset zoned migration condition, obtain subregion to be migrated.

When preset zoned migration condition can include occurring node increase or knot removal.

When node increases or deletes, system determines the partition needing to migrate, then the system that obtains determines Partition to be migrated.For example, it is desired to the partition migrated is partition 19, partition 28, then obtain those Partition.

Certainly, if there is the node of additions and deletions, system is it is confirmed that need each number to be migrated migrated According to, then the key treating migration data carries out Hash calculation, obtains the partition at data place to be migrated.

Step S302, obtain the 3rd node belonging to described subregion to be migrated.

The method identical with step S102 is used to obtain the 3rd node belonging to Partition to be migrated.

Step S303, mate from the index area of described 3rd node and obtain and described partition identification to be migrated Identical all index informations.

From the index area of described 3rd node at described subregion place to be migrated, by partition identification to be migrated Match with the place partition identification having stored data in the 3rd node index area, obtain to be migrated with this The index information that partition identification matches.If having multiple partitioned storage to be migrated the different the multiple 3rd On node, from the index area of multiple 3rd nodes, mate partition identification to be migrated the most respectively, respectively obtain The index information of the subregion to be migrated matched on multiple 3rd nodes.

From the index area that step S104 is formed, match the value of partition and Partition to be migrated The identical index information of value, including key key, Partition and value address.Such as, match In index area all partition be partition 19, the index information of partition 28.

When index area includes multiple bucket ID, it is also possible to scanning index district in batches, according to internal memory The bucket ID number of configuration, reads corresponding index information in corresponding bucket ID in internal memory.

Now, this step is mated from index area and is obtained and the value phase of described subregion Partition to be migrated Same index information, including:

Read the index information at least one subindex district in described index area in batches, be specially and read in batches The index information of different bucket bucket ID.

When reading every time in batches, record this described Bucket ID read, in order to obtain and divide next time Criticize the initial Bucket ID read.

From the described index information that this reads, coupling obtains identical with described partition identification to be migrated Index information.

Step S304, the address data memory of described index information obtaining coupling are ranked up, and will Ranking results is sent to described 3rd node, reads described data to be migrated in order to described 3rd node sequence And migrate to destination node.

Multiple index informations that step S303 is matched, big according to the value address in index information Little it is ranked up.Such as, the index information matched include < key11, partition_K11=19, Value_LBA_K11=10>,<key12, partition_K12=19, value_LBA_K12=40>, <key22, partition_K22=19, value_LBA_K22=60>,<key34, partition_K34=19, Value_LBA_K 34=30>,<key41, partition_K41=19, value_LBA_K41=20>.According to The result that value address is ranked up obtaining is K11, K41, K 34, K12, K22.Ranking results is sent out To hard disk, hard disk then can read in order, it is to avoid needs totally when reading according to existing method Scan and the situation of unordered random reading, promote overall performance.

Described ranking results can be, but not limited to organize according to tree structure, indexes for example with B+ tree Or bitmap index etc. organizes.Fig. 5 is the data storage format signal of the index area that the present embodiment provides Figure, as it is shown in figure 5, include multistage non-leaf node, stores value_LBA on non-leaf node Value, leafy node stores key and value of concrete storage data.

When there being newly-increased index information, the value address in the index information that will newly form and described rope Draw the value address of existing index information in district to compare, to determine the described index information of new composition Particular location on B+ tree.If the value of newly-increased value address compares with current node in B+ tree Relatively, if newly-increased is relatively big, then it is placed on the node of current node top；If newly-increased is less, then It is placed on the node that current node is following；By that analogy, newly-increased index is distributed on B+ tree.

The data processing method that the embodiment of the present invention provides is applicable to key-value distributed memory system, When storing data, divide to open by key and value and deposit, the index area of node records and has stored data Address offset on node hard disk, when migrating data, it is only necessary to scanning index district, just can find to need Key and value address to be migrated, reduces the number of times of hard disk I/O, and obtain batch is to be migrated Data sorting, optimizes disk read-write order, promotes overall performance.

The detailed description that the data processing method being above being provided the embodiment of the present invention is carried out, below right The data processing equipment that the embodiment of the present invention provides is described in detail.

The data processing equipment that the embodiment of the present invention provides is applied in key-value storage system.Fig. 6 It is the schematic diagram of the data processing equipment that the present embodiment provides, as shown in Figure 6, the number of the embodiment of the present invention Include according to processing means: acquiring unit 701, computing unit 702, memory element 703, indexing units 704, matching unit 705, reading unit 706 and sequencing unit 707.

This data processing equipment mainly includes data storage, data query and three duties of Data Migration, Illustrate separately below.

Carry out data storage time, the parts of groundwork include acquiring unit 701, computing unit 702, Memory element 703 and indexing units 704.

Acquiring unit 701 is for obtaining data to be stored and the Data Identification of described data to be stored.

As a example by storing a certain bar microblog data, the key of the data to be stored that acquiring unit 701 obtains adopts With string representation, potentially include time, user profile and serial number, represent that certain user is at some Several having distributed a certain bar microblogging, value is then concrete content of microblog, such as " today has a meal at * * * ".

For the data to be stored using key-value form to represent, acquiring unit 701 is then to obtain to be somebody's turn to do Data to be stored itself (i.e. value) and Data Identification key.

Computing unit 702 is waited to deposit described in being calculated according to the Data Identification of acquiring unit 701 acquisition First subregion at storage data place, and obtain the primary nodal point belonging to the first subregion.

Computing unit 702 obtains the eigenvalue of described key, and described eigenvalue represents described key for unique, The method of the eigenvalue obtaining key can be that described key is carried out Hash calculation, obtains described key's Cryptographic Hash, using the cryptographic Hash that obtains as the eigenvalue of described Key, obtains described according to key eigenvalue The Partition at data place to be stored.

Then, computing unit 702 can carry out concordance Hash calculation to the mark at data place to be stored, To determine the node of correspondence according to partition.If the calculated number to be stored of computing unit 702 It is designated 70 according to the Partition at place, distributed type assemblies has 10 nodes, then carries out delivery (mould 10) after computing, it may be determined that these data to be stored should be on first node.

Wherein, the data handling system of the embodiment of the present invention can also include duplicate removal unit (not shown), Duplicate removal unit is for when the node belonging to computing unit 702 obtains described subregion, it is judged that described first segment Whether the index area of point exists the index information identical with the Data Identification of described data to be stored, works as institute State time index area does not exist described Data Identification identical index information, trigger memory element 703.Work as institute Stating when there is index information identical for described key in index area, duplicate removal unit does not the most trigger memory element 703 Store, illustrate on this memory node, to have stored described data to be stored.

Memory element 703 is for by the described Data Identification of described data to be stored and described data to be stored It is stored respectively in described primary nodal point, and records address data memory.

Key and value of data to be stored is stored respectively in described primary nodal point by memory element 703, And record value address.

Address data memory is the described data to be stored address offset amounts on the hard disk of described node, i.e. For value address, represent value address offset amount on the hard disk of this node, as with LBA address Represent.

Operate for the ease of subsequent query and migration etc., wherein, the data-storage system of the embodiment of the present invention Also including indexing units 704, indexing units 704 is for by the Data Identification of described data to be stored, place The first partition identification and address data memory generate index information, and this index information is added to described In the index area of primary nodal point.

Index information in the index area of memory node is included on this memory node each storage stored The index information of data, each index information includes: has stored the Data Identification of data, has stored number According to first partition identification at place with stored the address data memory of data, i.e. include: storage data Key, partition and the value address information at place.

Indexing units 704 carries out Hash calculation according to the Data Identification in the index information to described generation Result or the size order according to Data Identification determine the sub-rope that the index information of described generation will be deposited Draw the mark in district, the index information of described generation is deposited in the subindex that described subindex district mark is corresponding Qu Zhong.

When there being newly-generated index information, indexing units 704 is by the key in the index information of described generation The key of existing index information compares, according to set in advance with the index area of described primary nodal point Put in order, determine key the depositing in described existing index information in the described index information of described generation Storage space is put, and the index information of described generation adds to described storage position.

Or, when there being newly-generated index information, indexing units 704 can also be to the rope of described generation Key in fuse breath carries out Hash calculation, to determine what the described index information of described generation will be deposited Bucket ID, for example, it is possible to key is carried out delivery (mould m) calculate, obtain the value of bucket ID, The described index information of this generation is deposited in described bucket ID.

When carrying out data query, the parts of groundwork include acquiring unit 701, computing unit 702, Matching unit 705 and reading unit 706.

Acquiring unit 701 is for obtaining the Data Identification of the data to be checked of input.Computing unit 702 is used Data Identification in the data to be checked obtained according to acquiring unit 701 is calculated described data to be checked Second subregion at place, and obtain the secondary nodal point belonging to described second subregion.Matching unit 705 is used for From the index area of described secondary nodal point, the index that the Data Identification of inquiry and described inquiry data matches Information, the index information in the index area of described secondary nodal point includes storing the Data Identification of data, institute Partition identification and address data memory.Read unit 706 for obtaining according to described matching unit Address data memory in the index information of described data to be checked, reads described from described secondary nodal point Data to be checked.

Matching unit 705, from the index area of the described secondary nodal point at described data place to be checked, will be treated Data Identification key and this secondary nodal point index area of inquiry data has stored the Data Identification phase of data Join, obtain the index information matched with the key of these data to be checked, thus obtain data to be checked Address data memory (i.e. value address).If there have multiple data to be checked to be stored in be different multiple On secondary nodal point, from the index area of multiple secondary nodal points, mate the Data Identification of data to be checked the most respectively, Respectively obtain the index information matched on multiple secondary nodal point, obtain the value ground of multiple queries data Location.

Wherein, when inquiring about multiple data to be checked, it is also possible to include sequencing unit 707, sequencing unit 707 for the described address data memory arrived matching unit 705 for multiple described Data Matching to be checked It is ranked up.Read the unit 706 ranking results according to sequencing unit 707 from the hard disk of corresponding node Order reads described data to be checked.

When carrying out Data Migration, the parts of groundwork include acquiring unit 701, computing unit 702, Matching unit 705 and sequencing unit 707.

Acquiring unit 701, for when meeting preset zoned migration condition, obtains subregion to be migrated.Meter Calculate unit 702 for obtaining the 3rd node belonging to described subregion to be migrated.

When there is the node of additions and deletions, system determines the partition needing to migrate, then acquiring unit 701 The Partition to be migrated that acquisition system determines.For example, it is desired to the partition migrated is partition 19, partition 28, acquiring unit 701 then obtains those Partition.

If there is the node of additions and deletions, when acquiring unit 701 get be need migrate each treat Migrate data, then the key utilizing computing unit 702 to treat migration data carries out Hash calculation, is treated Migrate the partition at data place, and obtain the 3rd node belonging to described partition.

Matching unit 705 obtains and described to be migrated point for mating from the index area of described 3rd node The index information that the value of district Partition is identical.

Index information in index area includes that storing the Data Identification of data, the subregion at place and data deposits Storage address, specifically includes key, partition identification and value address.Matching unit 705 matches partition The value index information identical with partition identification to be migrated.

Matching unit 705, from the index area of described 3rd node at described subregion place to be migrated, will be treated Migrate and partition identification and the 3rd node index area have stored the place partition identification of data match, must To the index information matched with this partition identification to be migrated.If having multiple partitioned storage to be migrated not On same multiple 3rd nodes, from the index area of multiple 3rd nodes, mate subregion mark to be migrated the most respectively Know, respectively obtain the index information of the subregion to be migrated matched on multiple 3rd node.

When index area includes multiple bucket ID, matching unit 705 can also scanning index in batches District, according to the bucket ID number of memory configurations, reads corresponding index information corresponding for bucket ID Get in internal memory.

Now, matching unit 705 specifically includes: subelement and coupling subelement (not shown) in batches. Subelement is for reading the index information at least one subindex district in described index area, specifically in batches in batches For reading the index information of different bucket bucket ID in batches.Subelement is reading every time in batches in batches Time, and recording this described Bucket ID read, in order to it is initial that acquisition is read the most in batches Bucket ID.Coupling subelement is used for from this described index information read of described subelement in batches, Coupling obtains the index information identical with described partition identification to be migrated.

Sequencing unit 707 is for the data storage to the described index information that matching unit 705 coupling obtains Address is ranked up, and ranking results is sent to described 3rd node, in order to described 3rd node sequence Read described data to be migrated and migrate to destination node.

Such as, when acquiring unit 701 get need migrate partition be partition 19. It is the index information of partition 19 that matching unit 705 matches all partition in index area. Such as, the index information that matching unit 705 matches include < key11, partition_K11=19, Value_LBA_K11=10>,<key12, partition_K12=19, value_LBA_K12=40>, <key22, partition_K22=19, value_LBA_K22=60>,<key34, partition_K34=19, Value_LBA_K34=30>,<key41, partition_K41=19, value_LBA_K41=20>.Sequence Unit 707 is K11, K41, K34, K12, K22 according to the result that value address is ranked up obtaining. Ranking results is issued hard disk, hard disk then can read in order, it is to avoid reads according to existing method Need scan full hard disk when taking and the situation of unordered random reading, promote overall performance.

Fig. 7 is the schematic diagram of a kind of data-storage system that the embodiment of the present invention provides, and this storage system is Use the distributed memory system of key-value key-value form, as it is shown in fig. 7, this data-storage system Including: a storage management node 10 and multiple data memory node 20.Storage management node 10 and Mutual communication is completed by bus between data memory node 20.Storage management node 10 is to install There is the data memory node of distributed coordination systems soft ware, in order to coordinate and to manage whole distributed storage system System.

Fig. 8 is a kind of schematic diagram storing management node 10 that the embodiment of the present invention provides, storage management joint Point 10 is probably the host server comprising computing capability, or personal computer PC, or can take The portable computer of band or terminal etc., the specific embodiment of the invention does not manage the tool of node to storage Body realizes limiting.As shown in Figure 8, storage management node 10 includes processor 101, communication interface 102, Memorizer 103 and bus 104.

The processor 101 of storage management node 10, communication interface 102, memorizer 103 is by bus 104 Complete mutual communication.Communication interface 102 is used for and net element communication, such as with data memory node 20 Deng, it is used for receiving or send data storage, data query or data migration task instruction.Processor 101 For performing program 1031, processor 101 is probably a central processor CPU, or specific collection Become circuit ASIC(Application Specific Integrated Circuit), or be configured Become to implement one or more integrated circuits of the embodiment of the present invention.Memorizer 103 is used for program of depositing 1031. Memorizer 103 may comprise high-speed RAM memorizer, it is also possible to also includes nonvolatile memory (non-volatile memory), for example, at least one disk memory.Wherein, program 1031 can To include program code, described program code includes computer-managed instruction.As it is shown in figure 9, program 1031 May include that computing unit 301.

When carrying out data storage, communication interface 102 is used for storing management node 10 and receives data to be stored Data Identification with described data to be stored.Computing unit 301 is for according to communication interface 102 acquisition Data Identification is calculated the first subregion that described data to be stored will store, and obtains described first point Primary nodal point belonging to district.This primary nodal point is a node in data memory node 20.According to calculating The result of calculation of unit 301, the data to be stored, described to be stored that will be obtained by communication interface 102 It is right that the first partition identification that the Data Identification of data and the data described to be stored determined will store is sent to The data memory node 20 answered.

When carrying out data query, the communication interface 102 of storage management node 10 receives the to be checked of input The Data Identification of data.Computing unit 301 is for the data to be checked according to communication interface 102 acquisition Data Identification is calculated second subregion at described data place to be checked, and obtains described second subregion institute The secondary nodal point belonged to, this secondary nodal point is a node in data memory node 20.According to computing unit The result of calculation of 301, and is determined the Data Identification of data to be stored that obtains by communication interface 102 Second partition identification at described data place to be checked is sent to the data memory node 20 of correspondence.

When carrying out Data Migration, the communication interface 102 of storage management node 10 receives subregion to be migrated. The mark of the computing unit 301 subregion to be migrated for obtaining according to communication interface 102 is calculated to be waited to move Move the 3rd node belonging to subregion.If the Data Identification for data to be migrated that communication interface 102 receives, Then computing unit 301 is calculated the 3rd subregion at described data place to be migrated according to described Data Identification, And obtain the 3rd node belonging to described 3rd subregion.Described 3rd node is in data memory node 20 One node.According to the result of calculation of computing unit 301, by communication interface 102, waiting of obtaining is moved Move partition identification, or the Data Identification of data to be migrated and determine the 3rd of data place described to be migrated Partition identification is sent to the data memory node 20 of correspondence.

Figure 10 is the schematic diagram of a kind of data memory node that the embodiment of the present invention provides, data memory node 20 are probably the host server comprising computing capability, or personal computer PC, or portability Portable computer or terminal etc., the specific embodiment of the invention not concrete to data memory node Realization limits.As shown in Figure 10, data memory node 20 includes processor 201, communication interface 202, Memorizer 203 and bus 204.

The processor 201 of data memory node 20, communication interface 202, memorizer 203 is by bus 204 Complete mutual communication.Communication interface 202 is used for and net element communication, such as with storage management node 10 Deng, the communication information that the communication interface 102 for receiving, storing and managing node 10 sends.Processor 201 For performing program 2031, processor 201 is probably a central processor CPU, or specific collection Become circuit ASIC(Application Specific Integrated Circuit), or be configured Become to implement one or more integrated circuits of the embodiment of the present invention.Memorizer 203 is used for program of depositing 2031. Memorizer 203 may comprise high-speed RAM memorizer, it is also possible to also includes nonvolatile memory (non-volatile memory), for example, at least one disk memory.Wherein, as shown in figure 11, Program 2031 may include that memory element 401, indexing units 402, duplicate removal unit 403, matching unit 404, unit 405 and sequencing unit 406 are read.

When carrying out data storage, communication interface 202 is for the communication interface of receiving, storing and managing node 10 102 send data described to be stored, the Data Identification of described data to be stored and determine described to be stored The first partition identification that data will store.Memory element 401 is for by the data of described data to be stored Mark and described data to be stored are stored respectively in memorizer 203, and record address data memory.Index List Unit 402 is true for the Data Identification of data to be stored, the computing unit 301 communication interface 202 obtained The address data memory of first partition identification at fixed place and memory element 401 record generates index information, And this index information is added in the index area of this memory node 20, it is recorded on memorizer 203.

Wherein, also including duplicate removal unit 403 before memory element 401, duplicate removal unit 403 is used for judging Whether the index area of this memory node 20 exists the index identical with the Data Identification of described data to be stored Information, when there is not the identical index information of described Data Identification in described index area, triggers storage single Unit 401 stores；When described index area does not exist the identical index information of described Data Identification, The most do not trigger memory element 401 to store.

When carrying out data query, communication interface 202 is for the communication interface of receiving, storing and managing node 10 The Data Identifications described to be stored of 102 transmissions and the second subregion mark at the data place described to be checked determined Know.Matching unit 404 is for inquiry from the index area of notebook data memory node and described inquiry data The index information that Data Identification matches, the index information in described index area includes the number storing data According to mark, the subregion at place and address data memory.Read unit 405 for according to matching unit 404 Address data memory in the index information of the data described to be checked obtained, from memorizer 203 correspondence Address data memory reads described data to be checked.

Wherein, when multiple data to be checked are processed, sequencing unit 406, sequencing unit are also included 406 for the described address data memory arrived matching unit 404 for multiple described Data Matching to be checked It is ranked up, is sent to ranking results read unit 405.Read unit 405 according to sequencing unit 406 Ranking results, order read from the address data memory that described data to be checked are corresponding described to be checked Data.

When carrying out Data Migration, communication interface 202 is for the communication interface of receiving, storing and managing node 10 102 partition identification to be migrated sent, or the Data Identification of data to be migrated and the number described to be migrated that determines The 3rd partition identification according to place.Matching unit 404 is used for from the index area of notebook data memory node, The place partition identification having stored data in partition identification to be migrated and index area is matched, obtain with should The index information that partition identification to be migrated matches.Sequencing unit 406 is for mating matching unit 404 The address data memory of the described index information obtained is ranked up, and ranking results is sent to memorizer 203, read described data to be migrated in order to order and migrate to destination node.If there being multiple to be migrated point District is stored on different multiple 3rd nodes, mates the most respectively and treat from the index area of multiple 3rd nodes Migrate partition identification, respectively obtain the index information of the subregion to be migrated matched on multiple 3rd node.

The data processing method of embodiment of the present invention offer and device, increase at least in the index area of data <key, partition, value_LBA>information, divides to open by key and value and deposits, by value (value_LBA) is ranked up in address, original random access hard disk is made into can sequential access hard disk, nothing Scan full hard disk need to be carried out, promote overall performance, it addition, batch scanning index area can be passed through, be quickly found out Need all<key, the value_LBA>corresponding for partition migrated, it is simple to concurrent, batch operation, Support breakpoint transmission, improve the utilization rate of inter-node bandwidth, improve digital independent performance.

Professional should further appreciate that, describes in conjunction with the embodiments described herein The unit of each example and algorithm steps, it is possible to come with electronic hardware, computer software or the combination of the two Realize, in order to clearly demonstrate the interchangeability of hardware and software, the most according to function Generally describe composition and the step of each example.These functions are come with hardware or software mode actually Perform, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can be to often Individual specifically should being used for uses different methods to realize described function, but this realization it is not considered that Beyond the scope of this invention.

The method described in conjunction with the embodiments described herein or the step of algorithm can use hardware, process The software module that device performs, or the combination of the two implements.Software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable ROM, Other form any well known in depositor, hard disk, moveable magnetic disc, CD-ROM or technical field Storage medium in.

Above-described detailed description of the invention, is carried out the purpose of the present invention, technical scheme and beneficial effect Further describe, be it should be understood that the foregoing is only the present invention detailed description of the invention and , the protection domain being not intended to limit the present invention, all within the spirit and principles in the present invention, done Any modification, equivalent substitution and improvement etc., should be included within the scope of the present invention.

Claims

1. a data processing method, it is characterised in that be applied to key-value key-value storage system, Described method includes:

The Data Identification of described data to be stored, first partition identification at place and address data memory is raw Become index information, and this index information is added in the index area of described primary nodal point；

Described index area includes at least one subindex district, according to the institute in the index information to described generation State Data Identification carry out the result of Hash calculation or determine described generation according to the size order of Data Identification The index information subindex district that will deposit, the index information of described generation is deposited in and described determines In described subindex district；

Obtain the 3rd node belonging to described subregion to be migrated；

Data processing method the most according to claim 1, it is characterised in that by described number to be stored According to Data Identification, first partition identification at place and address data memory generate index information, and should Index information adds in the index area of described primary nodal point, including:

Data processing method the most according to claim 1, it is characterised in that obtaining described first After primary nodal point belonging to subregion, also include:

Data processing method the most according to claim 1, it is characterised in that also include:

Obtain the Data Identification of the data to be checked of input；

Data processing method the most according to claim 4, it is characterised in that from described second section Before point reads described data to be checked, also include:

6. a data processing equipment, it is characterised in that be applied to key-value key-value storage system, Described device includes:

Indexing units, the Data Identification of the data to be stored by described acquiring unit is obtained, described based on The address data memory of the first partition identification and described unit records of calculating the place that unit determines generates Index information, and this index information is added to the index of the described primary nodal point that described computing unit determines Qu Zhong；

Described acquiring unit is additionally operable to, when meeting preset zoned migration condition, obtain subregion to be migrated；

Described device also includes:

Data processing equipment the most according to claim 6, it is characterised in that described indexing units will Data Identification in the index information of described generation is existing index information with the index area of described primary nodal point Data Identification compare, put in order according to set in advance, determine in the index information of described generation The storage position in described existing index information of Data Identification, the index information of described generation is added It is added to described storage position.

Data processing equipment the most according to claim 6, it is characterised in that described device also includes:

Data processing equipment the most according to claim 6, it is characterised in that described acquiring unit is also For obtaining the Data Identification of the data to be checked of input；

Described device also includes:

Data processing equipment the most according to claim 9, it is characterised in that described device also wraps Include: