CN102968498B - Data processing method and device - Google Patents
Data processing method and device Download PDFInfo
- Publication number
- CN102968498B CN102968498B CN201210516613.1A CN201210516613A CN102968498B CN 102968498 B CN102968498 B CN 102968498B CN 201210516613 A CN201210516613 A CN 201210516613A CN 102968498 B CN102968498 B CN 102968498B
- Authority
- CN
- China
- Prior art keywords
- data
- index information
- stored
- identification
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The present embodiments relate to a kind of data processing method and device, described method includes: obtain data to be stored and the Data Identification of described data to be stored;It is calculated, according to described Data Identification, the first subregion that described data to be stored will store, and obtains the primary nodal point belonging to described first subregion;Described Data Identification and the described data to be stored of described data to be stored are stored respectively in described primary nodal point, and record address data memory;The Data Identification of described data to be stored, first partition identification at place and address data memory are generated index information, and this index information is added in the index area of described primary nodal point.The embodiment of the present invention need not carry out hard disk overall surface sweeping, reduces the number of times to hard disk read operation when Partition migrates, improves data reading performance using redundancy.
Description
Technical field
The present invention relates to memory system technologies field, particularly relate to a kind of data processing method and device.
Background technology
Cloud storage, similar with cloud computing, it refers to by cluster application, grid or distributed document
Various types of storage devices a large amount of in network are gathered by the functions such as system by application software
Collaborative work, the common system that data storage and Operational Visit function are externally provided.
Cloud storage uses distributed hashtable (Distributed Hash Table, DHT) technology groups to become distribution
Formula file system (i.e. distributed storage cluster, hereinafter referred to as cluster), each memory node is according to unanimously
Property Hash (hash) algorithm is assigned independent multiple subregions (Partition).Owing to cloud storage is adopted
Forming Mass storage pond with cheap low reliable hardware, storage hardware fault is normality;Simultaneously for meeting money
The elastic supply in source, often occurs that memory node is dynamically added or leaves cluster.When one malfunctions or
When carrying out dynamic capacity-expanding, capacity reducing, Partition sequence can redistribute on node, management on node
Partition scope can change, and carries out again the migration operation of load balancing (rebalance),
Some Partition needs to move to other node, and the Partition that malfunctioning node undertakes will be by other
Node undertakes, and newly-increased node undertakes the Partition that a part is undertaken by other nodes, thus ensures respectively to deposit
Load balancing between storage node.
Existing data storage format generally uses open source software Tokyo Cabinet HDB/BDB/FDB(to breathe out
Uncommon data base/multipath tree data base/fixed-length record data base) type of organization.Memory node is deposited
The data of storage include a series of key-value(key-value) right, at the bucket array of memory node
(Bucket Array) deposits key-value chained list successively, corresponding to the bucket mark of order
Bucket ID, and key-value is to the storage ground on the storage medium (such as hard disk etc.) of memory node
Location is scattered.It is to say, for different Key-value corresponding for bucket ID and same Bucket
Different Key-value Laden Balances corresponding for ID are on hard disk.
Thus, memory node, when migrating data, needs to carry out scan full hard disk, often one key-value of scanning,
According to the partition information of storage in the value scanned, with the needs migration specified
Partition compares, and as identical, migrates.The read-write of hard disk is wasted very by such mode
Seriously, hard disk less than the 5% of overall capacity, but may be entered by the real Partition needing to migrate
Row scan full hard disk.And, when Data Migration, it is only capable of serial key-value and mates and to send migration dynamic
Making, I/O number of hard disk is too much, inefficiency.
Summary of the invention
In view of this, it is an object of the invention to provide a kind of data processing method and device, it is not necessary to firmly
Dish carries out scan full hard disk, reduces the number of times to hard disk read operation when Partition migrates, improves data and read
Take efficiency, can concurrently send key-value when Data Migration, improve the utilization rate of inter-node bandwidth,
Improve digital independent performance.
For achieving the above object, embodiment of the present invention first aspect provides a kind of data processing method, institute
The method of stating includes:
Obtain data to be stored and the Data Identification of described data to be stored;
It is calculated, according to described Data Identification, the first subregion that described data to be stored will store, and obtains
Obtain the primary nodal point belonging to described first subregion;
Described Data Identification and the described data to be stored of described data to be stored are stored respectively in described
One node, and record address data memory;
The Data Identification of described data to be stored, first partition identification at place and address data memory is raw
Become index information, and this index information is added in the index area of described primary nodal point.
In conjunction with first aspect, in the first possible implementation of first aspect, described index area bag
Include at least one subindex district, breathe out according to the described Data Identification in the index information to described generation
Wish the result calculated or determine that the index information of described generation will be deposited according to the size order of Data Identification
The subindex district put, deposits in the index information of described generation in the described described subindex district determined.
In conjunction with the first possible implementation of first aspect or first aspect, in the reality that the second is possible
In existing mode, the Data Identification of described data to be stored, first partition identification at place and data are stored
Address generates index information, and is added to by this index information in the index area of described primary nodal point, including:
By rope existing in the index area with described primary nodal point of the Data Identification in the index information of described generation
The Data Identification of fuse breath compares, and puts in order according to set in advance, determines the index of described generation
The Data Identification in information storage position in described existing index information, believes the index of described generation
Breath adds described storage position to.
In conjunction with the first possible implementation of first aspect or first aspect, in the reality that the third is possible
In existing mode, after obtaining the primary nodal point belonging to described first subregion, also include:
Judge whether the index area of described primary nodal point exists and the Data Identification phase of described data to be stored
With index information, when described index area does not exist the identical index information of described Data Identification, then
Perform Data Identification and the described data to be stored of described data to be stored are stored respectively in described first segment
The step of point;When described index area exists the identical index information of described Data Identification, do not perform
Data Identification and the described data to be stored of described data to be stored are stored respectively in described primary nodal point
Step.
In conjunction with the first possible implementation of first aspect or first aspect, in the 4th of first aspect
Plant in possible implementation, also include:
Obtain the Data Identification of the data to be checked of input;
Data Identification according to described data to be checked is calculated second point of described data place to be checked
District, and obtain the secondary nodal point belonging to described second subregion;
From the index area of described secondary nodal point, the Data Identification of inquiry and described data to be checked matches
Index information, the index information in described index area include storing the Data Identification of data, place point
District and address data memory;
According to the address data memory in the index information of the data described to be checked matched, from described
Two nodes read described data to be checked.
In conjunction with the 4th kind of possible implementation of first aspect, in the 5th kind of possible reality of first aspect
In existing mode, before reading described data to be checked from described secondary nodal point, also include:
The described address data memory of multiple described data to be checked is ranked up, according to ranking results from
In the described secondary nodal point that described data to be checked are corresponding, order reads described data to be checked.
In conjunction with the first possible implementation of first aspect or first aspect, in the 6th of first aspect
Planting in possible implementation, described method also includes:
When meeting preset zoned migration condition, obtain subregion to be migrated;
Obtain the 3rd node belonging to described subregion to be migrated;
From the index area of described 3rd node, coupling obtains own identical with described partition identification to be migrated
Index information, the index information in the index area of described 3rd node include storing data Data Identification,
The subregion at place and address data memory;
The address data memory of the described index information that coupling obtains is ranked up, and ranking results is sent out
Give described 3rd node, read described data to be migrated in order to described 3rd node sequence and migrate to mesh
Mark node.
Second aspect, the embodiment of the present invention additionally provides a kind of data processing equipment, and described device includes:
Acquiring unit, for obtaining data to be stored and the Data Identification of described data to be stored;
Computing unit, is calculated described to be stored for the Data Identification obtained according to described acquiring unit
The first subregion that data will store, and obtain the primary nodal point belonging to described first subregion;
Memory element, for depositing Data Identification and the described data to be stored of described data to be stored respectively
It is stored in described primary nodal point, and records address data memory;
Indexing units, the Data Identification of the data to be stored by described acquiring unit is obtained, described based on
The address data memory of the first partition identification and described unit records of calculating the place that unit determines generates
Index information, and this index information is added to the index of the described primary nodal point that described computing unit determines
Qu Zhong.
In conjunction with second aspect, in the first possible implementation of second aspect, described index area bag
Include at least one subindex district, breathe out according to the described Data Identification in the index information to described generation
Wish the result calculated or determine that the index information of described generation will be deposited according to the size order of Data Identification
The subindex district put, deposits in the index information of described generation in the described described subindex district determined.
In conjunction with the first possible implementation of second aspect or second aspect, in the second of second aspect
In kind possible implementation, described indexing units by the Data Identification in the index information of described generation with
In the index area of described primary nodal point, the Data Identification of existing index information compares, according to presetting
Put in order, determine Data Identification in the index information of described generation at described existing index information
In storage position, the index information of described generation is added to described storage position.
In conjunction with the first possible implementation of second aspect or second aspect, in the 3rd of second aspect
Planting in possible implementation, described device also includes:
Duplicate removal unit, in the index area judging described primary nodal point that described computing unit obtains whether
There is the index information identical with the Data Identification of described data to be stored, when described index area does not exists
During the identical index information of described Data Identification, trigger described memory element;When described index area is not deposited
When the index information that described Data Identification is identical, the most do not trigger described memory element.
In conjunction with the first possible implementation of second aspect or second aspect, in the 4th of second aspect
Planting in possible implementation, described acquiring unit is additionally operable to obtain the data mark of the data to be checked of input
Know;
Described computing unit is additionally operable to the Data Identification meter of the data to be checked obtained according to described acquiring unit
Calculate the second subregion obtaining described data place to be checked, and obtain the second section belonging to described second subregion
Point;
Described device also includes:
Matching unit, for the data of inquiry from the index area of described secondary nodal point with described inquiry data
The index information that mark matches, the index information in described index area includes the data mark storing data
Knowledge, the subregion at place and address data memory;
Read unit, in the index information of the data described to be checked obtained according to described matching unit
Address data memory, from described secondary nodal point, read described data to be checked.
In conjunction with the 4th kind of possible implementation of second aspect, in the 5th kind of possible reality of second aspect
In existing mode, described device also includes:
Sequencing unit, described in arriving for multiple described Data Matching to be checked described matching unit
Address data memory is ranked up;Described reading unit according to the ranking results of described sequencing unit from described
In the described secondary nodal point that data to be checked are corresponding, order reads described data to be checked.
In conjunction with the first possible implementation of second aspect or second aspect, in the 6th of second aspect
Planting in possible implementation, described acquiring unit is additionally operable to when meeting preset zoned migration condition,
Obtain subregion to be migrated;
Described computing unit is additionally operable to obtain the 3rd node belonging to described subregion to be migrated;
Described device also includes:
Matching unit, obtains and described subregion to be migrated for mating from the index area of described 3rd node
Identifying identical all index informations, the index information in described index area includes the data storing data
Mark, the subregion at place and address data memory;
Sequencing unit, for the data storage ground to the described index information that described matching unit coupling obtains
Location is ranked up, and ranking results is sent to described 3rd node, reads in order to described 3rd node sequence
Take described data to be migrated and migrate to destination node.
The data processing method of embodiment of the present invention offer and device, divide to open by key and value and deposit,
In the index area of data, record key, partition and value address offset on hard disk, counting
Have only to scanning index district according to when migrating, key and the value address needing to migrate can be found out, no
Need hard disk is carried out scan full hard disk, it is possible to reduce number of times to hard disk read operation when Partition migrates,
Improve data reading performance using redundancy, can concurrently send key-value when Data Migration, improve inter-node bandwidth
Utilization rate, improve digital independent performance.
Accompanying drawing explanation
A kind of data processing method flow chart that Fig. 1 provides for the embodiment of the present invention;
The data store content schematic diagram of a kind of index area that Fig. 2 provides for the embodiment of the present invention;
The data storage format schematic diagram of a kind of index area that Fig. 3 provides for the embodiment of the present invention;
The method flow diagram of a kind of data query step that Fig. 4 provides for the embodiment of the present invention;
The method flow diagram of a kind of Data Migration step that Fig. 5 provides for the embodiment of the present invention;
The schematic diagram of a kind of data processing equipment that Fig. 6 provides for the embodiment of the present invention;
The schematic diagram of a kind of data-storage system that Fig. 7 provides for the embodiment of the present invention;
A kind of schematic diagram storing management node that Fig. 8 provides for the embodiment of the present invention;
Fig. 9 is the schematic diagram of the memorizer Program of the storage management node shown in Fig. 8;
The schematic diagram of a kind of data memory node that Figure 10 provides for the embodiment of the present invention;
Figure 11 is the schematic diagram of the memorizer Program of the data memory node shown in Figure 10.
Detailed description of the invention
Below by drawings and Examples, the technical scheme of the embodiment of the present invention is done and further retouches in detail
State.
In prior art, distributed storage cluster generally uses key-value key-value structured data to store system
System, storage data generally use the form of Key-value pair to represent, at whole collection when Key is data access
Uniquely identifying in Qun, value is the data of access itself.
The cloud storage distributed storage cluster in order to ensure the robustness of system self, harmony, inside it
The application of middle Data Migration is the most widely.The most most basic includes that two kinds: one is that data redundancy is standby
Part processes, in order to ensure the fault-tolerance of cluster self, data reliability;Two is owing to cluster is by numerous
Dynamic node composition, i.e. have a node may delay the most suddenly machine, some nodes then have can
Can at a time rejoin cluster, system in order to ensure the harmony of global storage, can automatically or
Manually trigger the order making each internodal storage utilization rate balance, thus need the portion of existing node
Divide partition to move in newly-increased node, or migrate after the partition equilibrium of malfunctioning node
On other nodes.
Partition is that the memory space of memory node is divided by system according to concordance Hash (hash) algorithm
For different subregions, each memory node has multiple independent Partition.Such as, in system 10
Memory node, needs to undertake 100 Partition, corresponding partition identification be Partition 0~
These 100 Partition are assigned to 10 nodes, such as, distribute to first by Partition 99
Individual node include Partition 0~Partition 9, distribute to including of second node
Partition 10~Partition 19, distribute to the 3rd node include Partition 20~
Partition 29, the like, that distributes to the tenth node includes Partition 90~Partition
99, can be allocated out of order the most in actual applications, such as according to the side of delivery (mould 10)
Formula is distributed.
When a newly-increased node, system needs to equalize a part of partition from each node to this
New node.The most now, that distributes to first node includes Partition 0~Partition 9,
That distributes to second node includes Partition 10~Partition 18, distributes to the 3rd node
Include Partition 19~Partition 27, the like, distribute to including of the tenth node
Partition 82~Partition 90, distribute to the 11st node include Partition 91~
Partition 99.It is to say, system calculates the partition that can obtain needing to migrate by hash,
Such as, partition 19 needs to move to the 3rd node from second node, partition 28,
Partition 29 needs to move to the 4th node from the 3rd node, the like, partition
91~Partition 99 need to move to the 11st node from the tenth node.Certainly, if pressed
According to the mode of delivery (mould 11), the partition that needs migrate can be different.
The data processing method that the embodiment of the present invention provides can be connect by arbitrary in key-value storage system
The controller of the memory node receiving data processing task performs, it is also possible to due to the 3rd of memory node the
Side's processor performs.Fig. 1 is the data processing method flow chart that the present embodiment provides, as it is shown in figure 1,
The data processing method of the embodiment of the present invention includes:
Step S101, obtain data to be stored and the Data Identification of described data to be stored.
Data to be stored generally use the form of key (Key)-value (value) to represent, Key is that data are deposited
Uniquely identifying in whole cluster when taking, value is the data of access itself.
As a example by storing a certain bar microblog data, the key of the data to be stored that system generates uses character string
Represent, potentially include time, user profile and serial number, represent when certain user is sending out
A certain bar microblogging, value is then concrete content of microblog, such as " today has a meal at * * * ".
For using the data to be stored that represent of key-value form, this step is then that to obtain this to be stored
Data itself (i.e. value) and Data Identification key.
Step S102, be calculated that described data to be stored will store according to described Data Identification first
Subregion, and obtain the primary nodal point belonging to described first subregion.
Obtaining the eigenvalue of described key, described eigenvalue represents described key for unique, obtains key
The method of eigenvalue can be that described key is carried out Hash calculation, obtain the cryptographic Hash of described key,
Using the cryptographic Hash that obtains as the eigenvalue of described Key, obtain described number to be stored according to key eigenvalue
Partition according to place.Specifically, the value of described key is carried out modulo operation, determines place
partition。
Then, then the Partition mark at data place to be stored is carried out concordance Hash calculation, with
The node of correspondence is determined according to partition.If calculated data place to be stored
Partition is designated 70, has 10 nodes in distributed type assemblies, after carrying out delivery (mould 10) computing,
May determine that these data to be stored should be on first node.
Certainly, when obtaining the node belonging to partition, it is also possible to configured by system cluster
Partition segment information table is inquired about and is obtained node corresponding to partition.Described partition divides
Segment information table can be pre-configured with, it is also possible to carries out the most more according to the storage condition of actual cluster
New.
Wherein, after obtaining the node belonging to partition, also include: judge described primary nodal point
Index area in whether there is the step of the index information identical with the Data Identification of described data to be stored,
When described index area does not exist described identical for Data Identification key index information, perform step S103
Store;When described index area exists index information identical for described key, do not store,
Illustrate on this memory node, to have stored described data to be stored.
Step S103, described Data Identification and the described data to be stored of described data to be stored are deposited respectively
It is stored in described primary nodal point, and records address data memory.
Key and value of data to be stored is deposited in respectively the described primary nodal point that step S102 obtains
Storage medium on.
Address data memory is that described data to be stored are on the storage medium (such as hard disk) of described node
Address offset amount, is value address, represents value address offset amount on the hard disk of this node,
As represented with LBA address.
Step S104, by the Data Identification of described data to be stored, first partition identification at place and data
Storage address generates index information, and is added to by this index information in the index area of described primary nodal point.
Index information in the index area of memory node is included on this memory node each storage stored
The index information of data, each index information includes: has stored the Data Identification of data, has stored number
According to first partition identification at place with stored the address data memory of data, i.e. include: stored data
Key, partition and the value address information at place.
Index area can include multiple subindex district, and the index information of described generation is deposited in described sub-rope
Draw in district.
Fig. 2 is the data store content schematic diagram of the index area that the embodiment of the present invention provides, as in figure 2 it is shown,
This index area includes m BucketID, includes N bar index information, key11 table at Bucket1
Show that the key of Article 1 index information in Bucket1, partition_K11 represent key11 place
Partition, value_LBA_k11 represent that value corresponding to value address, i.e. key11 is at hard disk
On address offset amount.When inquiring about data, it is only necessary to mate key in index area and can find correspondence
Storage data, migrate data time, the partition at this key place can be calculated according to key,
In index area, mate the partition of correspondence, i.e. can find corresponding with on partition to be migrated
Storage data.
The index information of described generation is deposited in described subindex district include: according to described generation
Data Identification in index information carries out the result of Hash calculation or size order according to Data Identification is true
The mark in the subindex district that the index information of fixed described generation will be deposited, by the index information of described generation
Deposit in the described described subindex district determined.
When there being newly-generated index information, by the key in the index information of described generation and described first
In the index area of node, the key of existing index information compares, and puts in order according to set in advance, really
The key in the described index information of fixed described generation storage position in described existing index information, will
The index information of described generation adds described storage position to.
Or, when there being newly-generated index information, it is also possible to the key in the index information of described generation
Carry out Hash calculation, to determine the bucket ID that the described index information of described generation will be deposited, such as,
Key can carry out delivery, and (mould m) calculates, and obtains the value of bucket ID, by the described rope of this generation
Fuse breath is deposited in described bucket ID.
Wherein, the data processing method of the embodiment of the present invention after completing the storing step of data to be stored,
The step of data query can also be included, in order to inquire about or to read data to storing data, thus,
The step of data query is performed on the basis of the embodiment shown in Fig. 1.Fig. 3 is the number that the present embodiment provides
According to the method flow diagram of query steps, as it is shown on figure 3, the step of described data query includes:
The Data Identification of the data to be checked that step S201, acquisition input.
For using the data of key-value form storage, when reading data to be checked, receive user
Input key, certain user can also input inquiry word, system query word is converted to correspondence key.
Step S202, Data Identification according to described data to be checked are calculated described data institute to be checked
The second subregion, and obtain the secondary nodal point belonging to described second subregion.
The method identical with step S102 is used to be calculated the Partition at data place to be checked, root
According to calculated Partition, determine the secondary nodal point at described data place to be checked.
Step S203, from the index area of described secondary nodal point inquiry and the data mark of described data to be checked
The index information of sensible coupling.
Index information in described index area includes the subregion sum storing the Data Identification of data, place
According to storage address.
From the index area of the described secondary nodal point at described data place to be checked, by the number of data to be checked
Match according to the Data Identification having stored data in mark key and this secondary nodal point index area, obtain and this
The index information that the key of data to be checked matches, thus obtain the address data memory of data to be checked
(i.e. value address).If there being multiple data to be checked to be stored on different multiple secondary nodal points,
From the index area of multiple secondary nodal points, mate the Data Identification of data to be checked the most respectively, respectively obtain many
The index information matched on individual secondary nodal point, obtains the value address of multiple queries data.
Address data memory in the index information of the data described to be checked that step S204, basis match,
Described data to be checked are read from described secondary nodal point.
Wherein, if there being multiple data to be checked, after step S203, also include: to multiple described
The described address data memory of data to be checked is ranked up, step S204 then according to ranking results from described
In the storage medium of the described secondary nodal point that data to be checked are corresponding, order reads described data to be checked.
When distributed memory system occurs that memory node increases or deletes, need to carry out rebalance's
Migration operation, partition can redistribute on memory node, thus, the data of the embodiment of the present invention
Processing method also includes the step of Data Migration, partition to be migrated is carried out migration operation, at Fig. 1
The step of Data Migration is performed on the basis of shown embodiment.Fig. 4 is the Data Migration that the present embodiment provides
The method flow diagram of step, as shown in Figure 4, described Data Migration step includes:
Step S301, when meeting preset zoned migration condition, obtain subregion to be migrated.
When preset zoned migration condition can include occurring node increase or knot removal.
When node increases or deletes, system determines the partition needing to migrate, then the system that obtains determines
Partition to be migrated.For example, it is desired to the partition migrated is partition 19, partition
28, then obtain those Partition.
Certainly, if there is the node of additions and deletions, system is it is confirmed that need each number to be migrated migrated
According to, then the key treating migration data carries out Hash calculation, obtains the partition at data place to be migrated.
Step S302, obtain the 3rd node belonging to described subregion to be migrated.
The method identical with step S102 is used to obtain the 3rd node belonging to Partition to be migrated.
Step S303, mate from the index area of described 3rd node and obtain and described partition identification to be migrated
Identical all index informations.
Index information in described index area includes the subregion sum storing the Data Identification of data, place
According to storage address.
From the index area of described 3rd node at described subregion place to be migrated, by partition identification to be migrated
Match with the place partition identification having stored data in the 3rd node index area, obtain to be migrated with this
The index information that partition identification matches.If having multiple partitioned storage to be migrated the different the multiple 3rd
On node, from the index area of multiple 3rd nodes, mate partition identification to be migrated the most respectively, respectively obtain
The index information of the subregion to be migrated matched on multiple 3rd nodes.
From the index area that step S104 is formed, match the value of partition and Partition to be migrated
The identical index information of value, including key key, Partition and value address.Such as, match
In index area all partition be partition 19, the index information of partition 28.
When index area includes multiple bucket ID, it is also possible to scanning index district in batches, according to internal memory
The bucket ID number of configuration, reads corresponding index information in corresponding bucket ID in internal memory.
Now, this step is mated from index area and is obtained and the value phase of described subregion Partition to be migrated
Same index information, including:
Read the index information at least one subindex district in described index area in batches, be specially and read in batches
The index information of different bucket bucket ID.
When reading every time in batches, record this described Bucket ID read, in order to obtain and divide next time
Criticize the initial Bucket ID read.
From the described index information that this reads, coupling obtains identical with described partition identification to be migrated
Index information.
Step S304, the address data memory of described index information obtaining coupling are ranked up, and will
Ranking results is sent to described 3rd node, reads described data to be migrated in order to described 3rd node sequence
And migrate to destination node.
Multiple index informations that step S303 is matched, big according to the value address in index information
Little it is ranked up.Such as, the index information matched include < key11, partition_K11=19,
Value_LBA_K11=10>,<key12, partition_K12=19, value_LBA_K12=40>,
<key22, partition_K22=19, value_LBA_K22=60>,<key34, partition_K34=19,
Value_LBA_K 34=30>,<key41, partition_K41=19, value_LBA_K41=20>.According to
The result that value address is ranked up obtaining is K11, K41, K 34, K12, K22.Ranking results is sent out
To hard disk, hard disk then can read in order, it is to avoid needs totally when reading according to existing method
Scan and the situation of unordered random reading, promote overall performance.
Described ranking results can be, but not limited to organize according to tree structure, indexes for example with B+ tree
Or bitmap index etc. organizes.Fig. 5 is the data storage format signal of the index area that the present embodiment provides
Figure, as it is shown in figure 5, include multistage non-leaf node, stores value_LBA on non-leaf node
Value, leafy node stores key and value of concrete storage data.
When there being newly-increased index information, the value address in the index information that will newly form and described rope
Draw the value address of existing index information in district to compare, to determine the described index information of new composition
Particular location on B+ tree.If the value of newly-increased value address compares with current node in B+ tree
Relatively, if newly-increased is relatively big, then it is placed on the node of current node top;If newly-increased is less, then
It is placed on the node that current node is following;By that analogy, newly-increased index is distributed on B+ tree.
The data processing method that the embodiment of the present invention provides is applicable to key-value distributed memory system,
When storing data, divide to open by key and value and deposit, the index area of node records and has stored data
Address offset on node hard disk, when migrating data, it is only necessary to scanning index district, just can find to need
Key and value address to be migrated, reduces the number of times of hard disk I/O, and obtain batch is to be migrated
Data sorting, optimizes disk read-write order, promotes overall performance.
The detailed description that the data processing method being above being provided the embodiment of the present invention is carried out, below right
The data processing equipment that the embodiment of the present invention provides is described in detail.
The data processing equipment that the embodiment of the present invention provides is applied in key-value storage system.Fig. 6
It is the schematic diagram of the data processing equipment that the present embodiment provides, as shown in Figure 6, the number of the embodiment of the present invention
Include according to processing means: acquiring unit 701, computing unit 702, memory element 703, indexing units
704, matching unit 705, reading unit 706 and sequencing unit 707.
This data processing equipment mainly includes data storage, data query and three duties of Data Migration,
Illustrate separately below.
Carry out data storage time, the parts of groundwork include acquiring unit 701, computing unit 702,
Memory element 703 and indexing units 704.
Acquiring unit 701 is for obtaining data to be stored and the Data Identification of described data to be stored.
Data to be stored generally use the form of key (Key)-value (value) to represent, Key is that data are deposited
Uniquely identifying in whole cluster when taking, value is the data of access itself.
As a example by storing a certain bar microblog data, the key of the data to be stored that acquiring unit 701 obtains adopts
With string representation, potentially include time, user profile and serial number, represent that certain user is at some
Several having distributed a certain bar microblogging, value is then concrete content of microblog, such as " today has a meal at * * * ".
For the data to be stored using key-value form to represent, acquiring unit 701 is then to obtain to be somebody's turn to do
Data to be stored itself (i.e. value) and Data Identification key.
Computing unit 702 is waited to deposit described in being calculated according to the Data Identification of acquiring unit 701 acquisition
First subregion at storage data place, and obtain the primary nodal point belonging to the first subregion.
Computing unit 702 obtains the eigenvalue of described key, and described eigenvalue represents described key for unique,
The method of the eigenvalue obtaining key can be that described key is carried out Hash calculation, obtains described key's
Cryptographic Hash, using the cryptographic Hash that obtains as the eigenvalue of described Key, obtains described according to key eigenvalue
The Partition at data place to be stored.
Then, computing unit 702 can carry out concordance Hash calculation to the mark at data place to be stored,
To determine the node of correspondence according to partition.If the calculated number to be stored of computing unit 702
It is designated 70 according to the Partition at place, distributed type assemblies has 10 nodes, then carries out delivery (mould
10) after computing, it may be determined that these data to be stored should be on first node.
Wherein, the data handling system of the embodiment of the present invention can also include duplicate removal unit (not shown),
Duplicate removal unit is for when the node belonging to computing unit 702 obtains described subregion, it is judged that described first segment
Whether the index area of point exists the index information identical with the Data Identification of described data to be stored, works as institute
State time index area does not exist described Data Identification identical index information, trigger memory element 703.Work as institute
Stating when there is index information identical for described key in index area, duplicate removal unit does not the most trigger memory element 703
Store, illustrate on this memory node, to have stored described data to be stored.
Memory element 703 is for by the described Data Identification of described data to be stored and described data to be stored
It is stored respectively in described primary nodal point, and records address data memory.
Key and value of data to be stored is stored respectively in described primary nodal point by memory element 703,
And record value address.
Address data memory is the described data to be stored address offset amounts on the hard disk of described node, i.e.
For value address, represent value address offset amount on the hard disk of this node, as with LBA address
Represent.
Operate for the ease of subsequent query and migration etc., wherein, the data-storage system of the embodiment of the present invention
Also including indexing units 704, indexing units 704 is for by the Data Identification of described data to be stored, place
The first partition identification and address data memory generate index information, and this index information is added to described
In the index area of primary nodal point.
Index information in the index area of memory node is included on this memory node each storage stored
The index information of data, each index information includes: has stored the Data Identification of data, has stored number
According to first partition identification at place with stored the address data memory of data, i.e. include: storage data
Key, partition and the value address information at place.
Index area can include multiple subindex district, and the index information of described generation is deposited in described sub-rope
Draw in district.
Indexing units 704 carries out Hash calculation according to the Data Identification in the index information to described generation
Result or the size order according to Data Identification determine the sub-rope that the index information of described generation will be deposited
Draw the mark in district, the index information of described generation is deposited in the subindex that described subindex district mark is corresponding
Qu Zhong.
When there being newly-generated index information, indexing units 704 is by the key in the index information of described generation
The key of existing index information compares, according to set in advance with the index area of described primary nodal point
Put in order, determine key the depositing in described existing index information in the described index information of described generation
Storage space is put, and the index information of described generation adds to described storage position.
Or, when there being newly-generated index information, indexing units 704 can also be to the rope of described generation
Key in fuse breath carries out Hash calculation, to determine what the described index information of described generation will be deposited
Bucket ID, for example, it is possible to key is carried out delivery (mould m) calculate, obtain the value of bucket ID,
The described index information of this generation is deposited in described bucket ID.
When carrying out data query, the parts of groundwork include acquiring unit 701, computing unit 702,
Matching unit 705 and reading unit 706.
Acquiring unit 701 is for obtaining the Data Identification of the data to be checked of input.Computing unit 702 is used
Data Identification in the data to be checked obtained according to acquiring unit 701 is calculated described data to be checked
Second subregion at place, and obtain the secondary nodal point belonging to described second subregion.Matching unit 705 is used for
From the index area of described secondary nodal point, the index that the Data Identification of inquiry and described inquiry data matches
Information, the index information in the index area of described secondary nodal point includes storing the Data Identification of data, institute
Partition identification and address data memory.Read unit 706 for obtaining according to described matching unit
Address data memory in the index information of described data to be checked, reads described from described secondary nodal point
Data to be checked.
Matching unit 705, from the index area of the described secondary nodal point at described data place to be checked, will be treated
Data Identification key and this secondary nodal point index area of inquiry data has stored the Data Identification phase of data
Join, obtain the index information matched with the key of these data to be checked, thus obtain data to be checked
Address data memory (i.e. value address).If there have multiple data to be checked to be stored in be different multiple
On secondary nodal point, from the index area of multiple secondary nodal points, mate the Data Identification of data to be checked the most respectively,
Respectively obtain the index information matched on multiple secondary nodal point, obtain the value ground of multiple queries data
Location.
Wherein, when inquiring about multiple data to be checked, it is also possible to include sequencing unit 707, sequencing unit
707 for the described address data memory arrived matching unit 705 for multiple described Data Matching to be checked
It is ranked up.Read the unit 706 ranking results according to sequencing unit 707 from the hard disk of corresponding node
Order reads described data to be checked.
When carrying out Data Migration, the parts of groundwork include acquiring unit 701, computing unit 702,
Matching unit 705 and sequencing unit 707.
Acquiring unit 701, for when meeting preset zoned migration condition, obtains subregion to be migrated.Meter
Calculate unit 702 for obtaining the 3rd node belonging to described subregion to be migrated.
When preset zoned migration condition can include occurring node increase or knot removal.
When there is the node of additions and deletions, system determines the partition needing to migrate, then acquiring unit 701
The Partition to be migrated that acquisition system determines.For example, it is desired to the partition migrated is partition
19, partition 28, acquiring unit 701 then obtains those Partition.
If there is the node of additions and deletions, when acquiring unit 701 get be need migrate each treat
Migrate data, then the key utilizing computing unit 702 to treat migration data carries out Hash calculation, is treated
Migrate the partition at data place, and obtain the 3rd node belonging to described partition.
Matching unit 705 obtains and described to be migrated point for mating from the index area of described 3rd node
The index information that the value of district Partition is identical.
Index information in index area includes that storing the Data Identification of data, the subregion at place and data deposits
Storage address, specifically includes key, partition identification and value address.Matching unit 705 matches partition
The value index information identical with partition identification to be migrated.
Matching unit 705, from the index area of described 3rd node at described subregion place to be migrated, will be treated
Migrate and partition identification and the 3rd node index area have stored the place partition identification of data match, must
To the index information matched with this partition identification to be migrated.If having multiple partitioned storage to be migrated not
On same multiple 3rd nodes, from the index area of multiple 3rd nodes, mate subregion mark to be migrated the most respectively
Know, respectively obtain the index information of the subregion to be migrated matched on multiple 3rd node.
When index area includes multiple bucket ID, matching unit 705 can also scanning index in batches
District, according to the bucket ID number of memory configurations, reads corresponding index information corresponding for bucket ID
Get in internal memory.
Now, matching unit 705 specifically includes: subelement and coupling subelement (not shown) in batches.
Subelement is for reading the index information at least one subindex district in described index area, specifically in batches in batches
For reading the index information of different bucket bucket ID in batches.Subelement is reading every time in batches in batches
Time, and recording this described Bucket ID read, in order to it is initial that acquisition is read the most in batches
Bucket ID.Coupling subelement is used for from this described index information read of described subelement in batches,
Coupling obtains the index information identical with described partition identification to be migrated.
Sequencing unit 707 is for the data storage to the described index information that matching unit 705 coupling obtains
Address is ranked up, and ranking results is sent to described 3rd node, in order to described 3rd node sequence
Read described data to be migrated and migrate to destination node.
Such as, when acquiring unit 701 get need migrate partition be partition 19.
It is the index information of partition 19 that matching unit 705 matches all partition in index area.
Such as, the index information that matching unit 705 matches include < key11, partition_K11=19,
Value_LBA_K11=10>,<key12, partition_K12=19, value_LBA_K12=40>,
<key22, partition_K22=19, value_LBA_K22=60>,<key34, partition_K34=19,
Value_LBA_K34=30>,<key41, partition_K41=19, value_LBA_K41=20>.Sequence
Unit 707 is K11, K41, K34, K12, K22 according to the result that value address is ranked up obtaining.
Ranking results is issued hard disk, hard disk then can read in order, it is to avoid reads according to existing method
Need scan full hard disk when taking and the situation of unordered random reading, promote overall performance.
Fig. 7 is the schematic diagram of a kind of data-storage system that the embodiment of the present invention provides, and this storage system is
Use the distributed memory system of key-value key-value form, as it is shown in fig. 7, this data-storage system
Including: a storage management node 10 and multiple data memory node 20.Storage management node 10 and
Mutual communication is completed by bus between data memory node 20.Storage management node 10 is to install
There is the data memory node of distributed coordination systems soft ware, in order to coordinate and to manage whole distributed storage system
System.
Fig. 8 is a kind of schematic diagram storing management node 10 that the embodiment of the present invention provides, storage management joint
Point 10 is probably the host server comprising computing capability, or personal computer PC, or can take
The portable computer of band or terminal etc., the specific embodiment of the invention does not manage the tool of node to storage
Body realizes limiting.As shown in Figure 8, storage management node 10 includes processor 101, communication interface 102,
Memorizer 103 and bus 104.
The processor 101 of storage management node 10, communication interface 102, memorizer 103 is by bus 104
Complete mutual communication.Communication interface 102 is used for and net element communication, such as with data memory node 20
Deng, it is used for receiving or send data storage, data query or data migration task instruction.Processor 101
For performing program 1031, processor 101 is probably a central processor CPU, or specific collection
Become circuit ASIC(Application Specific Integrated Circuit), or be configured
Become to implement one or more integrated circuits of the embodiment of the present invention.Memorizer 103 is used for program of depositing 1031.
Memorizer 103 may comprise high-speed RAM memorizer, it is also possible to also includes nonvolatile memory
(non-volatile memory), for example, at least one disk memory.Wherein, program 1031 can
To include program code, described program code includes computer-managed instruction.As it is shown in figure 9, program 1031
May include that computing unit 301.
When carrying out data storage, communication interface 102 is used for storing management node 10 and receives data to be stored
Data Identification with described data to be stored.Computing unit 301 is for according to communication interface 102 acquisition
Data Identification is calculated the first subregion that described data to be stored will store, and obtains described first point
Primary nodal point belonging to district.This primary nodal point is a node in data memory node 20.According to calculating
The result of calculation of unit 301, the data to be stored, described to be stored that will be obtained by communication interface 102
It is right that the first partition identification that the Data Identification of data and the data described to be stored determined will store is sent to
The data memory node 20 answered.
When carrying out data query, the communication interface 102 of storage management node 10 receives the to be checked of input
The Data Identification of data.Computing unit 301 is for the data to be checked according to communication interface 102 acquisition
Data Identification is calculated second subregion at described data place to be checked, and obtains described second subregion institute
The secondary nodal point belonged to, this secondary nodal point is a node in data memory node 20.According to computing unit
The result of calculation of 301, and is determined the Data Identification of data to be stored that obtains by communication interface 102
Second partition identification at described data place to be checked is sent to the data memory node 20 of correspondence.
When carrying out Data Migration, the communication interface 102 of storage management node 10 receives subregion to be migrated.
The mark of the computing unit 301 subregion to be migrated for obtaining according to communication interface 102 is calculated to be waited to move
Move the 3rd node belonging to subregion.If the Data Identification for data to be migrated that communication interface 102 receives,
Then computing unit 301 is calculated the 3rd subregion at described data place to be migrated according to described Data Identification,
And obtain the 3rd node belonging to described 3rd subregion.Described 3rd node is in data memory node 20
One node.According to the result of calculation of computing unit 301, by communication interface 102, waiting of obtaining is moved
Move partition identification, or the Data Identification of data to be migrated and determine the 3rd of data place described to be migrated
Partition identification is sent to the data memory node 20 of correspondence.
Figure 10 is the schematic diagram of a kind of data memory node that the embodiment of the present invention provides, data memory node
20 are probably the host server comprising computing capability, or personal computer PC, or portability
Portable computer or terminal etc., the specific embodiment of the invention not concrete to data memory node
Realization limits.As shown in Figure 10, data memory node 20 includes processor 201, communication interface 202,
Memorizer 203 and bus 204.
The processor 201 of data memory node 20, communication interface 202, memorizer 203 is by bus 204
Complete mutual communication.Communication interface 202 is used for and net element communication, such as with storage management node 10
Deng, the communication information that the communication interface 102 for receiving, storing and managing node 10 sends.Processor 201
For performing program 2031, processor 201 is probably a central processor CPU, or specific collection
Become circuit ASIC(Application Specific Integrated Circuit), or be configured
Become to implement one or more integrated circuits of the embodiment of the present invention.Memorizer 203 is used for program of depositing 2031.
Memorizer 203 may comprise high-speed RAM memorizer, it is also possible to also includes nonvolatile memory
(non-volatile memory), for example, at least one disk memory.Wherein, as shown in figure 11,
Program 2031 may include that memory element 401, indexing units 402, duplicate removal unit 403, matching unit
404, unit 405 and sequencing unit 406 are read.
When carrying out data storage, communication interface 202 is for the communication interface of receiving, storing and managing node 10
102 send data described to be stored, the Data Identification of described data to be stored and determine described to be stored
The first partition identification that data will store.Memory element 401 is for by the data of described data to be stored
Mark and described data to be stored are stored respectively in memorizer 203, and record address data memory.Index List
Unit 402 is true for the Data Identification of data to be stored, the computing unit 301 communication interface 202 obtained
The address data memory of first partition identification at fixed place and memory element 401 record generates index information,
And this index information is added in the index area of this memory node 20, it is recorded on memorizer 203.
Wherein, also including duplicate removal unit 403 before memory element 401, duplicate removal unit 403 is used for judging
Whether the index area of this memory node 20 exists the index identical with the Data Identification of described data to be stored
Information, when there is not the identical index information of described Data Identification in described index area, triggers storage single
Unit 401 stores;When described index area does not exist the identical index information of described Data Identification,
The most do not trigger memory element 401 to store.
When carrying out data query, communication interface 202 is for the communication interface of receiving, storing and managing node 10
The Data Identifications described to be stored of 102 transmissions and the second subregion mark at the data place described to be checked determined
Know.Matching unit 404 is for inquiry from the index area of notebook data memory node and described inquiry data
The index information that Data Identification matches, the index information in described index area includes the number storing data
According to mark, the subregion at place and address data memory.Read unit 405 for according to matching unit 404
Address data memory in the index information of the data described to be checked obtained, from memorizer 203 correspondence
Address data memory reads described data to be checked.
Wherein, when multiple data to be checked are processed, sequencing unit 406, sequencing unit are also included
406 for the described address data memory arrived matching unit 404 for multiple described Data Matching to be checked
It is ranked up, is sent to ranking results read unit 405.Read unit 405 according to sequencing unit 406
Ranking results, order read from the address data memory that described data to be checked are corresponding described to be checked
Data.
When carrying out Data Migration, communication interface 202 is for the communication interface of receiving, storing and managing node 10
102 partition identification to be migrated sent, or the Data Identification of data to be migrated and the number described to be migrated that determines
The 3rd partition identification according to place.Matching unit 404 is used for from the index area of notebook data memory node,
The place partition identification having stored data in partition identification to be migrated and index area is matched, obtain with should
The index information that partition identification to be migrated matches.Sequencing unit 406 is for mating matching unit 404
The address data memory of the described index information obtained is ranked up, and ranking results is sent to memorizer
203, read described data to be migrated in order to order and migrate to destination node.If there being multiple to be migrated point
District is stored on different multiple 3rd nodes, mates the most respectively and treat from the index area of multiple 3rd nodes
Migrate partition identification, respectively obtain the index information of the subregion to be migrated matched on multiple 3rd node.
The data processing method of embodiment of the present invention offer and device, increase at least in the index area of data
<key, partition, value_LBA>information, divides to open by key and value and deposits, by value
(value_LBA) is ranked up in address, original random access hard disk is made into can sequential access hard disk, nothing
Scan full hard disk need to be carried out, promote overall performance, it addition, batch scanning index area can be passed through, be quickly found out
Need all<key, the value_LBA>corresponding for partition migrated, it is simple to concurrent, batch operation,
Support breakpoint transmission, improve the utilization rate of inter-node bandwidth, improve digital independent performance.
Professional should further appreciate that, describes in conjunction with the embodiments described herein
The unit of each example and algorithm steps, it is possible to come with electronic hardware, computer software or the combination of the two
Realize, in order to clearly demonstrate the interchangeability of hardware and software, the most according to function
Generally describe composition and the step of each example.These functions are come with hardware or software mode actually
Perform, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can be to often
Individual specifically should being used for uses different methods to realize described function, but this realization it is not considered that
Beyond the scope of this invention.
The method described in conjunction with the embodiments described herein or the step of algorithm can use hardware, process
The software module that device performs, or the combination of the two implements.Software module can be placed in random access memory
(RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable ROM,
Other form any well known in depositor, hard disk, moveable magnetic disc, CD-ROM or technical field
Storage medium in.
Above-described detailed description of the invention, is carried out the purpose of the present invention, technical scheme and beneficial effect
Further describe, be it should be understood that the foregoing is only the present invention detailed description of the invention and
, the protection domain being not intended to limit the present invention, all within the spirit and principles in the present invention, done
Any modification, equivalent substitution and improvement etc., should be included within the scope of the present invention.
Claims (10)
1. a data processing method, it is characterised in that be applied to key-value key-value storage system,
Described method includes:
Obtain data to be stored and the Data Identification of described data to be stored;
It is calculated, according to described Data Identification, the first subregion that described data to be stored will store, and obtains
Obtain the primary nodal point belonging to described first subregion;
Described Data Identification and the described data to be stored of described data to be stored are stored respectively in described
One node, and record address data memory;
The Data Identification of described data to be stored, first partition identification at place and address data memory is raw
Become index information, and this index information is added in the index area of described primary nodal point;
Described index area includes at least one subindex district, according to the institute in the index information to described generation
State Data Identification carry out the result of Hash calculation or determine described generation according to the size order of Data Identification
The index information subindex district that will deposit, the index information of described generation is deposited in and described determines
In described subindex district;
When meeting preset zoned migration condition, obtain subregion to be migrated;
Obtain the 3rd node belonging to described subregion to be migrated;
From the index area of described 3rd node, coupling obtains own identical with described partition identification to be migrated
Index information, the index information in the index area of described 3rd node include storing data Data Identification,
The subregion at place and address data memory;
The address data memory of the described index information that coupling obtains is ranked up, and ranking results is sent out
Give described 3rd node, read described data to be migrated in order to described 3rd node sequence and migrate to mesh
Mark node.
Data processing method the most according to claim 1, it is characterised in that by described number to be stored
According to Data Identification, first partition identification at place and address data memory generate index information, and should
Index information adds in the index area of described primary nodal point, including:
By rope existing in the index area with described primary nodal point of the Data Identification in the index information of described generation
The Data Identification of fuse breath compares, and puts in order according to set in advance, determines the index of described generation
The Data Identification in information storage position in described existing index information, believes the index of described generation
Breath adds described storage position to.
Data processing method the most according to claim 1, it is characterised in that obtaining described first
After primary nodal point belonging to subregion, also include:
Judge whether the index area of described primary nodal point exists and the Data Identification phase of described data to be stored
With index information, when described index area does not exist the identical index information of described Data Identification, then
Perform Data Identification and the described data to be stored of described data to be stored are stored respectively in described first segment
The step of point;When described index area exists the identical index information of described Data Identification, do not perform
Data Identification and the described data to be stored of described data to be stored are stored respectively in described primary nodal point
Step.
Data processing method the most according to claim 1, it is characterised in that also include:
Obtain the Data Identification of the data to be checked of input;
Data Identification according to described data to be checked is calculated second point of described data place to be checked
District, and obtain the secondary nodal point belonging to described second subregion;
From the index area of described secondary nodal point, the Data Identification of inquiry and described data to be checked matches
Index information, the index information in described index area include storing the Data Identification of data, place point
District and address data memory;
According to the address data memory in the index information of the data described to be checked matched, from described
Two nodes read described data to be checked.
Data processing method the most according to claim 4, it is characterised in that from described second section
Before point reads described data to be checked, also include:
The described address data memory of multiple described data to be checked is ranked up, according to ranking results from
In the described secondary nodal point that described data to be checked are corresponding, order reads described data to be checked.
6. a data processing equipment, it is characterised in that be applied to key-value key-value storage system,
Described device includes:
Acquiring unit, for obtaining data to be stored and the Data Identification of described data to be stored;
Computing unit, is calculated described to be stored for the Data Identification obtained according to described acquiring unit
The first subregion that data will store, and obtain the primary nodal point belonging to described first subregion;
Memory element, for depositing Data Identification and the described data to be stored of described data to be stored respectively
It is stored in described primary nodal point, and records address data memory;
Indexing units, the Data Identification of the data to be stored by described acquiring unit is obtained, described based on
The address data memory of the first partition identification and described unit records of calculating the place that unit determines generates
Index information, and this index information is added to the index of the described primary nodal point that described computing unit determines
Qu Zhong;
Described index area includes at least one subindex district, according to the institute in the index information to described generation
State Data Identification carry out the result of Hash calculation or determine described generation according to the size order of Data Identification
The index information subindex district that will deposit, the index information of described generation is deposited in and described determines
In described subindex district;
Described acquiring unit is additionally operable to, when meeting preset zoned migration condition, obtain subregion to be migrated;
Described computing unit is additionally operable to obtain the 3rd node belonging to described subregion to be migrated;
Described device also includes:
Matching unit, obtains and described subregion to be migrated for mating from the index area of described 3rd node
Identifying identical all index informations, the index information in described index area includes the data storing data
Mark, the subregion at place and address data memory;
Sequencing unit, for the data storage ground to the described index information that described matching unit coupling obtains
Location is ranked up, and ranking results is sent to described 3rd node, reads in order to described 3rd node sequence
Take described data to be migrated and migrate to destination node.
Data processing equipment the most according to claim 6, it is characterised in that described indexing units will
Data Identification in the index information of described generation is existing index information with the index area of described primary nodal point
Data Identification compare, put in order according to set in advance, determine in the index information of described generation
The storage position in described existing index information of Data Identification, the index information of described generation is added
It is added to described storage position.
Data processing equipment the most according to claim 6, it is characterised in that described device also includes:
Duplicate removal unit, in the index area judging described primary nodal point that described computing unit obtains whether
There is the index information identical with the Data Identification of described data to be stored, when described index area does not exists
During the identical index information of described Data Identification, trigger described memory element;When described index area is not deposited
When the index information that described Data Identification is identical, the most do not trigger described memory element.
Data processing equipment the most according to claim 6, it is characterised in that described acquiring unit is also
For obtaining the Data Identification of the data to be checked of input;
Described computing unit is additionally operable to the Data Identification meter of the data to be checked obtained according to described acquiring unit
Calculate the second subregion obtaining described data place to be checked, and obtain the second section belonging to described second subregion
Point;
Described device also includes:
Matching unit, for the data of inquiry from the index area of described secondary nodal point with described inquiry data
The index information that mark matches, the index information in described index area includes the data mark storing data
Knowledge, the subregion at place and address data memory;
Read unit, in the index information of the data described to be checked obtained according to described matching unit
Address data memory, from described secondary nodal point, read described data to be checked.
Data processing equipment the most according to claim 9, it is characterised in that described device also wraps
Include:
Sequencing unit, described in arriving for multiple described Data Matching to be checked described matching unit
Address data memory is ranked up;Described reading unit according to the ranking results of described sequencing unit from described
In the described secondary nodal point that data to be checked are corresponding, order reads described data to be checked.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210516613.1A CN102968498B (en) | 2012-12-05 | 2012-12-05 | Data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210516613.1A CN102968498B (en) | 2012-12-05 | 2012-12-05 | Data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102968498A CN102968498A (en) | 2013-03-13 |
CN102968498B true CN102968498B (en) | 2016-08-10 |
Family
ID=47798636
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210516613.1A Active CN102968498B (en) | 2012-12-05 | 2012-12-05 | Data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102968498B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111581206B (en) * | 2019-03-15 | 2021-06-15 | 北京忆芯科技有限公司 | B + tree operation device and method |
Families Citing this family (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104252457B (en) * | 2013-06-25 | 2018-11-23 | 北京百度网讯科技有限公司 | A kind of method and apparatus for being managed to data acquisition system |
CN103718533B (en) * | 2013-06-29 | 2015-06-10 | 华为技术有限公司 | Zoning balance subtask issuing method, apparatus and system |
CN104298687B (en) * | 2013-07-18 | 2018-04-03 | 阿里巴巴集团控股有限公司 | A kind of hash partition management method and device |
CN104348862B (en) * | 2013-07-31 | 2018-03-16 | 华为技术有限公司 | Data Migration processing method, apparatus and system |
CN104683422B (en) | 2013-12-03 | 2019-01-29 | 腾讯科技(深圳)有限公司 | Data transmission method and device |
CN104765754A (en) * | 2014-01-08 | 2015-07-08 | 北大方正集团有限公司 | Data storage method and device |
CN104809129B (en) * | 2014-01-26 | 2018-07-20 | 华为技术有限公司 | A kind of distributed data storage method, device and system |
CN103929475B (en) * | 2014-03-27 | 2017-11-24 | 华为技术有限公司 | The hard disk storage system and hard disc data operating method of a kind of Ethernet architecture |
CN105468473B (en) * | 2014-07-16 | 2019-03-01 | 北京奇虎科技有限公司 | Data migration method and data migration device |
CN104113606B (en) * | 2014-08-02 | 2018-04-10 | 成都极驰科技有限公司 | The method of work of the distributed meta data node architecture of uniformity dynamic equalization |
CN106164898B (en) * | 2014-10-11 | 2018-06-26 | 华为技术有限公司 | Data processing method and device |
CN104462396B (en) * | 2014-12-10 | 2017-12-19 | 北京国双科技有限公司 | Character string processing method and device |
CN104702691B (en) * | 2015-03-13 | 2017-12-01 | 华为技术有限公司 | Distributed load equalizing method and device |
CN106201771B (en) * | 2015-05-06 | 2019-07-05 | 阿里巴巴集团控股有限公司 | Data-storage system and data read-write method |
CN106569732B (en) * | 2015-10-12 | 2021-04-20 | 中兴通讯股份有限公司 | Data migration method and device |
CN105404679B (en) * | 2015-11-24 | 2019-02-01 | 华为技术有限公司 | Data processing method and device |
CN106959820B (en) * | 2016-01-11 | 2020-05-01 | 杭州海康威视数字技术股份有限公司 | Data extraction method and system |
CN107204998B (en) * | 2016-03-16 | 2020-04-28 | 华为技术有限公司 | Method and device for processing data |
KR20170109108A (en) * | 2016-03-17 | 2017-09-28 | 에스케이하이닉스 주식회사 | Memory system including memory device and operation method thereof |
CN106210038B (en) * | 2016-07-06 | 2019-01-29 | 网易(杭州)网络有限公司 | The processing method and system of data operation request |
CN107704475B (en) * | 2016-08-10 | 2021-12-14 | 泰康保险集团股份有限公司 | Multilayer distributed unstructured data storage method, query method and device |
CN107783980B (en) * | 2016-08-24 | 2021-10-19 | 阿里巴巴集团控股有限公司 | Index data generation and data query method and device, and storage and query system |
CN106528018B (en) * | 2016-10-31 | 2019-08-30 | 努比亚技术有限公司 | A kind of information processing method and terminal |
CA2978927C (en) * | 2016-11-25 | 2019-09-17 | Huawei Technologies Co., Ltd. | Data check method and storage system |
CN106777230B (en) * | 2016-12-26 | 2020-01-07 | 东软集团股份有限公司 | Partition system, partition method and device |
CN106682215B (en) * | 2016-12-30 | 2020-04-28 | 华为技术有限公司 | Data processing method and management node |
CN107145521B (en) * | 2017-04-10 | 2019-05-21 | 杭州趣链科技有限公司 | A kind of data migration method towards block chain multistage intelligent contract |
CN108932256A (en) * | 2017-05-25 | 2018-12-04 | 中兴通讯股份有限公司 | Distributed data redistribution control method, device and data management server |
CN107609089B (en) * | 2017-09-07 | 2019-11-19 | 北京神州绿盟信息安全科技股份有限公司 | A kind of data processing method, apparatus and system |
CN110069488A (en) * | 2017-09-30 | 2019-07-30 | 北京国双科技有限公司 | A kind of date storage method, method for reading data and its device |
CN109597567B (en) * | 2017-09-30 | 2022-03-08 | 网宿科技股份有限公司 | Data processing method and device |
CN107885803B (en) * | 2017-10-31 | 2020-05-01 | 中国地质大学(武汉) | Method and device for coupling big data writing-in and reading-out speed and storage device |
CN108255958B (en) * | 2017-12-21 | 2022-05-03 | 百度在线网络技术(北京)有限公司 | Data query method, device and storage medium |
WO2019127021A1 (en) * | 2017-12-26 | 2019-07-04 | 华为技术有限公司 | Management method and apparatus for storage device in storage system |
CN108345643A (en) * | 2018-01-12 | 2018-07-31 | 联动优势电子商务有限公司 | A kind of data processing method and device |
CN108389124B (en) * | 2018-02-26 | 2020-11-03 | 平安普惠企业管理有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN109032500B (en) * | 2018-06-11 | 2021-12-14 | 广州视源电子科技股份有限公司 | Data storage method and device of single chip microcomputer, single chip microcomputer and storage medium |
CN108959510B (en) * | 2018-06-27 | 2022-04-19 | 北京奥星贝斯科技有限公司 | Partition level connection method and device for distributed database |
CN109218385B (en) * | 2018-06-28 | 2021-08-03 | 西安华为技术有限公司 | Method and device for processing data |
CN110750529B (en) * | 2018-07-04 | 2022-09-23 | 百度在线网络技术(北京)有限公司 | Data processing method, device, equipment and storage medium |
CN109299190B (en) * | 2018-09-10 | 2020-11-17 | 华为技术有限公司 | Method and device for processing metadata of object in distributed storage system |
CN109408599B (en) * | 2018-09-20 | 2021-09-28 | 佛山科学技术学院 | Distributed storage method for big data |
CN111414356A (en) * | 2019-01-07 | 2020-07-14 | 北京京东尚科信息技术有限公司 | Data storage method and device, non-relational database system and storage medium |
CN113157706A (en) * | 2019-03-15 | 2021-07-23 | 北京忆芯科技有限公司 | B + tree operation device with node index and method thereof |
CN111046129A (en) * | 2019-05-13 | 2020-04-21 | 国家计算机网络与信息安全管理中心 | Public number information storage method and retrieval system based on text content characteristics |
CN110263061A (en) * | 2019-06-17 | 2019-09-20 | 郑州阿帕斯科技有限公司 | A kind of data query method and system |
CN112463214B (en) * | 2019-09-09 | 2023-11-03 | 北京京东振世信息技术有限公司 | Data processing method and device, computer readable storage medium and electronic equipment |
CN110727702B (en) * | 2019-09-16 | 2024-01-26 | 平安科技(深圳)有限公司 | Data query method, device, terminal and computer readable storage medium |
CN110677348B (en) * | 2019-09-17 | 2021-07-27 | 创新先进技术有限公司 | Data distribution method, access method and respective devices based on cache cluster routing |
CN112765262B (en) * | 2019-11-05 | 2023-02-28 | 金篆信科有限责任公司 | Data redistribution method, electronic equipment and storage medium |
CN111001157B (en) * | 2019-11-29 | 2021-09-28 | 腾讯科技(深圳)有限公司 | Method and device for generating reference information, storage medium and electronic device |
CN111506570A (en) * | 2020-03-05 | 2020-08-07 | 百度在线网络技术(北京)有限公司 | Data storage and query method and device, electronic equipment and storage medium |
CN111694867A (en) * | 2020-06-16 | 2020-09-22 | 北京同邦卓益科技有限公司 | Data management method and device, electronic equipment and storage medium |
CN111782632A (en) * | 2020-06-28 | 2020-10-16 | 百度在线网络技术(北京)有限公司 | Data processing method, device, equipment and storage medium |
CN111881317B (en) * | 2020-07-31 | 2021-08-20 | 北京达佳互联信息技术有限公司 | Data storage method and device based on key value system, electronic equipment and medium |
CN112015797A (en) * | 2020-08-31 | 2020-12-01 | 中国平安人寿保险股份有限公司 | Data reading method and computer equipment |
CN114490517A (en) * | 2020-10-23 | 2022-05-13 | 华为技术有限公司 | Data processing method, device, computing node and computer readable storage medium |
CN112506606A (en) * | 2020-11-23 | 2021-03-16 | 北京达佳互联信息技术有限公司 | Migration method, device, equipment and medium for containers in cluster |
CN112817976B (en) * | 2021-01-26 | 2024-04-05 | 广州欢网科技有限责任公司 | ID generation method, system, computer and readable instruction storage medium |
CN116049096B (en) * | 2022-05-05 | 2024-04-16 | 荣耀终端有限公司 | Data migration method, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101996217A (en) * | 2009-08-24 | 2011-03-30 | 华为技术有限公司 | Method for storing data and memory device thereof |
CN102521312A (en) * | 2011-12-01 | 2012-06-27 | 深圳市航天泰瑞捷电子有限公司 | Storage method of file index, and file system |
CN102567434A (en) * | 2010-12-31 | 2012-07-11 | 百度在线网络技术(北京)有限公司 | Data block processing method |
CN102662992A (en) * | 2012-03-14 | 2012-09-12 | 北京搜狐新媒体信息技术有限公司 | Method and device for storing and accessing massive small files |
CN102739622A (en) * | 2011-04-15 | 2012-10-17 | 北京兴宇中科科技开发股份有限公司 | Expandable data storage system |
-
2012
- 2012-12-05 CN CN201210516613.1A patent/CN102968498B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101996217A (en) * | 2009-08-24 | 2011-03-30 | 华为技术有限公司 | Method for storing data and memory device thereof |
CN102567434A (en) * | 2010-12-31 | 2012-07-11 | 百度在线网络技术(北京)有限公司 | Data block processing method |
CN102739622A (en) * | 2011-04-15 | 2012-10-17 | 北京兴宇中科科技开发股份有限公司 | Expandable data storage system |
CN102521312A (en) * | 2011-12-01 | 2012-06-27 | 深圳市航天泰瑞捷电子有限公司 | Storage method of file index, and file system |
CN102662992A (en) * | 2012-03-14 | 2012-09-12 | 北京搜狐新媒体信息技术有限公司 | Method and device for storing and accessing massive small files |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111581206B (en) * | 2019-03-15 | 2021-06-15 | 北京忆芯科技有限公司 | B + tree operation device and method |
Also Published As
Publication number | Publication date |
---|---|
CN102968498A (en) | 2013-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102968498B (en) | Data processing method and device | |
CN106233259B (en) | The method and system of more generation storing datas is retrieved in decentralized storage networks | |
US10261693B1 (en) | Storage system with decoupling and reordering of logical and physical capacity removal | |
CN105630955B (en) | A kind of data acquisition system member management method of high-efficiency dynamic | |
CN101963982B (en) | Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash | |
CN103327052B (en) | Date storage method and system and data access method and system | |
US20150331744A1 (en) | Data device grouping across multiple-data-storage-devices enclosures for data reconstruction | |
US20150331775A1 (en) | Estimating data storage device lifespan | |
US11762881B2 (en) | Partition merging method and database server | |
CN103067525A (en) | Cloud storage data backup method based on characteristic codes | |
CN103246549B (en) | A kind of method and system of data conversion storage | |
CN104111936B (en) | Data query method and system | |
US9424156B2 (en) | Identifying a potential failure event for a data storage device | |
CN105677904B (en) | Small documents storage method and device based on distributed file system | |
CN107977396A (en) | A kind of update method of the tables of data of KeyValue databases and table data update apparatus | |
US9436524B2 (en) | Managing archival storage | |
CN104054071A (en) | Method for accessing storage device and storage device | |
US20150331621A1 (en) | Uncoordinated data retrieval across multiple-data-storage-devices enclosures | |
CN103970875A (en) | Parallel repeated data deleting method | |
CN106970929A (en) | Data lead-in method and device | |
CN104823184A (en) | Data processing method, system and client | |
CN105677252B (en) | Read method, data processing method and the associated storage device of data | |
CN108268216A (en) | Data processing method, device and server | |
CN105868218A (en) | Data processing method and electronic device | |
CN109597903A (en) | Image file processing apparatus and method, document storage system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |