CN102495838B - Data processing method and data processing device - Google Patents

Data processing method and data processing device Download PDF

Info

Publication number
CN102495838B
CN102495838B CN201110343278.5A CN201110343278A CN102495838B CN 102495838 B CN102495838 B CN 102495838B CN 201110343278 A CN201110343278 A CN 201110343278A CN 102495838 B CN102495838 B CN 102495838B
Authority
CN
China
Prior art keywords
data
tree
pending
group
root node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110343278.5A
Other languages
Chinese (zh)
Other versions
CN102495838A (en
Inventor
陈娟
杨定国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Huawei Technology Co Ltd
Original Assignee
Huawei Symantec Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Symantec Technologies Co Ltd filed Critical Huawei Symantec Technologies Co Ltd
Priority to CN201110343278.5A priority Critical patent/CN102495838B/en
Publication of CN102495838A publication Critical patent/CN102495838A/en
Application granted granted Critical
Publication of CN102495838B publication Critical patent/CN102495838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data processing method and a data processing device. The method comprises the following steps of: writing data information of multiple to-be-processed datum stored in nodes of a first data tree into an internal memory; dividing the multiple to-be-processed datum into at least two groups, adding a state value of each group in management data so as to indicate each group of to-be-processed datum; processing each group of to-be-processed datum in the internal memory, after processing each group of to-be-processed datum, updating the state value of each group in the management data so as to indicate that each group of to-be-processed datum is processed; deleting the nodes, in which the data information of the multiple to-be-processed datum are stored, in the first data tree according to the updated state value of each group. In the embodiment of the invention, only once disk IO (Input Output) is needed for modifying the state values after processing each group of to-be-processed datum, and only once disk IO is needed for deleting all the nodes after processing all the to-be-processed datum, so that a large amount of disk IO is saved.

Description

Data processing method and device
Technical field
The embodiment of the present invention relates to technical field of data processing, especially a kind of data processing method and device.
Background technology
In practical application, the data of storing in file system have very large randomness, and between data, lack necessary relevance, therefore, the storage of data, deletion, amendment normally operate taking single datum as unit, each operation at least needs the read-write operation of a disk, also referred to as disk input and output (Input Output is called for short IO).
Fig. 1 is the one application schematic diagram of data in magnetic disk operation in prior art.As shown in Figure 1, the moment 1, in data tree, comprise the data message of pending data a, b, c, d, e, f, g, h, i, j, k, represent with a, b, c, d, e, f, g, h, i, j, k respectively; Moment 2, after deal with data a, the data message of data a is deleted from data tree, need disk I/O at least one times; Moment, 3(was not shown), after deal with data f, the data message of data f is deleted from data tree, need disk I/O at least one times; In the moment 4, have new data z to need to process, by the data message data inserting tree of new data z.Can find out, every processing one item number is according to all needing disk I/O at least one times to remove to delete the data message of this item number certificate in data tree.
Realizing in process of the present invention, inventor finds that in prior art, at least there are the following problems: because the operating performance of disk is relatively low, therefore reduced the overall performance of data in magnetic disk tree.
Summary of the invention
The embodiment of the present invention provides a kind of data processing method and device, relatively low with the operating performance due to disk in order to solve in prior art, has reduced the problem of the overall performance of data in magnetic disk tree.
On the one hand, the embodiment of the present invention provides a kind of data processing method, comprising:
The root node address of described the first data tree is write to described management data;
Read described management data, if there is the root node address of described the first data tree in described management data, arrive described the first data tree according to the root node address search of described the first data tree;
By in the data message write memory of multiple pending data of preserving in the node of the first data tree, the data message of described pending data is for pending data described in unique identification;
If lack the state value of at least one group of pending data in described management data, in described management data, add the state value of described at least one group of pending data, to indicate described at least one group of pending data pending;
According to the state value of the pending data of each group in described management data, described each group of pending data are processed, after every group of pending data processing is complete, the state value of every group of pending data in management data described in real-time update, to indicate every group of pending data processed;
According to the state value of each group of pending data after upgrading in described management data, delete by a disk I/O operation the described node of preserving the data message of described multiple pending data in described the first data tree;
Delete the root node address of described the first data tree in described management data and the state value of described each group of pending data.
On the other hand, the embodiment of the present invention provides a kind of data processing equipment, comprising:
Read module, before in the data message write memory of described multiple pending data that the node of described the first data tree is preserved, the root node address of described the first data tree is write to described management data, read described management data, if there is the root node address of described the first data tree in described management data, arrive described the first data tree according to the root node address search of described the first data tree, by in the data message write memory of multiple pending data of preserving in the node of the first data tree, the data message of described pending data is for pending data described in unique identification,
Grouping module if lack the state value of at least one group for described management data, is added the state value of described at least one group in described management data, to indicate the pending data of described at least one group pending;
Processing module, for described each group of pending data being processed according to the state value of the pending data of the each group of described management data, after every group of pending data processing is complete, the state value of every group of pending data in management data described in real-time update, to indicate every group of pending data processed;
Removing module, for the state value of each group of pending data after upgrading according to described management data, delete by a disk I/O operation the described node of preserving the data message of described multiple pending data in described the first data tree, delete the root node address of described the first data tree in described management data and the state value of described each group of pending data.
At least one technical scheme tool in multiple technical schemes has the following advantages or beneficial effect above:
The embodiment of the present invention adopts in internal memory multiple pending packets, each group of pending data are processed to the state value of this group in rear renewal disk to indicate the pending data of this group processed, after determining that according to the state value of each group after upgrading the pending data of each group are processed again, delete the technological means of preserving the node of the data message of described multiple pending data in the first data tree in disk, overcome the problem of carrying out disk I/O in prior art taking single datum as unit and reduced widely the overall performance of data in magnetic disk tree, after often handling one group of pending data processing, only need by a disk I/O amendment state value, and after all pending data processings are complete, only need to preserve by a disk I/O deletion all nodes of the data message of described multiple pending data, avoid a large amount of disk I/O, improve the operating performance of data in magnetic disk tree.
Brief description of the drawings
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the one application schematic diagram of data in magnetic disk operation in prior art;
The schematic flow sheet of a kind of data processing method that Fig. 2 provides for the embodiment of the present invention;
The another schematic flow sheet of a kind of data processing method that Fig. 3 provides for the embodiment of the present invention;
Fig. 4 is the schematic diagram of 3 rank B+ trees;
Fig. 5 is the schematic diagram of data structure new in the embodiment of the present invention;
The structural representation of the data processing equipment that Fig. 6 provides for the embodiment of the present invention.
Embodiment
For making object, technical scheme and the advantage of the embodiment of the present invention clearer, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
The schematic flow sheet of a kind of data processing method that Fig. 2 provides for the embodiment of the present invention.As shown in Figure 2, the method comprises:
Step 101, by the data message write memory of multiple pending data of preserving in the node of the first data tree, the data message of described pending data is for pending data described in unique identification.
For instance, data processing equipment is by the data message write memory of multiple pending data of preserving in the node of the first data tree.First data tree is here set up in advance, can be balanced tree (Balanced Tree is called for short B tree).B tree is a kind of multiway tree of balance, and as a kind of data structure of the most applicable disk array, its stable performance is fast widely used in file system.
The data message here can comprise the object number of pending data, and each pending data all have a unique object number, the object number of these pending data can be kept in the node of the first data tree, as the key word (key) of this node.
Step 102, described multiple pending data are divided into at least two groups, in the management data generating in advance, add the state value of each group, to indicate the pending data of each group pending.
Particularly, can described multiple pending data be divided into at least two groups according to default rule of classification.Default rule of classification can be in a certain order the data of some to be divided into one group, such as being one group according to 3 of object number orders from small to large, remaining inadequate 3 be one group.For instance, if pending data are a, b, c, d, e, f, g, h, i, j, k, one has 11, its object number successively from small to large, can be divided into 4 groups, and first group comprises: a, b, c, second group comprises: d, e, f, the 3rd group comprises: g, h, i, the 4th group comprises: j, k.Wherein, at least one group comprises at least two pending data.
Particularly, each group has state value separately, and for instance, state value is that 1 expression is pending, and state value is that 2 expressions are processed.Alternatively, each group also has group mark separately, and the group mark that is 1, the second group such as the group mark of first group is 2, by that analogy.In step 102, each group mark can be added in management data with together with the state value of each group corresponding with each group of mark.
Step 103, in internal memory, each group of pending data are processed, after every group of pending data processing is complete, the state value of every group of pending data in management data described in real-time update, to indicate every group of pending data processed.
Particularly, the processing for the treatment of deal with data can have various ways, such as deletion, amendment etc.Handling after the pending data of each group, upgrading the state value of this group in management data by disk I/O.
Step 104, delete according to the state value of each group of pending data after upgrading in described management data the described node of preserving the data message of described multiple pending data in described the first data tree.
Particularly, in the time that the state value of each group in management data all indicates the pending data of this group processed, delete by a disk I/O the described node of preserving the data message of described multiple pending data in described the first data tree.Here, a node can be preserved the data message of at least one pending data, and the described node of preserving the data message of described multiple pending data can be multiple.
The embodiment of the present invention adopts in internal memory multiple pending packets, the pending data of each group are processed to the state value of this group in rear renewal disk to indicate the pending data of this group processed, after determining that according to the state value of each group after upgrading the pending data of each group are processed again, delete the technological means of preserving the node of the data message of described multiple pending data in the first data tree in disk, overcome the problem of carrying out disk I/O in prior art taking single datum as unit and reduced widely the overall performance of data in magnetic disk tree, after often handling one group of pending data processing, only need by a disk I/O amendment state value, and after all pending data processings are complete, only need to preserve by a disk I/O deletion all nodes of the data message of described multiple pending data, avoid a large amount of disk I/O, improve the operating performance of data in magnetic disk tree.
The another schematic flow sheet of a kind of data processing method that Fig. 3 provides for the embodiment of the present invention.Flow process shown in Fig. 2 is expanded, and as shown in Figure 3, the method comprises:
Step 201, in disk, set up the first data tree, generate management data.
For instance, data processing equipment is set up the first data tree in disk, generates management data.Management data can be kept in disk with the form of arbitrary data structure, such as B tree, chained list etc.
Step 202, determine multiple pending data, the data message of described multiple pending data is saved in respectively in the node of described the first data tree.
Conventionally, can be first while setting up the first data tree definite root node only, when needing the data message of deal with data need to preserve time, then expand its child node with save data information based on this root node.Often determine pending data, just the data message of these pending data is saved in a node of the first data tree.Concrete, can determine the pending data corresponding with data processing request according to the data processing request receiving.Further, if the first data tree is B+ tree, be kept in the leaf node of the first data tree, in the time that the object number of pending data is saved in the first data tree singly, the first data tree can divide more node according to the division principle of B+ tree.
Fig. 4 is the schematic diagram of 3 rank B+ trees.As shown in Figure 3, numerical value in leaf node is the data that leaf node is preserved, be the object number of pending data in the present embodiment, the pointer that the Q in leaf node is this leaf node, the pointer of each leaf node couples together and has formed the data directory of a chain sheet form; Numerical value in other nodes is respectively the index of the data of preserving in each child node of this node, and P1, P2 in other nodes, P3 are respectively the pointer of each child node of pointing to this node.The division principle of the tree of B+ shown in Fig. 4 is:
1) in the time that a node is full, distributes a new node, and by 1/2 data Replica in origin node to new node, the last pointer that increases new node in father node.
2) in order to make the start offset of tree on disk constant, for root node, general newly-generated two nodes, are divided into two the content of former root node, and primitive root node content empties, and will replace its former substantial two node as child node.
Step 203, the root node address of described the first data tree is write to described management data.
It should be noted that, step 203 has started the treatment scheme for the treatment of deal with data, can be carried out by time or Event triggered, such as, noon every day, 12:00 performed step 205, or, according to keeper's instruction execution step 203.Be equivalent to the data of dynamic change, obtain a static point by time or Event triggered, similar snapshot, after process again.Here the root node address of described the first data tree is write to described management data, make in the situation that there is hardware fault internal storage data loss, can find the first data tree according to the root node in management data, reduction memory-mapped.
Step 204, by the data message write memory of described multiple pending data of preserving in the node of described the first data tree.
Step 205, according to default rule of classification, described multiple pending data are divided into described at least two groups, in the management data of described the first data tree, add the state value of each group of pending data, to indicate each group of pending data pending.
Step 206, in internal memory, each group of pending data are processed, after every group of pending data processing is complete, the state value of every group of pending data in management data described in real-time update, to indicate every group of pending data processed;
Step 207, delete according to the state value of each group of pending data after upgrading in described management data the described node of preserving the data message of described multiple pending data in described the first data tree.
Particularly, step 204-207 can carry out with reference to step 101-104.
Step 208, delete the root node address of described the first data tree in described management data and the state value of described each group of pending data.
Alternatively, in step 208, also an invalid value can be revised as in the root node address of described the first data tree in described management data.
If during execution step 204-step 208, occur in the situation of hardware fault internal storage data loss, after restarting, still can continue flow chart of data processing according to the state value of the root node address of described the first data tree in management data and each group, without re-treatment data.Particularly, after restarting, can carry out following steps:
From disk, read described management data, if there is the root node address of described the first data tree in described management data, to described the first data tree, perform step 204 according to the root node address search of described the first data tree;
Accordingly, step 205 specifically comprises:
If lack the state value of at least one group in described management data, in described management data, add the state value of described at least one group, to indicate the pending data of described at least one group pending;
In step 206, in internal memory, the pending data to each group are processed specifically and are comprised:
According to the state value of the pending data of each group in described management data, described each group of pending data are processed.
Concrete, if it is pending that in described at least two groups, the state value of a certain group is indicated the pending data of this group, the pending data of this group are processed, if described state value is indicated this group, pending data are processed, indicate pending other to organize pending data to state value and process.
For instance, when normal operating conditions, suppose, in step 205, multiple pending data have been divided into 4 groups, data to first and second group in step 206 are processed, and upgraded in management data after this state value of two groups, there is hardware fault and cause power-down rebooting, internal storage data is lost.After powering on and restarting, first read management data, the root node address that comprises the first data tree in management data now, and the state value of first and second group, and the state value of first and second group all indicates the pending data of corresponding group processed, then perform step 204, be still by the data message write memory of multiple pending data the data message that comprises processed pending data.Then, in step 205, only the state value of third and fourth group is added in management data, to indicate the pending data of third and fourth group pending, then successively the pending data of third and fourth group are processed, the state value of handling third and fourth group in the described management data of rear renewal, finally performs step 207.It should be noted that, no matter while being normal operating conditions or deposit data loss, the rule of classification for the treatment of deal with data in internal memory is all the same.
The present embodiment has plurality of application scenes.For instance, in distributed file system, data are write double, if data are write list, a piece of data is write as merit, and another piece of data is write failure, can will write failure or write successful data recording, such as record is write failure or write the object number of successful data.Backstage again by the data image of writing failure for writing successful data, after mirror image success, the object number of writing failure or writing successful data of record is deleted.Under this scene, in step 202, determine that multiple pending data are specially:
Multiple data of writing failure in file system are defined as to described multiple pending data;
In step 206, in internal memory, the pending data to each group are processed specifically and are comprised:
By the data of writing failure of described each group respectively mirror image be corresponding multiple successful data of writing.
Further, the data message of the pending data of determining before execution step 203 all can be saved on described the first data tree, and after setting up procedure 203, if there are new pending data, inconvenience is saved in its data message on described the first data tree again.
Under this scene, the data message of new pending data can be saved on another data tree.In another alternative embodiment of the present invention, before step 203, can also comprise:
In disk, set up the second data tree;
After step 203, can also comprise:
The root node address of described the first data tree of preserving in index is revised as to the root node address of described the second data tree; If there are new pending data, the data message of described new pending data is saved in the node of described the second data tree;
Step 207 specifically comprises:
The state value of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is deleted described the first data tree.
Particularly, in the time of normal operating conditions, after definite pending data, be all to find data tree according to the address of preserving in index at every turn, and the data message of pending data be saved in the node of data tree; And, in the time triggering executing data treatment scheme, be also to find data tree to perform step 203 according to the address of preserving in index.Therefore, after step 203, be the root node address of the second data tree due to what preserve in index, while again starting the treatment scheme for the treatment of deal with data, the root node address of described the second data tree can be write to described management data, thereby new pending data are processed.
More preferably, can be in the time setting up the first data tree, just set up described the second data tree.In addition, also can just no longer on the first data tree, insert the data message of new pending data by certain time point before execution step 203, but the data message of new pending data is saved in to the second data tree, specifically can be set by keeper.
In another alternative embodiment of the present invention, can in the time having new pending data, set up again the second data tree.Accordingly, after step 203, also comprise:
Delete the root node address of described the first data tree of preserving in index;
If there are new pending data, set up the second data tree, the root node address of described the second data tree is saved in described index, the data message of described new pending data is saved in the node of described the second data tree;
Step 207 specifically comprises:
The state value of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is deleted described the first data tree.
For fear of safeguarding many data trees, in another alternative embodiment of the present invention, can also be on the basis of the first data tree, construct a kind of new data structure, in step 203-208, need the pending data of processing and new pending data to distinguish.Fig. 5 is the schematic diagram of data structure new in the embodiment of the present invention, and this new data structure is original splitting characteristic except B+ sets, and also has a kind of special root division:
1) once, node state changes once in division, in figure with filling striped representation node state;
2) divide node of stylish generation, as new root node, new root node is inserted in former root node as a child node.The key of new root node is a fixed value not using, such as maximum or the minimum value of key word, as 0.
3) state of all nodes of elite tree and new root node is different.
4) newly insert node the same with the state of new root node;
5), while searching new pending data, only look for the node with new root node state consistency.
Particularly, before step 203, can also comprise:
The state of all nodes on described the first data tree is all set to the first state, described the first data tree is carried out to root division, the state of the new child node of the new root node splitting off and described new root node is made as to the second state;
After step 203, can also comprise:
If there are new pending data, the data message of described new pending data is saved in the node of the new tree that described new root node and new child node form, and the state that all nodes on described new tree are set is the second state;
Step 207 specifically can comprise:
The state value deletion of node state of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is the node of the first state.
For instance, if the tree that in Fig. 5, slanted bar line node forms is the first data tree, before step 203, it is carried out to root division, obtained nicking node (new root node) as the father node of the root node of the first data and the nicking node (new child node) as the brotgher of node of the root node of the first data.In addition, while division due to root, the address of root node is constant, and the node that this root node address is pointed to has changed, become the new root node of sensing from pointing to the root node of the first data tree, therefore can revise the raw address of the first data tree root node of preserving in index, what accordingly, after root division, execution step write management data at 203 o'clock is the new address of the first data tree root node.After pending data corresponding to the first data tree are all disposed, the tree entirety that slanted bar line node can be formed is deleted, the tree (the new tree that new root node and new child node form) that remaining nicking node forms.In addition, also can just no longer on the first data tree, insert the data message of new pending data by certain time point before execution step 203, and the data message of new pending data is saved on new tree, specifically can be set by keeper.Further, by before in described the first data tree, node state is the knot removal of the first state, can also be to carrying out again root division on new tree, expansion upper level is newly set, under this scene, the state of the node on can this upper level new tree is set to the first state or the third state, in the time being set to the first state, in step 207 according to the root node address of described the first data tree of preserving in described management data, deletion of node state is the node of the first state, newly set owing to not pointing to upper level according to the root node address of the first data tree, therefore can not delete the node that node state in the new tree of upper level is similarly the first state.
The embodiment of the present invention adopts in internal memory multiple pending packets, the pending data of each group are processed to the state value of this group in rear renewal disk to indicate the pending data of this group processed, after determining that according to the state value of each group after upgrading the pending data of each group are processed again, delete the technological means of preserving the node of the data message of described multiple pending data in the first data tree in disk, overcome the problem of carrying out disk I/O in prior art taking single datum as unit and reduced widely the overall performance of data in magnetic disk tree, after often handling one group of pending data processing, only need by a disk I/O amendment state value, and after all pending data processings are complete, only need to preserve by a disk I/O deletion all nodes of the data message of described multiple pending data, avoid a large amount of disk I/O, improve the operating performance of data in magnetic disk tree.Further, the present embodiment, by the root node address of the tree of save data in management data, in the situation that there is hardware fault internal storage data loss, can find data tree according to management data, reduction memory-mapped.Further, the present embodiment is processed pending data according to default rule of classification grouping, in the situation that there is hardware fault internal storage data loss, can still divide into groups according to default rule of classification, group member is constant, then according to the state value of each group that preserves in management data, from breakpoint succession deal with data, without re-treatment data, improve the efficiency of data processing.
The structural representation of the data processing equipment that Fig. 6 provides for the embodiment of the present invention.As shown in Figure 6, this device comprises:
Read module 41, in the data message write memory for multiple pending data that the node of the first data tree is preserved, the data message of described pending data is for pending data described in unique identification;
Grouping module 42 at least two groups described in described multiple pending data are divided into, is added the state value of each group of pending data, to indicate each group of pending data pending in the management data generating in advance;
Processing module 43, at internal memory, each group of pending data being processed, after every group of pending data processing is complete, the state value of every group of pending data in management data described in real-time update, to indicate every group of pending data processed;
Removing module 44, deletes for the state value of each group of pending data after upgrading according to described management data the described node of preserving the data message of described multiple pending data in described the first data tree.
In an optional embodiment of the present invention, also comprise:
Set up module, for set up described the first data tree at disk, generate management data;
Determination module, for determining multiple pending data, is saved in the data message of described multiple pending data respectively in the node of described the first data tree.
In another alternative embodiment of the present invention, when normal operating conditions, read module 41 also for: before in the data message write memory of described multiple pending data that the node of described the first data tree is preserved, the root node address of described the first data tree is write to described management data;
Removing module 44 also for:
Preserve the described node of data message of described multiple pending data in deleting described the first data tree according to the state value of each group of pending data after upgrading in management data after, delete the root node address of described the first data tree in described management data and the state value of described each group of pending data.
In another alternative embodiment of the present invention, when power-down rebooting, read module 41 also for:
By before in the data message write memory of described multiple pending data of preserving in the node of described the first data tree, read described management data, if there is the root node address of described the first data tree in described management data, arrive described the first data tree according to the root node address search of described the first data tree;
Grouping module 42 specifically for:
If lack the state value of at least one group in described management data, in described management data, add the state value of described at least one group, to indicate the pending data of described at least one group pending;
Processing module 43 specifically for:
According to the state value of the pending data of each group in described management data, described each group of pending data are processed.
In another alternative embodiment of the present invention, described set up module also for:
After the root node address of described the first data tree is write described management data by read module 41, delete the root node address of described the first data tree of preserving in index;
Described determination module also for:
After the root node address of described the first data tree is write described management data by read module 41, if there are new pending data, set up the second data tree, the root node address of described the second data tree is saved in described index, the data message of described new pending data is saved in the node of described the second data tree;
Removing module 44 specifically for:
The state value of each group of pending data according to the root node address of described the first data tree of preserving in management data and after upgrading is deleted described the first data tree.
In another alternative embodiment of the present invention, described set up module also for:
In disk, set up the second data tree;
Described determination module also for:
After the root node address of the first data tree described in read module 41 writes described management data, the root node address of described the first data tree of preserving in index is revised as to the root node address of described the second data tree;
If there are new pending data, the data message of described new pending data is saved in the node of described the second data tree;
Removing module 44 specifically for:
The state value of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is deleted described the first data tree.
In another alternative embodiment of the present invention, described set up module also for:
Before the root node address of described the first data tree is write described management data by read module 41, the state of all nodes on described the first data tree is all set to the first state, described the first data tree is carried out to root division, the state of the new child node of the new root node splitting off and described new root node is made as to the second state;
Described determination module also for:
After the root node address of described the first data tree is write described management data by read module 41, if there are new pending data, the data message of described new pending data is saved in the node of the new tree that described new root node and new child node form, and the state that all nodes on described new tree are set is the second state;
Removing module 44 specifically for:
The state value deletion of node state of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is the node of the first state.
In another alternative embodiment of the present invention, described determination module specifically for:
Multiple data of writing failure in file system are defined as to described multiple pending data;
Processing module 43 specifically for:
By the data of writing failure of described each group respectively mirror image be the corresponding successful data of writing.
A kind of data processing method that the specific implementation of above-mentioned data processing equipment provides with reference to the embodiment of the present invention shown in Fig. 1 or Fig. 2.The embodiment of the present invention adopts in internal memory multiple pending packets, the pending data of each group are processed to the state value of this group in rear renewal disk to indicate the pending data of this group processed, after determining that according to the state value of each group of pending data after upgrading the pending data of each group are processed again, delete the technological means of preserving the node of the data message of described multiple pending data in the first data tree in disk, overcome the problem of carrying out disk I/O in prior art taking single datum as unit and reduced widely the overall performance of data in magnetic disk tree, after often handling one group of pending data processing, only need by a disk I/O amendment state value, and after all pending data processings are complete, only need to preserve by a disk I/O deletion all nodes of the data message of described multiple pending data, avoid a large amount of disk I/O, improve the operating performance of data in magnetic disk tree.Further, the present embodiment, by the root node address of the tree of save data in management data, in the situation that there is hardware fault internal storage data loss, can find data tree according to management data, reduction memory-mapped.Further, the present embodiment is processed pending data according to default rule of classification grouping, in the situation that there is hardware fault internal storage data loss, can still divide into groups according to default rule of classification, group member is constant, again according to the state value of the each group of pending data of preserving in management data, from breakpoint succession deal with data.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can complete by the relevant hardware of programmed instruction, aforesaid program can be stored in a computer read/write memory medium, this program, in the time carrying out, is carried out the step that comprises said method embodiment; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CDs.
Finally it should be noted that: above embodiment only, in order to technical scheme of the present invention to be described, is not intended to limit; Although the present invention is had been described in detail with reference to previous embodiment, those of ordinary skill in the art is to be understood that: its technical scheme that still can record aforementioned each embodiment is modified, or part technical characterictic is wherein equal to replacement; And these amendments or replacement do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (8)

1. a data processing method, is characterized in that, comprising:
The root node address of the first data tree is write to management data;
Read described management data, if there is the root node address of described the first data tree in described management data, arrive described the first data tree according to the root node address search of described the first data tree;
By in the data message write memory of multiple pending data of preserving in the node of the first data tree, the data message of described pending data is for pending data described in unique identification;
Described multiple pending data are divided into at least two groups, in described management data, add the state value of each group of pending data, to indicate described each group of pending data pending;
If lack the state value of at least one group of pending data in described management data, in described management data, add the state value of described at least one group of pending data, to indicate described at least one group of pending data pending;
According to the state value of the pending data of each group in described management data, described each group of pending data are processed, after every group of pending data processing is complete, upgrade the state value of every group of pending data in described management data, to indicate every group of pending data processed;
According to the state value of each group of pending data after upgrading in described management data, delete by a disk I/O operation node of preserving the data message of described multiple pending data in described the first data tree;
Delete the root node address of described the first data tree in described management data and the state value of described each group of pending data.
2. according to the method described in claim 1, it is characterized in that, the described root node address by described the first data tree also comprises after writing described management data:
Delete the root node address of described the first data tree of preserving in index;
If there are new pending data, set up the second data tree, the root node address of described the second data tree is saved in described index, the data message of described new pending data is saved in the node of described the second data tree;
Describedly delete according to the state value of each group of pending data after upgrading in described management data the node of preserving the data message of described multiple pending data in described the first data tree and specifically comprise:
The state value of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is deleted described the first data tree.
3. method according to claim 1, is characterized in that, the described root node address by described the first data tree also comprises before writing described management data:
In disk, set up the second data tree;
The described root node address by described the first data tree also comprises after writing described management data:
The root node address of described the first data tree of preserving in index is revised as to the root node address of described the second data tree;
If there are new pending data, the data message of described new pending data is saved in the node of described the second data tree;
Describedly delete according to the state value of each group of pending data after upgrading in described management data the node of preserving the data message of described multiple pending data in described the first data tree and specifically comprise:
The state value of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is deleted described the first data tree.
4. method according to claim 1, is characterized in that, the described root node address by described the first data tree also comprises before writing described management data:
The state of all nodes on described the first data tree is all set to the first state, described the first data tree is carried out to root division, the state of the new child node of the new root node splitting off and described new root node is made as to the second state;
The management data that the described root node address by described the first data tree writes described the first data tree also comprises afterwards:
If there are new pending data, the data message of described new pending data is saved in the node of the new tree that described new root node and new child node form, and the state that all nodes on described new tree are set is the second state;
Describedly delete according to the state value of each group of pending data after upgrading in described management data the described node of preserving the data message of described multiple pending data in described the first data tree and specifically comprise:
The state value deletion of node state of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is the node of the first state.
5. a data processing equipment, is characterized in that, comprising:
Read module, before in the data message write memory of multiple pending data that the node of the first data tree is preserved, the root node address of the first data tree is write to management data, read described management data, if there is the root node address of described the first data tree in described management data, arrive described the first data tree according to the root node address search of described the first data tree, by in the data message write memory of multiple pending data of preserving in the node of the first data tree, the data message of described pending data is for pending data described in unique identification,
Grouping module for described multiple pending data are divided into at least two groups, is added the state value of each group of pending data in described management data, to indicate described each group of pending data pending; If lack the state value of at least one group in described management data, in described management data, add the state value of described at least one group, to indicate the pending data of described at least one group pending;
Processing module, for described each group of pending data being processed according to the state value of the pending data of the each group of described management data, after every group of pending data processing is complete, the state value of every group of pending data in management data described in real-time update, to indicate every group of pending data processed;
Removing module, for the state value of each group of pending data after upgrading according to described management data, delete by a disk I/O operation the described node of preserving the data message of described multiple pending data in described the first data tree, delete the root node address of described the first data tree in described management data and the state value of described each group of pending data.
6. device according to claim 5, is characterized in that, also comprises:
Set up module, for set up described the first data tree at disk, generate described management data;
Determination module, for determining multiple pending data, is saved in the data message of described multiple pending data respectively in the node of described the first data tree;
Described set up module also for:
After the root node address of described the first data tree is write described management data by described read module, delete the root node address of described the first data tree of preserving in index;
Described determination module also for:
After the root node address of described the first data tree is write described management data by described read module, if there are new pending data, set up the second data tree, the root node address of described the second data tree is saved in described index, the data message of described new pending data is saved in the node of described the second data tree;
Described removing module specifically for:
The state value of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is deleted described the first data tree.
7. device according to claim 5, is characterized in that, also comprises:
Set up module, for set up described the first data tree at disk, generate described management data;
Determination module, for determining multiple pending data, is saved in the data message of described multiple pending data respectively in the node of described the first data tree;
Described set up module also for:
In disk, set up the second data tree;
Described determination module also for:
After the root node address of described the first data tree is write described management data by described read module, the root node address of the root node address of described the first data tree of preserving in index being revised as to described the second data tree;
If there are new pending data, the data message of described new pending data is saved in the node of described the second data tree;
Described removing module specifically for:
The state value of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is deleted described the first data tree.
8. device according to claim 5, is characterized in that, also comprises:
Set up module, for set up described the first data tree at disk, generate described management data;
Determination module, for determining multiple pending data, is saved in the data message of described multiple pending data respectively in the node of described the first data tree;
Described set up module also for:
Before the root node address of described the first data tree is write described management data by described read module, the state of all nodes on described the first data tree is all set to the first state, described the first data tree is carried out to root division, the state of the new child node of the new root node splitting off and described new root node is made as to the second state;
Described determination module also for:
After the root node address of described the first data tree is write described management data by described read module, if there are new pending data, the data message of described new pending data is saved in the node of the new tree that described new root node and new child node form, and the state that all nodes on described new tree are set is the second state;
Described removing module specifically for:
The state value deletion of node state of each group of pending data according to the root node address of described the first data tree of preserving in described management data and after upgrading is the node of the first state.
CN201110343278.5A 2011-11-03 2011-11-03 Data processing method and data processing device Active CN102495838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110343278.5A CN102495838B (en) 2011-11-03 2011-11-03 Data processing method and data processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110343278.5A CN102495838B (en) 2011-11-03 2011-11-03 Data processing method and data processing device

Publications (2)

Publication Number Publication Date
CN102495838A CN102495838A (en) 2012-06-13
CN102495838B true CN102495838B (en) 2014-09-17

Family

ID=46187663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110343278.5A Active CN102495838B (en) 2011-11-03 2011-11-03 Data processing method and data processing device

Country Status (1)

Country Link
CN (1) CN102495838B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729304B (en) * 2012-10-11 2017-03-15 腾讯科技(深圳)有限公司 Data processing method and device
CN103853766B (en) * 2012-12-03 2017-04-05 中国科学院计算技术研究所 A kind of on-line processing method and system towards stream data
WO2014146240A1 (en) * 2013-03-19 2014-09-25 华为技术有限公司 Data update method and server for distributed storage system
CN103258016B (en) * 2013-04-24 2016-05-18 山东中创软件工程股份有限公司 Data transmission method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1859355A (en) * 2005-12-28 2006-11-08 华为技术有限公司 Method for processing batch service
CN101223507A (en) * 2005-05-20 2008-07-16 集团建模控股有限公司 Data processing network
CN102012840A (en) * 2010-12-23 2011-04-13 中国农业银行股份有限公司 Batch data scheduling method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8055609B2 (en) * 2008-01-22 2011-11-08 International Business Machines Corporation Efficient update methods for large volume data updates in data warehouses

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101223507A (en) * 2005-05-20 2008-07-16 集团建模控股有限公司 Data processing network
CN1859355A (en) * 2005-12-28 2006-11-08 华为技术有限公司 Method for processing batch service
CN102012840A (en) * 2010-12-23 2011-04-13 中国农业银行股份有限公司 Batch data scheduling method and system

Also Published As

Publication number Publication date
CN102495838A (en) 2012-06-13

Similar Documents

Publication Publication Date Title
US11314701B2 (en) Resharding method and system for a distributed storage system
CN107391628B (en) Data synchronization method and device
TWI515561B (en) Data tree storage methods, systems and computer program products using page structure of flash memory
US8108446B1 (en) Methods and systems for managing deduplicated data using unilateral referencing
US8626717B2 (en) Database backup and restore with integrated index reorganization
CN111444196B (en) Method, device and equipment for generating Hash of global state in block chain type account book
CN102707990A (en) Container based processing method, device and system
CN107665219B (en) Log management method and device
CN111444192B (en) Method, device and equipment for generating Hash of global state in block chain type account book
US10509780B2 (en) Maintaining I/O transaction metadata in log-with-index structure
EP2003579B1 (en) Method and system for data processing with database update for the same
CN102495838B (en) Data processing method and data processing device
CN103559139A (en) Data storage method and device
CN103473298A (en) Data archiving method and device and storage system
CN103246549A (en) Method and system for data transfer
WO2023277819A3 (en) Data processing method, system, device, computer program product, and storage function
CN102096613A (en) Method and device for generating snapshot
WO2016175880A1 (en) Merging incoming data in a database
CN110515897B (en) Method and system for optimizing reading performance of LSM storage system
CN114327292B (en) File management method, system, electronic device and storage medium
CN115858471A (en) Service data change recording method, device, computer equipment and medium
CN112965939A (en) File merging method, device and equipment
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
US9128823B1 (en) Synthetic data generation for backups of block-based storage
CN113806803A (en) Data storage method, system, terminal equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB02 Change of applicant information

Address after: 611731 Chengdu high tech Zone, Sichuan, West Park, Qingshui River

Applicant after: HUAWEI DIGITAL TECHNOLOGIES (CHENG DU) Co.,Ltd.

Address before: 611731 Chengdu high tech Zone, Sichuan, West Park, Qingshui River

Applicant before: CHENGDU HUAWEI SYMANTEC TECHNOLOGIES Co.,Ltd.

COR Change of bibliographic data

Free format text: CORRECT: APPLICANT; FROM: CHENGDU HUAWEI SYMANTEC TECHNOLOGIES CO., LTD. TO: HUAWEI DIGITAL TECHNOLOGY (CHENGDU) CO., LTD.

C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220921

Address after: No. 1899 Xiyuan Avenue, high tech Zone (West District), Chengdu, Sichuan 610041

Patentee after: Chengdu Huawei Technologies Co.,Ltd.

Address before: 611731 Qingshui River District, Chengdu hi tech Zone, Sichuan, China

Patentee before: HUAWEI DIGITAL TECHNOLOGIES (CHENG DU) Co.,Ltd.

TR01 Transfer of patent right