CN104881466A

CN104881466A - Method and device for processing data fragments and deleting garbage files

Info

Publication number: CN104881466A
Application number: CN201510271710.2A
Authority: CN
Inventors: 徐佩林; 颜世光; 覃安; 李康; 梁栋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2015-05-25
Filing date: 2015-05-25
Publication date: 2015-09-02
Anticipated expiration: 2035-05-25
Also published as: CN104881466B

Abstract

The embodiment of the invention discloses a method and device for processing data fragments and deleting garbage files. The method for processing the data fragments comprises the steps that in the process that a distributed total order storage system generates total order data fragments, and at least one piece of attribute description information corresponding to the data fragments is obtained, wherein the attribute description information comprises data iterative information; the attribute description information is written in file meta information corresponding to the data fragments; when an instruction for processing at least one target data fragment is received, the data iterative information in the file meta information corresponding to the target data fragments is processed to achieve processing on the target data fragments. According to the technical scheme, the technical effect for processing the target data fragments completely can be achieved without the needs for moving or modifying the data files, the processing mechanism of data fragments in an existing distributed total order storage system is optimized, and the ever-growing convenient and efficient processing demands of people for the data fragments are met.

Description

The process of data fragmentation and the delet method of garbage files and device

Technical field

The embodiment of the present invention relates to computer technology, particularly relates to a kind of process of data fragmentation and the delet method of garbage files and device.

Background technology

In general, data store mainly through the mode of Key-Value (key-value pair) in a database.Store corresponding key assignments (Value) in each key name (Key), corresponding key assignments can be found by key name, and then certain data manipulation can be completed to this key assignments.In addition, in order to realize the fast reading and writing to data in database, the data stored in database are generally total order data.

Total order data logically see it is a super large data set (number of data lines is more than trillion magnitudes) sorted by key, because its data volume is huge, super large data set cannot be stored completely by means of only one or several servers.Therefore, in existing distributed total order storage system, need magnanimity total order data to disperse in a distributed fashion to be stored in each data fragmentation of server cluster.Wherein, different data fragmentations is stored in one or more sliced service device, and the data message scope stored in different burst is stored in the burst metamessage of management server by unified.Like this, multiple sliced service device carries out unified scheduling configuration by a management server, can realize each generic operation to total order data.

Obviously, the total order data stored in database are dynamic changes, along with the lasting additions and deletions operation to data, the size of burst can change, therefore need to divide larger burst, and less burst is merged, and how the data fragmentation storing total order data is carried out rationally and divides efficiently/merge being the previous very important research topic of order.

The implementation method of existing burst division/folding mainly contains the following two kinds:

1, off-line division/merge.This programme needs to stop service when burst division/merging.Then the legacy data off-line in burst is write in new burst, then revise burst metamessage and come into force.Division/the combined efficiency of this realization is very low, needs the bandwidth/computational resource doubled, and needs to stop service for a long time, is unacceptable under the scene higher to requirement of real-time;

2, based on the division of file chaining.In this programme, the physical store of burst corresponds to a file system directories, if generate new burst, only needs the link creating ancient deed under new burst catalogue.This scheme, without the need to Mobile data, in line splitting, can be served without the need to stopping.But need the linking functions of dependent file system, and effectively cannot realize burst merging.

Summary of the invention

In view of this, the embodiment of the present invention provides a kind of process of data fragmentation and the delet method of garbage files and device, to optimize the treatment mechanism of data fragmentation in existing distributed total order storage system, meet the processing demands of data fragmentation of the growing facilitation of people, high efficiency.

In first aspect, embodiments provide the disposal route of data fragmentation in a kind of distributed total order storage system, comprising:

Generate in distributed total order storage system in the process of total order data fragmentation, obtain at least one attribute description information corresponding with described data fragmentation, wherein, described attribute description information comprises data iterative information;

By in file meta-information corresponding with described data fragmentation for described attribute description information write;

When receiving the process instruction at least one target data burst, the data iterative information in the file meta-information corresponding with described target data burst is processed, to realize the process to described target data burst.

In second aspect, embodiments provide a kind of delet method of garbage files, comprising:

File meta-information corresponding with each data fragmentation in Querying Distributed total order storage system, obtain the first listed files, wherein, in described file meta-information, store the attribute description information corresponding with data fragmentation, and include data iterative information in described attribute description information;

Scan the file system corresponding with described distributed total order storage system, obtain the second listed files;

Calculate the difference set of described first listed files and described second listed files, as listed files to be deleted;

Delete the data file matched with described listed files to be deleted in described distributed total order storage system.

In the third aspect, embodiments provide the treating apparatus of data fragmentation in a kind of distributed total order storage system, comprising:

Attribute description data obtaining module, for generating in the process of total order data fragmentation in distributed total order storage system, obtain at least one attribute description information corresponding with described data fragmentation, wherein, described attribute description information comprises data iterative information;

Attribute description information writing module, for writing in the file meta-information corresponding with described data fragmentation by described attribute description information;

File meta-information processing module, for when receiving the process instruction at least one target data burst, processes the data iterative information in the file meta-information corresponding with described target data burst, to realize the process to described target data burst.

In fourth aspect, embodiments provide a kind of delete device of garbage files, comprising:

First listed files acquiring unit, for file meta-information corresponding with each data fragmentation in Querying Distributed total order storage system, obtain the first listed files, wherein, store the attribute description information corresponding with data fragmentation in described file meta-information, and include data iterative information in described attribute description information;

Second listed files acquiring unit, for scanning the file system corresponding with described distributed total order storage system, obtains the second listed files;

Listed files computing unit to be deleted, for calculating the difference set of described first listed files and described second listed files, as listed files to be deleted;

Data file delete cells, for deleting the data file matched with described listed files to be deleted in described distributed total order storage system;

Described first listed files acquiring unit specifically for:

Obtain file meta-information corresponding with each data fragmentation in described distributed total order storage system as pending file meta-information;

According to the data iterative information in each described pending file meta-information and file storage location information, obtain the division file meta-information that described pending file meta-information comprises, and/or merged file metamessage, wherein, described division file meta-information specifically comprises the file meta-information after division process, and described merged file metamessage specifically comprises the file meta-information after merging treatment;

According to described division file meta-information, and/or described merged file metamessage, generate described first listed files.

The embodiment of the present invention is by writing the technological means in the file meta-information corresponding with data fragmentation by the data iterative information of data fragmentation, achieve when receiving the process instruction to target data burst, without the need to directly processing the data file stored in this target data burst, process by means of only to the file meta-information corresponding with described target data burst, the technique effect of the process to described target data burst can be completed, without the need to carrying out any movement or amendment to data file, optimize the treatment mechanism of data fragmentation in existing distributed total order storage system, meet the facilitation that people are growing, the processing demands of the data fragmentation of high efficiency.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the disposal route of data fragmentation in a kind of distributed total order storage system of first embodiment of the invention;

Fig. 2 is the process flow diagram of the disposal route of data fragmentation in a kind of distributed total order storage system of second embodiment of the invention;

Fig. 3 is the process flow diagram of the disposal route of data fragmentation in a kind of distributed total order storage system of third embodiment of the invention;

Fig. 4 is the relation schematic diagram between burst metamessage, file meta-information and data fragmentation applied in the embodiment of the present invention;

Fig. 5 is the information interaction schematic diagram in the embodiment of the present invention between management server and sliced service device;

Fig. 6 is the division of data fragmentation and the reality of merging schematic diagram consuming time in the embodiment of the present invention;

Fig. 7 is the process flow diagram of the delet method of a kind of garbage files of fourth embodiment of the invention;

Fig. 8 is the process flow diagram of the delete procedure of a kind of concrete garbage files of fourth embodiment of the invention;

Fig. 9 is the structural drawing of the treating apparatus of data fragmentation in a kind of distributed total order storage system of fifth embodiment of the invention;

Figure 10 is the structural drawing of the delete device of a kind of garbage files of sixth embodiment of the invention.

Embodiment

In order to make the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, the specific embodiment of the invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.Before in further detail exemplary embodiment being discussed, it should be mentioned that some exemplary embodiments are described as the process or method described as process flow diagram.Although operations (or step) is described as the process of order by process flow diagram, many operations wherein can be implemented concurrently, concomitantly or simultaneously.In addition, the order of operations can be rearranged.Described process can be terminated when its operations are completed, but can also have the additional step do not comprised in the accompanying drawings.Described process can correspond to method, function, code, subroutine, subroutine etc.

First embodiment

The process flow diagram of the disposal route of data fragmentation in a kind of distributed total order storage system that Fig. 1 provides for first embodiment of the invention, the method of the present embodiment can be performed by the treating apparatus of data fragmentation in distributed total order storage system, this device realizes by the mode of hardware and/or software, and in the sliced service device of general accessible site in distributed total order storage system, with the management server in distributed total order storage system with the use of.The method of the present embodiment specifically comprises:

110, generate in distributed total order storage system in the process of total order data fragmentation, obtain at least one attribute description information corresponding with described data fragmentation, wherein, described attribute description information comprises data iterative information.

As previously mentioned, for total order data more than PB (petabyte, thousand terabytes), be difficult to its full storage to get off by means of only single server.Therefore, after needing to use distributed total order storage system that total order data are carried out data fragmentation, be stored in different sliced service devices.

In the present embodiment, generate in distributed total order storage system in the process of total order data fragmentation, obtain the attribute description information that includes data iterative information corresponding with each data fragmentation.

Described data iterative information is specifically for identifying the scope of total order data included in the data fragmentation corresponding to it.Accordingly, described data iterative information can comprise initial data mark (typical, the Key value in Key-Value) and the end data mark of the total order data stored in described data fragmentation; Also initial data serial number and the end data serial number of the total order data stored in described data fragmentation can be comprised; Can also comprise the initial data mark of the total order data stored in described data fragmentation and the number etc. of included total order data, the present embodiment does not limit this.

For example, described total order data are by the data of Key value from the sequence of A ~ Z order, these total order data are divided into two data fragmentations by distributed total order storage system, then for above-mentioned three kinds of situations: the data iterative information corresponding with the first data fragmentation can be the data iterative information that (A->M) is corresponding with the second data fragmentation can be (N->Z); Or the data iterative information corresponding with the first data fragmentation can be the data iterative information that (1->13) is corresponding with the second data fragmentation can be (14->26); Or the data iterative information corresponding with the first data fragmentation can be (A, 13) data iterative information corresponding with the second data fragmentation can be (N, 13).

Described attribute description information is specifically for describing the base attribute feature of the data fragmentation corresponding to it, it is except including data iterative information, can also comprise the size that stores storage file in the physical address of data fragmentation, data fragmentation and other zone bit information etc., the present embodiment does not limit this.

120, by file meta-information corresponding with described data fragmentation for described attribute description information write.

In the present embodiment, specifically store the attribute description information in data fragmentation in described file meta-information, the general and data fragmentation corresponding stored of this file meta-information is in sliced service device.

130, when receiving the process instruction at least one target data burst, the data iterative information in the file meta-information corresponding with described target data burst is processed, to realize the process to described target data burst.

As previously mentioned, data in the total order data file stored in distributed total order storage system are dynamic changes, along with the lasting additions and deletions operation to data, the size of each data fragmentation can change, therefore need to divide larger data fragmentation, less data fragmentation is merged.

Accordingly, in the present embodiment, management server can according to the size of each data fragmentation, and the sliced service device regularly carrying out the data fragmentation place dividing or merge to needs sends corresponding process instruction, completes corresponding division or union operation to make described sliced service device.Wherein, described target data burst specifically refers to the data fragmentation needing to carry out dividing or merging.

In the present embodiment, owing to storing data iterative information in file meta-information, and in this data iterative information, identify the scope of total order data included in the data fragmentation corresponding to it.Now, sliced service device is without the need to having come division or the merging of paired data burst by the mode of practical operation target data burst, and only need, by dividing accordingly the data iterative information in file meta-information corresponding to described target data burst or merge, the division to described target data burst or merging can be realized.

Typically, when receiving the division instruction to target data burst, process by carrying out division to the data iterative information in the file meta-information corresponding with described target data burst, generate the mode of at least two file meta-information corresponding with target data burst, the division process to described target data burst can be realized; When receiving the merging instruction at least two target data bursts, by carrying out merging treatment to the data iterative information at least two file meta-information corresponding with described at least two target data bursts, generate the mode with these at least two target data burst respective file metamessages, the merging treatment to described at least two target data bursts can be realized.

On the basis of above-described embodiment, described attribute description information can also comprise: come into force identification information, wherein, described in the identification information that comes into force whether come into force for identifying described data iterative information.Whether come into force, specifically refer to the data whether can accessed on the data fragmentation corresponding with described data iterative information.If described in the identification information that comes into force be designated and do not come into force, then the total data on described data fragmentation all can be accessed; If described in the identification information that comes into force be identified as and come into force, then only can access the partial data on described data fragmentation.

For example, after generation data fragmentation, first file meta-information corresponding with this data fragmentation comprises two attribute description information, that is: data iterative information and the identification information that comes into force.Wherein, described data iterative information is (A->G), described in the identification information that comes into force be set to " not coming into force ".Accordingly, by the first file meta-information, the total data (data that A ~ G is corresponding) in this data fragmentation can be accessed;

After receiving the division instruction to this data fragmentation, first file meta-information can be split into two file meta-information (the second file meta-information and the 3rd file meta-information) by sliced service device, wherein, the data iterative information that second file meta-information comprises is (A->C), and the mark that comes into force in the second file meta-information is set to " coming into force ".Accordingly, by the second file meta-information, the partial data (data that A ~ C is corresponding) in this data fragmentation only can be accessed.

The reason of such setting is: in sliced service device, and data are come into force needs to increase the consumption of CPU (CentralProcessing Unit, central processing unit).Therefore, do not use come into force identification information time, all need data iterative information to be designated in any case to come into force, this considerably increases the consumption of CPU.Accordingly, by introducing this identification information that comes into force in attribute description information, can selecting the need of making data iterative information come into force according to actual conditions, doing the consumption that significantly can reduce CPU like this, improving the treatment effeciency of CPU.

Second embodiment

Fig. 2 is the process flow diagram of the disposal route of data fragmentation in a kind of distributed total order storage system of second embodiment of the invention.The present embodiment is optimized based on above-described embodiment, in the present embodiment, described process instruction is specifically optimized for and indicates the division of data fragmentation; Accordingly, by when receiving the process instruction at least one target data burst, data iterative information in the file meta-information corresponding with described target data burst is processed, to realize specifically being optimized for the process of described target data burst: the instruction of the division to target data burst that receiving management server sends; According to described division instruction, obtain the file destination metamessage corresponding with described target data burst; Data iterative information in described file destination metamessage is carried out division process, generate at least two division file meta-information; Division result is back to described management server, according to described division result, corresponding burst metamessage is modified to indicate described management server;

Simultaneously, data iterative information in described file destination metamessage is carried out division process, generate at least two division file meta-information to be specifically optimized for: in the total order data corresponding with described target data burst, obtain at least one node data mark, wherein, described node data is the data in described total order data between start point data and endpoint data; According to start point data mark, endpoint data mark and at least one node data described mark that the data iterative information in described file destination metamessage comprises, generate at least two division data iterative informations; According to described at least two division data iterative informations, generate at least two division file meta-information.

Accordingly, the method for the present embodiment specifically comprises:

210, generate in distributed total order storage system in the process of total order data fragmentation, obtain at least one attribute description information corresponding with described data fragmentation.

220, by file meta-information corresponding with described data fragmentation for described attribute description information write.

230, the division to target data burst that receiving management server sends indicates.

In the present embodiment, in the division instruction that management server sends, include the information of target data burst needing to carry out dividing process, and described division instruction is used to indicate sliced service device that target data burst is split at least two data fragmentations.

240, according to described division instruction, the file destination metamessage corresponding with described target data burst is obtained.

In the present embodiment, sliced service device indicates the information of the target data burst comprised according to described division, can obtain the file destination metamessage corresponding with described target data burst.

250, in the total order data corresponding with described target data burst, obtain at least one node data mark, wherein, described node data is the data in described total order data between start point data and endpoint data.

In the present embodiment, in order to realize the division process to target data burst, only need the file destination metamessage corresponding with described target data burst to be treated at least two file meta-information (be hereinafter referred to as and divide file meta-information).Also namely, the data iterative information that file destination metamessage comprises is treated at least two data iterative informations (being hereinafter referred to as division data iterative information).

For example, the data iterative information that file destination metamessage comprises is (A->Z), by this data iterative information and the total order data corresponding with this iterative information (namely Key value is from the total order data of A ~ Z), (A->M) and (N->Z) these two can be generated and divide data iterative information.Accordingly, two the division file meta-information comprising (A->M) and (N->Z) these two respectively and divide data iterative information can be generated.

In order to realize above-mentioned technique effect, first needing in the total order data corresponding with described target data burst, obtaining at least one node data mark.Wherein, the described node data mark obtained, specifically for identifying with the start point data that the data iterative information in file destination metamessage comprises, together with endpoint data identifies, the start point data mark that the new division data iterative information of common composition comprises and endpoint data mark.

Continuous precedent, in the Key value corresponding with described target data burst from the total order data of A ~ Z, obtain two node data mark M and N, wherein, the node data that M and N is corresponding is the data in described total order data between start point data and endpoint data.

260, according to start point data mark, endpoint data mark and at least one node data described mark that the data iterative information in described file destination metamessage comprises, at least two division data iterative informations are generated.

Continuous precedent, according to start point data mark A, endpoint data mark Z and two node data mark M and N that the data iterative information in described file destination metamessage comprises, two divisions data iterative information (A->M) and (N->Z) can be generated.

270, according to described at least two division data iterative informations, at least two division file meta-information are generated.

280, division result is back to described management server, according to described division result, corresponding burst metamessage is modified to indicate described management server.

In general, management server is used for storing and safeguarding burst metamessage, wherein, mainly stores the information of the sliced service device at total order data different pieces of information burst place in burst metamessage.In the present embodiment, owing to being by realizing the process of file meta-information to the division of data fragmentation and merging treatment, instead of direct control data burst.Therefore, in the present embodiment, the information storing corresponding file meta-information in existing burst metamessage is needed.Like this, when a needs access data fragmentation, first search burst metamessage, find corresponding file meta-information afterwards according to the burst metamessage of coupling, the data fragmentation finally finding coupling according to the file meta-information of coupling conducts interviews.

As mentioned above, after a file meta-information is split at least two division file meta-information, needs division result to be back to described management server, according to described division result, corresponding burst metamessage is modified to indicate described management server.

The method of the present embodiment is when receiving the division instruction to target data burst, without the need to directly dividing the data file stored in this target data burst, process by means of only carrying out division to the file meta-information corresponding with described target data burst, the technique effect that described target data burst is divided can be completed, without the need to carrying out any movement or amendment to data file, optimize the treatment mechanism of data fragmentation in existing distributed total order storage system, meet the processing demands of data fragmentation of the growing facilitation of people, high efficiency.

3rd embodiment

Fig. 3 is the process flow diagram of the disposal route of data fragmentation in a kind of distributed total order storage system of third embodiment of the invention.The present embodiment is optimized based on above-described embodiment, in the present embodiment, described process instruction is specifically optimized for and indicates the merging of data fragmentation; Accordingly, by when receiving the process instruction at least one target data burst, data iterative information in the file meta-information corresponding with described target data burst is processed, to realize specifically being optimized for the process of described target data burst: the instruction of the merging at least two target data bursts that receiving management server sends; According to described merging instruction, obtain at least two the file destination metamessages corresponding with described at least two target data bursts; Data iterative information in described at least two file destination metamessages is carried out merging treatment, generates merged file metamessage; Amalgamation result is back to described management server, according to described amalgamation result, corresponding burst metamessage is modified to indicate described management server;

Meanwhile, the data iterative information in described at least two file destination metamessages is carried out merging treatment, generate merged file metamessage and be specifically optimized for: the data iterative information that described in acquisition, at least two file destination metamessages comprise; Putting in order of each start point data mark, each endpoint data mark and the total order data corresponding with described target data burst that comprise according to described at least two data iterative informations, determines merge start point data mark and merge endpoint data mark; According to described merging start point data mark and described merging endpoint data mark, generate pooled data iterative information; According to described pooled data iterative information, generate merged file metamessage.

Accordingly, the method for the present embodiment specifically comprises:

310, generate in distributed total order storage system in the process of total order data fragmentation, obtain at least one attribute description information corresponding with described data fragmentation.

320, by file meta-information corresponding with described data fragmentation for described attribute description information write.

330, the merging at least two target data bursts that receiving management server sends indicates.

In the present embodiment, include the information of at least two the target data bursts needing to carry out merging treatment in the merging instruction that management server sends, and described merging instruction is used to indicate sliced service device that described at least two target data bursts are merged into a data fragmentation.

340, according to described merging instruction, at least two the file destination metamessages corresponding with described at least two target data bursts are obtained.

In the present embodiment, sliced service device merges according to described the information indicating at least two the target data bursts comprised, and can obtain at least two the file destination metamessages corresponding with described target data burst.

350, obtain described in the data iterative information that comprises of at least two file destination metamessages.

360, putting in order of each start point data mark, each endpoint data mark and the total order data corresponding with described target data burst that comprise according to described at least two data iterative informations, determines merge start point data mark and merge endpoint data mark.

370, according to described merging start point data mark and described merging endpoint data mark, pooled data iterative information is generated.

In a concrete example, the data iterative information that described at least two the file destination metamessages obtained comprise is respectively (A->M) and (N->Z), Data Identification included in these two data iterative informations is respectively A, M, N and Z, the total order data corresponding with data fragmentation put in order as arranging according to the order of Key value from A ~ Z, therefore, can determine that merging start point data is designated A, merge terminal flag Z, accordingly, the pooled data iterative information generated is (A->Z).

380, according to described pooled data iterative information, merged file metamessage is generated.

390, amalgamation result is back to described management server, according to described amalgamation result, corresponding burst metamessage is modified to indicate described management server.

The method of the present embodiment is when receiving the merging instruction at least two target data bursts, without the need to directly merging the data file stored in these at least two target data bursts, merging treatment is carried out by means of only to the file meta-information corresponding with described at least two target data bursts, the technique effect that described target data burst is merged can be completed, without the need to carrying out any movement or amendment to data file, optimize the treatment mechanism of data fragmentation in existing distributed total order storage system, meet the facilitation that people are growing, the processing demands of the data fragmentation of high efficiency.

Can clearly be found out by the second embodiment and the 3rd embodiment: the method for the present embodiment is in fact introduce file metadata layer between burst metamessage and data fragmentation.Wherein, the file meta-information stored in this file metadata layer is owing to only including the attribute information of data fragmentation, and its physics realization is a very little file, usually only has several million sizes.After introducing file metadata layer, the data fragmentation of the division of data fragmentation or union operation and reality can be realized to isolate, become the operation to file meta-information completely, and file meta-information is very little file, can be quickly to its processing speed.

Figure 4 illustrates the relation schematic diagram between a kind of burst metamessage, file meta-information and data fragmentation.As shown in Figure 4, burst metamessage is stored in burst metamessage layer, and file meta-information is stored in file metadata layer, and data fragmentation is stored in file system layer.Wherein, burst metamessage layer is generally arranged in management server, and file metadata layer and file system layer are generally arranged in sliced service device.In burst metamessage, store the information of one or more file meta-information, in file meta-information, store the information of data fragmentation.Be not one to one between file meta-information and data fragmentation, a data fragmentation can one or more file meta-information corresponding (such as, a data fragmentation experienced by and repeatedly divides).By said structure relation, the division division of data fragmentation being converted into file meta-information can be realized, the merging of data fragmentation is converted into the merging of file meta-information.The real Abruption and mergence that makes becomes a pair invertible operation, and has nothing to do with real data fragmentation.In whole operating process, there is not any movement or amendment in data fragmentation.

Figure 5 illustrates the information interaction schematic diagram between management server in the embodiment of the present invention and sliced service device.As shown in Figure 5, management server initiates the instruction of division/union operation to sliced service device, sliced service device is according to the operational order received, the file meta-information that upgrading is corresponding, after upgrading successfully, the result of successful operation is returned to management server, and management server is according to above-mentioned updating result afterwards, the burst metamessage that amendment is corresponding.

The automatic Abruption and mergence of burst that the distributed total order storage system that application the technical program realizes can be implemented in line, the main beneficial effect of acquisition is as follows:

1) system availability: all operations completes completely online, each operation used time is less than 0.5 second, serves basic unaware, and the system high for availability requirement is very applicable, wherein, the reality schematic diagram consuming time of the division/merging of data fragmentation in the embodiment of the present invention is shown at Fig. 6;

2) data are flexible: can no requirement (NR) completely to input Data distribution8, system can completely self-adaptation by dividing, merging adjust burst;

3) resource consumption: Abruption and mergence operation does not bring the resource pressures such as extra bandwidth, calculating, storage to server cluster, and resource consumption approximates 0, and this is extremely important for data scale, scene that cluster scale is large;

4) system load balancing, hot dilatation: all bursts of whole system can be made to maintain the size of design by division/folding, this is that the load balancing of system has laid an extraordinary basis.The method of the present embodiment and burst move the hot dilatation combining and realize, and make the dilatation of system very simple and quick, service unaware.

Based on above-mentioned beneficial effect, the method for the present embodiment can be applied to the entity storehouse etc. of the chained library of the service of crawl, web page library and knowledge mapping.

4th embodiment

Fig. 7 is the process flow diagram of the delet method of a kind of garbage files of fourth embodiment of the invention.The method of the present embodiment can be performed by the delete device of garbage files, this device realizes by the mode of hardware and/or software, and in the management server of general accessible site in distributed total order storage system, with the sliced service device in distributed total order storage system with the use of.The method of the present embodiment specifically comprises:

710, file meta-information corresponding with each data fragmentation in distributed total order storage system is obtained as pending file meta-information, wherein, store the attribute description information corresponding with data fragmentation in described file meta-information, and include data iterative information in described attribute description information.

In general, under the scene of multiple sliced service device, the formation speed of garbage files is quickly, and can the collection handling garbage files while realizing division/merging well also very crucial.Can be found out to the 3rd embodiment by the first embodiment: because burst metamessage layer and file system layer are kept apart by file metadata layer, and a data fragmentation may correspond to multiple file meta-information, different file meta-information may be present on different sliced service devices.Therefore, the garbage files determined in a sliced service device, may also can use by other sliced service device, therefore, in an independent sliced service device, directly complete the deletion action of garbage files, probably cause the mistake of useful file to be deleted, therefore, need to carry out each sliced service device of United Dispatching by management server, carried out the deletion action of garbage files.

In the present embodiment, file meta-information corresponding with each data fragmentation in Querying Distributed total order storage system, obtaining the first listed files can comprise: inquire about all files metamessage stored in each sliced service device, to obtain the first corresponding listed files.

Preferably, the file storage location information that can comprise according to file meta-information, generates described first listed files.

But, consider that the data volume of the file meta-information that each data fragmentation is corresponding is very huge, therefore when obtaining the first listed files according to all files metamessage, to the consumption of CPU is comparatively large and processing speed is slow.Accordingly, one of the present embodiment preferred embodiment in, file meta-information corresponding with each data fragmentation in Querying Distributed total order storage system, obtaining the first listed files can also comprise:

Obtain file meta-information corresponding with each data fragmentation in described distributed total order storage system as pending file meta-information; According to the data iterative information in each described pending file meta-information and file storage location information, obtain the division file meta-information that described pending file meta-information comprises, and/or merged file metamessage, wherein, described division file meta-information specifically comprises the file meta-information after division process, and described merged file metamessage specifically comprises the file meta-information after merging treatment; According to described division file meta-information, and/or described merged file metamessage, generate described first listed files.

The reason of such setting is, can find out according to the first embodiment and the 3rd embodiment: if file meta-information is without division or union operation, data fragmentation corresponding to it can only be stored in a sliced service device, the data fragmentation corresponding to this file meta-information carries out delete processing, the mistake of data fragmentation can not be caused to delete, only file meta-information through division or union operation after, data fragmentation corresponding to it just may be stored in different sliced service devices, therefore, only need obtain division file meta-information and merged file metamessage that each file meta-information comprises to generate described first listed files, arrange like this and greatly can reduce CPU consumption, improve CPU processing speed.

720, the file system that scanning is corresponding with described distributed total order storage system, obtains the second listed files.

730, the difference set of described first listed files and described second listed files is calculated, as listed files to be deleted.

740, the data file matched with described listed files to be deleted in described distributed total order storage system is deleted.

The method of the present embodiment, by file meta-information corresponding with each data fragmentation in Querying Distributed total order storage system, obtains the first listed files, scans the file system corresponding with described distributed total order storage system, obtain the second listed files; Calculate the difference set of described first listed files and described second listed files, as listed files to be deleted; Delete the technological means of the data fragmentation matched with described listed files to be deleted in described distributed total order storage system, can on the basis of introducing file metadata layer, the mistake of data file is prevented to delete, additionally by the mode obtaining the first listed files according to division file meta-information and merged file metamessage, the erasing time of garbage files greatly can be reduced.

Figure 8 illustrates a kind of process flow diagram of concrete garbage files delete procedure, as shown in Figure 8, described method comprises:

810, the file set a used is inquired about to sliced service device.

820, scanning document system, obtains the set b of All Files in current file system.

830, wait-receiving mode is to All Files data.

Whether the result 840, judging to verify fileinfo is that file is complete: if so, perform 850; Otherwise, return execution 810 and 820.

850, difference set is done in b, a set, draw garbage files set.

860, from file system, delete garbage files, this refuse collection completes.

870, after triggering the timer of refuse collection next time, execution 810 and 820 is returned.

5th embodiment

Figure 9 illustrates the treating apparatus of data fragmentation in a kind of distributed total order storage system of fifth embodiment of the invention.As shown in Figure 9, described device comprises:

Attribute description data obtaining module 91, for generating in the process of total order data fragmentation in distributed total order storage system, obtain at least one attribute description information corresponding with described data fragmentation, wherein, described attribute description information comprises data iterative information.

Attribute description information writing module 92, for writing described attribute description information in the file meta-information corresponding with described data fragmentation.

File meta-information processing module 93, for when receiving the process instruction at least one target data burst, data iterative information in the file meta-information corresponding with described target data burst is processed, to realize the process to described target data burst.

On the basis of the various embodiments described above, described data iterative information can comprise: the initial data mark of the total order data stored in described data fragmentation and end data mark.

On the basis of the various embodiments described above, described process indicates to comprise and indicates the division of data fragmentation;

Accordingly, described file meta-information processing module specifically can comprise:

Division instruction receiving element, for the instruction of the division to target data burst that receiving management server sends;

File destination metamessage acquiring unit, for according to described division instruction, obtains the file destination metamessage corresponding with described target data burst;

Division processing unit, for the data iterative information in described file destination metamessage being carried out division process, generates at least two division file meta-information;

Division result returns unit, for division result is back to described management server, modifies according to described division result to indicate described management server to corresponding burst metamessage.

On the basis of the various embodiments described above, described division processing unit specifically may be used for further:

In the total order data corresponding with described target data burst, obtain at least one node data mark, wherein, described node data is the data in described total order data between start point data and endpoint data;

According to start point data mark, endpoint data mark and at least one node data described mark that the data iterative information in described file destination metamessage comprises, generate at least two division data iterative informations;

According to described at least two division data iterative informations, generate at least two division file meta-information.

On the basis of the various embodiments described above, described process indicates to comprise and indicates the merging of data fragmentation;

Merge instruction receiving element, for the instruction of the merging at least two target data bursts that receiving management server sends;

File destination metamessage acquiring unit, for according to described merging instruction, obtains at least two the file destination metamessages corresponding with described at least two target data bursts;

Merging treatment unit, for the data iterative information in described at least two file destination metamessages is carried out merging treatment, generates merged file metamessage;

Amalgamation result returns unit, for amalgamation result is back to described management server, modifies according to described amalgamation result to indicate described management server to corresponding burst metamessage.

On the basis of the various embodiments described above, described merging treatment unit specifically may be used for further:

The data iterative information that described in acquisition, at least two file destination metamessages comprise;

Putting in order of each start point data mark, each endpoint data mark and the total order data corresponding with described target data burst that comprise according to described at least two data iterative informations, determines merge start point data mark and merge endpoint data mark;

According to described merging start point data mark and described merging endpoint data mark, generate pooled data iterative information;

According to described pooled data iterative information, generate merged file metamessage.

In the distributed total order storage system that the embodiment of the present invention provides, the treating apparatus of data fragmentation can be used for the disposal route performing data fragmentation in the distributed total order storage system that provides of any embodiment of the present invention, possess corresponding functional module, realize identical beneficial effect.

6th embodiment

Figure 10 illustrates the delete device of a kind of garbage files of sixth embodiment of the invention.As shown in Figure 10, described device comprises:

First listed files acquiring unit 101, for file meta-information corresponding with each data fragmentation in Querying Distributed total order storage system, obtain the first listed files, wherein, store the attribute description information corresponding with data fragmentation in described file meta-information, in attribute description information described in it, include data iterative information;

Second listed files acquiring unit 102, for scanning the file system corresponding with described distributed total order storage system, obtains the second listed files;

Listed files computing unit 103 to be deleted, for calculating the difference set of described first listed files and described second listed files, as listed files to be deleted;

Data file delete cells 104, for deleting the data file matched with described listed files to be deleted in described distributed total order storage system;

Described first listed files obtains Unit 101 and specifically may be used for:

The method of the present embodiment, by file meta-information corresponding with each data fragmentation in Querying Distributed total order storage system, obtains the first listed files, scans the file system corresponding with described distributed total order storage system, obtain the second listed files; Calculate the difference set of described first listed files and described second listed files, as listed files to be deleted; Delete the technological means of the data file matched with described listed files to be deleted in described distributed total order storage system, can on the basis of introducing file metadata layer, the mistake of data file is prevented to delete, additionally by the mode obtaining the first listed files according to division file meta-information and merged file metamessage, the erasing time of garbage files greatly can be reduced.

The delete device of the garbage files that the embodiment of the present invention provides can be used for the delet method performing the garbage files that any embodiment of the present invention provides, and possesses corresponding functional module, realizes identical beneficial effect.

Obviously, it will be understood by those skilled in the art that above-mentioned of the present invention each module or each step can be implemented by sliced service device as above and management server.Alternatively, the embodiment of the present invention can realize by the executable program of computer installation, thus they storages can be performed by processor in the storage device, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.; Or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to the combination of any specific hardware and software.

The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. the disposal route of data fragmentation in distributed total order storage system, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, described data iterative information comprises: the initial data mark of the total order data stored in described data fragmentation and end data mark.

3. method according to claim 2, is characterized in that, described process indicates to comprise and indicates the division of data fragmentation;

Accordingly, when receiving the process instruction at least one target data burst, the data iterative information in the file meta-information corresponding with described target data burst is processed, to realize comprising the process of described target data burst:

The instruction of the division to target data burst that receiving management server sends;

According to described division instruction, obtain the file destination metamessage corresponding with described target data burst;

Data iterative information in described file destination metamessage is carried out division process, generate at least two division file meta-information;

Division result is back to described management server, according to described division result, corresponding burst metamessage is modified to indicate described management server.

4. method according to claim 3, is characterized in that, the data iterative information in described file destination metamessage is carried out division process, generates at least two division file meta-information and comprises:

5. method according to claim 2, is characterized in that, described process indicates to comprise and indicates the merging of data fragmentation;

The instruction of the merging at least two target data bursts that receiving management server sends;

According to described merging instruction, obtain at least two the file destination metamessages corresponding with described at least two target data bursts;

Data iterative information in described at least two file destination metamessages is carried out merging treatment, generates merged file metamessage;

Amalgamation result is back to described management server, according to described amalgamation result, corresponding burst metamessage is modified to indicate described management server.

6. method according to claim 5, is characterized in that, the data iterative information in described at least two file destination metamessages is carried out merging treatment, generates merged file metamessage and comprises:

7. method according to claim 1, is characterized in that, described attribute description information also comprises: come into force identification information, wherein, described in the identification information that comes into force whether come into force for identifying described data iterative information.

8. a delet method for garbage files, is characterized in that, comprising:

9. method according to claim 8, is characterized in that, file meta-information corresponding with each data fragmentation in Querying Distributed total order storage system, obtains the first listed files and comprises:

10. the treating apparatus of data fragmentation in distributed total order storage system, is characterized in that, comprising:

11. devices according to claim 10, is characterized in that, described data iterative information comprises: the initial data mark of the total order data stored in described data fragmentation and end data mark.

12. devices according to claim 11, is characterized in that, described process indicates to comprise and indicates the division of data fragmentation;

Accordingly, described file meta-information processing module specifically comprises:

13. devices according to claim 12, is characterized in that, described division processing unit further specifically for:

14. devices according to claim 11, is characterized in that, described process indicates to comprise and indicates the merging of data fragmentation;

15. devices according to claim 14, is characterized in that, described merging treatment unit further specifically for:

The delete device of 16. 1 kinds of garbage files, is characterized in that, comprising:

Described first listed files acquiring unit specifically for: