CN110008178A - The method for organizing and device of metadata of distributed type file system - Google Patents

The method for organizing and device of metadata of distributed type file system Download PDF

Info

Publication number
CN110008178A
CN110008178A CN201910008102.0A CN201910008102A CN110008178A CN 110008178 A CN110008178 A CN 110008178A CN 201910008102 A CN201910008102 A CN 201910008102A CN 110008178 A CN110008178 A CN 110008178A
Authority
CN
China
Prior art keywords
blocks
text block
file
files
combination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910008102.0A
Other languages
Chinese (zh)
Other versions
CN110008178B (en
Inventor
陈骁杰
刘俊峰
姚文辉
沈健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910008102.0A priority Critical patent/CN110008178B/en
Publication of CN110008178A publication Critical patent/CN110008178A/en
Application granted granted Critical
Publication of CN110008178B publication Critical patent/CN110008178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application involves computer fields, disclose a kind of method for organizing of metadata of distributed type file system, it include: the content according to file to be created, the blocks of files combined index table for establishing default, the mapping relations for combining for reflecting the text block of file to be created and constituting all text blocks that text block combines;If file to be created includes that other existing files or existing text block combine, then by file to be created existing file or existing text block combination each of text block reference, it is replaced by the reference of the text block combination or the combination of existing text block to existing file, and updates blocks of files combined index table accordingly;It is generated according to the blocks of files combined index table of update and safeguards file to be created.The present invention is not necessarily to pay close attention to the variation of bottom text block structure, and organizational form is more convenient, and organizational efficiency is higher, and the utilization and tissue to text block are more reasonable, so that resource utilization ratio is higher.

Description

The method for organizing and device of metadata of distributed type file system
Technical field
This application involves field of computer technology, the in particular to organizational technology of data.
Background technique
Computer by file system management, storing data, and in the information explosion epoch the available data of people at The growth of index times, merely by way of increasing memory capacity of the hard disk number to extend computer file system, in capacity The performance of size, capacity growth rate, data backup, data safety etc. is all barely satisfactory.
And distributed file system can effectively solve the storage and management problem of data.
Specifically, distributed file system will be fixed on some file system in some place, expand to any number ofly Point/multiple file system, numerous nodes form a Filesystem Network.Each node can be distributed in different places, The communication and data transmission between node are carried out by network.
In this case, people are when using distributed file system, without being concerned about which node is data be stored in It is upper or be from which node from acquisition, it is only necessary to as using local file system in management and storage file system Data.
But according to existing technology, in distributed file system, after document creation with beneath text block (Chunk) relationship is binding.In this case, there are some problems, for example, to by a few sections of contents in file by The content of other files is replaced, deletes (such as some file mergencess, the scene of garbage reclamation), needs to replicate from other file A data are come in, and therefore, performance is affected.
Specifically, after text block (Chunk) is created, one a file, different text block (Chunk) groups can only be directly subordinate to Complete copy portion is had to when at new file.Also, file can not be nested with text block (Chunk), that is, text block (Chunk) its position cannot be substituted by other contents after creating.In addition, file is synchronous with the life cycle of text block (Chunk), Text block (Chunk) is also corresponding when file is deleted disappears.
As it can be seen that needing to pay close attention to bottom text block according to the organizational technology of existing metadata of distributed type file system (Chunk) variation of structure, organizational form is not convenient enough, and organizational efficiency is not high, utilization and tissue to text block (Chunk) It is not reasonable, so that resource utilization ratio is not also high.
Summary of the invention
The method for organizing and device for being designed to provide a kind of metadata of distributed type file system of the application, without concern The variation of bottom text block (Chunk) structure, organizational form is more convenient, and organizational efficiency is higher, to the benefit of text block (Chunk) With with organize it is more reasonable so that resource utilization ratio is higher.
To solve the above-mentioned problems, this application discloses a kind of method for organizing of metadata of distributed type file system, comprising:
According to the content of file to be created, the blocks of files combined index table of default is established, for reflecting the file to be created The mapping relations of all text blocks that are combined with composition text block of text block combination;
If the file to be created includes that other existing files or existing text block combine, will be to the text to be created The reference of each of existing file or the combination of existing text block in part text block, is replaced by the existing text The text block of part combines or the reference of existing text block combination, and updates this document block combined index table accordingly;
It is generated according to the blocks of files combined index table of update and safeguards the file to be created.
In a preferred embodiment, if being somebody's turn to do the file to be created includes other existing files or existing text block group Close, then by the file to be created the existing file or each of existing text block combination text block draw With being replaced by the reference to the combination of the text block of the existing file or the combination of existing text block, and update this document accordingly In the step of block combined index table, further includes:
User newly creates a text block combination to multiple text blocks in the file to be created, will be to multiple text block Each of text block reference, be replaced by the reference combined to the new text block, and update this document block synthetic rope accordingly Draw table.
In a preferred embodiment, further includes:
Blocks of files concordance list is established, for mapping the mark of each text block and the corresponding relationship of storage location;
When receiving the request of creation file, the text which is included is determined according to the content of the file to be created This block, and blocks of files concordance list is established according to this and determines storage location corresponding to each text block.
In a preferred embodiment, also include:
According to this document block combined index table, the adduction relationship concordance list of blocks of files is established, for reflecting text to be created The mapping relations that the combination of each of part blocks of files or blocks of files are combined with the upper one layer of blocks of files for quoting it.
In a preferred embodiment, this establishes the adduction relationship concordance list of blocks of files according to this document block combined index table, uses It is reflected in the combination of each of reflection file to be created blocks of files or blocks of files with what the upper one layer of blocks of files for quoting it combined After the step of penetrating relationship, further includes:
According to the adduction relationship concordance list of this document block, reference count is carried out to each this document block, also, work as this article When the reference count of this block is 0, physics deletes text block in the storage location of text block.
Disclosed herein as well is a kind of tissue devices of metadata of distributed type file system, comprising:
The blocks of files combined index table module of default establishes the blocks of files of default for the content according to file to be created Combined index table, the blocks of files combined index table of the default are used to reflect text block combination and composition this article of the file to be created The mapping relations of all text blocks of this block combination;
The combination of citation text block updates and blocks of files combined index table update module, if including for the file to be created Other existing files or the combination of existing text block, then will be to existing file or existing text in the file to be created The reference of each of this block combination text block, is replaced by the text block combination of the existing file or existing text block Combined reference, and this document block combined index table is updated accordingly;
It generates and maintenance creates file module, generated for the blocks of files combined index table according to update and safeguard this wait create Build file.
In a preferred embodiment, citation text block combination update and blocks of files combined index table update module, are also used to A text block is newly created to multiple text blocks in the file to be created by user to combine, it will be to every in multiple text block The reference of one text block is replaced by the reference to the combination of text block, and updates this document block combined index table accordingly.
In a preferred embodiment, further includes:
Blocks of files concordance list establishes module, and for establishing blocks of files concordance list, this document block concordance list is each for mapping The mark of a text block and the corresponding relationship of storage location;
Text block storage location determining module, for receive creation file request when, according to the file to be created Content determines included text block, and establishes blocks of files concordance list according to this and determine storage position corresponding to each text block It sets.
In a preferred embodiment, the adduction relationship concordance list module of blocks of files is used for according to this document block combined index table, The adduction relationship concordance list for establishing blocks of files, for reflect each of file to be created blocks of files combination or blocks of files with Quote the mapping relations of its upper one layer of blocks of files combination;
Text block quotes technology and removing module, should to each for the adduction relationship concordance list according to this document block Blocks of files carries out reference count, also, when the reference count of text block is 0, the physics in the storage location of text block Delete text block.
Disclosed herein as well is a kind of hoc apparatus of metadata of distributed type file system, comprising:
Memory, for storing computer executable instructions;And
Processor, for realizing the step in method as previously described when executing the computer executable instructions.
Disclosed herein as well is a kind of computer readable storage medium, calculating is stored in the computer readable storage medium Machine executable instruction, the computer executable instructions realize the step in method as previously described when being executed by processor.
Increase the adduction relationship concordance list of blocks of files combined index table (ChunkList Table) and blocks of files (Reference), correspondingly, the specific organizational form of data also changes, and an original text block (chunk) is only by a text Part application therefore, there is no need to handle adduction relationship, that is, reference count is 1 forever.And according to the present invention it is possible to by multiple files It applies simultaneously, increase an adduction relationship every time or deletes an adduction relationship, require the adduction relationship rope for updating blocks of files Draw table (Reference), when reference count is 0, physics deletes text block in the storage location of text block.
Also, according to the present invention, if there are the combination of the text block of existing file or existing texts in file to be created The combination of this block, and user needs directly to quote the text block combination or the combination of existing text block of these existing files, then first Establish text block combination (ChunkList) accordingly, resettle file to be created combined with the text block of existing file or The adduction relationship of existing text block combination, therefore, intermediate mostly one layer of corresponding relationship.
In this case, firstly, by combining file existing in file to be created or existing text block (ChunkList) each text block corresponding to is combined with their corresponding existing files or existing text block (ChunkList) it substitutes, may be implemented directly to replace the effect of a part of data in file to be created, do not need to carry out again specially Therefore the duplication of door keeps method for organizing more convenient, more efficient.
Second, when the content work for needing to combine other existing file contents or existing text block (ChunkList) For file content to be created a part when, it is only necessary to combine the text block of existing file (ChunkList) or existing Text block combination (ChunkList) be used as a text block (Chunk), hang over below the file to be created, therefore, make Method for organizing is more convenient, more efficient.
Third, after a file is deleted, if in other existing files or existing text block combination (ChunkList) There is also the references to the text block in this document, then text block will not be deleted, and draw if there is no to text block With, that is, when the reference count of the Chunk is 0, then physics deletes the text block in the storage location of the text block, because This, keeps the management of data and organizational form more reasonable, safe.
4th, in different files, bottom may some content (i.e. text block of existing file or existing text Block combination text block) be it is shared, in this case, it is possible to reduce same text block effectively proposes the occupancy of system resource The utilization efficiency of high system resource.
A large amount of technical characteristic is described in the description of the present application, is distributed in each technical solution, if to enumerate Out if the combination (i.e. technical solution) of all possible technical characteristic of the application, specification can be made excessively tediously long.In order to keep away Exempt from this problem, each technical characteristic disclosed in the application foregoing invention content, below in each embodiment and example Each technical characteristic disclosed in disclosed each technical characteristic and attached drawing, can freely be combined with each other, to constitute each The new technical solution (these technical solutions have been recorded because being considered as in the present specification) of kind, unless the group of this technical characteristic Conjunction is technically infeasible.For example, disclosing feature A+B+C in one example, spy is disclosed in another example A+B+D+E is levied, and feature C and D are the equivalent technologies means for playing phase same-action, it, can not as long as technically selecting a use Can use simultaneously, feature E can be technically combined with feature C, then, and the scheme of A+B+C+D because technology is infeasible should not It is considered as having recorded, and the scheme of A+B+C+E should be considered as being described.
Detailed description of the invention
Fig. 1 is illustrated according to the method for organizing process of the metadata of distributed type file system of the application first embodiment Figure;
Fig. 2 is file organization in the method for organizing according to the metadata of distributed type file system of the application first embodiment Structural schematic diagram;
Fig. 3 is the structural representation according to the tissue device of the metadata of distributed type file system of the application second embodiment Figure.
Specific embodiment
In the following description, in order to make the reader understand this application better, many technical details are proposed.But this The those of ordinary skill in field is appreciated that even if without these technical details and many variations based on the following respective embodiments And modification, the application technical solution claimed also may be implemented.
The explanation of part concept:
Distributed file system: for the file system of local side, distributed file system or network file System is a kind of file system that permission files through network is shared on multiple host, the multi-user on multimachine device can be allowed to divide Enjoy file and memory space.In such file system, the data storage area block of client and indirect access bottom, but By network, with specific communication protocol and server communication.
Metadata: metadata is the information of the tissue about data, data field and its relationship.Data attribute is mainly described (property) information, for supporting such as to indicate storage location, historical data, resource lookup, file record function.First number According to a kind of electronic type catalogue at last, in order to achieve the purpose that scheduling, it is necessary to perhaps characteristic is being described and is collecting in data, And then reach the purpose for assisting data retrieval.Based on application, metadata can be divided into below several: data structure: number According to the title of collection, relationship, field, constraint etc.;Data deployment: the physical location of data set;Data flow: the process between data set Dependence (non-referring to dependence), the rule including data set to another data set;Quality metric: it can be calculated on data set Measurement;Metric logic relationship: the logical operation relationship between data set measurement;ETL process: the sequence of process operation, parallel, Serially;Data set snapshot: on a time point, distribution situation of the data on all data sets;Star Schema metadata: true Table, dimension, attribute, level etc.;Report semantic layer: pair of the rule of report form index, filter condition physical name and Business Name It answers;Data access log: when which data is accessed by whom;Quality checks log: when, how measuring audited, result; Data load log: when which data is loaded by whom.For example, in some embodiments of the application, metadata can be with It is the FileNode etc. in file system, for indicating a file, records the practical position stored of file content.
Data block: data block is the one group of several groups of records of continuous arrangement together in order, is main memory and input, defeated The data unit transmitted between equipment or external memory out.The size of data block can be fixed or can be changed , there is gap between block and block.Design data block size, is influenced by many factors, including input, delivery efficiency, deposits Store up space cost and computer application feature etc..
Text block: hereinafter being indicated with Chunk, and in this application, text block is meant that identical with above-mentioned data block , this will not be repeated here.
Blocks of files combined index table (ChunkList Table): include each text block combination in file (Chunklist) mapping relations of (ChunkList) or text block (Chunk) are combined with next layer of text block, such as:
Blocks of files concordance list (Chunk Table): the attribute comprising ChunkId (mark of text block) to text block reflects Penetrate relationship, wherein the attribute of text block including, for example, storage location, length, permission, etc., such as:
The machine concordance list (Location Table) of blocks of files distribution: comprising each text block (Chunk) ChunkId to storage text block (Chunk) more physical machine positions mapping relations, such as:
The adduction relationship concordance list (Reference) of blocks of files: include the text block combination of each of file (ChunkList) or text block (Chunk) and quote it upper one layer of ChunkList mapping relations, such as:
Father Chunk and sub- Chunk: it can be regarded as nested relationship between text block (Chunk), be equivalent to a ChunkList One or more Chunk of lower extension are replaced with another ChunkList, this original text block in this way in this file (Chunk) content of part just becomes the content of Chunk all under new ChunkList.
The part innovative point of summary description the application below:
Implementation to keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application Mode is described in further detail.
It may be noted that will use " Chunk " expression " text block ", or use infra for the convenience understood is read " ChunkList " expression " text block combination " uses " ChunkID " expression " mark of text block ".
The first embodiment of the application is related to a kind of method for organizing of metadata of distributed type file system, and process is as schemed Shown in 1, the structure of the corresponding file of the process is as shown in Figure 2.Method includes the following steps:
Step 110: establishing blocks of files concordance list, the mark for mapping each text block is corresponding with storage location to close System.
Specifically, in this step, establishing blocks of files concordance list (Chunk Table), wherein each Chunk is corresponding One ChunkID, and each Chunk has specific attribute.It may be noted that attribute is right including at least each Chunk institute The storage location answered, and blocks of files concordance list (Chunk Table) at least for map each ChunkID and attribute (for example, Storage address) corresponding relationship, for example, the corresponding relationship of each ChunkID and storage location, for example, such as following table institute Show:
It may be noted that can store when text block is larger in multiple storage locations.
It may be noted that attribute is not limited to storage location in other embodiments of the invention, can also include: length, permission, Etc..
Specifically, being first to create Chunk, and distribute storage location (Location) for the Chunk when written document, return to User;User is written directly on corresponding machine according to storage location (Location), after writing a full Chunk, Yong Huxiang Host (Master) submits length information, then applies for creating a new Chunk after continuing, and so on.
Specifically, the corresponding address of storage location (Location) is recorded in the machine concordance list of blocks of files distribution In (Location Table), for example, as shown in the table:
Step 120: determining the corresponding storage location of each Chunk and its corresponding address in file to be created.
Specifically, in this step, when receiving the request of creation file, determining institute according to the content of the file to be created The text block for including, and storage location corresponding to each text block is determined according to the blocks of files concordance list of establishing.
Further, in this step, according to the content of the file to be created, all Chunk that determination includes, and according to The machine concordance list (Location Table) for establishing blocks of files concordance list (Chunk Table) and blocks of files distribution, Determine storage location corresponding to each Chunk and specific address.
Step 130: according to the content of file to be created, the blocks of files combined index table of default is established, it is described for reflecting The mapping relations for all text blocks that the text block combination of file to be created is combined with the composition text block.
Specifically, in this step, first establishing the blocks of files combined index table of the file to be created of a default (ChunkList Table), for reflecting that the text block of the file to be created combines (ChunkList) and all text blocks Mapping relations.
It may be noted that in this step, according to the setting of default, what above-mentioned text block combination (ChunkList) was hung below is All text blocks that the file to be created includes.
Step 140:, will be right if the file to be created includes that other existing files or existing text block combine The reference of each of the existing file or the combination of existing text block in the file to be created text block, substitution For the reference of text block combination or the combination of existing text block to the existing file, and the blocks of files group is updated accordingly Close concordance list.
Specifically, in this step, if file to be created includes that other existing files or existing text block combine (ChunkList), and user wishes that the existing file or the combination of existing text block are directly quoted in selection (ChunkList), it in this case, is related in the blocks of files combined index table (ChunkList Table) of the file to be created And the corresponding text block arrived, that is, existing file or existing text block combine the text block that (ChunkList) is included, The text block combination (ChunkList) or existing text block combination (ChunkList) of these existing files will be replaced with.
In other words, that is, by citation text block combine, instead of the text block of reference one by one.
Further, the blocks of files combined index table (ChunkList Table) of the file to be created is updated accordingly.
For step 130-140, it can be demonstrated that as shown in the table:
Specifically, each file has a text block combination (ChunkList) to be corresponding to it, the Chunk that user creates All it is directly hung to text block combination (ChunkList) below;When user needs to combine other file or text block (ChunkList) when incorporated in the form of reference, then (ChunkList) or other text the text block of other file are combined Block combination (ChunkList) is added to the text block combination (ChunkList) of this document in the following, substitution is to these files or text Each of block combination (ChunkList) Chunk is individually quoted.
It may be noted that establishing mapping of the text block combination with all text blocks for reflecting the file to be created of default Relational file block combined index table, is not created by user, and by automatically generating when creation file, the file automatically generated is corresponding Text block, which combines (ChunkList), to be made of all Chunk that this document includes.
When as described above, combined in file to be created comprising other existing files or existing text block, and user The text block combination (ChunkList) or the combination of existing text block of the existing file are directly quoted in selection (ChunkList) when, then directly quoted in the file to be created existing file text block combination (ChunkList) or Existing text block combines (ChunkList), also, updates the blocks of files combined index table of the file to be created accordingly (ChunkList Table)
It is appreciated that being not to require to hang text block combination under each text block combination (ChunkList) (ChunkList), (ChunkList) situation, Yi Jiyong can be combined according to specific existing file or existing text block Whether family selects replacement application mode, neatly determines.
This have the advantage that not only making the method for organizing of data more convenient, the organizational efficiency of data is improved, And by quoting existing file or existing text block combination (ChunkList), reduces file to be created and system is provided The occupancy in source, allow multiple files and meanwhile reference text block share, improve resource utilization ratio.
Further, in this step, it can also comprise the steps of, it is as needed by user, voluntarily to described wait create The multiple text blocks built in file newly create a text block combination (ChunkList), will be to every in the multiple text block The reference of one text block is replaced by the reference to text block combination (ChunkList), and updates the blocks of files accordingly Combined index table.
This have the advantage that can surmount according to the specific needs of user and have existing file or existing text This block combines the limitation of (ChunkList), further carries out more flexible adjustment to text block combination, meets user to data Management and tissue personal needs, and further improve efficiency, improve the utilization rate of system resource.
Step 150: being generated according to the blocks of files combined index table of update and safeguard the file to be created.
Specifically, in this step, by the blocks of files combined index table (ChunkList Table) of above-mentioned update, to It creates in file, to the adduction relationship of a part of text block, has been replaced with the text block combination to some existing files (ChunkList) or existing text block combines (ChunkList).
Step 160: according to the blocks of files combined index table, the adduction relationship concordance list of blocks of files is established, for reflecting The mapping relations that the combination of each of file to be created blocks of files or blocks of files are combined with the upper one layer of blocks of files for quoting it.
Specifically, in this step, according to the new blocks of files combined index table (ChunkList of the file to be created Table), the adduction relationship concordance list (Reference Table) of blocks of files is established, it is every in file to be created for mapping The corresponding relationship of its upper one layer of ChunkList of one ChunkList or Chunk and reference.
Step 170: according to the adduction relationship concordance list of the blocks of files, reference meter being carried out each described blocks of files Number, also, when the reference count of the text block is 0, physics deletes the text in the storage location of the text block Block.
Specifically, in this step, being treated according to the adduction relationship concordance list (Reference Table) of the blocks of files Each Chunk for creating file carries out reference count.
It may be noted that Chunk reference count refers to the Chunk by comprising this file to be created and other existing files Or total counting of existing text block combination (ChunkList) reference.
It, can be as follows with illustration for step 160-170:
The example of the adduction relationship concordance list (Reference Table) of blocks of files is as follows:
It may be noted that including the adduction relationship between Chunk when nesting in upper table.
Specifically, nesting is equivalent to one or more Chunk use text block combined under (ChunkList) between Chunk Another text block combine (ChunkList) replace, that is, father Chunk include sub- Chunk, in this way in this file originally this The content of the part Chunk just becomes the content that new text block combines Chunk all under (ChunkList).
On this basis, reference count is carried out to each Chunk according to upper table.
For example, when a Chunk, which combines (ChunkList) by a text block, to be quoted, reference count increases by 1, When text block combination (ChunkList) changes reference, when no longer quoting the Chunk, the reference technology of the Chunk subtracts 1.
It may be noted that also will increase 1 to the reference count of the sub- Chunk when father Chunk includes sub- Chunk.
It may be noted that Chunk can be recovered when reference count is 0.
Specifically, Chunk is recovered, refer to that Chunk is deleted by physics, that is, be stored on storage location (Location) Data and Chunk table in Chunk all removed.
More specifically, when Chunk reference count is 0, expression is no longer cited, and is equivalent to the corresponding file content of Chunk It has been do not needed that, therefore, Free up Memory can be deleted from disk.
It may be noted that cleaning Chunk is text in real time only in blocks of files concordance list (Chunk Table) in practical application Cleaning Chunk file passes through background task deferred run in the machine concordance list (Location Table) of part block distribution.
It may be noted that it is further, due to clearing up Chunk in real time than time-consuming, typically pass through background task timing It sweeps, if check Chunk has in the machine concordance list (Location Table) that blocks of files is distributed, but is indexed in blocks of files When not having in table (Chunk Table), then it will do it cleaning, this process is not high to requirement of real-time.
Lower surface analysis is once in place of the main distinction of presently filed embodiment and existing data organization method.
According to existing data organization method, usually there are two tables, and one is blocks of files concordance list (Chunk Table), the other is the machine concordance list (Location Table) of blocks of files distribution.According to the present invention, text block is realized (chunk) independent assortment, so as to be shared simultaneously by multiple files, for this purpose, increasing blocks of files combined index table The adduction relationship concordance list (Reference Table) of (ChunkList Table) and blocks of files.
Correspondingly, the specific organizational form of data also changes, specifically, an original Chunk is only answered by a file With, therefore, there is no need to handle adduction relationship, that is, reference count is 1 forever.And according to the present invention it is possible to simultaneously by multiple files Using one adduction relationship of one adduction relationship of increase or deletion, requires the adduction relationship concordance list for updating blocks of files every time (Reference)。
It also, was that a file is directly constituted by one group of Chunk in the past, and according to the present invention, if in file to be created There are the text block of existing file combination (ChunkList) or existing text block combinations (ChunkList), then if user Need directly to quote the text block combination (ChunkList) or the combination of existing text block of these existing files (ChunkList), then it needs first to establish a text block combination (ChunkList) accordingly, resettle file to be created and has File text block combination (ChunkList) or existing text block combination (ChunkList) adduction relationship, therefore, in Between more one layer of corresponding relationship.
In other words, in presently filed embodiment, the structure of traditional file and Chunk is further refined, It has been cut into " file, text block combination (ChunkList), text block (Chunk) and storage location (Location) " four layers It is secondary.
Further, the lower surface analysis technical effect that once presently filed embodiment has.
Firstly, passing through existing file or existing text in the text block combination (ChunkList) by file to be created Block combines (ChunkList) and is directly substituted with their corresponding text blocks combinations (ChunkList), may be implemented direct replacement to The effect for creating a part of data in file, does not need to carry out special duplication again, therefore, keeps method for organizing more convenient, It is more efficient.
Second, when the content work for needing to combine other existing file contents or existing text block (ChunkList) For file content to be created a part when, it is only necessary to combine the text block of existing file (ChunkList) or existing Text block combination (ChunkList) be used as a Chunk, hang over below the file to be created, therefore, make method for organizing It is more convenient, it is more efficient.
Third, after a file is deleted, if there is also to this article in other existing files or existing Chunk combination The reference of Chunk in part, then the Chunk will not be deleted, if there is no the reference to the Chunk, that is, the Chunk's When reference count is 0, then in the storage location of the text block therefore the physics deletion text block makes the tissue of data Method is more reasonable, safe.
4th, in different files, bottom may some content be it is shared, in this case, it is possible to reduce phase With Chunk to the occupancy of system resource, the utilization efficiency of system resource is effectively improved.
The second embodiment of the application is related to a kind of tissue device of metadata of distributed type file system, and structure is as schemed Shown in 3, the tissue device of the metadata of distributed type file system includes:
Blocks of files concordance list establishes module, and for establishing blocks of files concordance list, this document block concordance list is each for mapping The mark of a text block and the corresponding relationship of storage location.
Text block storage location determining module, for receive creation file request when, according to the file to be created Content determine included text block, and according to it is described establish blocks of files concordance list and determine deposited corresponding to each text block Storage space is set.
The blocks of files combined index table module of default establishes the blocks of files of default for the content according to file to be created Combined index table, the blocks of files combined index table of the default are used to reflect text block combination and the composition institute of the file to be created State the mapping relations of all text blocks of text block combination.
The combination of citation text block updates and blocks of files combined index table update module, if being used for the file packet to be created It is combined containing other existing files or existing text block, then it will be to the existing file in the file to be created or The reference of each of some text block combinations text block is replaced by the text block combination or existing to the existing file Text block combination reference, and update the blocks of files combined index table accordingly.Further, which combines more New and blocks of files combined index table update module can be also used for new to multiple text blocks in the file to be created by user A text block combination is created, the reference to each of the multiple text block text block is replaced by the text The reference of block combination, and the blocks of files combined index table is updated accordingly.
Generate and maintenance creation file module, is generated for the blocks of files combined index table according to update and described in safeguarding to Create file.
The adduction relationship concordance list module of blocks of files, for establishing blocks of files according to the blocks of files combined index table Adduction relationship concordance list, for reflecting the combination of each of file to be created blocks of files or blocks of files and quoting its upper one The mapping relations of layer blocks of files combination.
Text block quotes technology and removing module, for the adduction relationship concordance list according to the blocks of files, to each The blocks of files carries out reference count, also, when the reference count of the text block is 0, in the storage position of the text block It sets physics and deletes the text block.
First embodiment is method implementation corresponding with present embodiment, and the technology in first embodiment is thin Section can be applied to present embodiment, and the technical detail in present embodiment also can be applied to first embodiment.
It should be noted that it will be appreciated by those skilled in the art that the tissue of above-mentioned metadata of distributed type file system fills The realization function of each module shown in the embodiment set can refer to the method for organizing of aforementioned metadata of distributed type file system Associated description and understand.Each module shown in the embodiment of the tissue device of above-mentioned metadata of distributed type file system Function can be realized and running on the program on processor (executable instruction), can also pass through specific logic circuit reality It is existing.If the tissue device of the above-mentioned metadata of distributed type file system of the embodiment of the present application is realized in the form of software function module And when sold or used as an independent product, it also can store in a computer readable storage medium.Based in this way Understanding, substantially the part that contributes to existing technology can be produced the technical solution of the embodiment of the present application in other words with software The form of product embodies, which is stored in a storage medium, including some instructions are used so that one Platform computer equipment (can be personal computer, server or network equipment etc.) executes described in each embodiment of the application The all or part of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read Only Memory), the various media that can store program code such as magnetic or disk.In this way, the embodiment of the present application is not limited to appoint What specific hardware and software combines.
Correspondingly, the application embodiment also provides a kind of computer storage medium, wherein it is executable to be stored with computer Instruction, the computer executable instructions realize each method embodiment of the application when being executed by processor.
In addition, the application embodiment also provides a kind of hoc apparatus of metadata of distributed type file system, including For storing the memory of computer executable instructions, and, processor;The processor is based in executing the memory The step in above-mentioned each method embodiment is realized when calculation machine executable instruction.Wherein, which can be central processing list Member (Central Processing Unit, referred to as " CPU "), can also be other general processors, digital signal processor (Digital Signal Processor, referred to as " DSP "), specific integrated circuit (Application Specific Integrated Circuit, referred to as " ASIC ") etc..Memory above-mentioned can be read-only memory (read-only Memory, referred to as " ROM "), random access memory (random access memory, referred to as " RAM "), flash memory (Flash), hard disk or solid state hard disk etc..The step of method disclosed in each embodiment of the present invention, can be embodied directly in firmly Part processor executes completion, or in processor hardware and software module combination execute completion.
It should be noted that relational terms such as first and second and the like are only in the application documents of this patent For distinguishing one entity or operation from another entity or operation, without necessarily requiring or implying these entities Or there are any actual relationship or orders between operation.Moreover, the terms "include", "comprise" or its is any other Variant is intended to non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only Including those elements, but also other elements including being not explicitly listed, or further include for this process, method, object Product or the intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence " including one ", not There is also other identical elements in the process, method, article or apparatus that includes the element for exclusion.The application of this patent In file, if it is mentioned that certain behavior is executed according to certain element, then refers to the meaning for executing the behavior according at least to the element, wherein Include two kinds of situations: executing the behavior according only to the element and the behavior is executed according to the element and other elements.Multiple, Repeatedly, the expression such as a variety of include 2,2 times, 2 kinds and 2 or more, 2 times or more, two or more.
It is included in disclosure of this application with being considered as globality in all documents that the application refers to, so as to It can be used as the foundation of modification if necessary.In addition, it should also be understood that, after having read the above disclosure of the application, this field Technical staff can make various changes or modifications the application, and such equivalent forms equally fall within the application model claimed It encloses.

Claims (11)

1. a kind of method for organizing of metadata of distributed type file system characterized by comprising
According to the content of file to be created, the blocks of files combined index table of default is established, for reflecting the file to be created Mapping relations text block combination and constitute all text blocks that the text block combines;
If the file to be created includes that other existing files or existing text block combine, will be to the text to be created The reference of each of the existing file or the combination of existing text block in part text block, is replaced by described existing File text block combination or the combination of existing text block reference, and update the blocks of files combined index table accordingly;
It is generated according to the blocks of files combined index table of update and safeguards the file to be created.
2. the method as described in claim 1, which is characterized in that if the file to be created includes other existing texts Part or existing text block combination, then by the file to be created the existing file or existing text block combine Each of text block reference, be replaced by the combination of the text block of the existing file or the combination of existing text block Reference, and in the step of updating the blocks of files combined index table accordingly, further includes:
User newly creates a text block combination to multiple text blocks in the file to be created, will be to the multiple text block Each of text block reference, be replaced by the reference to the new text block combination, and update the blocks of files group accordingly Close concordance list.
3. the method as described in claim 1, which is characterized in that further include:
Blocks of files concordance list is established, for mapping the mark of each text block and the corresponding relationship of storage location;
When receiving the request of creation file, the text which is included is determined according to the content of the file to be created Block, and storage location corresponding to each text block is determined according to the blocks of files concordance list of establishing.
4. method as claimed in claim 3, which is characterized in that also include:
According to the blocks of files combined index table, the adduction relationship concordance list of blocks of files is established, for reflecting file to be created Each of the mapping relations that are combined with reference its upper one layer of blocks of files of blocks of files combination or blocks of files.
5. method as claimed in claim 4, which is characterized in that it is described according to the blocks of files combined index table, establish file The adduction relationship concordance list of block, for reflecting the combination of each of file to be created blocks of files or blocks of files and quoting its After the step of mapping relations of upper one layer of blocks of files combination, further includes:
According to the adduction relationship concordance list of the blocks of files, reference count is carried out each described blocks of files, also, when described When the reference count of text block is 0, physics deletes the text block in the storage location of the text block.
6. a kind of tissue device of metadata of distributed type file system characterized by comprising
The blocks of files combined index table module of default establishes the blocks of files combination of default for the content according to file to be created Concordance list, the blocks of files combined index table of the default are used to reflect text block combination and the composition text of the file to be created The mapping relations of all text blocks of this block combination;
The combination of citation text block updates and blocks of files combined index table update module, if including it for the file to be created Its existing file or the combination of existing text block, then will be to the existing file in the file to be created or existing The reference of each of text block combination text block, is replaced by the text block combination of the existing file or existing text The reference of this block combination, and the blocks of files combined index table is updated accordingly;
It generates and maintenance creates file module, generate and safeguard described to be created for the blocks of files combined index table according to update File.
7. device as claimed in claim 6, which is characterized in that
The citation text block combination updates and blocks of files combined index table update module, is also used to by user to described to be created Multiple text blocks in file newly create a text block combination, will draw to each of the multiple text block text block With being replaced by the reference to text block combination, and update the blocks of files combined index table accordingly.
8. device as claimed in claim 6, which is characterized in that further include:
Blocks of files concordance list establishes module, and for establishing blocks of files concordance list, this document block concordance list is for mapping each text The mark of this block and the corresponding relationship of storage location;
Text block storage location determining module, for receive creation file request when, according in the file to be created Hold and determine included text block, and storage position corresponding to each text block is determined according to the blocks of files concordance list of establishing It sets.
9. device as claimed in claim 8, which is characterized in that
The adduction relationship concordance list module of blocks of files, for establishing the reference of blocks of files according to the blocks of files combined index table Relationship concordance list, for reflecting the combination of each of file to be created blocks of files or blocks of files and the upper one layer of text for quoting it The mapping relations of part block combination;
Text block quotes technology and removing module, for the adduction relationship concordance list according to the blocks of files, described in each Blocks of files carries out reference count, also, when the reference count of the text block is 0, in the storage location of the text block Physics deletes the text block.
10. a kind of hoc apparatus of metadata of distributed type file system characterized by comprising
Memory, for storing computer executable instructions;And
Processor, for being realized as described in any one of claim 1 to 5 when executing the computer executable instructions Step in method.
11. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Executable instruction is realized as described in any one of claim 1 to 5 when the computer executable instructions are executed by processor Method in step.
CN201910008102.0A 2019-01-04 2019-01-04 Distributed file system metadata organization method and device Active CN110008178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910008102.0A CN110008178B (en) 2019-01-04 2019-01-04 Distributed file system metadata organization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910008102.0A CN110008178B (en) 2019-01-04 2019-01-04 Distributed file system metadata organization method and device

Publications (2)

Publication Number Publication Date
CN110008178A true CN110008178A (en) 2019-07-12
CN110008178B CN110008178B (en) 2023-04-07

Family

ID=67165335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910008102.0A Active CN110008178B (en) 2019-01-04 2019-01-04 Distributed file system metadata organization method and device

Country Status (1)

Country Link
CN (1) CN110008178B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591434A (en) * 2021-08-05 2021-11-02 江西金格科技股份有限公司 Method for merging OFD (office automation device) documents carrying semantic indexing information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908073A (en) * 2010-08-13 2010-12-08 清华大学 Method for deleting duplicated data in file system in real time
CN102137163A (en) * 2011-03-22 2011-07-27 Tcl集团股份有限公司 Multimedia file sharing system and method
CN104077315A (en) * 2013-03-29 2014-10-01 国家计算机网络与信息安全管理中心 Distributed file system data importing method and distributed file system data importing device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908073A (en) * 2010-08-13 2010-12-08 清华大学 Method for deleting duplicated data in file system in real time
CN102137163A (en) * 2011-03-22 2011-07-27 Tcl集团股份有限公司 Multimedia file sharing system and method
CN104077315A (en) * 2013-03-29 2014-10-01 国家计算机网络与信息安全管理中心 Distributed file system data importing method and distributed file system data importing device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591434A (en) * 2021-08-05 2021-11-02 江西金格科技股份有限公司 Method for merging OFD (office automation device) documents carrying semantic indexing information

Also Published As

Publication number Publication date
CN110008178B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN107247778B (en) System and method for implementing an extensible data storage service
CN105793843B (en) The bind lines and column for memory database for OLTP and analysis workload stores
JP5656563B2 (en) Document management system, document management system control method, and program
CN103812939B (en) Big data storage system
CN106446001B (en) A kind of method and system of the storage file in computer storage medium
US20160350302A1 (en) Dynamically splitting a range of a node in a distributed hash table
JP2017504924A (en) Content-based organization of the file system
CN107045422A (en) Distributed storage method and equipment
CN104111924B (en) A kind of Database Systems
US10108690B1 (en) Rolling subpartition management
CN105900093B (en) A kind of update method of the tables of data of KeyValue databases and table data update apparatus
CN109194711A (en) A kind of synchronous method of organizational structure, client, server-side and medium
CN102629247A (en) Method, device and system for data processing
CN103186652A (en) Distributed data de-duplication system and method thereof
CN109542861A (en) File management method, device and system
CN105045850B (en) Junk data recovery method in cloud storage log file system
CN104536908B (en) A kind of magnanimity small records efficient storage management method towards unit
CN109299037A (en) Document handling method and device
CN106446044A (en) Storage space reclaiming method and device
CN108153804A (en) A kind of metadata daily record update method of symmetric distributed file system
CN110109866A (en) A kind of management method and equipment of file system directories
CN110008178A (en) The method for organizing and device of metadata of distributed type file system
JP6006740B2 (en) Index management device
CN106873906A (en) Method and apparatus for managing metamessage
CN102831240B (en) The storage means of extended metadata file and storage organization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant