CN108984686A - A kind of distributed file system indexing means and device merged based on log - Google Patents

A kind of distributed file system indexing means and device merged based on log Download PDF

Info

Publication number
CN108984686A
CN108984686A CN201810718623.0A CN201810718623A CN108984686A CN 108984686 A CN108984686 A CN 108984686A CN 201810718623 A CN201810718623 A CN 201810718623A CN 108984686 A CN108984686 A CN 108984686A
Authority
CN
China
Prior art keywords
file
index
log
type
delete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810718623.0A
Other languages
Chinese (zh)
Other versions
CN108984686B (en
Inventor
张晓宇
雷达
吴晓晨
李昀
郑寄平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 52 Research Institute
Original Assignee
CETC 52 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 52 Research Institute filed Critical CETC 52 Research Institute
Priority to CN201810718623.0A priority Critical patent/CN108984686B/en
Publication of CN108984686A publication Critical patent/CN108984686A/en
Application granted granted Critical
Publication of CN108984686B publication Critical patent/CN108984686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of distributed file system indexing means and device merged based on log, pass through the meta data server of distributed file system, when log merges, construct file operation metadata information and write storage unit, then it reads and parses the operation metadata information in storage unit, file index operation is finally executed, manipulative indexing is established, deletes processed object.The present invention can solve index omit, client poor compatibility, can not incremental build file index and building time-consuming, the lower problem of efficiency.

Description

A kind of distributed file system indexing means and device merged based on log
Technical field
The invention belongs to file storage and processing technology field more particularly to a kind of distributed documents merged based on log System index method and apparatus.
Background technique
With internet, cloud computing, the fast development of big data, artificial intelligence, according to the market Internet data center (IDC) Survey institute is, it is expected that global metadata total amount will be increased with annual 50% or so speed, and to the year two thousand twenty, global metadata total amount will Reach 40ZB (hundred million TB of 1ZB=10).In these data, only about 15% data can be accessed often, most data Will gradually it turn cold after a birth.Although the rate of people logging in of these " cold datas " is very low, but need to retain these data, and For enterprise, there are also mass data to need to store and retrieve.
Log file system (Journaling File System) is the file system with failover capability, It records the modification for being not yet submitted to file system using log, to prevent metadata to be destroyed.Relative to non-journal file System substantially increases the stability of file system, reliability is increased in system crash or power-off, when shortening recovery Between, it ensure that the atomicity of file operation.
Currently, the mode for establishing retrieval to file, which is broadly divided into, establishes index in client and server-side, built in client Lithol draws the type for needing to consider various clients, and compatibility is poor.Index, which is established, in server-side is mainly the following method:
By the operation of monitoring file system carry file, to establish manipulative indexing;
By traversing the file under specified carry file, to establish file index;
Above-mentioned technical method the problem is that, in the first method, some operations can not listen to, such as moving operation (mv), can omit the indexes of some files in this way.Second method needs to be traversed for all subfiles under file Folder and file, can not carry out incremental build, and when quantity of documents is very big, ergodic process time-consuming is very long, and efficiency is lower.
In addition, each operation associated with the file is had recorded in the log of log file system, if based on there is no day The file system of will pooling function goes building to index, then needs to parse each operation, and also resulting in index, to establish efficiency lower.
Summary of the invention
The purpose of the present invention is to provide it is a kind of based on log merge distributed file system indexing means and device, with Solve index omit, client poor compatibility, can not incremental build file index and building time-consuming, efficiency is lower to ask Topic.
To achieve the above object, the technical scheme adopted by the invention is as follows:
A kind of distributed file system indexing means merged based on log, the distributed document merged based on log System index method, comprising:
Step 1: recording file operation information, and log is written, the file operation information when file operation occurs The time of origin of type and file operation including file operation, and when the type of file operation is moving operation, remembering After having recorded file operation information and log being written, file operation metadata information is constructed immediately;
Step 2: in the case where meeting trigger condition, execution journal union operation;
Step 3: for the file being modified in log union operation, when the type of the file operation of generation be creation/ When delete operation, file operation metadata information is constructed, and information memory cell is written;When the type of the file operation of generation is When moving operation, information memory cell is written into the file operation metadata information having been built up;
Step 4: reading the file operation metadata information in information memory cell;
Step 5: being parsed to the file operation metadata information read, according to file operation resulting after parsing Type, execute the operation of corresponding file index;
Step 6: deleting the processed object in information memory cell after the completion of the execution of All Files index operation.
Further, the record file operation information, comprising:
The type of file operation, file behaviour are recorded respectively by increasing field in the directory entry structure of the file operated The index node of deleted file when the time of origin and delete operation of work;
By the title for increasing field record deleted file in the bibliographic structure of the file operated.
Further, described in the case where meeting trigger condition, execution journal union operation, comprising:
When the quantity of log is more than given threshold or receives log combine command, day is carried out as unit of the catalogue of file Will union operation.
Further, the building file operation metadata information, comprising:
When the type of file operation is creation operation, obtain the type of the file operation, file operation generation when Between, filename, file size, file path, filemodetime, whether need delete with path file index and whether be File, and construct operation information character string;
When the type of file operation is delete operation, obtain the type of the file operation, file operation generation when Between, file path and whether be file, and construct operation information character string;
When the type of file operation is moving operation, the source path, destination path, file for obtaining the file operation are big Small, filename, filemodetime and whether be file, and construct operation information character string.
Further, the described pair of file operation metadata information read parses, according to text resulting after parsing The type of part operation executes corresponding file index operation, comprising:
When the type of file operation is creation operation, first judge whether to need to delete with path file index, if needed Delete, then first delete indexed set in the identical file index in path, it is on the contrary then without operation;Then file is judged whether it is again, If it is file, then file index creation operation is constructed and executed, if not file, is then terminated;
When the type of file operation is delete operation, first judge whether deletion object is file, if it is file, then structure It builds and executes and delete file index operation;If it is file, then constructs and the file index executed under this document folder deletes behaviour Make;
When the type of file operation is moving operation, first judge whether mobile object is file, if it is file, then first It constructs and executes file index delete operation under source path, then construct and execute file index creation operation under destination path; If it is file, then the file index under this folder path is first retrieved, takes out file path, modification in source file index Then file path is updated to destination path, filename, modification time by source path by time, file size, the value of filename It is constant with file size, file index creation operation under destination path is constructed and executed, then constructs and executes source path hereafter Part indexes delete operation.
It is described to be merged based on log the present invention also provides a kind of distributed file system indexing unit merged based on log Distributed file system indexing unit, including update building writing unit module, information memory cell module and solution it is destructed Build performance unit module, in which:
The update constructs writing unit module, for recording file operation information, and be written when file operation occurs Log, the file operation information includes the type of file operation and the time of origin of file operation, and in file operation When type is moving operation, after having recorded file operation information and log is written, file operation metadata information is constructed immediately; And in the case where meeting trigger condition, execution journal union operation;For the file being modified in log union operation, work as generation The type of file operation when being creation/delete operation, construct file operation metadata information, and information memory cell mould be written Block;When the type of the file operation of generation is moving operation, information is written into the file operation metadata information having been built up Storage unit module;
The information memory cell module updates file operation member constructed in building writing unit module for storing Data information;
The parsing constructs performance unit module, for reading the letter of the file operation metadata in information memory cell module Breath;The file operation metadata information read is parsed, according to the type of file operation resulting after parsing, executes phase The file index operation answered;After the completion of the execution of All Files index operation, delete processed in information memory cell module Object.
Further, described when file operation occurs, file operation information is recorded, is performed the following operations:
Building writing unit module is updated by increasing field record file in the directory entry structure of the file operated The index node of deleted file when the type of operation, the time of origin of file operation and delete operation;By being operated File bibliographic structure in increase field record by the title of operation file.
Further, described in the case where meeting trigger condition, the concrete operations of execution journal union operation, execution are as follows:
When the quantity of log is more than given threshold/receive log combine command, building writing unit module is updated with text The catalogue of part is that unit carries out log union operation.
Further, the building file operation metadata information, performs the following operations:
When the type of file operation is creation operation, the class that building writing unit module obtains the file operation is updated Whether type the time of origin of file operation, filename, file size, file path, filemodetime, needs to delete and go the same way Diameter file index and whether be file, and construct operation information character string;
When the type of file operation is delete operation, the class that building writing unit module obtains the file operation is updated Type, the time of origin of file operation, file path and whether be file, and construct operation information character string;
When the type of file operation is moving operation, the source that building writing unit module obtains the file operation is updated Path, destination path, file size, filename, filemodetime and whether be file, and construct operation information character String.
Further, the described pair of file operation metadata information read parses, according to text resulting after parsing The type of part operation executes corresponding file index operation, performs the following operations:
When the type of file operation is creation operation, parsing building performance unit module first judges whether to need to delete same Path file index, if necessary to delete, then first delete indexed set in the identical file index in path, it is on the contrary then without operation;So Judge whether it is file again afterwards, if it is file, then constructs and execute file index creation operation, if not file, then return It returns;
When the type of file operation be delete operation when, parsing building performance unit module first judge deletion object whether be File is then constructed and is executed and delete file index operation if it is file;If it is file, then constructs and execute this document File index behaviour under folder, which deletes, to be made.
When the type of file operation be moving operation when, parsing building performance unit module first judge mobile object whether be File then first constructs and executes file index delete operation under source path if it is file, then construct and execute destination path Lower file index creation operation;If it is file, then the file index under this folder path is first retrieved, takes out source file The value of file path, modification time, file size, filename in index, for the purpose of then updating file path by source path Path, filename, modification time and file size are constant, construct and execute file index creation operation under destination path, then It constructs and executes file index delete operation under source path.
A kind of distributed file system index establishing method and device merged based on log provided by the invention, by dividing The meta data server of cloth file system, when log merges, building file operation metadata information and write storage unit, Then it reads and parses the operation metadata information in storage unit, finally execute file index operation, establish manipulative indexing, delete Except processed object.The present invention, can be to avoid the loss of file operation, simultaneously by obtaining operation information in meta data server Without considering the type of client, solves the problem of index omission, client poor compatibility.In addition, being merged by log, increase Amount obtain a period of time in file operation information, reduce index construct number, solve can not incremental build file index, The problem of time-consuming for building, low efficiency.The present invention can solve index omit, client poor compatibility, can not incremental build text Time-consuming for part index and building, the lower problem of efficiency.
Detailed description of the invention
Fig. 1 is a kind of embodiment flow diagram of the distributed file system indexing means merged the present invention is based on log;
Fig. 2 is a kind of creation operation of embodiment of the distributed file system indexing means merged the present invention is based on log When file operation information record flow diagram;
Fig. 3 is a kind of delete operation of embodiment of the distributed file system indexing means merged the present invention is based on log When file operation information record flow diagram;
Fig. 4 is a kind of moving operation of embodiment of the distributed file system indexing means merged the present invention is based on log When file operation information record flow diagram;
Fig. 5 is a kind of creation operation of embodiment of the distributed file system indexing means merged the present invention is based on log When file operation metadata information architecture flow diagram;
Fig. 6 is a kind of delete operation of embodiment of the distributed file system indexing means merged the present invention is based on log When file operation metadata information architecture flow diagram;
Fig. 7 is a kind of moving operation of embodiment of the distributed file system indexing means merged the present invention is based on log When file operation metadata information architecture flow diagram;
Fig. 8 is a kind of creation operation of embodiment of the distributed file system indexing means merged the present invention is based on log The file index operating process block diagram of Shi Zhihang;
Fig. 9 is a kind of delete operation of embodiment of the distributed file system indexing means merged the present invention is based on log The file index operating process block diagram of Shi Zhihang;
Figure 10 is a kind of mobile behaviour of embodiment of the distributed file system indexing means merged the present invention is based on log As when the file index operating process block diagram that executes;
Figure 11 is a kind of example structure signal of the distributed file system indexing unit merged the present invention is based on log Figure;
Figure 12 is that the update of the distributed file system indexing unit merged the present invention is based on log constructs writing unit mould A kind of embodiment functional schematic of block.
Specific embodiment
Technical solution of the present invention is described in further details with reference to the accompanying drawings and examples, following embodiment is not constituted Limitation of the invention.
The present embodiment provides a kind of distributed file system indexing means merged based on log, as shown in Figure 1, this is based on The distributed file system indexing means that log merges, comprising:
S101, when file operation occurs, record file operation information, and log be written, the file operation packet The type of file operation and the time of origin of file operation are included, and when the type of file operation is moving operation, is being recorded Complete file operation information and after log is written, creates file operation metadata information immediately.
All operation objects in the present embodiment can be file or folder, such as: " file operation " unless otherwise specified Not refer in particular to the operation for file, but the also operation of file.And the type of file operation includes: creation operation, deletes Operation and moving operation, wherein include creation and modification operation in creation operation, include movement and renaming in moving operation Operation.
The record file operation information, mainly by increasing in the directory entry of the file operated (dentry) structure Operation, etime and preInode field record respectively the type of file operation, file operation time of origin and delete The index node of deleted file when except operation;By increasing deleted_ in the catalogue of the file operated (dir) structure Files field records the title of deleted file.
Specifically, as shown in Fig. 2, recording file operation information when the type of the file operation of generation is creation operation, It include: to increase operation and etime field in the directory entry structure of creation object, and operation field is updated to Etime field is updated to current time by create (creation), is terminated.
As shown in figure 3, the operation of file operation information is recorded when the type of the file operation of generation is delete operation, It include: to increase operation, etime and preInode field in the directory entry structure for deleting object, and by operation Field is updated to unlink (deletion), and etime field is updated to current time;Judge to delete whether object is file afterwards, if For file, then the preInode field in directory entry structure is updated to 0;It, then will be in directory entry structure if file PreInode field is updated to the index node (inode) of deleted file, and increases in the bibliographic structure for deleting object Deleted_files field record deletes the title of object, terminates.
As shown in figure 4, needing respectively when the type of the file operation of generation is moving operation in source directory item (src Dentry file operation information) and in destination directory item (dest dentry) is recorded, file operation information is recorded, comprising: is first existed Increase operation, etime and preInode field in the source directory item structure of mobile object, and more by operation field It is newly move (movement), etime field is updated to current time, preInode field is updated to source index node;Exist afterwards Increase operation and etime field in the destination directory item structure of mobile object, and operation field is updated to Etime field is updated to current time by move (movement), is terminated.
Moving operation creates file operation metadata information, content construction after having recorded file operation information immediately Including following information: the source path of file operation, destination path, file size, filename, filemodetime and whether be File.
Specifically, as shown in fig. 7, when moving operation, file operation metadata information is constructed, comprising: first obtain mobile pair The source path of elephant, after according to source path obtain purpose object indexing node, by purpose object indexing node obtain destination path, It according to the directory entry of the object in destination path acquisition file size, filename, filemodetime and whether is file, root According to the information architecture moving operation message character string of acquisition, and by increasing index_move structure in file system cache, The moving operation message character string of building is recorded in index_move structure for use, is terminated.
Since file operation information is recorded by increasing field in catalogue and directory entry structure, the present embodiment note Recording file operation information is to be recorded in caching, and distributed file system itself will record system log, so these information Also it can follow file operation event that log is written together.The meta data server of distributed file system can pass through after restarting Playback log content caches to establish, and when playing back log, the operation metadata information being recorded in file directory item can also add It is downloaded in caching, will not omit or lose file operation information, ensure that the integrality of file index.
S102, in the case where meeting trigger condition, execution journal union operation.
Log merges, and mainly passes through meta data server (Metadata Server, MDS) for the institute in a certain period There is operation to merge, and by the file information persistent storage, while the log of deletion record action event.When log merges When operation starts, MDS can read dirty (dirty) data in caching first, and list is added in relevant catalogue.Then, This list is traversed, dirty file (or file) is subjected to persistent storage as unit of catalogue.Finally delete and this time Corresponding log.
Merged by log, the file operation information in a period of time can be obtained with increment, reduces index construct number, Efficiently solve can not incremental build file index, time-consuming for building, the problem of low efficiency.Log union operation refers to will be from upper Secondary log is merged into this log merging, and the log in this period merges.The trigger condition of log union operation can divide For automatic trigger and triggering manually, automatic trigger are the meta data server when the quantity of log is more than given threshold (Metadata Server, MDS) can automatic execution journal union operation;Triggering is to receive file operation log merging manually When order, meta data server execution journal union operation.Log union operation in the present embodiment is to be with the catalogue of file Unit carries out log union operation.
When carrying out log union operation, first the catalogue fragment (dirfrag) of traversal caching apoplexy involving the solid organs, directory entry, index are saved The structures such as point, and relative catalogue (dir) is recorded respectively;Then, the catalogue of traversal record obtains recording most in catalogue The title for the file being closely modified, it is subsequent to construct file operation metadata information to the file that these are modified.
The index node state that meta data server is linked by judgement with file directory item, to determine that this file is to delete Except still creating, for example, for delete operation, being otherwise that creation operates if the index node being linked with directory entry is sky.
S103, for the file being modified in log union operation, when the type of the file operation of generation is to create/delete When except operation, file operation metadata information is constructed, and information memory cell is written;When the type of the file operation of generation is to move When dynamic operation, information memory cell is written into the file operation metadata information having been built up.
The above-mentioned file being modified refers to being created/delete/file crossed of moving operation, in log union operation When, due to having recorded these files being modified in log, therefore the file being modified in available log union operation Catalogue, construct file operation metadata information.
Each file operation metadata information first in deposit caching, waits all texts after building is completed in the present embodiment Part operation metadata information all completes to construct and then information memory cell is written in unification.
When the type of file operation is creation operation, obtain the type of the file operation, file operation generation when Between, filename, file size, file path, filemodetime, whether need delete with path file index and whether be File, and construct operation information character string.
Specifically, as shown in figure 5, when creating operation, file operation metadata information is constructed, comprising: first obtain creation The directory entry of file or folder obtains the index node of file or folder by directory entry, obtains text by index node Whether the type of part operation filemodetime, file size, is file;Whether the type for judging file operation afterwards is creation Whether operation then obtains the time of origin of file operation by directory entry, filename, file path, needs if creation operation Delete with path file index, after judge whether comprising creation object in deleted_files field in bibliographic structure Title, if marked comprising if delete with path file index mark be true;Deletion is marked to go the same way file index if not including Mark is false, finally creates operation information character string according to the information architecture of acquisition, and by increasing in file system cache Add index_create structure, the creation operation information character string of building is recorded in index_create structure for use, knot Beam;If not creation operation, then directly terminate.
When the type of file operation is delete operation, obtain the type of the file operation, file operation generation when Between, file path and whether be file, and construct operation information character string.
Specifically, as shown in fig. 6, in delete operation, file operation metadata information is constructed, comprising: first obtain and delete The directory entry of object obtains the type of file operation according to directory entry, judges whether the type of file operation is delete operation, if For delete operation, then the time of origin of the file path, file operation of deleting object is obtained according to directory entry and whether be text Part, according to the information architecture delete operation message character string of acquisition, and by increasing index_ in file system cache The delete operation message character string of building is recorded in index_unlink structure for use, terminates by unlink structure;If not Delete operation then directly terminates.
After the completion of all file operation metadata information buildings, the index_move in file system cache is tied Information memory cell is written in file operation metadata information in structure, index_create structure and index_unlink structure. Since log union operation is merged as unit of catalogue, so being deposited by file operation metadata information write-in information When storage unit, and be written as unit of catalogue, by the file operation metadata information under each catalogue, write-in one is right As after completing write-in, by index_move structure, the index_create structure of all record file operation metadata informations It is emptied with index_unlink structure.
File operation metadata information in S104, reading information memory cell.
The content of all objects in the index data pond in information memory cell is read, and is believed by file operation metadata The time of origin of file operation is ranked up in breath.Specifically, whether there is object in timing query information storage unit, if there is Object then reads the content of all objects, after the content reading for completing all objects, first converts thereof into readable format, so Afterwards by the time of origin event ordering of file operation in file operation metadata information.
S105, the file operation metadata information read is parsed, according to file operation resulting after parsing Type executes corresponding file index operation.
Specifically, as shown in figure 8, when the type of file operation is creation operation, corresponding file index operation is executed, It include: first to judge whether to need to delete with path file index, if necessary to delete, then the path first deleted in indexed set is identical File index, it is on the contrary then without operation;Then file is judged whether it is again, if it is file, is then constructed and is executed file index Creation operation, if not file, then terminates.
As shown in figure 9, executing corresponding file index operation, comprising: first when the type of file operation is delete operation Judge to delete whether object is file, if it is file, then constructs and execute and delete file index operation;If it is file, It then constructs and executes the file index delete operation under this document folder.
As shown in Figure 10, when the type of file operation is moving operation, corresponding file index operation is executed, comprising: First judge whether mobile object is file, if it is file, then first constructs and execute file index delete operation under source path, so After construct and execute under destination path file index creation and operate, terminate;If it is file, then this file road is first retrieved File index under diameter takes out file path, modification time, file size, the value of filename in source file index, then will be literary Part path is updated to destination path by source path, and filename, modification time and file size are constant, constructs and executes destination path Lower file index creation operation, then constructs and executes file index delete operation under source path, terminate.
In the present embodiment, the readable format changed into S104 step is corresponding to the parsing of data format with S105 step, And it is all made of the prior art, it is no longer repeated herein.
S106, the processed object after the completion of the execution of All Files index operation, in deletion information memory cell.
The present embodiment further includes a kind of distributed file system indexing unit merged based on log, the distributed document System is made of the more storage arrays that performance is good, capacity is big, such as monitor, meta data server, storage unit.Client It is connect by network with distributed file system, sends file operation requests to distributed file system, meta data server connects After receiving request, the specific implementation of file operation is executed, finally returns to operating result to client.
As shown in figure 11, the distributed file system indexing unit 10 merged based on log, including update building and write Enter unit module 11, information memory cell module 12 and parsing building performance unit module 13, in which:
Update building writing unit module 11 is described in further detail in conjunction with Figure 12: updating building writing unit module 11 For recording file operation information, and log is written, the file operation information includes file operation when file operation occurs Type and file operation time of origin, and the type of file operation be moving operation when, recording file operation Information and after log is written, constructs file operation metadata information immediately;
And in the case where meeting trigger condition, execution journal union operation;For the text being modified in log union operation Part constructs file operation metadata information, and information is written when the type of the file operation of generation is creation/delete operation Storage unit module 12;When the type of the file operation of generation is moving operation, the file operation metadata that will have been built up Information memory cell module 12 is written in information;
The information memory cell module 12 updates file behaviour constructed in building writing unit module 11 for storing Make metadata information;
The parsing constructs performance unit module 13, for reading the file operation member number in information memory cell module 12 It is believed that breath;The file operation metadata information read is parsed, according to the type of file operation resulting after parsing, is held The corresponding file index operation of row;After the completion of the execution of All Files index operation, delete in information memory cell module 12 Processed object.
The distributed file system indexing unit merged based on log of the present embodiment is divided with above-mentioned based on what log merged Cloth file system indexing means are corresponding, and the concrete operations content about each step is no longer repeated one by one herein.
It should be noted that a kind of implementation of the invention, updates building writing unit module 11 and is arranged in distribution In the meta data server of file system;Information memory cell module 12 is an independent dress with information storage function It sets;Parsing building performance unit module 13 is an independent device, is stored in any one storage of distributed file system On array.Alternatively, may be implemented in other ways.For example, the division of the module or unit, only a kind of logic function It can divide, there may be another division manner in actual implementation, such as multiple units or components can be combined or be can integrate To a system, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling or Direct-coupling or communication connection can be through some interfaces, and the indirect coupling or communication connection of device or unit can be electricity Property, mechanical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that computer disposal Device (processor) performs all or part of the steps of the method described in the various embodiments of the present invention.And storage medium packet above-mentioned It includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), the various media that can store program code such as magnetic or disk.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, without departing substantially from essence of the invention In the case where mind and its essence, those skilled in the art make various corresponding changes and change in accordance with the present invention Shape, but these corresponding changes and modifications all should fall within the scope of protection of the appended claims of the present invention.

Claims (10)

1. it is a kind of based on log merge distributed file system indexing means, which is characterized in that it is described based on log merge Distributed file system indexing means, comprising:
Step 1: recording file operation information, and log is written, the file operation information includes when file operation occurs The type of file operation and the time of origin of file operation, and when the type of file operation is moving operation, it is recording File operation information and after log is written, constructs file operation metadata information immediately;
Step 2: in the case where meeting trigger condition, execution journal union operation;
Step 3: for the file being modified in log union operation, when the type of the file operation of generation is creation/deletion When operation, file operation metadata information is constructed, and information memory cell is written;When the type of the file operation of generation is movement When operation, information memory cell is written into the file operation metadata information having been built up;
Step 4: reading the file operation metadata information in information memory cell;
Step 5: being parsed to the file operation metadata information read, according to the class of file operation resulting after parsing Type executes corresponding file index operation;
Step 6: deleting the processed object in information memory cell after the completion of the execution of All Files index operation.
2. the distributed file system indexing means merged as described in claim 1 based on log, which is characterized in that the note Record file operation information, comprising:
The type of file operation, file operation are recorded respectively by increasing field in the directory entry structure of the file operated The index node of deleted file when time of origin and delete operation;
By the title for increasing field record deleted file in the bibliographic structure of the file operated.
3. as described in claim 1 based on log merge distributed file system indexing means, which is characterized in that it is described Meet under trigger condition, execution journal union operation, comprising:
When the quantity of log is more than given threshold or receives log combine command, log conjunction is carried out as unit of the catalogue of file And it operates.
4. the distributed file system indexing means merged as described in claim 1 based on log, which is characterized in that the structure Build file operation metadata information, comprising:
When the type of file operation is creation operation, the type of the file operation, the time of origin of file operation, text are obtained Whether whether part name file size, file path, filemodetime, need to delete with path file index and be file, And construct operation information character string;
When the type of file operation is delete operation, the type of the file operation, the time of origin of file operation, text are obtained Part path and whether be file, and construct operation information character string;
When the type of file operation is moving operation, obtain the source path of the file operation, destination path, file size, Filename, filemodetime and whether be file, and construct operation information character string.
5. the distributed file system indexing means merged as claimed in claim 4 based on log, which is characterized in that described right The file operation metadata information read is parsed, and according to the type of file operation resulting after parsing, is executed corresponding File index operation, comprising:
When the type of file operation is creation operation, first judge whether to need to delete to delete if necessary with path file index Remove, then first delete indexed set in the identical file index in path, it is on the contrary then without operation;Then file is judged whether it is again, if It is file, then constructs and execute file index creation operation, if not file, then terminate;
When the type of file operation is delete operation, first judges to delete whether object is file, if it is file, then construct simultaneously It executes and deletes file index operation;If it is file, then constructs and execute the file index delete operation under this document folder;
When the type of file operation is moving operation, first judge whether mobile object is file, if it is file, is then first constructed And file index delete operation under source path is executed, then construct and executes file index creation operation under destination path;If It is file, then first retrieves the file index under this folder path, when takes out file path, modification in source file index Between, file size, the value of filename, file path is then updated to destination path by source path, filename, modification time and File size is constant, constructs and executes file index creation operation under destination path, then construct and execute file under source path Index delete operation.
6. it is a kind of based on log merge distributed file system indexing unit, which is characterized in that it is described based on log merge Distributed file system indexing unit, including update building writing unit module, information memory cell module and parsing building Performance unit module, in which:
The update constructs writing unit module, for recording file operation information, and day is written when file operation occurs Will, the file operation information include the type of file operation and the time of origin of file operation, and in the class of file operation When type is moving operation, after having recorded file operation information and log is written, file operation metadata information is constructed immediately;And And in the case where meeting trigger condition, execution journal union operation;For the file being modified in log union operation, when generation When the type of file operation is creation/delete operation, file operation metadata information is constructed, and information memory cell mould is written Block;When the type of the file operation of generation is moving operation, information is written into the file operation metadata information having been built up Storage unit module;
The information memory cell module updates file operation metadata constructed in building writing unit module for storing Information;
The parsing constructs performance unit module, for reading the file operation metadata information in information memory cell module; The file operation metadata information read is parsed, according to the type of file operation resulting after parsing, is executed corresponding File index operation;After the completion of the execution of All Files index operation, it is processed right in information memory cell module to delete As.
7. as claimed in claim 6 based on log merge distributed file system indexing unit, which is characterized in that it is described more New building writing unit module records file operation information, performs the following operations when file operation occurs:
Building writing unit module is updated by increasing field record file operation in the directory entry structure of the file operated Type, file operation time of origin and delete operation when deleted file index node;By in the text operated Increase field record in the bibliographic structure of part by the title of operation file.
8. as claimed in claim 6 based on log merge distributed file system indexing unit, which is characterized in that it is described more For new building writing unit module in the case where meeting trigger condition, the concrete operations of execution journal union operation, execution are as follows:
When the quantity of log is more than given threshold/receive log combine command, building writing unit module is updated with file Catalogue is that unit carries out log union operation.
9. as claimed in claim 6 based on log merge distributed file system indexing unit, which is characterized in that it is described more New building writing unit module constructs file operation metadata information, performs the following operations:
When the type of file operation be creation operation when, update building writing unit module obtain the file operation type, Whether the time of origin of file operation filename, file size, file path, filemodetime, needs to delete with path text Whether part indexes and is file, and constructs operation information character string;
When the type of file operation be delete operation when, update building writing unit module obtain the file operation type, The time of origin of file operation, file path and whether be file, and construct operation information character string;
When the type of file operation is moving operation, the source road that building writing unit module obtains the file operation is updated Diameter, destination path, file size, filename, filemodetime and whether be file, and construct operation information character string.
10. the distributed file system indexing unit merged as claimed in claim 9 based on log, which is characterized in that described Parsing building performance unit module parses the file operation metadata information read, according to file resulting after parsing The type of operation executes corresponding file index operation, performs the following operations:
When the type of file operation is creation operation, parsing building performance unit module first judges whether to need to delete same path File index, if necessary to delete, then first delete indexed set in the identical file index in path, it is on the contrary then without operation;Then again Judge whether it is file, if it is file, then constructs and execute file index creation operation, if not file, then return;
When the type of file operation is delete operation, parsing building performance unit module first judges to delete whether object is text Part is then constructed and is executed and delete file index operation if it is file;If it is file, then constructs and execute this document folder Under file index behaviour delete make.
When the type of file operation is moving operation, parsing building performance unit module first judges whether mobile object is text Part then first constructs and executes file index delete operation under source path if it is file, then construct and execute under destination path File index creation operation;If it is file, then the file index under this folder path is first retrieved, takes out source file rope Draw the value of middle file path, modification time, file size, filename, then updates file path for purpose road by source path Diameter, filename, modification time and file size are constant, construct and execute file index creation operation under destination path, then structure It builds and executes file index delete operation under source path.
CN201810718623.0A 2018-07-02 2018-07-02 Distributed file system indexing method and device based on log merging Active CN108984686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810718623.0A CN108984686B (en) 2018-07-02 2018-07-02 Distributed file system indexing method and device based on log merging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810718623.0A CN108984686B (en) 2018-07-02 2018-07-02 Distributed file system indexing method and device based on log merging

Publications (2)

Publication Number Publication Date
CN108984686A true CN108984686A (en) 2018-12-11
CN108984686B CN108984686B (en) 2021-03-30

Family

ID=64536753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810718623.0A Active CN108984686B (en) 2018-07-02 2018-07-02 Distributed file system indexing method and device based on log merging

Country Status (1)

Country Link
CN (1) CN108984686B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032496A (en) * 2019-04-19 2019-07-19 杭州玳数科技有限公司 A kind of log collection method and system for supporting diversified log merging
CN111208946A (en) * 2020-01-06 2020-05-29 北京同有飞骥科技股份有限公司 Data persistence method and system supporting KB-level small file concurrent IO
CN111427989A (en) * 2019-01-10 2020-07-17 北大方正集团有限公司 Index processing method, index processing system and storage medium for full-text retrieval
CN111984598A (en) * 2020-08-20 2020-11-24 重庆紫光华山智安科技有限公司 High-performance metadata log file management method, system, medium and terminal
CN112860649A (en) * 2021-02-03 2021-05-28 深圳市木浪云数据有限公司 Method, device and system for generating index in increment manner
CN112948327A (en) * 2021-04-01 2021-06-11 北京奇艺世纪科技有限公司 File processing method, system, electronic device and storage medium
CN113656645A (en) * 2020-05-12 2021-11-16 北京字节跳动网络技术有限公司 Log consumption method and device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101571827A (en) * 2008-04-30 2009-11-04 国际商业机器公司 Method for saving logs and log system
CN102483755A (en) * 2009-06-26 2012-05-30 森普利维蒂公司 File system
CN102622407A (en) * 2012-01-29 2012-08-01 广州亦云信息技术有限公司 Log file operating system and log file management method
CN102737130A (en) * 2012-06-21 2012-10-17 广州从兴电子开发有限公司 Method and system for processing metadata of hadoop distributed file system (HDFS)
CN104123300A (en) * 2013-04-26 2014-10-29 上海云人信息科技有限公司 Data distributed storage system and method
CN104239443A (en) * 2014-09-01 2014-12-24 英方软件(上海)有限公司 Serialization data operation log storage method
CN106663056A (en) * 2014-08-28 2017-05-10 华为技术有限公司 Metadata index search in file system
US9767139B1 (en) * 2014-06-30 2017-09-19 EMC IP Holding Company LLC End-to-end data integrity in parallel storage systems
CN108052679A (en) * 2018-01-04 2018-05-18 焦点科技股份有限公司 A kind of Log Analysis System based on HADOOP
CN108153804A (en) * 2017-11-17 2018-06-12 极道科技(北京)有限公司 A kind of metadata daily record update method of symmetric distributed file system
US20180181645A1 (en) * 2015-06-09 2018-06-28 Palantir Technologies Inc. Systems and methods for indexing and aggregating data records

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101571827A (en) * 2008-04-30 2009-11-04 国际商业机器公司 Method for saving logs and log system
CN102483755A (en) * 2009-06-26 2012-05-30 森普利维蒂公司 File system
CN102622407A (en) * 2012-01-29 2012-08-01 广州亦云信息技术有限公司 Log file operating system and log file management method
CN102737130A (en) * 2012-06-21 2012-10-17 广州从兴电子开发有限公司 Method and system for processing metadata of hadoop distributed file system (HDFS)
CN104123300A (en) * 2013-04-26 2014-10-29 上海云人信息科技有限公司 Data distributed storage system and method
US9767139B1 (en) * 2014-06-30 2017-09-19 EMC IP Holding Company LLC End-to-end data integrity in parallel storage systems
CN106663056A (en) * 2014-08-28 2017-05-10 华为技术有限公司 Metadata index search in file system
CN104239443A (en) * 2014-09-01 2014-12-24 英方软件(上海)有限公司 Serialization data operation log storage method
US20180181645A1 (en) * 2015-06-09 2018-06-28 Palantir Technologies Inc. Systems and methods for indexing and aggregating data records
CN108153804A (en) * 2017-11-17 2018-06-12 极道科技(北京)有限公司 A kind of metadata daily record update method of symmetric distributed file system
CN108052679A (en) * 2018-01-04 2018-05-18 焦点科技股份有限公司 A kind of Log Analysis System based on HADOOP

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111427989A (en) * 2019-01-10 2020-07-17 北大方正集团有限公司 Index processing method, index processing system and storage medium for full-text retrieval
CN110032496A (en) * 2019-04-19 2019-07-19 杭州玳数科技有限公司 A kind of log collection method and system for supporting diversified log merging
CN110032496B (en) * 2019-04-19 2023-10-13 杭州玳数科技有限公司 Log acquisition method and system supporting diversified log merging
CN111208946A (en) * 2020-01-06 2020-05-29 北京同有飞骥科技股份有限公司 Data persistence method and system supporting KB-level small file concurrent IO
CN113656645A (en) * 2020-05-12 2021-11-16 北京字节跳动网络技术有限公司 Log consumption method and device
CN111984598A (en) * 2020-08-20 2020-11-24 重庆紫光华山智安科技有限公司 High-performance metadata log file management method, system, medium and terminal
CN112860649A (en) * 2021-02-03 2021-05-28 深圳市木浪云数据有限公司 Method, device and system for generating index in increment manner
CN112948327A (en) * 2021-04-01 2021-06-11 北京奇艺世纪科技有限公司 File processing method, system, electronic device and storage medium

Also Published As

Publication number Publication date
CN108984686B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN108984686A (en) A kind of distributed file system indexing means and device merged based on log
CN104731921B (en) Storage and processing method of the Hadoop distributed file systems for log type small documents
JP5656563B2 (en) Document management system, document management system control method, and program
EP3596619B1 (en) Methods, devices and systems for maintaining consistency of metadata and data across data centers
CN106484877B (en) A kind of document retrieval system based on HDFS
CN103765393B (en) Storage system
US8301588B2 (en) Data storage for file updates
CN103262043B (en) The method and system of the meticulous recovery of performing database from differential backup
CN103020315B (en) A kind of mass small documents storage means based on master-salve distributed file system
CN101641695B (en) Resource access filtering system and database structure for use therewith
CN103020204B (en) A kind of method and its system carrying out multi-dimensional interval query to distributed sequence list
CN105787093B (en) A kind of construction method of the log file system based on LSM-Tree structure
CN101866305B (en) Continuous data protection method and system supporting data inquiry and quick recovery
US7418544B2 (en) Method and system for log structured relational database objects
US10013440B1 (en) Incremental out-of-place updates for index structures
CN102184211B (en) File system, and method and device for retrieving, writing, modifying or deleting file
CN106484906B (en) Distributed object storage system flash-back method and device
CN111522880B (en) Method for improving data read-write performance based on mysql database cluster
CN103595797B (en) Caching method for distributed storage system
CN106021031B (en) A kind of the deletion data reconstruction method and device of BTRFS file system
CN101888405A (en) Cloud computing file system and data processing method
CN105912687A (en) Mass distributed database memory cell
CN103678491A (en) Method based on Hadoop small file optimization and reverse index establishment
US11048699B1 (en) Grand unified file indexing
CN110321325A (en) File inode lookup method, terminal, server, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant