CN101149755A - Distributed file system file writing system and method - Google Patents

Distributed file system file writing system and method Download PDF

Info

Publication number
CN101149755A
CN101149755A CNA2007101679005A CN200710167900A CN101149755A CN 101149755 A CN101149755 A CN 101149755A CN A2007101679005 A CNA2007101679005 A CN A2007101679005A CN 200710167900 A CN200710167900 A CN 200710167900A CN 101149755 A CN101149755 A CN 101149755A
Authority
CN
China
Prior art keywords
descriptor
predistribution
file destination
piece
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2007101679005A
Other languages
Chinese (zh)
Other versions
CN100517335C (en
Inventor
刘岳
李剑宇
唐荣峰
熊劲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CNB2007101679005A priority Critical patent/CN100517335C/en
Publication of CN101149755A publication Critical patent/CN101149755A/en
Application granted granted Critical
Publication of CN100517335C publication Critical patent/CN100517335C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This invention relates to a distributed file writing system and method, can direct use the preassignment information in the cache to distribute block data when the memory sever need to distribute block data for the targeted document.. When more than one client distribute block data for their own document at the same time, the memory sever end can exert the advantage of preassignment mechanism to ensure that all target file data are saved in the disk continuous as far as possible, avoid been staggered. The present invention can solve the problem that when more than one client distribute block data for their own document at the same time, data blocks are staggered the memory sever distribute for the targeted document, the block data of each file is discontinuous distributed on the disk.

Description

A kind of file writing system and method for distributed file system
Technical field
The present invention relates to the file writing system and the method in Computer Storage field, particularly a kind of distributed file system.
Background technology
The structure that a group of planes (cluster) structure is made up of interconnected a plurality of stand-alone computer, these computing machines can be unit or multicomputer system (PC, workstation or SMP), and each computing machine all has storer, the I/O the device and operating system of oneself.Group of planes structure is a single system to user and application, and it can provide high performance environments and rapid and reliable service efficiently at a low price.Because group of planes structure has the advantage of high performance-price ratio, thereby it has become the main flow structure of high-performance calculation at present.
In Network of Workstation, jumbo memory device often is equipped with, when System Operation, need manage these equipment.Simultaneously, Network of Workstation also needs to provide good file-sharing service for the user of different cluster nodes.Distributed file system provides these services for group of planes structure, and it integrates all memory devices in the Network of Workstation, sets up a unified name space (institutional framework of file and catalogue).The bibliographic structure of the distributed file system that each node is seen is consistent, and the user of different nodes can adopt the identical file of transparent way visit.Data in the distributed file system not necessarily are stored in the disk of this node, thereby all can be provided with special-purpose storage server usually.To be written as example, when application process was passed through the distributed file system client write data, client at first was sent to the storage server end with data by network, and storage server is write the data that receive in the disk of this node again.
In (SuSE) Linux OS, process to the browsing process of file is: (1) open (2) read/write (3) read/write (4) read/write... (n) close.That is, process at first will open a file by open action before access file, obtained a filec descriptor fd.After opening file, process can be called the read/write function as parameter with accessed filec descriptor fd and carry out read-write operation.After read-write finished, process also needed to operate close file by close.
In file system, the layout of file storage and I/O performance are closely related.In order to improve the I/O performance of entire system, the existing file system all tries one's best when the distribute data piece data block of identical file is deposited on disk continuously, does like this can reduce on the one hand file data and write moving of fashionable magnetic head; On the other hand, when reading file, also can give full play to the effect that outfile is read in advance.Yet, when a plurality of processes in the system need to give the file allocation data block that writes separately simultaneously, they will compete the free block in the application system, thereby cause one section continuous free block zone to be distributed to a plurality of files alternately, and then reduced the continuous degree of data block on disk of each file.
At the problems referred to above, the ext3 file system has proposed data block reservation distribution mechanism and has alleviated a plurality of concurrent processes are divided timing at piece mutual interference problem.The ext3 file system needs predistribution status information of file maintenance of distribute data piece for each, in fact be exactly to reserve one section continuous data block zone for each needs the file of distribute data piece, when a plurality of files simultaneously during the request for data piece, from reserved area separately, distribute respectively, avoiding the data block weave in of different files, thereby make each file can both have higher continuous degree.Simultaneously, each needs the predistribution status information of the file of distribute data piece constantly to adjust the access module of this document according to process, such as, when system identification is being certain document order ground distribute data piece to process always, will enlarge the piece reserved window of this document, just reserve the more data piece for this document; On the contrary, if process is not sequentially to be the file allocation data block, system will dwindle the piece reserved window of this document; By above-mentioned adjustment, can farthest bring into play the advantage that piece is reserved distribution mechanism.
Wherein, the piece of each file reserve distribution state information be kept at the corresponding ext3inode structure of this document in, this structure is initialised during for this document distribute data piece first, is in the end destroyed when writing the process close file for one.
In this accessing, directly the file in the ext3 file system being carried out repeatedly write operation only needs once to open and close operation, and its flow process as shown in Figure 1.In whole access process, file system can be managed the predistribution status information always and be come the predistribution data block according to information wherein.Yet, in distributed file system, the file that client will be visited not necessarily is stored in the local disk, but corresponding to a file destination in the storage server end local file system, in the time of on storage server is based upon the ext3 local file system, file destination be exactly an ext3 type local file.
In distributed file system, when storage server end service thread is handled each write request that client sends, all need to open and close file destination, its flow process is as shown in Figure 2.When closing file destination, the piece of file destination is presorted status information and is probably destroyed, thereby causes the predistribution status information not transmit between request.When a plurality of client process are separately file allocation data block simultaneously, storage server can not be each file destination maintenance block predistribution status information constantly, piece predistribution mechanism also just can't be played effectiveness, thereby can cause a plurality of file destination data stored interleaved on disk, and then the reduction data write and the performance of subsequent read operation.
In order to overcome these defectives, available technology adopting a kind of object storage technology.After taking this memory technology, the a certain partial data of file of client or file can corresponding objects storage server end a data object, storage server is when handling the write request of client, not to be write request distribute data piece and write data immediately, but the metadata cache that will write same data object earlier is in internal memory, when this part data aggregate arrives certain length, just once on disk, distribute a bulk of continuum, and then carry out write operation for this data object.By the mode that this delay distributes, can promote the continuation degree of data block in disk of same data object.
But, because the memory amount of system is limited, if wait the data object number of data to be distributed piece to reach certain quantity simultaneously, just will certainly accelerate the frequency of distribute data piece, like this, the length of each consecutive data block of distributing for data object just will reduce.Therefore, under the more situation of load number, this delay allocation strategy can not obtain good effect.In addition, the storage server end of this prior art also needs the support of specialized hardware, and therefore higher to the requirement of system equipment, practicality is not strong yet.
Summary of the invention
The objective of the invention is to, a kind of file writing system and method for distributed file system are provided, it has promoted the write performance of distributed file system, when solving a plurality of client process simultaneously for separately file allocation data block, storage server is that the data block that a plurality of file destinations distribute is interspersed, and the data block of each file is in the disk discontinuous problem that distributes.
To achieve these goals, the invention discloses a kind of file wiring method of distributed file system, may further comprise the steps:
Step 100, distributed file system load store server module, and carry out initialization;
Step 200, the write request that described storage server end sends according to a client in the described distributed file system at a file destination, be utilized as the piece predistribution descriptor of described file destination buffer memory, finish write operation described file destination;
Step 300 encapsulates response message according to the result of said write operation, with response message send to described write request from client.
Preferable, in described step 200, may further comprise the steps:
Step 210, a client in the described distributed file system is sent write request at a file destination to the storage server end;
Step 220, described storage server end obtains the relevant information of described file destination according to the information that comprises in the described write request;
Step 230, described storage server end is according to the file destination information in the client write request, be a piece predistribution of described file destination initialization descriptor, and described predistribution descriptor be buffered in the internal memory of storage server end, the ext3 local file system in the described storage server end is that described file destination is reserved corresponding data block according to described predistribution descriptor;
Step 240, described storage server end is finished the write operation to described file destination;
Step 250 is closed described file destination.
Preferable, in described step 220, may further comprise the steps:
Step 221 according to the information that comprises in the described write request, is resolved the filename that obtains described file destination;
Step 222 is opened described file destination, obtain described file destination i-number number.
Preferable, in described step 250, after closing described file destination, described predistribution descriptor of described file destination continues to be buffered in the described internal memory.
Preferable, in said method, the buffer organization structure of described predistribution descriptor is the hash table.
Hash function in the described hash table is the i-number number function to a hash inlet array indexing value by described file destination;
The address that each hash list item of described hash table has write down the piece predistribution descriptor of described file destination, file system under the described file destination the address of superblock, the reference count of hash list item, service time last time of hash list item, and the i-number of described file destination number.
Preferable, in said method, also comprise step 400, regularly to the execution reclaimer operation of described predistribution descriptor, be released to the data block that described file destination is reserved, and discharge the storage space of described predistribution descriptor.
Preferable, in described step 400, the reference count of judging described hash list item whether equal 0 and the interval of service time last time and current time whether greater than the maximum lifetime of piece predistribution descriptor, if satisfied above-mentioned two conditions simultaneously, then be released to the data block that described file destination is reserved, and discharge the storage space of described predistribution descriptor.
Preferable, in said method, also comprise step 500, the unloading storage server modules.
Preferable, in described step 240, the write operation of finishing described file destination may further comprise the steps:
Step 241 locks to the index node of described file destination;
Step 242, the index node by described file destination finds the privately owned index node of described file destination in the ext3 local file system;
Step 243 judges whether the piece predistribution descriptor pointer of the privately owned index node of ext3 of file destination is NULL, if be not NULL, enters step 246, if be NULL, then enters step 244;
Step 244 is by the key of described predistribution descriptor of i-number number generation in described hash table of described file destination;
Step 245, in described hash table, search, judged whether to described file destination buffer memory piece predistribution descriptor, be the piece predistribution descriptor of described file destination buffer memory if found, then give the privately owned index node piece of file destination ext3 predistribution descriptor pointer, and the reference count of hash node under the described predistribution descriptor is added 1, enter step 246 its assignment, if do not find piece predistribution descriptor, then directly enter step 246 into described file destination buffer memory;
Step 246, calling system function generic_file_aio_write_nolock () finishes write operation, if in step 245, do not find piece predistribution descriptor into described file destination buffer memory, then system can automatically be that described file destination distributes a piece predistribution descriptor, and with its assignment to the piece predistribution descriptor pointer in the privately owned inode structures of ext3 of described file destination;
Step 247, judge in 245 steps, whether to be that file destination finds a piece predistribution descriptor earlier, if, then the reference count with hash node under the described predistribution descriptor subtracts 1, enter step 248, if in 245 steps, do not distribute descriptor for file destination finds piece to reserve, and by distribution and the assignment in step 246, piece predistribution descriptor pointer in the privately owned inode structures of the ext3 of described file destination is not NULL, piece predistribution descriptor then that piece predistribution descriptor pointer in the privately owned inode structures of the ext3 of file destination is pointed is inserted in the buffer memory, and enters step 248;
Step 248 is changed to NULL with the piece predistribution descriptor pointer of the privately owned index node of ext3 of file destination, in case locking system discharges it;
Step 249 is with the index node release of described file destination.
To achieve these goals, the invention also discloses a kind of distributed file system, comprise storage server end and client;
In the described storage server end, include ext3 local file system, predistribution descriptor administration module and write request processing module;
Described predistribution descriptor administration module is used for when receiving client at the write request of a file destination, for predistribution descriptor of described file destination initialization and management with reclaim described predistribution descriptor;
Described write request processing module be used for according to described client send at a file destination write request, obtain the relevant information of described file destination, and the piece predistribution descriptor of the described file destination of being managed according to described predistribution descriptor administration module, for described write request distribute data piece and carry out write operation;
Described client is used for sending write request to described storage server end.
Preferable, described write request processing module also is used for after write operation is finished, according to the result of said write operation encapsulate response message and with response message send to described write request from client;
Described client also is used to receive the response message that described write request processing module is sent.
Preferable, the institutional framework of described predistribution descriptor is the hash table;
Hash function in the described hash table is the i-number number function to a hash inlet array indexing value by described file destination;
The address that each hash list item of described hash table has write down the piece predistribution descriptor of described file destination, file system under the described file destination the address of superblock, the reference count of hash list item, service time last time of hash list item, and the i-number of described file destination number.
Preferable, in said system, the relevant information of described file destination is i-number number of described file destination.
Preferable, in said system, described write request processing module also is used for, and when the reference count of described hash list item equals 0 and the interval of service time last time and current time during greater than the maximum lifetime of piece predistribution descriptor, is released to the data block that described file destination is reserved.
Preferable, in said system, described predistribution descriptor administration module also is used for, when the reference count of described hash list item equals 0 and the interval of service time last time and current time during greater than the maximum lifetime of piece predistribution descriptor, discharge the storage space of described predistribution descriptor.
Preferable, in said system, described write request processing module also is used for, and when the write operation carried out described file destination, realizes the operation of the following step:
Step 001 locks to the index node of described file destination;
Step 002, the index node by described file destination finds the privately owned index node of described file destination in the ext3 local file system;
Step 003 judges that the piece of the privately owned index node of file destination ext3 distributes whether descriptor pointer is NULL, if be not NULL, enters step 006, if be NULL, then enters step 004;
Step 004 is by the key of described predistribution descriptor of i-number number generation in described hash table of described file destination;
Step 005, in described hash table, search, judged whether to described file destination buffer memory piece predistribution descriptor, be the piece predistribution descriptor of described file destination buffer memory if found, then give the privately owned index node piece of file destination ext3 predistribution descriptor pointer with its assignment, and described distributed the reference count of hash node adds 1 under the descriptor, and do not find piece predistribution descriptor if enter step 006 into described file destination buffer memory, then directly enter step 006;
Step 006, calling system function generic_file_aio_write_nolock () finishes write operation, if in step 005, do not find piece predistribution descriptor into described file destination buffer memory, then system can automatically be that described file destination distributes a piece predistribution descriptor, and with its assignment to the piece predistribution descriptor pointer in the privately owned inode structures of ext3 of described file destination;
Step 007, judge in 005 step, whether to be that file destination finds a piece predistribution descriptor earlier, if, then the reference count with hash node under the described predistribution descriptor subtracts 1, enter step 008, if in 005 step, do not distribute descriptor for file destination finds piece to reserve, and by distribution and the assignment in step 006, piece predistribution descriptor pointer in the privately owned inode structures of the ext3 of described file destination is not NULL, piece predistribution descriptor then that piece predistribution descriptor pointer in the privately owned inode structures of the ext3 of file destination is pointed is inserted in the buffer memory, and enters step 008;
Step 008 is changed to NULL with the piece predistribution descriptor pointer of the privately owned index node of ext3 of file destination, in case locking system discharges it;
Step 009 is with the index node release of described file destination.
The file writing system and the method for distributed file system of the present invention, have following beneficial effect: the present invention can guarantee when the multi-client load needs for separately file allocation data block simultaneously, the storage server end still can have been given play to the advantage that piece is reserved allocation strategy, thereby the data block that guarantees each file destination is deposited on disk as far as possible continuously, performance when this continuous file storage layout both can promote data and writes back can promote the sequential read performance to file again.
Describe the present invention below in conjunction with the drawings and specific embodiments, but not as a limitation of the invention.
Description of drawings
Fig. 1 is the write operation synoptic diagram of the local file system of prior art;
Fig. 2 is the write operation synoptic diagram of the distributed file system of prior art;
The process flow diagram of the file wiring method of Fig. 3 distributed file system of the present invention;
Fig. 4 is the buffer organization structural drawing of predistribution descriptor of the present invention;
Fig. 5 is the structural drawing of distributed file system of the present invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, the file writing system and the method for distributed file system of the present invention is further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The file writing system and the method for distributed file system of the present invention, at the local file system in the storage server end is the distributed file system of ext3 file system, and distributed file system wherein can be the various existing distributed file system that meet above-mentioned requirements.In the present invention, mainly utilize the piece of ext3 file system to reserve the lifting that distribution mechanism realizes the distributed file system write capability.
Please refer to Fig. 3, this is the process flow diagram of the wiring method of distributed file system of the present invention.The wiring method of distributed file system of the present invention may further comprise the steps:
Step S100, distributed file system load store server module, and carry out initialization.
Step S200, a client in the distributed file system is sent write request at a file destination to the storage server end.Comprise the file destination identification information that will visit in the described write request, write request is with respect to the reference position of file destination and the information such as length of write request.
Step S300, described storage server end obtains the filename of described file destination according to the information that comprises in the described write request, opens described file destination, obtain described file destination i-number number.Wherein, i-number number is the i node serial number of file, and it is the unique identification of file in file system.
Step S400, ext3 local file system in described storage server end is after a piece predistribution of the described file destination initialization descriptor, described storage server end is buffered in described predistribution descriptor in the internal memory of described storage server end, described storage server end is that described file destination is reserved corresponding data block according to described predistribution descriptor, described storage server end is after closing described file destination, and the described predistribution descriptor of described file destination continues to be buffered in the described internal memory.
Wherein, the buffer organization structure of described predistribution descriptor as shown in Figure 4.As a kind of embodiment, in an embodiment of the present invention, with hash table cache predistribution descriptor, but not with this as limitation of the invention.
At embodiments of the invention, be used for the hash table of all piece predistribution descriptors of buffer memory, the number of hash inlet item is fixed, the maximum hash list item number that each hash inlet Xiang Suoneng connects is also fixed, use LRU chained list link hash list item in each hash inlet item, above-mentioned employing fixed number and employing LRU chained list link and only are used for example, and be not used in limitation of the invention, in practical operation, the method that also can adopt fixed number not and use structure such as other chained lists to link.Hash function in the described hash table is the i-number number function to a hash inlet array indexing value by described file destination.The address that each hash list item of described hash table has write down the piece predistribution descriptor of described file destination, the address of the superblock of the file system under the described file destination, the reference count of hash list item, service time last time of hash list item, and the i-number of described file destination number.
Step S500, storage server end take out from internal memory and are the piece predistribution descriptor of described file destination buffer memory, and come to be file destination distribute data piece according to described predistribution descriptor, thereby finish the write operation to file destination.
Step S600 encapsulates response message according to the result of said write operation, with response message send to described write request from client.
Step S700 regularly carries out the reclaimer operation of fast predistribution descriptor.When carrying out reclaimer operation, reclaim each hash list item of thread traverses, the numerical value of judging the reference count of described hash list item whether equal 0 and the interval of service time last time and current time whether greater than the maximum lifetime MAX_LIFE_TIME of piece predistribution descriptor, if satisfied above-mentioned two conditions simultaneously, then be released to all data blocks that described file destination is reserved, and discharge the storage space of described hash list item.Wherein, the numerical value of reference count is using the process number of this predistribution descriptor in order to expression.
Step S800, distributed file system unloading storage server modules.
Wherein, among the described step S500, further may further comprise the steps:
Step S501 locks to the index node inode of described file destination.
Step S502, the index node inode by described file destination finds the privately owned index node inode of described file destination in the ext3 local file system.
Step S503 judges whether the piece predistribution descriptor pointer in the privately owned index node inode of the ext3 structure of file destination is NULL, if be not NULL, enters step S506, if be NULL, then enters step S504.
Step S504 is by the key of described predistribution descriptor of i-number number generation in described hash table of described file destination.
Step S505, in described hash table, search, judged whether to described file destination buffer memory piece predistribution descriptor, be the piece predistribution descriptor of described file destination buffer memory if found, then its assignment is given the piece predistribution descriptor pointer in the privately owned inode structure of ext3 of file destination, and the reference count of hash node under the described predistribution descriptor added 1, enter step S506, if do not find piece predistribution descriptor, then directly enter step S506 into described file destination buffer memory.
Step S506, calling system function generic_file_aio_write_nolock () finishes write operation.In the process of carrying out write operation, if in step S505, do not find piece predistribution descriptor into described file destination buffer memory, then system can automatically be that described file destination distributes a piece predistribution descriptor, and with its assignment to the piece predistribution descriptor pointer in the privately owned inode structure of ext3 of described file destination.
Step S507 judges whether be that file destination finds a piece predistribution descriptor earlier, if then the reference count with hash node under the described predistribution descriptor subtracts 1, enters step S508 in the S505 step.If in the S505 step, do not distribute descriptor for file destination finds piece to reserve, and by distribution and the assignment in step S506, piece predistribution descriptor pointer in the privately owned inode structure of the ext3 of described file destination is not NULL, then piece predistribution descriptor pointer piece predistribution descriptor pointed in the privately owned inode structure of the ext3 of file destination is inserted in the buffer memory, and enters step S508.
Step S508 is changed to NULL with the piece predistribution descriptor pointer in the privately owned inode structure of the ext3 of file destination, in case locking system discharges it.
Step S509 is with the inode release of described file destination.
Figure 5 shows that the structural drawing of a kind of distributed file system provided by the present invention.As shown in Figure 5, in the distributed file system 1 of the present invention, comprise storage server end 10 and client 20, the local file system in the described storage server end 10 is an ext3 local file system 100.In the described storage server end 10, include predistribution descriptor administration module 110 and write request processing module 120.
Described predistribution descriptor administration module 110 is used for initialization, management and reclaims the predistribution descriptor.
Described write request processing module 120 is used for handling the write request that the client 20 of described distributed file system 1 is sent, and finishes analysis, writes and operation such as acknowledgement messaging.
When described distributed file system 1 operated, client 20 was sent the write request at a file destination in described storage server end 10.Described write request processing module 120 according to the information that comprises in the described write request, is obtained described file destination filename, opens described file destination, obtains the i-number of described file destination.Described predistribution descriptor administration module 110 is predistribution descriptor of described file destination initialization, and described predistribution descriptor is buffered in the internal memory of storage server end 100.Described ext3 local file system 100 is that described file destination is reserved corresponding data block according to described predistribution descriptor.Described storage server end 10 close described file destination write process after, the described predistribution descriptor of described file destination continues to be buffered in the described internal memory.
Wherein, the buffer organization structure of described predistribution descriptor as shown in Figure 4.In an embodiment of the present invention, with hash table cache predistribution descriptor, but not with this as limitation of the invention.At embodiments of the invention, be used for the hash table of all piece predistribution descriptors of buffer memory, the number of hash inlet item is fixed, the maximum hash list item number that each hash inlet Xiang Suoneng connects is also fixed, use LRU chained list link hash list item in each hash inlet item, above-mentioned employing fixed number and employing LRU chained list link and only are used for example, and be not used in limitation of the invention, in practical operation, the method that also can adopt fixed number not and use structure such as other chained lists to link.Hash function in the described hash table is the i-number number function to a hash inlet array indexing value by described file destination.The address that each hash list item of described hash table has write down the piece predistribution descriptor of described file destination, the address of the superblock of the file system under the described file destination, the reference count of hash list item, service time last time of hash list item, and the i-number of described file destination number.
After write operation was finished, described write request processing module 120 encapsulated response message according to the result of said write operation, and response message is sent to the described client of sending write request.
At last,, thereby need regularly obsolete predistribution descriptor to be discharged, thereby will be the data block release of described file destination reservation owing to the data block of reserving for file destination can't be reallocated to other file.Therefore, described predistribution descriptor administration module 110 also is used for regularly carrying out the reclaimer operation of fast predistribution descriptor.When carrying out reclaimer operation, described predistribution descriptor administration module 110 starts one and reclaims thread, each hash list item of described recovery thread traverses, the reference count of judging described hash list item whether equal 0 and the interval of service time last time and current time whether greater than the maximum lifetime MAX_LIFE_TIME of piece predistribution descriptor, if satisfied above-mentioned two conditions simultaneously, 110 of described predistribution descriptor administration modules are released to all data blocks that described file destination is reserved, and discharge the storage space of described hash list item.Wherein, the numerical value of reference count is using the process number of this predistribution descriptor in order to expression.
In the running of said system, the storage server end takes out from buffer memory and is the piece predistribution descriptor of described file destination buffer memory, and come to be file destination distribute data piece according to it, thereby file destination is finished write operation, wherein, the wiring method of distributed file system is described as the aforementioned for performed content.Therefore, repeat no more herein.
Take the file writing system and the method for distributed file system of the present invention, can be when storage server need be for file destination distribute data piece, directly utilize the predistribution information in the buffer memory to come the distribute data piece, and because each predistribution status information that writes request all is stored in the buffer memory, therefore, each writes the predistribution information of request and can transmit between request, when a plurality of client process are separately file allocation data block simultaneously, the storage server end can have been given play to the benefit of piece predistribution mechanism, thereby guaranteeing that each file destination data can be tried one's best on disk deposits continuously, avoids overlapping.
Certainly; the present invention also can have other various embodiments; under the situation that does not deviate from spirit of the present invention and essence thereof; those of ordinary skill in the art work as can make various corresponding changes and distortion according to the present invention, but these corresponding changes and distortion all should belong to the protection domain of the appended claim of the present invention.

Claims (16)

1. the file wiring method of a distributed file system is characterized in that, may further comprise the steps:
Step 100, distributed file system load store server module, and carry out initialization;
Step 200, the write request that described storage server end sends according to a client in the described distributed file system at a file destination, be utilized as the piece predistribution descriptor of described file destination buffer memory, finish write operation described file destination;
Step 300 encapsulates response message according to the result of said write operation, with response message send to described write request from client.
2. the file wiring method of distributed file system according to claim 1 is characterized in that, in the described step 200, may further comprise the steps:
Step 210, a client in the described distributed file system is sent write request at a file destination to the storage server end;
Step 220, described storage server end obtains the relevant information of described file destination according to the information that comprises in the described write request;
Step 230, described storage server end is according to the file destination information in the client write request, be a piece predistribution of described file destination initialization descriptor, and described predistribution descriptor be buffered in the internal memory of storage server end, the ext3 local file system in the described storage server end is that described file destination is reserved corresponding data block according to described predistribution descriptor;
Step 240, described storage server end is finished the write operation to described file destination;
Step 250 is closed described file destination.
3. the file wiring method of distributed file system according to claim 2 is characterized in that, in the described step 220, may further comprise the steps:
Step 221 according to the information that comprises in the described write request, is resolved the filename that obtains described file destination;
Step 222 is opened described file destination, obtain described file destination i-number number.
4. the file wiring method of distributed file system according to claim 2 is characterized in that, in step 250, after closing described file destination, described predistribution descriptor of described file destination continues to be buffered in the described internal memory.
5. the file wiring method of distributed file system according to claim 2 is characterized in that, the buffer organization structure of described predistribution descriptor is the hash table.
Hash function in the described hash table is the i-number number function to a hash inlet array indexing value by described file destination;
The address that each hash list item of described hash table has write down the piece predistribution descriptor of described file destination, file system under the described file destination the address of superblock, the reference count of hash list item, service time last time of hash list item, and the i-number of described file destination number.
6. the file wiring method of distributed file system according to claim 1, it is characterized in that, also comprise step 400, regular execution reclaimer operation described predistribution descriptor, be released to the data block that described file destination is reserved, and discharge the storage space of described predistribution descriptor.
7. the file wiring method of distributed file system according to claim 6, it is characterized in that, in step 400, the reference count of judging described hash list item whether equal 0 and the interval of service time last time and current time whether greater than the maximum lifetime of piece predistribution descriptor, if satisfied above-mentioned two conditions simultaneously, then be released to the data block that described file destination is reserved, and discharge the storage space of described predistribution descriptor.
8. the file wiring method of distributed file system according to claim 1 is characterized in that, also comprises step 500, the unloading storage server modules.
9. the file wiring method of distributed file system according to claim 2 is characterized in that, in the described step 240, the write operation of finishing described file destination may further comprise the steps:
Step 241 locks to the index node of described file destination;
Step 242, the index node by described file destination finds the privately owned index node of described file destination in the ext3 local file system;
Step 243 judges whether the piece predistribution descriptor pointer of the privately owned index node of ext3 of file destination is NULL, if be not NULL, enters step 246, if be NULL, then enters step 244;
Step 244 is by the key of described predistribution descriptor of i-number number generation in described hash table of described file destination;
Step 245, in described hash table, search, judged whether to described file destination buffer memory piece predistribution descriptor, be the piece predistribution descriptor of described file destination buffer memory if found, then give the privately owned index node piece of file destination ext3 predistribution descriptor pointer, and the reference count of hash node under the described predistribution descriptor is added 1, enter step 246 its assignment, if do not find piece predistribution descriptor, then directly enter step 246 into described file destination buffer memory;
Step 246, calling system function generic_file_aio_write_nolock () finishes write operation, if in step 245, do not find piece predistribution descriptor into described file destination buffer memory, then system can automatically be that described file destination distributes a piece predistribution descriptor, and with its assignment to the piece predistribution descriptor pointer in the privately owned inode structures of ext3 of described file destination;
Step 247, judge in 245 steps, whether to be that file destination finds a piece predistribution descriptor earlier, if, then the reference count with hash node under the described predistribution descriptor subtracts 1, enter step 248, if in 245 steps, do not distribute descriptor for file destination finds piece to reserve, and by distribution and the assignment in step 246, piece predistribution descriptor pointer in the privately owned inode structures of the ext3 of described file destination is not NULL, piece predistribution descriptor then that piece predistribution descriptor pointer in the privately owned inode structures of the ext3 of file destination is pointed is inserted in the buffer memory, and enters step 248;
Step 248 is changed to NULL with the piece predistribution descriptor pointer of the privately owned index node of ext3 of file destination, in case locking system discharges it;
Step 249 is with the index node release of described file destination.
10. a distributed file system is characterized in that, comprises storage server end and client;
In the described storage server end, include ext3 local file system, predistribution descriptor administration module and write request processing module;
Described predistribution descriptor administration module is used for when receiving client at the write request of a file destination, for predistribution descriptor of described file destination initialization and management with reclaim described predistribution descriptor;
Described write request processing module be used for according to described client send at a file destination write request, obtain the relevant information of described file destination, and the piece predistribution descriptor of the described file destination of being managed according to described predistribution descriptor administration module, for described write request distribute data piece and carry out write operation;
Described client is used for sending write request to described storage server end.
11., distributed file system according to claim 10, it is characterized in that described write request processing module also is used for after write operation is finished, according to the result of said write operation encapsulate response message and with response message send to described write request from client;
Described client also is used to receive the response message that described write request processing module is sent.
12. distributed file system according to claim 10 is characterized in that, the institutional framework of described predistribution descriptor is the hash table,
Hash function in the described hash table is the i-number number function to a hash inlet array indexing value by described file destination;
The address that each hash list item of described hash table has write down the piece predistribution descriptor of described file destination, file system under the described file destination the address of superblock, the reference count of hash list item, service time last time of hash list item, and the i-number of described file destination number.
13. distributed file system according to claim 10 is characterized in that, the relevant information of described file destination is i-number number of described file destination.
14. distributed file system according to claim 12, it is characterized in that, described write request processing module also is used for, when the reference count of described hash list item equals 0 and the interval of service time last time and current time during greater than the maximum lifetime of piece predistribution descriptor, be released to the data block that described file destination is reserved.
15. distributed file system according to claim 12, it is characterized in that, described predistribution descriptor administration module also is used for, when the reference count of described hash list item equals 0 and the interval of service time last time and current time during greater than the maximum lifetime of piece predistribution descriptor, discharge the storage space of described predistribution descriptor.
16. distributed file system according to claim 10 is characterized in that, described write request processing module also is used for, and when the write operation carried out described file destination, realizes the operation of the following step:
Step 001 locks to the index node of described file destination;
Step 002, the index node by described file destination finds the privately owned index node of described file destination in the ext3 local file system;
Step 003 judges that the piece of the privately owned index node of file destination ext3 distributes whether descriptor pointer is NULL, if be not NULL, enters step 006, if be NULL, then enters step 004;
Step 004 is by the key of described predistribution descriptor of i-number number generation in described hash table of described file destination;
Step 005, in described hash table, search, judged whether to described file destination buffer memory piece predistribution descriptor, be the piece predistribution descriptor of described file destination buffer memory if found, then give the privately owned index node piece of file destination ext3 predistribution descriptor pointer with its assignment, and described distributed the reference count of hash node adds 1 under the descriptor, and do not find piece predistribution descriptor if enter step 006 into described file destination buffer memory, then directly enter step 006;
Step 006, calling system function generic_file_aio_write_nolock () finishes write operation, if in step 005, do not find piece predistribution descriptor into described file destination buffer memory, then system can automatically be that described file destination distributes a piece predistribution descriptor, and with its assignment to the piece predistribution descriptor pointer in the privately owned inode structures of ext3 of described file destination;
Step 007, judge in 005 step, whether to be that file destination finds a piece predistribution descriptor earlier, if, then the reference count with hash node under the described predistribution descriptor subtracts 1, enter step 008, if in 005 step, do not distribute descriptor for file destination finds piece to reserve, and by distribution and the assignment in step 006, piece predistribution descriptor pointer in the privately owned inode structures of the ext3 of described file destination is not NULL, piece predistribution descriptor then that piece predistribution descriptor pointer in the privately owned inode structures of the ext3 of file destination is pointed is inserted in the buffer memory, and enters step 008;
Step 008 is changed to NULL with the piece predistribution descriptor pointer of the privately owned index node of ext3 of file destination, in case locking system discharges it;
Step 009 is with the index node release of described file destination.
CNB2007101679005A 2007-10-25 2007-10-25 Distributed file system file writing system and method Expired - Fee Related CN100517335C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007101679005A CN100517335C (en) 2007-10-25 2007-10-25 Distributed file system file writing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007101679005A CN100517335C (en) 2007-10-25 2007-10-25 Distributed file system file writing system and method

Publications (2)

Publication Number Publication Date
CN101149755A true CN101149755A (en) 2008-03-26
CN100517335C CN100517335C (en) 2009-07-22

Family

ID=39250281

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101679005A Expired - Fee Related CN100517335C (en) 2007-10-25 2007-10-25 Distributed file system file writing system and method

Country Status (1)

Country Link
CN (1) CN100517335C (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073739A (en) * 2011-01-25 2011-05-25 中国科学院计算技术研究所 Method for reading and writing data in distributed file system with snapshot function
CN101452402B (en) * 2008-11-28 2012-05-30 珠海金山快快科技有限公司 Software operation system and software operation method
CN102622412A (en) * 2011-11-28 2012-08-01 中兴通讯股份有限公司 Method and device of concurrent writes for distributed file system
CN101706802B (en) * 2009-11-24 2013-06-05 成都市华为赛门铁克科技有限公司 Method, device and sever for writing, modifying and restoring data
CN103294704A (en) * 2012-02-28 2013-09-11 鸿富锦精密工业(深圳)有限公司 File synchronous system and method
CN103514298A (en) * 2013-10-16 2014-01-15 浪潮(北京)电子信息产业有限公司 Method for achieving file lock and metadata server
CN103516812A (en) * 2013-10-22 2014-01-15 浪潮电子信息产业股份有限公司 Method for accelerating cloud storage internal data transmission
CN103559231A (en) * 2013-10-23 2014-02-05 华为技术有限公司 File system quota managing method, device and system
CN104166520A (en) * 2013-05-20 2014-11-26 深圳先进技术研究院 Distributed hard disk system and data migration method thereof
CN104573428A (en) * 2013-10-12 2015-04-29 方正宽带网络服务股份有限公司 Method and system for improving resource effectiveness of server cluster
WO2018019255A1 (en) * 2016-07-26 2018-02-01 中兴通讯股份有限公司 File writing method and device
CN108881107A (en) * 2017-05-09 2018-11-23 腾讯科技(深圳)有限公司 A kind of distributed resource dispensing method, apparatus and system
CN109977079A (en) * 2019-04-01 2019-07-05 江苏汇智达信息科技有限公司 A kind of data processing method and device based on distributed file system
US11093532B2 (en) 2017-05-25 2021-08-17 International Business Machines Corporation Pre-allocating filesystem metadata within an object storage system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011147073A1 (en) * 2010-05-24 2011-12-01 中兴通讯股份有限公司 Data processing method and device in distributed file system
CN104809124B (en) * 2014-01-24 2018-06-26 中国移动通信集团河北有限公司 Cloud Virtual File System and its input/output request processing method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185665B1 (en) * 1997-02-28 2001-02-06 Matsushita Electric Industrial Co., Ltd. File management apparatus, file management method, and recording medium containing file management program
JP2001266259A (en) * 2000-03-15 2001-09-28 Sharp Corp Transaction processor
US8458238B2 (en) * 2004-10-26 2013-06-04 Netapp, Inc. Method and system for efficient write journal entry management for a distributed file system
CN1277213C (en) * 2004-12-31 2006-09-27 大唐微电子技术有限公司 A flash memory file system management method
KR100703753B1 (en) * 2005-04-14 2007-04-06 삼성전자주식회사 Apparatus and method for managing file system

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452402B (en) * 2008-11-28 2012-05-30 珠海金山快快科技有限公司 Software operation system and software operation method
CN101706802B (en) * 2009-11-24 2013-06-05 成都市华为赛门铁克科技有限公司 Method, device and sever for writing, modifying and restoring data
CN102073739A (en) * 2011-01-25 2011-05-25 中国科学院计算技术研究所 Method for reading and writing data in distributed file system with snapshot function
CN102622412A (en) * 2011-11-28 2012-08-01 中兴通讯股份有限公司 Method and device of concurrent writes for distributed file system
CN103294704A (en) * 2012-02-28 2013-09-11 鸿富锦精密工业(深圳)有限公司 File synchronous system and method
CN104166520A (en) * 2013-05-20 2014-11-26 深圳先进技术研究院 Distributed hard disk system and data migration method thereof
CN104166520B (en) * 2013-05-20 2019-01-11 深圳先进技术研究院 Distributed hard-disk system and wherein carry out Data Migration method
CN104573428A (en) * 2013-10-12 2015-04-29 方正宽带网络服务股份有限公司 Method and system for improving resource effectiveness of server cluster
CN104573428B (en) * 2013-10-12 2018-02-13 方正宽带网络服务股份有限公司 A kind of method and system for improving server cluster resource availability
CN103514298A (en) * 2013-10-16 2014-01-15 浪潮(北京)电子信息产业有限公司 Method for achieving file lock and metadata server
CN103516812A (en) * 2013-10-22 2014-01-15 浪潮电子信息产业股份有限公司 Method for accelerating cloud storage internal data transmission
CN103559231A (en) * 2013-10-23 2014-02-05 华为技术有限公司 File system quota managing method, device and system
WO2018019255A1 (en) * 2016-07-26 2018-02-01 中兴通讯股份有限公司 File writing method and device
CN108881107A (en) * 2017-05-09 2018-11-23 腾讯科技(深圳)有限公司 A kind of distributed resource dispensing method, apparatus and system
US11093532B2 (en) 2017-05-25 2021-08-17 International Business Machines Corporation Pre-allocating filesystem metadata within an object storage system
CN109977079A (en) * 2019-04-01 2019-07-05 江苏汇智达信息科技有限公司 A kind of data processing method and device based on distributed file system
CN109977079B (en) * 2019-04-01 2021-10-26 泰州清润环保科技有限公司 Data processing method and device based on distributed file system

Also Published As

Publication number Publication date
CN100517335C (en) 2009-07-22

Similar Documents

Publication Publication Date Title
CN100517335C (en) Distributed file system file writing system and method
JP5142995B2 (en) Memory page management
CN110555001B (en) Data processing method, device, terminal and medium
CN107544756B (en) Key-Value log type local storage method based on SCM
CN110858162B (en) Memory management method and device and server
CN102063406A (en) Network shared Cache for multi-core processor and directory control method thereof
CN107256196A (en) The caching system and method for support zero-copy based on flash array
CN100424699C (en) Attribute extensible object file system
CN111984191A (en) Multi-client caching method and system supporting distributed storage
Lee et al. Metadata management of the SANtopia file system
Rumble Memory and object management in RAMCloud
CN108089825A (en) A kind of storage system based on distributed type assemblies
US20160012075A1 (en) Computer system and data management method
CN109960662A (en) A kind of method for recovering internal storage and equipment
Li et al. Enabling efficient updates in KV storage via hashing: Design and performance evaluation
JPWO2004036432A1 (en) Database accelerator
CN116894041B (en) Data storage method, device, computer equipment and medium
CA2415018C (en) Adaptive parallel data clustering when loading a data structure containing data clustered along one or more dimensions
WO2024197789A1 (en) Fine-grained file system and file reading and writing method
CN114785662B (en) Storage management method, device, equipment and machine-readable storage medium
CN115794368A (en) Service system, memory management method and device
CN100395730C (en) Data source based virtual memory processing method
CN1333346C (en) Method for accessing files
US7085888B2 (en) Increasing memory locality of filesystem synchronization operations
US6728854B2 (en) System and method for providing transaction management for a data storage space

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090722

Termination date: 20191025

CF01 Termination of patent right due to non-payment of annual fee