CN102147802A - Pseudo-random type NFS application acceleration system - Google Patents

Pseudo-random type NFS application acceleration system Download PDF

Info

Publication number
CN102147802A
CN102147802A CN2010106117218A CN201010611721A CN102147802A CN 102147802 A CN102147802 A CN 102147802A CN 2010106117218 A CN2010106117218 A CN 2010106117218A CN 201010611721 A CN201010611721 A CN 201010611721A CN 102147802 A CN102147802 A CN 102147802A
Authority
CN
China
Prior art keywords
nfs
module
client
pseudorandom
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010106117218A
Other languages
Chinese (zh)
Other versions
CN102147802B (en
Inventor
骆志军
许建卫
袁清波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN 201010611721 priority Critical patent/CN102147802B/en
Publication of CN102147802A publication Critical patent/CN102147802A/en
Application granted granted Critical
Publication of CN102147802B publication Critical patent/CN102147802B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a pseudo-random type network file system (NFS) application acceleration system. The pseudo-random type NFS application acceleration system comprises two kernel modules, namely prefetch network file systemdaemon (pfnfsd) and device mapper cache (dmcache); pfnfsd is a rewrite of the kernel module nfsd, mainly comprises an NFS client request positioning module and a pre-acquiring module, and is used for capturing and positioning an NFS request of a client and executing pre-acquiring operation; and dmcache is a target driver realized on the basis of a linux kernel device mapper mechanism, mainly comprises a module for managing high-speed storage equipment, mapping an address and recycling resources, and is used for storing the pre-acquired data to the high-performance storage equipment. In the pseudo-random type NFS application acceleration system, the application is transparent, the kernel changes small and the extra high-speed storage equipment can be loaded.

Description

A kind of pseudorandom class NFS uses accelerating system
Technical field
The present invention relates to storage system I/O performance optimization field, be specifically related to a kind of pseudorandom class NFS and use accelerating system.
Background technology
Network file system(NFS) (NFS:Network File System) is because its stability, ease for use and the free characteristic of increasing income often become the first-selected system that many cluster networks are stored.Though also have much other network storage products now, but its stability, ease for use can't be compared with NFS, or valuable product, are not suitable for middle and small scale cluster network system, so for most of group system, NFS still is irreplaceable.
Along with the NFS framework reach its maturity and perfect, NFS has goed deep into various applications, such as high-performance calculation, IPTV (Internet Protocol Television), sensor information processing, petroleum prospecting etc.But along with the develop rapidly of processor computing velocity, the I/O bandwidth that NFS provided has become the bottleneck of group system gradually.
At the I/O performance issue of NFS file system, various researchs and improvement have also obtained effective progress.Particularly various I/O access modules are more clocklike used, use such as IPTV, its access module often has characteristics such as succession, predictability, by extracting the access module of these application, take buffer memory, look ahead or technology such as I/O parallelization, can effectively improve the I/O performance of NFS.
Yet for the various application with random access mode, the correlative study of I/O performance that can promote nfs server is considerably less.Such as petroleum prospecting, the calculated amount of these application and data volume are very big, and data manipulation mainly is with machine-readable, but can obtain in advance with machine-readable access sequence.
Device Mapper is a kind of mapping framework mechanism from the logical device to the physical equipment that provides in Linux 2.6 kernels, and it is realized work such as I/O ask to redirect by modular target driver plug-in unit.By means of Device Mapper mechanism, we are the memory device of low speed, as hard disk and memory device at a high speed, and as ramdisk, the block device that a plurality of device map such as SSD are a logic.
Summary of the invention
At the application of above-mentioned random access mode, the present invention proposes a kind of pseudorandom class NFS and use accelerating system.
A kind of pseudorandom class NFS uses accelerating system, comprises two kernel modules of pfnfsd and dmcache;
Wherein, described pfnfsd is the rewriting to kernel module nfsd, mainly comprises NFS client-requested locating module and prefetch module, and the NFS that is used for intercepting and capturing with positioning client terminal asks, and carries out prefetch operation;
Described dmcache is based on the targetdriver that linux kernel device mapper mechanism realizes, mainly comprises management, map addresses and the resource recycling module of high speed storing equipment, and the deposit data of being responsible for looking ahead is to the high-performance memory device.
First kind of optimal technical scheme of the present invention is: the pfnfsd module is set up a concordance list for each internal memory according to each client indexes file in internal memory, safeguard a pointer, pointing to is the pairing index position of base request at last for the last time, searches concordance list according to the NFS client read request that obtains afterwards and determines the position of client current request in concordance list.
A kind of more preferably technical scheme of the present invention is: the pfnfsd module determine behind the client reading location can with the thread synchronization of looking ahead, can also determine the module that the dmcache module can be recovered.
Second kind of optimal technical scheme of the present invention is: the prefetch module of described pfnfsd, use the kernel thread of looking ahead to look ahead according to concordance list in kernel.
The third optimal technical scheme of the present invention is: also comprise the window that reorders of looking ahead, prefetch module is directly read sorted concordance list.
The 4th kind of optimal technical scheme of the present invention is: the dmcache module is disk and high speed storing device map a block device, and wherein high speed storing equipment is used for storing prefetch data, and its capacity is externally hidden, and is equivalent to high-speed cache of disk.
The 5th kind of optimal technical scheme of the present invention is: dmcache carries out map addresses and IO redirect operation at memory device and logical device, and management of cache equipment.
The 6th kind of optimal technical scheme of the present invention is: when the data order that reads when prefetch data and NFS client is the same, promptly adopt recovery algorithm recovered data block.
Another kind of the present invention more preferably technical scheme is: described recovery algorithm is that the NFS client whenever runs through the reorder data of window of an I/O, will be subjected to a data block of window.
Beneficial effect of the present invention is as follows:
The transparency: because NPRP is the kernel module that is carried in the nfs server end, do not need application, kernel to do any change to client, corresponding to having the transparency;
Change little to linux kernel: NPRP has inserted a function to the NFS kernel module, is used to intercept and capture the I/O request of NFS client, and in addition, other module all is the standalone module that can dynamically add and unload, and is very little to the change of kernel;
Can load extra high speed storing equipment: NPRP and can utilize the storage medium of host memory ramdisk, also can load extra high speed storing equipment, as MCard8 as prefetch data; Increase extra high speed storing equipment, be equivalent to expand the internal memory of I/O server, make the I/O server have stronger caching performance.
Description of drawings
Fig. 1 is that pseudorandom class NFS uses the speed technology framework
Fig. 2 is location NFS client reading location
Fig. 3 is the dmcache framework
Embodiment
Pseudorandom class NFS uses speed technology and mainly is made of pfnfsd (prefetch network file systemdaemon) and two kernel modules of dmcache (device mapper cache).Pfnfsd is the rewriting to kernel module nfsd, mainly comprises NFS client-requested locating module and prefetch module, is used for intercepting and capturing the NFS request with positioning client terminal, and carries out most principal work such as look ahead; Dmcache is based on the target driver that linux kernel Device Mapper mechanism realizes, the management, map addresses and the resource recycling module that mainly comprise high speed storing equipment, it is put into the data of looking ahead on the high-performance memory device, as ramdisk, and high property memory device SSD etc.
Among Fig. 1, the Request Processing process of nfs is actually according to information requested, in the pfnfsd module location of client is carried out in request, according to locating information look ahead, synchronously, reclaim resource; Carry out the process of the redirected processing of I/O then in the dmcache module.
1.pfnfsd module
Pfnfsd at first obtains the index file of each client, sets up a concordance list for each client in internal memory, and safeguards a pointer for each concordance list, points to the pairing index position of last actual request.
Afterwards, pfnfsd constantly obtains the read request of NFS client, according to this request and concordance list, can determine the position of client current request in concordance list.The process flow diagram of simplifying as shown in Figure 2.
As long as the read request of NFS client and certain sequence of concordance list have common factor, we can think that client read certain request of index file, suppose that it is N request that this and client have the request of common factor in concordance list, because the client read request is to get off according to the request sequential read of concordance list, then current reading location can be represented in the position of concordance list with current request, be N.
Determine the reading location of client, on the one hand, can with the thread synchronization of looking ahead, the thread that prevents to look ahead lags behind the application program of client; On the other hand, can determine which cache blocks of dmcache module can be recovered, this problem can have a detailed description in reclaiming the cache blocks module.
Pfnfsd also comprises a prefetch module, uses a kernel thread to look ahead according to concordance list at kernel, is referred to as the thread of looking ahead.Why using a thread rather than a plurality of thread, is because find through our test, to same file, single-threaded read performance often Duo simultaneously a plurality of different pieces of a file than multithreading faster, so use is single-threaded.
In order to improve the succession of disk access, carried out I/O and reordered.I/O reorders at first an ordering window, and window size asks corresponding I/O capacity to be weighed with wherein contained.Sort such as the request to every 100MB capacity of concordance list, the ordering window is 100MB.In advance concordance list is sorted according to this ordering window outside prefetch module, prefetch module is directly read this sorted concordance list and is looked ahead, or looks ahead in prefetch module ordering earlier again.Read request position that must viewing client-side when looking ahead, if then can look ahead greater than the read request position of client in the position of looking ahead, otherwise the position of looking ahead been has has been caught up with and surpassed in the read request position of client, at this moment must allow client awaits.
The data of looking ahead are not to be kept in the internal memory by working as front module, but are kept in the cache device by the dmcache module, and this equipment can be ramdisk, SSD etc.
2.dmcache module
The dmcache module is mainly carried out buffer memory to the data of looking ahead.It is embodied as the target driver of Device Mapper.
Device Mapper is a kind of mapping framework mechanism from the logical device to the physical equipment that provides in Linux 2.6 kernels, and it is realized work such as I/O ask to redirect by modular target driver plug-in unit.By means of Device Mapper mechanism, we are the memory device of low speed, as hard disk and memory device at a high speed, and as ramdisk, the block device that a plurality of device map such as SSD are a logic.
The flow process of dmcache module as shown in Figure 3.
The dmcache module is under the Linux file system layer, on the device drive layer, so the request of seeing at Fig. 3 is bio.
In Fig. 3, the dmcache module is disk and capacity high speed storing device map a block device, and wherein high speed storing equipment is used for storing the data of looking ahead, and its capacity is externally to hide, and is equivalent to a high-speed cache of disk.
Dmcache on the one hand need be the map addresses of shining upon the logical device of coming out to disk or high speed storing equipment, and this mainly is the function of map addresses and I/O redirection module; On the other hand, need management high speed storing equipment, this is the function of the administration module of cache blocks (region).
The bio request of getting off from file system layer has two classes:
A kind of is the prefetch request that thread gets off of looking ahead, if the data of this request at this moment not at high speed storing equipment, then from the hard disk read data and copy high speed storing equipment to, otherwise are directly returned from high speed storing equipment read data;
Another kind of request is the request of coming from the NFS client, if this request is then fetched data from high speed storing equipment, otherwise directly fetched data to hard disk at high speed storing equipment.
To these two kinds of different bio, all must be through the Hash inquiry, I/O is redirected could arrive real physical equipment.
It is that the cache blocks of unit is managed that high speed storing equipment is divided into cache blocks region, and region is a data structure being responsible for being mapped to from the address space of disk the address space of high speed storing equipment cache.So, each bio is searched its data that read whether in high speed storing equipment the time, only need find these data to get final product at the region of the address of disk correspondence.
For the address space from disk finds corresponding region, we have used a Hash table to satisfy this mapping.All region that filled data in magnetic disk can also not have the region of padding data then to be placed in the chained list in this Hash table.So in Fig. 3, judge that bio whether when prefetched, in fact inquires about this Hash table with its corresponding disk address.Find region to represent that then data are prefetched, and can obtain the address of these data at high speed storing equipment by this region, thus the data of obtaining.
The region that utilization is found changes target device and the equipment bias internal of bio, and in fact this be exactly that I/O is redirected.
Just as the page of operating system need reclaim, after idle region has run out, need to replace and reclaim.
Because what high speed storing equipment was preserved is the data of looking ahead, if the data that read of data of looking ahead and NFS client are the same in proper order, so after the NFS client is hit, promptly recyclable this region; But, the thread of looking ahead has carried out I/O to index file and has reordered, the order that reads order and NFS client actual read data of the data of looking ahead is different, so can not simply use the method for " just hit and reclaim ", and the algorithm of traditional LRU and so on is also improper, because reading of NFS client is discrete at random fully, the data of hitting now often all are the data that can not read once more.
We adopt the region be suitable for above-mentioned feature to reclaim algorithm: the NFS client whenever runs through the reorder data of window of an I/O, just reclaims the region of a window.This algorithm is more effective, is because the NFS client-requested sequence and the thread request sequence of looking ahead, and the beginning of reordering at each I/O and when finishing always has one " intersection point ".If NFS client read data has been crossed a window, all region of then previous window just can reclaim certainly.

Claims (9)

1. a pseudorandom class NFS uses accelerating system, it is characterized in that: comprise two kernel modules of pfnfsd and dmcache;
Wherein, described pfnfsd is the rewriting to kernel module nfsd, mainly comprises NFS client-requested locating module and prefetch module, and the NFS that is used for intercepting and capturing with positioning client terminal asks, and carries out prefetch operation;
Described dmcache is based on the target driver that linux kernel device mapper mechanism realizes, mainly comprises management, map addresses and the resource recycling module of high speed storing equipment, and the deposit data of being responsible for looking ahead is to the high-performance memory device.
2. a kind of according to claim 1 pseudorandom class NFS uses accelerating system, it is characterized in that: the pfnfsd module is set up a concordance list for each internal memory according to each client indexes file in internal memory, safeguard a pointer, pointing to is the pairing index position of base request at last for the last time, searches concordance list according to the NFS client read request that obtains afterwards and determines the position of client current request in concordance list.
3. use accelerating system as a kind of pseudorandom class NFS as described in the claim 2, it is characterized in that: the pfnfsd module determine behind the client reading location can with the thread synchronization of looking ahead, can also determine the module that the dmcache module can be recovered.
4. a kind of according to claim 1 pseudorandom class NFS uses accelerating system, it is characterized in that: the prefetch module of described pfnfsd, use the kernel thread of looking ahead to look ahead according to concordance list in kernel.
5. a kind of according to claim 1 pseudorandom class NFS uses accelerating system, and it is characterized in that: also comprise the window that reorders of looking ahead, prefetch module is directly read sorted concordance list.
6. a kind of according to claim 1 pseudorandom class NFS uses accelerating system, it is characterized in that: the dmcache module is disk and high speed storing device map a block device, wherein high speed storing equipment is used for storing prefetch data, and its capacity is externally hidden, and is equivalent to high-speed cache of disk.
7. a kind of according to claim 1 pseudorandom class NFS uses accelerating system, and it is characterized in that: dmcache carries out map addresses and IO redirect operation at memory device and logical device, and management of cache equipment.
8. a kind of according to claim 1 pseudorandom class NFS uses accelerating system, it is characterized in that: when the data order that reads when prefetch data and NFS client is the same, promptly adopt recovery algorithm recovered data block.
9. use accelerating system as a kind of pseudorandom class NFS as described in the claim 8, it is characterized in that: described recovery algorithm is that the NFS client whenever runs through the reorder data of window of an I/O, will be subjected to a data block of window.
CN 201010611721 2010-12-17 2010-12-17 Pseudo-random type NFS application acceleration system Active CN102147802B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010611721 CN102147802B (en) 2010-12-17 2010-12-17 Pseudo-random type NFS application acceleration system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010611721 CN102147802B (en) 2010-12-17 2010-12-17 Pseudo-random type NFS application acceleration system

Publications (2)

Publication Number Publication Date
CN102147802A true CN102147802A (en) 2011-08-10
CN102147802B CN102147802B (en) 2013-02-20

Family

ID=44422068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010611721 Active CN102147802B (en) 2010-12-17 2010-12-17 Pseudo-random type NFS application acceleration system

Country Status (1)

Country Link
CN (1) CN102147802B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317551A (en) * 2014-10-17 2015-01-28 北京德加才科技有限公司 Ultrahigh-safety true random number generation method and ultrahigh-safety true random number generation system
CN104793892A (en) * 2014-01-20 2015-07-22 上海优刻得信息科技有限公司 Method for accelerating random in-out (IO) read-write of disk
CN106407133A (en) * 2015-07-30 2017-02-15 爱思开海力士有限公司 Memory system and operating method thereof
CN106528001A (en) * 2016-12-05 2017-03-22 北京航空航天大学 Cache system based on nonvolatile memory and software RAID
CN108520050A (en) * 2018-03-30 2018-09-11 北京邮电大学 A kind of Merkle trees buffer storage based on two-dimensional localization and its operating method to Merkle trees

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211358A (en) * 2006-12-31 2008-07-02 联想(北京)有限公司 Method for accessing dedicated file systems
CN101477486A (en) * 2009-01-22 2009-07-08 中国人民解放军国防科学技术大学 File backup recovery method based on sector recombination

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211358A (en) * 2006-12-31 2008-07-02 联想(北京)有限公司 Method for accessing dedicated file systems
CN101477486A (en) * 2009-01-22 2009-07-08 中国人民解放军国防科学技术大学 File backup recovery method based on sector recombination

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
《2009 Sixth IFIP International Conference on Network and Parallel Computing》 20091231 Bin Chen,et al. "DPM: A Demand-driven Virtual Disk Prefetch Mechanism for Mobile Personal Computing Environments" 59-66 1-9 , *
《international conference for high performance computing,networking,storage and analysis》 20081231 Surendra Byna,et al. "Parallel I/O Prefetching Using MPI File Caching and I/O Signatures" 1-12 1-9 , *
《计算机工程与科学》 20091231 翟佳等 "面向IO服务器的高性能存储器的实现与优化" 17-20 1 第31卷, 第11期 *
《计算机工程与设计》 20101215 朱正义 "高性能块级CDP 系统的研究与设计" 5224-5226、5261 1 第31卷, 第24期 *
BIN CHEN,ET AL.: ""DPM: A Demand-driven Virtual Disk Prefetch Mechanism for Mobile Personal Computing Environments"", 《2009 SIXTH IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING》 *
SURENDRA BYNA,ET AL.: ""Parallel I/O Prefetching Using MPI File Caching and I/O Signatures"", 《INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING,NETWORKING,STORAGE AND ANALYSIS》 *
朱正义: ""高性能块级CDP 系统的研究与设计"", 《计算机工程与设计》 *
翟佳等: ""面向IO服务器的高性能存储器的实现与优化"", 《计算机工程与科学》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104793892A (en) * 2014-01-20 2015-07-22 上海优刻得信息科技有限公司 Method for accelerating random in-out (IO) read-write of disk
CN104793892B (en) * 2014-01-20 2019-04-19 优刻得科技股份有限公司 A method of accelerating disk stochastic inputs output (IO) read-write
CN104317551A (en) * 2014-10-17 2015-01-28 北京德加才科技有限公司 Ultrahigh-safety true random number generation method and ultrahigh-safety true random number generation system
CN106407133A (en) * 2015-07-30 2017-02-15 爱思开海力士有限公司 Memory system and operating method thereof
CN106407133B (en) * 2015-07-30 2020-10-27 爱思开海力士有限公司 Storage system and operation method thereof
CN106528001A (en) * 2016-12-05 2017-03-22 北京航空航天大学 Cache system based on nonvolatile memory and software RAID
CN106528001B (en) * 2016-12-05 2019-08-23 北京航空航天大学 A kind of caching system based on nonvolatile memory and software RAID
CN108520050A (en) * 2018-03-30 2018-09-11 北京邮电大学 A kind of Merkle trees buffer storage based on two-dimensional localization and its operating method to Merkle trees

Also Published As

Publication number Publication date
CN102147802B (en) 2013-02-20

Similar Documents

Publication Publication Date Title
US10176057B2 (en) Multi-lock caches
Lu et al. Wisckey: Separating keys from values in ssd-conscious storage
CN112214424B (en) Object memory architecture, processing node, memory object storage and management method
Liao et al. Multi-dimensional index on hadoop distributed file system
Debnath et al. Revisiting hash table design for phase change memory
US6754800B2 (en) Methods and apparatus for implementing host-based object storage schemes
Lu et al. BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash
CN105302744B (en) The invalid data area of Cache
US9792221B2 (en) System and method for improving performance of read/write operations from a persistent memory device
US20150095346A1 (en) Extent hashing technique for distributed storage architecture
EP2885728A2 (en) Hardware implementation of the aggregation/group by operation: hash-table method
CN102147802B (en) Pseudo-random type NFS application acceleration system
US9940060B1 (en) Memory use and eviction in a deduplication storage system
US9229869B1 (en) Multi-lock caches
US9612975B2 (en) Page cache device and method for efficient mapping
CN106133703A (en) RDMA is used to scan internal memory for deleting repetition
CN109582221A (en) Host computing device, remote-server device, storage system and its method
Hoque et al. Disk layout techniques for online social network data
US20230350810A1 (en) In-memory hash entries and hashes used to improve key search operations for keys of a key value store
Tulkinbekov et al. CaseDB: Lightweight key-value store for edge computing environment
Zhou et al. Hierarchical consistent hashing for heterogeneous object-based storage
US10339052B2 (en) Massive access request for out-of-core textures by a parallel processor with limited memory
US20230350610A1 (en) Prefetching keys for garbage collection
CN1212570C (en) Two-stage CD mirror server/client cache system
US11899642B2 (en) System and method using hash table with a set of frequently-accessed buckets and a set of less frequently-accessed buckets

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220727

Address after: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing

Patentee after: Dawning Information Industry (Beijing) Co.,Ltd.

Patentee after: DAWNING INFORMATION INDUSTRY Co.,Ltd.

Address before: 100084 Beijing Haidian District City Mill Street No. 64

Patentee before: Dawning Information Industry (Beijing) Co.,Ltd.