WO2012024800A1 - Method and system for extending data storage system functions - Google Patents

Method and system for extending data storage system functions Download PDF

Info

Publication number
WO2012024800A1
WO2012024800A1 PCT/CA2011/050514 CA2011050514W WO2012024800A1 WO 2012024800 A1 WO2012024800 A1 WO 2012024800A1 CA 2011050514 W CA2011050514 W CA 2011050514W WO 2012024800 A1 WO2012024800 A1 WO 2012024800A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data storage
intercepting
interceptor
filesystem
Prior art date
Application number
PCT/CA2011/050514
Other languages
French (fr)
Inventor
Rayan Zachariassen
Steven Lamb
Laryn-Joe Fernandes
Original Assignee
Rayan Zachariassen
Steven Lamb
Laryn-Joe Fernandes
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rayan Zachariassen, Steven Lamb, Laryn-Joe Fernandes filed Critical Rayan Zachariassen
Priority to EP11819254.1A priority Critical patent/EP2609528A4/en
Priority to AU2011293014A priority patent/AU2011293014B2/en
Priority to CN2011800509497A priority patent/CN103201736A/en
Priority to KR20137007468A priority patent/KR101510025B1/en
Priority to JP2013525098A priority patent/JP2013536514A/en
Publication of WO2012024800A1 publication Critical patent/WO2012024800A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/16Protection against loss of memory contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/188Virtual file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring

Definitions

  • the present invention relates generally to data storage systems, and more particularly, to a method and system for extending the functions of a data storage system.
  • a filesystem on a computer enables the operating system of the computer to act as a taisted third party that enforces security and naming protocols between communicating processes, even where such processes are not active at the same time.
  • One of the functions of a filesystem is to catalogue and organize data provided to it so that it can later be retrieved. In order to carry out this function, filesystems must manage a storage resource. Typical storage resources appear to the filesystem as a contiguous byte sequence, a contiguous block sequence, or essentially a key-value store for example in an object storage system.
  • Filesystems are typically part of the operating system on a computer, but may also exist as an extension of the operating system in an application, or as a pure application accessed over a network connection using a client filesystem protocol.
  • Various examples of these are known in the art.
  • a filesystem is designed to use a particular kind of storage resource, for example a disk, and will have constraints related to that storage resource. These constraints could be related to, for example, performance, capacity, parallelism, scalability, physical location, etc.
  • Additional constraints exist within the filesystem itself in terms of its features allowing manipulation of the data it is storing, for example to apply encryption, compression, replication, or other transformations including context sensitive processing (data polymorphism) when the filesystem is not provided with this functionality in advance.
  • context sensitive generally refers to a program feature that changes depending on what you are doing in the program.
  • context sensitive help provides documentation for the particular feature that you are in the process of using and context or context sensitive processing of data allows data to be processed differently depending on how or where the data will be used.
  • a method for extending the functionality of a data storage system including a data organization means and a data storage resource, the method including dissociating the data storage functions of the data storage system from other functions of the data storage system and transferring at least a portion of the data storage functions to an intercepting system.
  • the intercepting system is a complimentary storage resource.
  • the data organization means may be selected from the group comprising a filesystem, a key-value store and a database.
  • the dissociating step is carried out by providing an interceptor means in communication with the data organization means, the data storage resource and the intercepting system; the interceptor means intercepting a filesystem operation to determine whether a function of the operation should be handled by the data storage resource or by the intercepting system.
  • the interceptor means intercepts a filesystem operation while the operation still has context and before the operation would otherwise be decomposed into independent operations suitable for the data storage resource.
  • At least another portion of the data storage functions are retained with the data storage system. According to another aspect of the invention, all of the data storage functions are transferred to the intercepting system.
  • the data storage system is selected from the group comprising a filesystem, a key-value store, an object store and a network protocol.
  • the data storage system is shared among multiple external interfaces.
  • the interceptor means comprises a user- space application program in cooperation with a facility of an operating system or of a filesystem.
  • the interceptor means comprises a filesystem protocol proxy application performed on a network.
  • the interceptor means comprises a minifilter driver adapted to intercept filesystem operations in an operating system kernel.
  • the intercepting system uses one or more complimentary storage resources to carry out its functions, the complimentary storage resources being independent from the storage resource.
  • the intercepting system implements capacity expansion of the data storage system.
  • the intercepting system improves the performance of the data storage system by altering one or more characteristics of data on the storage resource.
  • the one or more characteristics are preferably selected from the group comprising a storage format, a storage location and storage order.
  • the method further includes the step of carrying out de-duplication by the intercepting system.
  • the method further includes the step of carrying out data polymorphism by the intercepting system.
  • the method for includes the step of step of implementing independent access control mechanisms for the data by the intercepting system.
  • the method for includes the step of versioning data by the intercepting system.
  • the method further includes the step of implementing one of single or multi-level caching of the data storage system; wherein the implementing step is carried out by the interceptor means.
  • the method further includes the step of implementing locality optimization by pushing less-used data to remote data storage systems and by pulling more-used data to nearby data storage systems; wherein the implementing step is carried out by the interceptor means.
  • the remote data storage systems include data storage systems that are either physically remote or require more time to access.
  • the method further includes the step of implementing name based virtualization making file names that are not valid in a current data storage system appear to be valid by referring to data on other data storage systems; wherein the implementing step is carried out by the interceptor means.
  • the method further includes the step of implementing data backup and data replication; wherein the implementing step is carried out by the interceptor means.
  • the method further includes the step of implementing data virtualization including allowing data under the same name to be physically located in different data storage systems; wherein the implementing step is carried out by the interceptor means.
  • the method further includes providing distinct capabilities for selected data at the interception system than at the storage resource, wherein the distinct capabilities are selected from the group comprising performance characteristics, de- duplication, data polymorphism, independent access control mechanisms, versioning, caching, locality, replication and data virtualization.
  • the selected data is preferably identified using a selection mechanism employing a metadata pattern matching of one or more selected from the group comprising name, timestamps, size, historical information, physical information and contextual information.
  • a system for extending the functionality of a data storage system including a data organization means and a data storage resource, the system comprising a dissociating means for dissociating data storage functions of the data storage system from other functions of the data storage system and a means for transferring at least a portion of the data storage functions to an intercepting system.
  • the intercepting system may be a complimentary storage resource.
  • the dissociating means comprises an interceptor in communication with the data organization means, the data storage resource and the intercepting system; the interceptor adapted to intercept a filesystem operation to determine whether a function of the operation should be handled by the data storage resource or by the intercepting system.
  • the interceptor intercepts a filesystem operation while the operation still has context and before the operation would otherwise be decomposed into independent operations suitable for the data storage resource.
  • at least another portion of the data storage functions are retained with the data storage system. Alternatively, all of the data storage functions are transferred to the intercepting system.
  • the data storage system is selected from the group comprising a filesystem, a key-value store, an object store and a network protocol.
  • the data storage system is shared among multiple external interfaces.
  • the interceptor comprises a user-space application program in cooperation with a facility of an operating system or of a filesystem.
  • the interceptor comprises a filesystem protocol proxy application performed on a network.
  • the interceptor comprises a minifilter driver adapted to intercept filesystem operations in an operating system kernel.
  • the intercepting system uses one or more complimentary storage resources to carry out its functions, the complimentary storage resources being independent from the storage resource.
  • the intercepting system implements capacity expansion of the data storage system.
  • the intercepting system improves the performance of the data storage system by altering one or more characteristics of data on the storage resource.
  • the one or more characteristics are selected from the group comprising a storage format, a storage location and storage order.
  • the intercepting system is adapted to perform de-duplication of the data.
  • the intercepting system is adapted to perform polymorphism of the data.
  • the intercepting system is adapted to perform independent access control mechanisms for the data.
  • the intercepting system is adapted to perform versioning of the data.
  • the interceptor is adapted to perform one of single or multi-level caching of the data storage system.
  • the interceptor is adapted to perform locality optimization by pushing less-used data to remote data storage systems and by pulling more- used data to nearby data storage systems.
  • the remote data storage systems include data storage systems that are either physically remote or require more time to access.
  • the interceptor is adapted to perform name based virtualization making file names that are not valid in a current data storage system appear to be valid by referring to data on other data storage systems.
  • the interceptor is adapted to perform one of data backup and data replication.
  • the interceptor is adapted to perform data virtualization including allowing data under the same name to be physically located in different data storage systems.
  • the interception system is adapted to perform one or more selected from the group comprising improving performance characteristics, de-duplication, data polymorphism, independent access control mechanisms, versioning, caching, locality, replication and data virtualization.
  • the data organization means is selected from the group comprising a filesystem, a key-value store and a database.
  • Figure 1 shows a high-level architecture of a system according to the invention.
  • Figure 2 shows a computer system in which the invention may be used and/or implemented.
  • Figure 3 shows an embodiment of the method according to the invention. Detailed Description of the Embodiments
  • the invention provides a novel system and method for affecting all data related constraints, including but not limited to those that can be affected using virtual storage resources.
  • the invention is able to provide this functionality for data storage systems other than filesysteiiis that provide higher level abstractions above the actual storage resources as their primary interface, for example databases and particular object databases, key value stores, certain network protocols and certain shared data systems.
  • the invention allows for all the traditional constraints of data storage systems to be changed without the cooperation of, or changes made to, the filesystem itself.
  • the invention does not require a complete overhaul or implementation of a new filesystem and permits for the advantages and extended functions described herein to be implemented on existing systems, in the interfaces between older and newer systems and in developing new systems as well with or without the need for a new filesystem to be developed.
  • the invention provides for an intercepting system to be attached to the filesystem that can selectively intercept filesystem operations while they still include context information before these operations are decomposed into operations suitable for the storage resource. That is, the filesystem includes contextual operations and instaictions for carrying them out that are provided by the operating system of the computer.
  • the intercepting system provides the functionality the filesystem requires at an interception point where the intercepting system's operation is transparent to the filesystem.
  • the invention provides for a method and system that extends the functionality of data storage resources and systems by dissociating the storage responsibility of the data storage system from its other functions (such as file naming, locking, sharing, security, etc.), and optionally having the actual storage responsibility being carried out by a separate component.
  • FIG. 1 With reference to Figure 1, there is shown one embodiment of the invention in which there is shown a data storage system 100, an interceptor 200 and an intercepting system 300. Details of the preferred embodiments in which various distributions of prior art data storage system functions are dissociated from the data storage system itself and divided up to be carried out by either the data storage system or the intercepting system are described below.
  • the novel dissociation and subsequent distribution of functionality between the data storage system 100 and the intercepting system 300 is believed to be novel in the art, and particularly with respect to providing a distribution of functionality between the data storage system 100 and the intercepting system 300 in a non-cooperative environment.
  • the preferred embodiment of the invention relates to an add-on mechanism to existing non-cooperative data storage systems, it can also be an architectural feature providing extensibility to a cooperative data storage system to provide extended and enhanced functionality, examples of which will be described in more detail below.
  • FIG. 3 there is shown another embodiment of the invention, where there is provided a method for extending the functionality of a data storage system including the steps of dissociating the data storage functions 310 of a data storage system from other functions of the data storage system by intercepting 315 a filesystem operation to determine whether a function of the operation should be handled by the data storage resource of by an intercepting system and transferring 320 at least a portion of the data storage functions to the intercepting system.
  • the intercepted data storage functions continue to the data storage resource 325.
  • the invention generally operates within the context of a computer system, and serves to provide an extension of the data storage capabilities associated with a general computer system, an exemplary one of which is shown in Figure 2.
  • the computer system 20 has a number of physical and logical components, including a central processing unit (“CPU") 24, random access memory (“RAM”) 28, an input/output (“I/O") interface 32, a network interface 36, non-volatile storage 40, and a local bus 44 enabling the CPU 24 to communicate with the other components.
  • the CPU 24 executes an operating system and a number of software systems.
  • RAM 28 provides relatively-responsive volatile storage to the CPU 24.
  • the I/O interface 32 allows for input to be received from one or more devices, such as a keyboard, a mouse, etc., and outputs information to output devices, such as a display and/or speakers.
  • the network interface 36 permits communication with other systems.
  • Non-volatile storage 40 stores the operating system and programs. During operation of the computer system 20, the operating system, the programs and the data may be retrieved from the non-volatile storage 40 and placed in RAM 28 to facilitate execution.
  • Data storage system 100 is generally known in the art, and for the purposes of the invention is any machine, device or apparatus and is able to store data using a given identifier (such as a filename), and later retrieve at least a portion of it, on demand by that identifier.
  • Data storage system 100 includes a data organization means 105 and uses an operationally independent, but optionally physically embedded, machine or module referred to herein as storage resource 110 (also shown as non-volatile storage 40 in Figure 2) to store data in a primitive form according to the characteristics of the storage resource, from where the data storage system itself can later retrieve the stored data.
  • Data storage system 100 may be shared among multiple external interfaces.
  • Representative examples of a data organization means 105 include filesystems, key- value stores, databases, and other machines layered on top of these such as web caches and page files, in combination with .
  • Representative examples of storage resources 110 are physical disks, memory such as RAM, RAID arrays, paper tape, documents, etc.
  • the storage system 100 may be a WindowsTM NTFS filesystem and the storage resource may be a hard disk drive.
  • Interceptor means 200 is a system for intercepting and processing data operations, such as filesystem operations. According to the invention, the interceptor means 200 intercepts a data operation while the operation still has context available but before the operation is decomposed into the appropriate context free and more specific operations for the storage resource, as is the case for virtually all storage resources. Context may be, for example, application information including the source of the data request, details of the data being requested, details of what the data will be used for and other information requiring knowledge of the current data operation.
  • the interceptor means 200 is a minifilter or legacy filter driver designed and otherwise arranged to intercept filesystem operations at the appropriate level in the WindowsTM operating system kernel. Such a driver may be independently implemented or it may be procured commercially.
  • the interception mechanism could be provided by facilities built into other operating environments, for example, but not limited to filesystem stacks, using application level filesystem providers like FUSE (filesystem in user space), STREAMS drivers for network intercepts, or physical transparent proxy machines.
  • the interceptor means may also be a custom designed software module incorporating the functionality herein described.
  • Interceptor means 200 referred to interchangeable as an interceptor, may also be a filesystem protocol proxy application.
  • the interceptor means 200 manages the junction between the data storage system 100 and the intercepting system 300.
  • Intercepting system 300 has an interest in certain operations of the data storage system. In the representative example, this could be all data inputs/outputs in order to redirect the data to another storage device.
  • the intercepting system 300 would configure the intercepting mechanism to pass all data input/output operations laterally to the intercepting system 300, instead of passing them through to the lower layers of the data storage system 100, as is traditionally done.
  • the intercepting system 300 Upon receipt of an intercepted operation, the intercepting system 300 has a number of options: (a) the intercepting system 300 could determine that it does not process this operation and instaict the interceptor means 200 to carry on as if there was no intercept, so that the interceptor means passes the operation on to the lower levels of the data storage system 100; (b) the intercepting system 300 could determine that it needs to modify the operation before passing the operation on to the lower levels of the data storage system 100; or, (c) the intercepting system 300 could determine that it is responsible for handling the operation itself and take over responsibility for it, in which case the interception mechanism 200 must react to the upper and lower levels of the data storage system appropriately for that operation, including all error handling.
  • the intercepting system 300 may be any number of hardware devices, such as a complimentary storage resource, a multimedia extender, a multimedia server, a home server, an application specific controller, or any similar system that is capable of performing various tasks, including those that are typically performed by storage resources.
  • the interceptor means 200 by which the intercepting system is able to transparently, that is in a manner not identifiable by the data storage system, take responsibility for intercepted operations.
  • This allows an existing filesystem to be extended with new functionality when the data storage resource is dissociated from the filesystem.
  • prior art intermediary programs that perform some data interception functions such as viais checkers and encryption shims, use the original data storage system to provide their functionality.
  • the invention further provides an existing data storage system to separate the location of stored data or metadata from the data storage system while retaining the original semantics of that data storage system.
  • the data storage system remains responsible for any existing data or metadata already stored, but the intercepting system can take over responsibility for any new or modified data or metadata according to its own policy as long as the external semantics of the existing data storage system remain unchanged.
  • the original filesystem remains responsible for all metadata, and the intercepting system takes over responsibility for some or all data.
  • the invention allows the NTFS filesystem metadata, including directories, the Master File Table (MFT), and the contents of each MFT entry, to be maintained on the filesystem' s storage resources as would normally occur, but the data itself could be manipulated separately and stored independently of the configuration of the original data storage system.
  • MFT Master File Table
  • the invention provides a number of advantages, including providing a way for a legacy, unmodified NTFS implementation to support modern filesystem features like replication, de- duplication, caching, pooling, as well as features usually provided by storage resources (at RAID levels), and new unique features.
  • the invention makes this possible in a manner that is compatible with existing applications because unmodified semantics are presented at the external application interface of the filesystem. That is, the filesystem remains unchanged and is useable by the operating system without modification.
  • WHS MicrosoftTM Windows Home Server project
  • NTFS NTFS
  • WHS initially contained a function called Drive Extender that extended NTFS on the Windows system by using a file based paradigm, i.e. whole files could be placed on a different destination NTFS filesystem and a reference (a "tombstone") left on the original NTFS filesystem.
  • This approach failed because it was not completely transparent to applications (the reference that was left behaved differently than the original file), and in fact the functionality Drive Extender was trying to provide was initially re-implemented using a virtual block storage resource model, that is by appearing as a storage resource to the filesystem.
  • Redirected data can be stored on a faster storage resource than the original data storage system uses, or it could be stored in a way which leads to faster reading and/or writing of the data, or both mechanisms could be used simultaneously.
  • the metadata of the storage system can be maintained on the original storage resource, but the data itself stored on the faster storage resource.
  • the interceptor means directs the filesystem to retrieve metadata from the original storage resource, but to retrieve data from the faster storage resource.
  • the data appears as it would have if it were retrieved directly from the original storage resource but has been retrieved much faster.
  • Performance may also be improved, for example, by caching data on a secondary storage resource that is possibly faster or has other improved characteristics.
  • the redirected data is stored in sparse files on the faster storage resource.
  • a sparse file is a file that contains holes where data is not allocated on the storage resource but represented virtually.
  • One of the problems in file caching systems is to represent the cached file in a space-efficient format, usually necessitating the creation of a data packing module to store actual data, and to translate between internal file location and where the data for those locations is stored.
  • the data that has not yet been retrieved from the original file should be represented virtually, not physically.
  • Using sparse files allows any filesystem supporting sparse files to provide this functionality, and saves having to develop and provide this functionality within the invention. Applicant believes that the use of sparse files to cache the data of real files is novel, and a person skilled in the art will appreciate the ability to use existing capabilities to provide this internal function of a sub-file cache.
  • the invention may also be used in the data compression context, where data could be compressed in various ways that are not limited by the characteristics of the original data storage system. For example, normal stream compression or de-duplication compression where the data is not stored again if it already exists somewhere in the system. This type of data compression is typically not possible directly at the level of the data storage system using the teachings of the prior art. A person skilled in the art will appreciate the distinct advantages of providing data compression within any filesystem and data storage resource system, irrespective of the filesystem and data storage resource system being used.
  • data polymorphism Another possible application is in data polymorphism where data can appear different dependent on the context in which it is intended to be used. While various data polymorphism applications are known in the art, these are typically performed at the application level and is constrained by the data storage system. According to the method of the invention, data can be manipulated at the interception system, that is at the data storage system level and be presented to the filesystem in a context-modified manner so operating system and application resources or modification is not required. [0075] In the context of a document management system, the invention allows for redirected data to be versioned in a manner which retains the data storage system semantics, but provides an extended semantics that allows retrieval of older versions of the data. Current systems require older versions of the data either to be updated to incorporate the extended semantics, or to be retrieved in a manner that differentiates older data from newer data.
  • Redirected data inputs/outputs could also provide input to a distributed data storage system that controls where the source data is located based on its time related usefulness to the data storage system. Again, in this regard, these features can be implemented at the data storage system level and will not appear any differently to the application accessing the filesystem.
  • data may no longer reside on the original data storage system's storage resource, and could be located somewhere completely different without have to provide adaptations to the filesystem.
  • Data virtualization becomes possible too, where a single piece of named data may no longer reside on the same storage resource, but could be scattered to multiple data storage systems and/or storage resources.
  • redirected data inputs/outputs could drive a replication process to ensure redundancy of data.
  • the method further includes providing distinct capabilities for selected data at the interception system than at the storage resource, wherein the distinct capabilities are selected from the group comprising performance characteristics, de- duplication, data polymorphism, independent access control mechanisms, versioning, caching, locality, replication and data virtualization.
  • the selected data is preferably identified using a selection mechanism employing a metadata pattern matching of one or more selected from the group comprising name, timestamps, size, historical information, physical information and contextual information.

Abstract

A method and system for extending the functionality of a data storage system, the data storage system including a data organization means and a data storage resource, the method including the steps of dissociating the data storage functions of the data storage system from other functions of the data storage system and transferring at least a portion of said data storage functions to an intercepting system.

Description

METHOD AND SYSTEM FOR EXTENDING DATA
STORAGE SYSTEM FUNCTIONS
[0001] This application claims priority from United States Provisional Application No. 61/376,905, filed on August 25, 2010, the contents of which are incorporated herein in their entirety by reference.
Field of the Invention
[0002] The present invention relates generally to data storage systems, and more particularly, to a method and system for extending the functions of a data storage system.
Background of the Invention
[0003] A filesystem on a computer enables the operating system of the computer to act as a taisted third party that enforces security and naming protocols between communicating processes, even where such processes are not active at the same time. One of the functions of a filesystem is to catalogue and organize data provided to it so that it can later be retrieved. In order to carry out this function, filesystems must manage a storage resource. Typical storage resources appear to the filesystem as a contiguous byte sequence, a contiguous block sequence, or essentially a key-value store for example in an object storage system.
[0004] Filesystems are typically part of the operating system on a computer, but may also exist as an extension of the operating system in an application, or as a pure application accessed over a network connection using a client filesystem protocol. Various examples of these are known in the art. In all cases a filesystem is designed to use a particular kind of storage resource, for example a disk, and will have constraints related to that storage resource. These constraints could be related to, for example, performance, capacity, parallelism, scalability, physical location, etc. Additional constraints exist within the filesystem itself in terms of its features allowing manipulation of the data it is storing, for example to apply encryption, compression, replication, or other transformations including context sensitive processing (data polymorphism) when the filesystem is not provided with this functionality in advance. In the art, the phrase context sensitive generally refers to a program feature that changes depending on what you are doing in the program. For example, context sensitive help provides documentation for the particular feature that you are in the process of using and context or context sensitive processing of data allows data to be processed differently depending on how or where the data will be used.
[0005] Prior art systems and methods of coping with some of these constraints has been to use virtual devices that create a virtual storage resource for the filesystem to use. Due to the lack of available context that exists within the filesystem and is not passed on to the storage resource, this approach is unable to provide context sensitive processing of the data and is also unable to remove constraints that are due to a 1 : 1 mapping of data the filesystem sees to data actually on the storage resource, for example the total data storage capacity.
[0006] It is therefore an object of the invention to provide a novel system and method for extending the functions of a data storage system, for example to extend the functions of a data storage system to permit context sensitive data processing.
Summary of the Invention
[0007] According to one embodiment of the invention, there is provided a method for extending the functionality of a data storage system, the data storage system including a data organization means and a data storage resource, the method including dissociating the data storage functions of the data storage system from other functions of the data storage system and transferring at least a portion of the data storage functions to an intercepting system.
[0008] According to one aspect of the invention, the intercepting system is a complimentary storage resource. The data organization means may be selected from the group comprising a filesystem, a key-value store and a database.
[0009] According to another aspect of the invention, the dissociating step is carried out by providing an interceptor means in communication with the data organization means, the data storage resource and the intercepting system; the interceptor means intercepting a filesystem operation to determine whether a function of the operation should be handled by the data storage resource or by the intercepting system.
[0010] According to another aspect of the invention, the interceptor means intercepts a filesystem operation while the operation still has context and before the operation would otherwise be decomposed into independent operations suitable for the data storage resource.
[0011] According to another aspect of the invention, at least another portion of the data storage functions are retained with the data storage system. According to another aspect of the invention, all of the data storage functions are transferred to the intercepting system.
[0012] According to another aspect of the invention, the data storage system is selected from the group comprising a filesystem, a key-value store, an object store and a network protocol.
[0013] According to another aspect of the invention, the data storage system is shared among multiple external interfaces.
[0014] According to another aspect of the invention, the interceptor means comprises a user- space application program in cooperation with a facility of an operating system or of a filesystem. According to another aspect of the invention, the interceptor means comprises a filesystem protocol proxy application performed on a network. According to another aspect of the invention, the interceptor means comprises a minifilter driver adapted to intercept filesystem operations in an operating system kernel.
[0015] According to another aspect of the invention, the intercepting system uses one or more complimentary storage resources to carry out its functions, the complimentary storage resources being independent from the storage resource.
[0016] According to another aspect of the invention, the intercepting system implements capacity expansion of the data storage system. According to another aspect of the invention, the intercepting system improves the performance of the data storage system by altering one or more characteristics of data on the storage resource. The one or more characteristics are preferably selected from the group comprising a storage format, a storage location and storage order.
[0017] According to another aspect of the invention, the method further includes the step of carrying out de-duplication by the intercepting system. According to another aspect of the invention, the method further includes the step of carrying out data polymorphism by the intercepting system. According to another aspect of the invention, the method for includes the step of step of implementing independent access control mechanisms for the data by the intercepting system. According to another aspect of the invention, the method for includes the step of versioning data by the intercepting system.
[0018] According to another aspect of the invention, the method further includes the step of implementing one of single or multi-level caching of the data storage system; wherein the implementing step is carried out by the interceptor means. According to another aspect of the invention, the method further includes the step of implementing locality optimization by pushing less-used data to remote data storage systems and by pulling more-used data to nearby data storage systems; wherein the implementing step is carried out by the interceptor means. The remote data storage systems include data storage systems that are either physically remote or require more time to access.
[0019] According to another aspect of the invention, the method further includes the step of implementing name based virtualization making file names that are not valid in a current data storage system appear to be valid by referring to data on other data storage systems; wherein the implementing step is carried out by the interceptor means.
[0020] According to another aspect of the invention, the method further includes the step of implementing data backup and data replication; wherein the implementing step is carried out by the interceptor means.
[0021] According to another aspect of the invention, the method further includes the step of implementing data virtualization including allowing data under the same name to be physically located in different data storage systems; wherein the implementing step is carried out by the interceptor means.
[0022] According to another aspect of the invention, the method further includes providing distinct capabilities for selected data at the interception system than at the storage resource, wherein the distinct capabilities are selected from the group comprising performance characteristics, de- duplication, data polymorphism, independent access control mechanisms, versioning, caching, locality, replication and data virtualization. The selected data is preferably identified using a selection mechanism employing a metadata pattern matching of one or more selected from the group comprising name, timestamps, size, historical information, physical information and contextual information.
[0023] According to another embodiment of the invention, there is provided a system for extending the functionality of a data storage system, the data storage system including a data organization means and a data storage resource, the system comprising a dissociating means for dissociating data storage functions of the data storage system from other functions of the data storage system and a means for transferring at least a portion of the data storage functions to an intercepting system.
[0024] According to one aspect of this embodiment, the intercepting system may be a complimentary storage resource.
[0025] According to another aspect of this embodiment, the dissociating means comprises an interceptor in communication with the data organization means, the data storage resource and the intercepting system; the interceptor adapted to intercept a filesystem operation to determine whether a function of the operation should be handled by the data storage resource or by the intercepting system.
[0026] According to another aspect of this embodiment, the interceptor intercepts a filesystem operation while the operation still has context and before the operation would otherwise be decomposed into independent operations suitable for the data storage resource. [0027] According to another aspect of this embodiment, at least another portion of the data storage functions are retained with the data storage system. Alternatively, all of the data storage functions are transferred to the intercepting system.
[0028] According to another aspect of this embodiment, the data storage system is selected from the group comprising a filesystem, a key-value store, an object store and a network protocol.
[0029] According to another aspect of this embodiment, the data storage system is shared among multiple external interfaces.
[0030] According to another aspect of this embodiment, the interceptor comprises a user-space application program in cooperation with a facility of an operating system or of a filesystem.
[0031] According to another aspect of this embodiment, the interceptor comprises a filesystem protocol proxy application performed on a network.
[0032] According to another aspect of this embodiment, the interceptor comprises a minifilter driver adapted to intercept filesystem operations in an operating system kernel.
[0033] According to another aspect of this embodiment, the intercepting system uses one or more complimentary storage resources to carry out its functions, the complimentary storage resources being independent from the storage resource.
[0034] According to another aspect of this embodiment, the intercepting system implements capacity expansion of the data storage system.
[0035] According to another aspect of this embodiment, the intercepting system improves the performance of the data storage system by altering one or more characteristics of data on the storage resource. [0036] According to another aspect of this embodiment, the one or more characteristics are selected from the group comprising a storage format, a storage location and storage order.
[0037] According to another aspect of this embodiment, the intercepting system is adapted to perform de-duplication of the data.
[0038] According to another aspect of this embodiment, the intercepting system is adapted to perform polymorphism of the data.
[0039] According to another aspect of this embodiment, the intercepting system is adapted to perform independent access control mechanisms for the data.
[0040] According to another aspect of this embodiment, the intercepting system is adapted to perform versioning of the data.
[0041] According to another aspect of this embodiment, the interceptor is adapted to perform one of single or multi-level caching of the data storage system.
[0042] According to another aspect of this embodiment, the interceptor is adapted to perform locality optimization by pushing less-used data to remote data storage systems and by pulling more- used data to nearby data storage systems.
[0043] According to another aspect of this embodiment, the remote data storage systems include data storage systems that are either physically remote or require more time to access.
[0044] According to another aspect of this embodiment, the interceptor is adapted to perform name based virtualization making file names that are not valid in a current data storage system appear to be valid by referring to data on other data storage systems.
[0045] According to another aspect of this embodiment, the interceptor is adapted to perform one of data backup and data replication. [0046] According to another aspect of this embodiment, the interceptor is adapted to perform data virtualization including allowing data under the same name to be physically located in different data storage systems.
[0047] According to another aspect of this embodiment, the interception system is adapted to perform one or more selected from the group comprising improving performance characteristics, de-duplication, data polymorphism, independent access control mechanisms, versioning, caching, locality, replication and data virtualization.
[0048] According to another aspect of this embodiment, the data organization means is selected from the group comprising a filesystem, a key-value store and a database.
Brief Description of the Drawings
[0049] Embodiments will now be described, by way of example only, with reference to the attached Figures, wherein:
[0050] Figure 1 shows a high-level architecture of a system according to the invention.
[0051] Figure 2 shows a computer system in which the invention may be used and/or implemented.
[0052] Figure 3 shows an embodiment of the method according to the invention. Detailed Description of the Embodiments
[0053] The invention provides a novel system and method for affecting all data related constraints, including but not limited to those that can be affected using virtual storage resources. The invention is able to provide this functionality for data storage systems other than filesysteiiis that provide higher level abstractions above the actual storage resources as their primary interface, for example databases and particular object databases, key value stores, certain network protocols and certain shared data systems.
[0054] Furthermore, the invention allows for all the traditional constraints of data storage systems to be changed without the cooperation of, or changes made to, the filesystem itself. Thus, the invention does not require a complete overhaul or implementation of a new filesystem and permits for the advantages and extended functions described herein to be implemented on existing systems, in the interfaces between older and newer systems and in developing new systems as well with or without the need for a new filesystem to be developed. Broadly, the invention provides for an intercepting system to be attached to the filesystem that can selectively intercept filesystem operations while they still include context information before these operations are decomposed into operations suitable for the storage resource. That is, the filesystem includes contextual operations and instaictions for carrying them out that are provided by the operating system of the computer. When the filesystem interacts with the storage resource, these contextual operations and instaictions are lost as the filesystem-storage resource interaction is only concerned with the retrieval, storage, and cataloguing of data. The intercepting system according to the invention provides the functionality the filesystem requires at an interception point where the intercepting system's operation is transparent to the filesystem. Thus, the invention provides for a method and system that extends the functionality of data storage resources and systems by dissociating the storage responsibility of the data storage system from its other functions (such as file naming, locking, sharing, security, etc.), and optionally having the actual storage responsibility being carried out by a separate component.
[0055] With reference to Figure 1, there is shown one embodiment of the invention in which there is shown a data storage system 100, an interceptor 200 and an intercepting system 300. Details of the preferred embodiments in which various distributions of prior art data storage system functions are dissociated from the data storage system itself and divided up to be carried out by either the data storage system or the intercepting system are described below. The novel dissociation and subsequent distribution of functionality between the data storage system 100 and the intercepting system 300 is believed to be novel in the art, and particularly with respect to providing a distribution of functionality between the data storage system 100 and the intercepting system 300 in a non-cooperative environment. However, although the preferred embodiment of the invention relates to an add-on mechanism to existing non-cooperative data storage systems, it can also be an architectural feature providing extensibility to a cooperative data storage system to provide extended and enhanced functionality, examples of which will be described in more detail below.
[0056] With reference to Figure 3, there is shown another embodiment of the invention, where there is provided a method for extending the functionality of a data storage system including the steps of dissociating the data storage functions 310 of a data storage system from other functions of the data storage system by intercepting 315 a filesystem operation to determine whether a function of the operation should be handled by the data storage resource of by an intercepting system and transferring 320 at least a portion of the data storage functions to the intercepting system. Alternatively, the intercepted data storage functions continue to the data storage resource 325.
[0057] The invention generally operates within the context of a computer system, and serves to provide an extension of the data storage capabilities associated with a general computer system, an exemplary one of which is shown in Figure 2. As shown, the computer system 20 has a number of physical and logical components, including a central processing unit ("CPU") 24, random access memory ("RAM") 28, an input/output ("I/O") interface 32, a network interface 36, non-volatile storage 40, and a local bus 44 enabling the CPU 24 to communicate with the other components. The CPU 24 executes an operating system and a number of software systems. RAM 28 provides relatively-responsive volatile storage to the CPU 24. The I/O interface 32 allows for input to be received from one or more devices, such as a keyboard, a mouse, etc., and outputs information to output devices, such as a display and/or speakers. The network interface 36 permits communication with other systems. Non-volatile storage 40 stores the operating system and programs. During operation of the computer system 20, the operating system, the programs and the data may be retrieved from the non-volatile storage 40 and placed in RAM 28 to facilitate execution.
[0058] Data storage system 100 is generally known in the art, and for the purposes of the invention is any machine, device or apparatus and is able to store data using a given identifier (such as a filename), and later retrieve at least a portion of it, on demand by that identifier. Data storage system 100 includes a data organization means 105 and uses an operationally independent, but optionally physically embedded, machine or module referred to herein as storage resource 110 (also shown as non-volatile storage 40 in Figure 2) to store data in a primitive form according to the characteristics of the storage resource, from where the data storage system itself can later retrieve the stored data. Data storage system 100 may be shared among multiple external interfaces.
[0059] Representative examples of a data organization means 105 include filesystems, key- value stores, databases, and other machines layered on top of these such as web caches and page files, in combination with . Representative examples of storage resources 110 are physical disks, memory such as RAM, RAID arrays, paper tape, documents, etc. In a representative example of the preferred embodiment, the storage system 100 may be a Windows™ NTFS filesystem and the storage resource may be a hard disk drive.
[0060] Interceptor means 200 is a system for intercepting and processing data operations, such as filesystem operations. According to the invention, the interceptor means 200 intercepts a data operation while the operation still has context available but before the operation is decomposed into the appropriate context free and more specific operations for the storage resource, as is the case for virtually all storage resources. Context may be, for example, application information including the source of the data request, details of the data being requested, details of what the data will be used for and other information requiring knowledge of the current data operation.
[0061] According to the representative example discussed above, the interceptor means 200 is a minifilter or legacy filter driver designed and otherwise arranged to intercept filesystem operations at the appropriate level in the Windows™ operating system kernel. Such a driver may be independently implemented or it may be procured commercially. In other examples, the interception mechanism could be provided by facilities built into other operating environments, for example, but not limited to filesystem stacks, using application level filesystem providers like FUSE (filesystem in user space), STREAMS drivers for network intercepts, or physical transparent proxy machines. The interceptor means may also be a custom designed software module incorporating the functionality herein described. Interceptor means 200, referred to interchangeable as an interceptor, may also be a filesystem protocol proxy application.
[0062] The interceptor means 200 manages the junction between the data storage system 100 and the intercepting system 300. Intercepting system 300 has an interest in certain operations of the data storage system. In the representative example, this could be all data inputs/outputs in order to redirect the data to another storage device. The intercepting system 300 would configure the intercepting mechanism to pass all data input/output operations laterally to the intercepting system 300, instead of passing them through to the lower layers of the data storage system 100, as is traditionally done.
[0063] Upon receipt of an intercepted operation, the intercepting system 300 has a number of options: (a) the intercepting system 300 could determine that it does not process this operation and instaict the interceptor means 200 to carry on as if there was no intercept, so that the interceptor means passes the operation on to the lower levels of the data storage system 100; (b) the intercepting system 300 could determine that it needs to modify the operation before passing the operation on to the lower levels of the data storage system 100; or, (c) the intercepting system 300 could determine that it is responsible for handling the operation itself and take over responsibility for it, in which case the interception mechanism 200 must react to the upper and lower levels of the data storage system appropriately for that operation, including all error handling.
[0064] The intercepting system 300 may be any number of hardware devices, such as a complimentary storage resource, a multimedia extender, a multimedia server, a home server, an application specific controller, or any similar system that is capable of performing various tasks, including those that are typically performed by storage resources.
[0065] In accordance with the invention, there is provided the interceptor means 200, by which the intercepting system is able to transparently, that is in a manner not identifiable by the data storage system, take responsibility for intercepted operations. This allows an existing filesystem to be extended with new functionality when the data storage resource is dissociated from the filesystem. In contrast, prior art intermediary programs that perform some data interception functions, such as viais checkers and encryption shims, use the original data storage system to provide their functionality.
[0066] The invention further provides an existing data storage system to separate the location of stored data or metadata from the data storage system while retaining the original semantics of that data storage system. Thus, the data storage system remains responsible for any existing data or metadata already stored, but the intercepting system can take over responsibility for any new or modified data or metadata according to its own policy as long as the external semantics of the existing data storage system remain unchanged.
[0067] In the representative example described throughout, the original filesystem remains responsible for all metadata, and the intercepting system takes over responsibility for some or all data. Specifically, the invention allows the NTFS filesystem metadata, including directories, the Master File Table (MFT), and the contents of each MFT entry, to be maintained on the filesystem' s storage resources as would normally occur, but the data itself could be manipulated separately and stored independently of the configuration of the original data storage system.
[0068] The invention provides a number of advantages, including providing a way for a legacy, unmodified NTFS implementation to support modern filesystem features like replication, de- duplication, caching, pooling, as well as features usually provided by storage resources (at RAID levels), and new unique features. The invention makes this possible in a manner that is compatible with existing applications because unmodified semantics are presented at the external application interface of the filesystem. That is, the filesystem remains unchanged and is useable by the operating system without modification.
[0069] One prior art example of attempting to solve a similar problem as solved by the invention, is the Microsoft™ Windows Home Server project (WHS), a data storage system with simple expandability intended for home users. WHS initially contained a function called Drive Extender that extended NTFS on the Windows system by using a file based paradigm, i.e. whole files could be placed on a different destination NTFS filesystem and a reference (a "tombstone") left on the original NTFS filesystem. This approach failed because it was not completely transparent to applications (the reference that was left behaved differently than the original file), and in fact the functionality Drive Extender was trying to provide was initially re-implemented using a virtual block storage resource model, that is by appearing as a storage resource to the filesystem. Subsequently the functionality was completely dropped from WHS. Other examples of the original Drive Extender approach are apparent in the art, for example in Hierarchical Storage Management systems. The present invention, on the other hand, allows the filesystem semantics and metadata to remain with the original storage resource, and be referred to as needed, but the data itself is manipulated and stored independently without the constraints of the configuration of the original data storage system.
[0070] While the example of the preceding paragraph has been provided with respect to disk drive storage systems, the approach of the invention may also be used with respect to extending a non-cooperative filesystem to other non-cooperative data storage systems, such as key-value stores, object stores, various kinds of caches, databases, etc. Given the ability to locate data separately, differently, or even just with additional processing compared to the design of the original data storage system, it is now possible to provide a number of new features and corresponding functionality to applications using an existing data storage system. Applicant believes that the prior art has not disclosed a method and system for dissociating the data storage responsibility from the other functions of a data storage system to thereby extend the functionality of the data storage system beyond the constraints of the filesystem and the data storage system itself.
[0071] One of the advantages of incorporating the system and method according to the invention is a possible increase in performance. Redirected data can be stored on a faster storage resource than the original data storage system uses, or it could be stored in a way which leads to faster reading and/or writing of the data, or both mechanisms could be used simultaneously. For example, the metadata of the storage system can be maintained on the original storage resource, but the data itself stored on the faster storage resource. In this example, the interceptor means directs the filesystem to retrieve metadata from the original storage resource, but to retrieve data from the faster storage resource. When the information and data are then passed on to the filesystem for processing by the operating system, the data appears as it would have if it were retrieved directly from the original storage resource but has been retrieved much faster. Performance may also be improved, for example, by caching data on a secondary storage resource that is possibly faster or has other improved characteristics.
[0072] In one embodiment of the invention the redirected data is stored in sparse files on the faster storage resource. A sparse file is a file that contains holes where data is not allocated on the storage resource but represented virtually. One of the problems in file caching systems is to represent the cached file in a space-efficient format, usually necessitating the creation of a data packing module to store actual data, and to translate between internal file location and where the data for those locations is stored. In order to cache data at a sub-file level as needed, the data that has not yet been retrieved from the original file should be represented virtually, not physically. Using sparse files allows any filesystem supporting sparse files to provide this functionality, and saves having to develop and provide this functionality within the invention. Applicant believes that the use of sparse files to cache the data of real files is novel, and a person skilled in the art will appreciate the ability to use existing capabilities to provide this internal function of a sub-file cache.
[0073] The invention may also be used in the data compression context, where data could be compressed in various ways that are not limited by the characteristics of the original data storage system. For example, normal stream compression or de-duplication compression where the data is not stored again if it already exists somewhere in the system. This type of data compression is typically not possible directly at the level of the data storage system using the teachings of the prior art. A person skilled in the art will appreciate the distinct advantages of providing data compression within any filesystem and data storage resource system, irrespective of the filesystem and data storage resource system being used.
[0074] Another possible application is in data polymorphism where data can appear different dependent on the context in which it is intended to be used. While various data polymorphism applications are known in the art, these are typically performed at the application level and is constrained by the data storage system. According to the method of the invention, data can be manipulated at the interception system, that is at the data storage system level and be presented to the filesystem in a context-modified manner so operating system and application resources or modification is not required. [0075] In the context of a document management system, the invention allows for redirected data to be versioned in a manner which retains the data storage system semantics, but provides an extended semantics that allows retrieval of older versions of the data. Current systems require older versions of the data either to be updated to incorporate the extended semantics, or to be retrieved in a manner that differentiates older data from newer data.
[0076] Redirected data inputs/outputs could also provide input to a distributed data storage system that controls where the source data is located based on its time related usefulness to the data storage system. Again, in this regard, these features can be implemented at the data storage system level and will not appear any differently to the application accessing the filesystem.
[0077] In the name virtualization context, data may no longer reside on the original data storage system's storage resource, and could be located somewhere completely different without have to provide adaptations to the filesystem. Data virtualization becomes possible too, where a single piece of named data may no longer reside on the same storage resource, but could be scattered to multiple data storage systems and/or storage resources. In the data replication context, redirected data inputs/outputs could drive a replication process to ensure redundancy of data.
[0078] While data and operations on data as have been described above have known implementations, mainly on the application level, such as by using a virtual machine implementation, the disadvantages of these implementations are described in the background of invention section, the invention allows for these features and functionality to be provided at the data storage system level, and therefore makes them independent of the filesystem or the operating system.
[0079] Various adaptations and implementations of the invention may be made without departing from the spirit of the claims that follow, including implementing, by the interceptor, one of single or multi-level caching of the data storage system; wherein the implementing step is carried out by the interceptor means and implementing locality optimization by pushing less-used data to remote data storage systems and by pulling more-used data to nearby data storage systems. The remote data storage systems include data storage systems that are either physically remote or require more time to access. Furthermore, it is possible to implement, by the interceptor, name based virtualization making file names that are not valid in a current data storage system appear to be valid by referring to data on other data storage systems. Data backup and replication is also possible by having the interceptor communicate directly with the filesystem. Data virtualization is also made possible at the data storage system level, including allowing data under the same name to be physically located in different data storage systems; wherein the implementing step is carried out by the interceptor means.
[0080] According to another aspect of the invention, the method further includes providing distinct capabilities for selected data at the interception system than at the storage resource, wherein the distinct capabilities are selected from the group comprising performance characteristics, de- duplication, data polymorphism, independent access control mechanisms, versioning, caching, locality, replication and data virtualization. The selected data is preferably identified using a selection mechanism employing a metadata pattern matching of one or more selected from the group comprising name, timestamps, size, historical information, physical information and contextual information.
[0081] The above-described embodiments are intended to be examples of the present invention and alterations and modifications may be effected thereto, by those of skill in the art, without departing from the scope of the invention that is defined solely by the claims appended hereto.

Claims

s claimed is:
1. A method for extending the functionality of a data storage system, said data storage system including a data organization means and a data storage resource, the method comprising dissociating the data storage functions of the data storage system from other functions of the data storage system and transferring at least a portion of said data storage functions to an intercepting system.
2. A method according to claim 1, wherein said intercepting system comprises a complimentary storage resource.
3. A method according to claim 1, wherein said dissociating step is carried out by providing an interceptor means in communication with the data organization means, the data storage resource and the intercepting system; said interceptor means intercepting a filesystem operation to determine whether a function of said operation should be handled by said data storage resource or by said intercepting system.
4. A method according to claim 3, wherein said interceptor means intercepts a filesystem operation while the operation still has context and before the operation would otherwise be decomposed into independent operations suitable for the data storage resource.
5. A method according to claim 1, wherein at least another portion of said data storage functions are retained with said data storage system.
6. A method according to claim 1, wherein all of said data storage functions are transferred to said intercepting system.
7. A method according to claim 1, wherein said data storage system is selected from the group comprising a filesystem, a key-value store, an object store and a network protocol.
8. A method according to claim 1, wherein the data storage system is shared among multiple external interfaces.
9. A method according to claim 3, wherein said interceptor means comprises a user- space application program in cooperation with a facility of an operating system or of a filesystem.
10. A method according to claim 3, wherein said interceptor means comprises a filesystem protocol proxy application performed on a network.
1 1. A method according to claim 3, wherein said interceptor means comprises a minifilter driver adapted to intercept filesystem operations in an operating system kernel.
12. A method according to claim 3, wherein said intercepting system uses one or more complimentary storage resources to carry out its functions, said complimentary storage resources being independent from said storage resource.
13. A method according to claim 1, wherein said intercepting system implements capacity expansion of the data storage system.
14. A method according to claim 1, wherein said intercepting system improves the performance of the data storage system by altering one or more characteristics of data on the storage resource or by using a complimentary storage resource.
15. A method according to claim 14, wherein said one or more characteristics are selected from the group comprising a storage format, a storage location and storage order.
16. A method according to claim 1, further comprising the step of carrying out de- duplication by said intercepting system.
17. A method according to claim 1, further comprising the step of carrying out data polymorphism by said intercepting system.
18. A method according to claim 1, further comprising the step of implementing independent access control mechanisms for the data by said intercepting system.
19. A method according to claim 1, further comprising the step of versioning data by said intercepting system.
20. A method according to claim 12, further comprising the step of implementing one of single or multi-level caching of the data storage system; wherein said implementing step is carried out by said interceptor means.
21. A method according to claim 12, wherein the complimentary storage resource is the operating system memory cache normally used to cache data.
22. A method according to claim 12, further comprising the step of implementing locality optimization by pushing less-used data to remote data storage systems and by pulling more-used data to nearby storage resources or data storage systems; wherein said implementing step is carried out by said interceptor means.
23. A method according to claim 21, wherein said remote data storage systems include data storage systems that are either physically remote or require more time to access.
24. A method according to claim 12, further comprising the step of implementing name based virtualization making file names that are not valid in a current data storage system appear to be valid by referring to data on other data storage systems; wherein said implementing step is carried out by said interceptor means.
25. A method according to claim 12, further comprising the step of implementing one of data backup and data replication; wherein said implementing step is carried out by said interceptor means.
26. A method according to claim 12, further comprising the step of implementing data virtualization including allowing data under the same name to be physically located in different data storage systems; wherein said implementing step is carried out by said interceptor means.
27. A method according to claim to claim 1, further comprising providing distinct capabilities for selected data at said interception system than at said storage resource, wherein said distinct capabilities are selected from the group comprising performance characteristics, de-duplication, data polymorphism, independent access control mechanisms, versioning, caching, locality, replication and data virtualization.
28. A method according to claim 26, wherein said selected data is identified using a selection mechanism employing a metadata pattern matching of one or more selected from the group comprising name, timestamps, size, historical information, physical information and contextual information.
29. A method according to claim 1, wherein said data organization means is selected from the group comprising a filesystem, a key-value store and a database.
30. A method according to claim 12, wherein said complementary storage resource is a sparse file.
31. A system for extending the functionality of a data storage system, said data storage system including a data organization means and a data storage resource, the system comprising a dissociating means for dissociating data storage functions of the data storage system from other functions of the data storage system and a means for transferring at least a portion of said data storage functions to an intercepting system.
32. A system according to claim 31, wherein said intercepting system comprises a complimentary storage resource.
33. A system according to claim 31, wherein said dissociating means comprises an interceptor in communication with the data organization means, the data storage resource and the intercepting system; said interceptor adapted to intercept a filesystem operation to determine whether a function of said operation should be handled by said data storage resource or by said intercepting system.
34. A system according to claim 33, wherein said interceptor intercepts a filesystem operation while the operation still has context and before the operation would otherwise be decomposed into independent operations suitable for the data storage resource.
35. A system according to claim 31, wherein at least another portion of said data storage functions are retained with said data storage system.
36. A system according to claim 31, wherein all of said data storage functions are transferred to said intercepting system.
37. A system according to claim 31, wherein said data storage system is selected from the group comprising a filesystem, a key-value store, an object store and a network protocol.
38. A system according to claim 31, wherein the data storage system is shared among multiple external interfaces.
39. A system according to claim 33, wherein said interceptor comprises a user-space application program in cooperation with a facility of an operating system or of a filesystem.
40. A system according to claim 33, wherein said interceptor comprises a filesystem protocol proxy application performed on a network.
41. A system according to claim 33, wherein said interceptor comprises a minifilter or legacy filter driver adapted to intercept filesystem operations in an operating system kernel.
42. A system according to claim 33, wherein said intercepting system uses one or more complimentary storage resources to carry out its functions, said complimentary storage resources being independent from said storage resource.
43. A system according to claim 33, wherein said intercepting system implements capacity expansion of the data storage system.
44. A system according to claim 33, wherein said intercepting system improves the performance of the data storage system by altering one or more characteristics of data on the storage resource.
45. A system according to claim 33, wherein said one or more characteristics are selected from the group comprising a storage format, a storage location and storage order.
46. A system according to claim 31, wherein said intercepting system is adapted to perform de-duplication of the data.
47. A system according to claim 31, wherein said intercepting system is adapted to perform polymorphism of the data.
48. A system according to claim 31, wherein said intercepting system is adapted to perform independent access control mechanisms for the data.
49. A system according to claim 31, wherein said intercepting system is adapted to perform versioning of the data.
50. A system according to claim 33, wherein said interceptor is adapted to perform one of single or multi-level caching of the data storage system.
51. A system according to claim 33, wherein the complimentary storage resource is the operating system memory cache normally used to cache data.
52. A system according to claim 33, wherein said interceptor is adapted to perform locality optimization by pushing less-used data to remote data storage systems and by pulling more-used data to nearby storage resources or data storage systems.
53. A system according to claim 51, wherein said remote data storage systems include data storage systems that are either physically remote or require more time to access.
54. A system according to claim 33, wherein said interceptor is adapted to perform name based virtualization making file names that are not valid in a current data storage system appear to be valid by referring to data on other data storage systems.
55. A system according to claim 33, wherein said interceptor is adapted to perform one of data backup and data replication.
56. A system according to claim 33, wherein said interceptor is adapted to perform data virtualization including allowing data under the same name to be physically located in different data storage systems.
57. A system according to claim to claim 31, wherein said interception system is adapted to perform one or more selected from the group comprising improving performance characteristics, de-duplication, data polymorphism, independent access control mechanisms, versioning, caching, locality, replication and data virtualization.
58. A system according to claim 31, wherein said data organization means is selected from the group comprising a filesystem, a key-value store and a database.
59. A system according to claim 42, wherein said complementary storage resource is a sparse file.
PCT/CA2011/050514 2010-08-25 2011-08-24 Method and system for extending data storage system functions WO2012024800A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP11819254.1A EP2609528A4 (en) 2010-08-25 2011-08-24 Method and system for extending data storage system functions
AU2011293014A AU2011293014B2 (en) 2010-08-25 2011-08-24 Method and system for extending data storage system functions
CN2011800509497A CN103201736A (en) 2010-08-25 2011-08-24 Method and system for extending data storage system functions
KR20137007468A KR101510025B1 (en) 2010-08-25 2011-08-24 Method and system for extending data storage system functions
JP2013525098A JP2013536514A (en) 2010-08-25 2011-08-24 Method and system for extending data storage system functionality

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US37690510P 2010-08-25 2010-08-25
US61/376,905 2010-08-25

Publications (1)

Publication Number Publication Date
WO2012024800A1 true WO2012024800A1 (en) 2012-03-01

Family

ID=45722795

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2011/050514 WO2012024800A1 (en) 2010-08-25 2011-08-24 Method and system for extending data storage system functions

Country Status (6)

Country Link
EP (1) EP2609528A4 (en)
JP (1) JP2013536514A (en)
KR (1) KR101510025B1 (en)
CN (1) CN103201736A (en)
AU (1) AU2011293014B2 (en)
WO (1) WO2012024800A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014107519A3 (en) * 2013-01-02 2014-08-28 Oracle International Corporation Compression and deduplication layered driver
EP3163469A4 (en) * 2014-06-24 2017-05-03 Huawei Technologies Co. Ltd. Method and device for realizing ip disk file storage
US10802928B2 (en) 2015-09-10 2020-10-13 International Business Machines Corporation Backup and restoration of file system
US11287973B2 (en) 2016-02-02 2022-03-29 Samsung Electronics Co., Ltd. Polymorphic storage devices

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10423331B2 (en) * 2016-02-02 2019-09-24 Samsung Electronics Co., Ltd. Polymorphic storage devices

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0856803A2 (en) 1997-01-31 1998-08-05 Informix Software, Inc. File system interface to a database
CA2646776A1 (en) * 1999-08-05 2001-02-15 Oracle International Corporation Internet file system
US6269382B1 (en) 1998-08-31 2001-07-31 Microsoft Corporation Systems and methods for migration and recall of data from local and remote storage
WO2010037117A1 (en) * 2008-09-29 2010-04-01 Nirvanix, Inc. Client application program interface for network-attached storage system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7165057B2 (en) * 2001-11-29 2007-01-16 Veritas Operating Corporation Methods and systems to access storage objects
WO2003085526A1 (en) * 2002-04-03 2003-10-16 Powerquest Corporation Using disassociated images for computer and storage resource management
US20050138306A1 (en) * 2003-12-19 2005-06-23 Panchbudhe Ankur P. Performance of operations on selected data in a storage area

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0856803A2 (en) 1997-01-31 1998-08-05 Informix Software, Inc. File system interface to a database
US6269382B1 (en) 1998-08-31 2001-07-31 Microsoft Corporation Systems and methods for migration and recall of data from local and remote storage
CA2646776A1 (en) * 1999-08-05 2001-02-15 Oracle International Corporation Internet file system
WO2010037117A1 (en) * 2008-09-29 2010-04-01 Nirvanix, Inc. Client application program interface for network-attached storage system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2609528A4 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014107519A3 (en) * 2013-01-02 2014-08-28 Oracle International Corporation Compression and deduplication layered driver
CN105027122A (en) * 2013-01-02 2015-11-04 甲骨文国际公司 Compression and deduplication layered driver
US9424267B2 (en) 2013-01-02 2016-08-23 Oracle International Corporation Compression and deduplication layered driver
US9846700B2 (en) 2013-01-02 2017-12-19 Oracle International Corporation Compression and deduplication layered driver
CN105027122B (en) * 2013-01-02 2019-09-10 甲骨文国际公司 Compression and data de-duplication Layered driver
EP3163469A4 (en) * 2014-06-24 2017-05-03 Huawei Technologies Co. Ltd. Method and device for realizing ip disk file storage
US10437849B2 (en) 2014-06-24 2019-10-08 Huawei Technologies Co., Ltd. Method and apparatus for implementing storage of file in IP disk
US10802928B2 (en) 2015-09-10 2020-10-13 International Business Machines Corporation Backup and restoration of file system
US11287973B2 (en) 2016-02-02 2022-03-29 Samsung Electronics Co., Ltd. Polymorphic storage devices

Also Published As

Publication number Publication date
AU2011293014B2 (en) 2014-08-14
CN103201736A (en) 2013-07-10
AU2011293014A1 (en) 2013-03-21
EP2609528A4 (en) 2017-07-19
EP2609528A1 (en) 2013-07-03
KR20130046441A (en) 2013-05-07
JP2013536514A (en) 2013-09-19
KR101510025B1 (en) 2015-04-08

Similar Documents

Publication Publication Date Title
CN111108493B (en) System, method and apparatus for simplifying file system operations using a key-value store system
US8135678B1 (en) System and method for restoring a single data stream file from a snapshot
KR101932372B1 (en) In place snapshots
EP2411918B1 (en) Virtualized data storage system architecture
EP2724236B1 (en) System and method for providing a unified storage system that supports file/object duality
US8805951B1 (en) Virtual machines and cloud storage caching for cloud computing applications
US20100088349A1 (en) Virtual file system stack for data deduplication
US6850969B2 (en) Lock-free file system
AU2011293015B2 (en) Method and system for cache tiering
WO2016134035A1 (en) Virtualized application-layer space for data processing in data storage systems
EP2836902A1 (en) Method and apparatus for migration of a virtualized file system, data storage system for migration of a virtualized file system, and file server for use in a data storage system
KR20150132511A (en) Log record management
AU2011293014B2 (en) Method and system for extending data storage system functions
US10909091B1 (en) On-demand data schema modifications
EP2821914B1 (en) Method and apparatus for migration of a virtualized file system, data storage system for migration of a virtualized file system, and file server for use in a data storage system
US10387384B1 (en) Method and system for semantic metadata compression in a two-tier storage system using copy-on-write
US10055139B1 (en) Optimized layout in a two tier storage
US10872073B1 (en) Lock-free updates to a data retention index
US20230029677A1 (en) Technique for efficiently indexing data of an archival storage system
US20060253858A1 (en) Software service application and method of servicing a software application
US10628391B1 (en) Method and system for reducing metadata overhead in a two-tier storage architecture
US20230004541A1 (en) Computer-implemented method for database management, computer program product and database system
US11914571B1 (en) Optimistic concurrency for a multi-writer database
Karollil Radfs-virtualizing filesystems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11819254

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
ENP Entry into the national phase

Ref document number: 2013525098

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2011819254

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2011293014

Country of ref document: AU

Date of ref document: 20110824

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20137007468

Country of ref document: KR

Kind code of ref document: A