US20170123675A1 - Method for layered storage of enterprise data - Google Patents

Method for layered storage of enterprise data Download PDF

Info

Publication number
US20170123675A1
US20170123675A1 US15/407,413 US201715407413A US2017123675A1 US 20170123675 A1 US20170123675 A1 US 20170123675A1 US 201715407413 A US201715407413 A US 201715407413A US 2017123675 A1 US2017123675 A1 US 2017123675A1
Authority
US
United States
Prior art keywords
data
storage
blocks
mega
fixed length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/407,413
Other versions
US10089009B2 (en
Inventor
Kurt GLAZEMAKERS
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inuron
Original Assignee
Inuron
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from EP13166044.1A external-priority patent/EP2799973B1/en
Application filed by Inuron filed Critical Inuron
Priority to US15/407,413 priority Critical patent/US10089009B2/en
Assigned to INURON reassignment INURON ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GLAZEMAKERS, KURT
Publication of US20170123675A1 publication Critical patent/US20170123675A1/en
Application granted granted Critical
Publication of US10089009B2 publication Critical patent/US10089009B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/13Linear codes
    • H03M13/15Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes
    • H03M13/151Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes using error location or error correction polynomials
    • H03M13/154Error and erasure correction, e.g. by using the error and erasure locator or Forney polynomial
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/65Purpose and implementation aspects
    • H03M13/6569Implementation on processors, e.g. DSPs, or software implementations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/22Employing cache memory using specific memory technology
    • G06F2212/222Non-volatile memory
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention generally relates to layered storage of enterprise data, more precisely to a software-controlled method that implements a layered storage architecture enabling enterprises to store data in a virtual machine centric manner.
  • the present invention in particular concerns storage of enterprise data in backend storage systems, like for instance remote cloud storage facilities, with reduced storage latency, reduced storage resource requirements and costs, and reduced dependency on specific storage hardware technology.
  • the present invention aims at turning any existing backend storage capacity into high-speed storage for enterprises that require features like efficient and unlimited zero-copy snapshotting, thin cloning, thin provisioning, replicating, rollbacks, etc. of data at virtual machine level, as well as adequate protection against errors and recovery from outages of diverse storage hardware used in data centres or remote cloud storage facilities.
  • SAN Storage Area Network
  • NAS Network-Attached Storage or NAS may be deployed, to remotely store data using file-based protocols instead of block-based protocols.
  • the data are usually stored in a Redundant Array of Independent Disks (RAID), i.e. multiple disk drives that form part of a single logical unit amongst which the data are distributed depending on the desired redundancy level.
  • RAID Redundant Array of Independent Disks
  • the SAN/NAS system provides several LUNs to the hypervisor, i.e. a piece of software, firmware or hardware that serves, i.e. creates, runs, monitors and manages the different virtual machines on a host machine, i.e. a server or computer.
  • a virtual machine or guest machine is a software implementation of a machine or computer, typically comprising a single operating system and application programs running on that operating system. Usually, plural virtual machines share the hardware resources of a single host machine.
  • physical servers or computers are easily supporting 10 or more virtual machines.
  • Each computer or server is running a hypervisor to serve the virtual machines it is hosting.
  • the computer-implemented method according to the present invention works in close cooperation with such hypervisor.
  • U.S. Patent Application US 2012/0047339 entitled “Redundant Array of Independent Clouds” describes a mechanism for reliable block based storage in remote cloud storage facilities.
  • US 2012/0047339 recognizes in paragraphs [0004]-[0006] the problem of slow network based storage via the internet and the need for redundancy.
  • US 2012/0047339 consequently proposes to divide data in multiple blocks, and to store the data blocks with different cloud providers. The data can be reconstructed through use of a translation map.
  • a parity block may be generated from the N data blocks.
  • the parity block is stored with yet another cloud provider.
  • United States Patent Application US 2011/0296440 entitled “Accelerator System for Use with Secure Data Storage” describes a system and method for accelerating the processing of and secure cloud storage of data. From paragraphs [003]-[0007] of US 2011/0296440, it is learned that the offloading certain processing from the motherboard, i.e. the secure parsing of data, in order to accelerate storage and processing of data is key to the system described in this patent application.
  • a logical unit is a single storage volume that is identified and addressed through its LUN or Logical Unit Number.
  • features like roll back, snapshotting or replicating a single virtual machine are difficult to implement.
  • the only way to roll back a virtual machine lies in retrieving an older snapshot of that virtual machine, mounting the file system and copying the requested virtual disks back to the primary storage logical unit. This is a complex and slow process.
  • Gateways have been developed to be installed between the storage systems and hypervisors. These gateways however only solve a small part of the problem. They are difficult to manage, require the storage to reside near the hypervisor and remain expensive.
  • United States Patent Application US 2010/0332401 entitled “Performing Data Storage Operations with a Cloud Storage Environment, Including Automatically Selecting Among Multiple Cloud Storage Sites” for instance describes a method for data storage and migration in a cloud environment.
  • a cloud storage gateway introduces local caching and de-duplication.
  • data stored in cache local, primary copies
  • cloud storage systems secondary copies.
  • containerized de-duplication is foreseen to avoid creating unnecessary additional instances of the data within secondary storage, i.e. within the cloud.
  • the above defined objectives are realized through a computer-implemented method for layered storage of enterprise data as defined by claim 1 , the method comprising:
  • the invention concerns a virtual machine centric layered storage method that combines efficient location based storage, i.e. local caching of data blocks, with redundant distributed time based back-end storage of data containers.
  • the time based back-end storage with containerization enables enterprise features like zero copy snapshotting, cloning, replicating, rolling back, etc. at virtual machine level whereas the local caching turns any backend storage virtually into high-speed storage.
  • the containerization the random I/O behavior of the virtual machines turns into sequential write operations. Thereafter, the containers are distributed in a redundant way across available backend storage using erasure encoding.
  • the erasure coding enables to realize a configurable amount of redundancy such that a configurable amount of backend disks can get lost or remain non-responsive without loss of data and this without a need for RAID technology that suffers from lengthy rebuild times, i.e. days or even weeks.
  • the method according to the present invention works with any type of existing backend storage solution, e.g. NAS, SAN, or JBOD (Just a Bunch Of Disks), and is consequently storage hardware independent.
  • the backend storage can be a distributed local object store or it can be a remote object store in the cloud like for instance an Amazon S3 or Openstack Swift implemented object store.
  • the backend store can be distributed over plural hardware units that reside in a single data center, plural hardware units that reside in one or more branch office, or plural hardware units that reside with different cloud storage providers accessible via a public wide area network (WAN) like the internet.
  • WAN public wide area network
  • the caching of data containers or SCOs prior to distribution in the backend is preferably implemented using fast memory like flash or SSD drives.
  • the local caching accelerates the time to confirm a write operation to the hypervisor, and as such virtually accelerates the backend storage.
  • the response time to a write request will improve when flash or SSD disks are used to implement the two-tier local caching in comparison to alternate storage devices.
  • the data blocks may be de-duplicated per hypervisor serving one or more of the virtual machines and the de-duplicated data blocks may get stored in an additional local de-dup cache memory.
  • de-duplicated data blocks stored locally in a de-dup cache its address, a hash value, a container number and an offset number may be stored as metadata.
  • the one or multiple backend storage systems are local or remote cloud storage systems operated by different cloud storage operators.
  • the backend store consists of storage capacity that resides with different cloud storage providers. This way, retrieval of data shall no longer depend on the availability or responsiveness of a single cloud storage provider.
  • a read request presence of the requested data in the local cache memory will be verified.
  • the local cache will serve the read request.
  • the data will have to be retrieved from the backend stores.
  • the redundancy introduced through erasure coding makes it possible to rely on the fastest responding cloud storage systems. As soon as X out of Y mega-blocks can be delivered by one or more fast responding cloud storage providers, the data container can be re-established and the read request can be served.
  • the computer-implemented method for layered storage of enterprise data comprises: storing for each data block its address, a container number and an offset number as metadata.
  • the local caching wherein data blocks are sequentially written in containers is preferably implemented as a transaction based cache.
  • the address of the data block, the number of the container wherein it is stored and the offset or pointer to the location in the container where it is stored are logged as metadata.
  • the write operation can be confirmed to the hypervisor.
  • the address will be used for look-up in the metadata.
  • the read request will then be served from the local cache memory when the requested data are still available resulting in very fast read request handling. If not available in the local cache, the metadata will be used to retrieve the appropriate containers and requested data blocks from the backend storage system(s).
  • the computer-implemented method for layered storage of enterprise data comprises: storing each data block that is not yet part of a data container that is remotely stored, in a distributed transaction log of another storage accelerator.
  • This mechanism will ensure that each data block of for instance 4 kbyte that is not yet part of a container that is stored in backend storage, is copied and synced onto fast memory, e.g. flash or SSD storage, of another storage accelerator. Storage in a distributed transaction log of another storage accelerator is done before a write request is confirmed to the requestor.
  • fast memory e.g. flash or SSD storage
  • the computer-implemented method for layered storage of enterprise data according to the present invention may be implemented with multiple backend storage systems comprising various types of backend storage systems.
  • an advantage of the present invention is that it does not rely on particular backend storage technology, e.g. SAN NAS or JBOD.
  • the backend storage system may be deployed locally or they may reside in one or more data centre, one or more branch office, in the public cloud, or a combination thereof
  • the computer-implemented method for layered storage of enterprise data according to the present invention may comprise:
  • the second layer storage i.e. the time based storage that forms part of the present invention, requires maintenance of older containers or snapshots stored in the backend storage system for performance reasons.
  • a scrubber that has access to the transaction logs can independently from the storage acceleration delete obsolete containers stored in the backend storage system and update the corresponding metadata and transaction logs.
  • An advantageous aspect of the present invention is that the scrubbing does not impact the performance of read/write operations because most of these read/write operations will be handled by the local caching layer.
  • the present invention also concerns a corresponding data processing system as defined by claim 7 comprising means for carrying out the method.
  • the present invention further also relates to one or more non-transitory computer readable storage mediums as defined by claim 8 , comprising having a computer program stored thereon for execution of the method according to the invention.
  • FIG. 1 is a functional block scheme illustrating an embodiment of the method for layered storage of enterprise data according to the present invention
  • FIG. 2 illustrates a first use case of the method according to the present invention, i.e. protection against disk failures
  • FIG. 3 illustrates a second use case of the method according to the present invention, i.e. a hybrid cloud storage implementation
  • FIG. 4 illustrates a third use case of the method according to the present invention, i.e. a virtual desktop infrastructure implementation.
  • FIG. 5 illustrates a fourth use case of the method according to the present invention, i.e. a cloud storage gateway implementation.
  • FIG. 1 shows the functional blocks or components of a cloud storage router or CSR which is an embodiment of the computer program according to the present invention that performs an embodiment of the method according to the present invention when executed by a processor.
  • the cloud storage router illustrated by FIG. 1 is a virtual machine centric layered storage application that combines the function of a SAN, high speed SSD caching, backup and off-site replication, encryption and compression.
  • the cloud storage router is a software implemented storage application that installs on any x86 hardware or can be used as a virtual storage appliance (VSA).
  • the cloud storage router comprises a containerization function 105 that generates containers, the so called Storage Container Objects or SCO's 151 and 152 , from a configurable collection of 4 kbyte data blocks.
  • the SCO's have a configurable maximum length that is supposed to be set to 256 Mbyte in FIG. 1 .
  • the generated SCO's can be stored in backend storage systems, e.g. a distributed local object store running within the same VSA or remote object storage like for instance an Amazon S3 or an Openstack Swift implemented storage.
  • This second storage layer wherein the 4 kbyte long data blocks are containerized in SCO's is a time-based storage implementation because the 4 kbyte long data blocks are sequentially stored in the SCO's.
  • the second, time-based storage layer enables features like unlimited zero-copy snapshotting, cloning, and easy replication.
  • the second layer makes the storage independent of the location.
  • the object can be stored distributed over different x86 hardware units that reside within one or more data centers, within one or more branch offices, or that reside remotely in the public cloud where they are reachable via a WAN like the internet.
  • the location based storage layer is implemented by a metadata lookup map 104 , a transaction log 103 and an SCO generator 105 . They are jointly called the storage accelerator since the location-based storage layer virtually accelerates storage in any backend storage system.
  • the storage accelerator brings storage closer to the hypervisor in a virtual machine centric way, making use of an intelligent 2-tier caching algorithm. “Virtual machine centric” in this context means that it enables to do all storage related activities like snapshotting, cloning or replication at the virtual machine level, enabling for instance administrators to easily select only the most important virtual machines for replication.
  • the caching can be done inside the hypervisor by using a Virtual Storage Appliance or VSA to access local flash or SSD drives.
  • the storage accelerator uses SSD or flash storage to serve as a read and write cache, virtually turning any backend storage capacity into high-speed storage.
  • the write caching works as a transaction log based cache on fast flash or SSD redundant storage. Data blocks of 4 kbyte are received by the block device 101 on IO path 108 .
  • the write cache is accomplished by filling up SCO's 151 , 152 with each new 4 k block that is written. As a result, any random write I/O behavior is turned into a sequential write operation.
  • the address 141 of the 4 k block, and the location 142 comprising the SCO number 142 A and the offset 142 B within that SCO are stored as metadata in the metadata lookup map 104 .
  • the address 131 and location 132 are also logged by transaction log 103 .
  • the storage accelerator uses a distributed database to provide redundancy and immediate access to the metadata in case the volume is moved or failed-over to another cloud storage router. Only real data that is written to the cloud storage router will be stored as a result of which thin provisioning becomes possible.
  • the transaction log 103 will immediately confirm the write to the hypervisor hence enabling fast response times while the SCO's 151 , 152 can be pushed at a slower pace to the backend storage system(s).
  • the storage accelerator will look-up the location in the metadata lookup map 104 . If the requested data block exists in the cache, the storage accelerator will serve the read requests directly from the local SSD or flash storage, resulting in very fast read I/O operations.
  • the storage accelerator writes or reads the data using SCO's when it is communicating with the backend storage systems, not shown in FIG. 1 . Because of the transaction log based approach in the storage accelerator, written data resides on redundant flash or SSD storage before it reaches the redundant backend storage.
  • the storage accelerator drawn in FIG. 1 indeed enables a distributed transaction log 107 .
  • the distributed transaction log 107 will make sure that each 4 k block 171 , 172 that is not yet part of an SCO that is stored in the backend storage system is synced to SSD or flash storage on another storage accelerator before it confirms the write to the guest.
  • the size of the distributed transaction log 107 is very limited because there is only a need to protect data blocks 171 , 172 that are not yet stored on the backend storage system.
  • the cloud storage router illustrated by FIG. 1 implements a second, time-based storage layer.
  • the SCO's 151 , 152 and each set of transaction logs 131 , 132 , 133 that are created by the storage accelerator are therein treated as objects.
  • the containers that consists of a configurable maximum amount of 4 k blocks are called SCO's or Storage Container Objects.
  • the benefit of this time-based storage layer is that it enables enterprise features like zero-copy snapshots and cloning.
  • the time-based storage also requires maintenance in order to compact and clean-up older SCO's and/or deleted snapshots. By having all transaction logs and SCO's stored on backend storage systems, these maintenance tasks can be offloaded from the storage accelerator.
  • a scrubber not drawn in FIG. 1 , does the maintenance of the time-based storage. This scrubber works completely independent from the storage accelerator since it has access to all transaction logs and SCO's stored in the backend storage system. Once the scrubber has finished cleaning-up older SCO's and/or deleted snapshots, it will create an updated set of transaction logs that is being used by the storage accelerator to update the local metadata stored in the metadata lookup map 104 . The obsolete SCO's on the backend storage system are deleted.
  • Each scrubber is configurable to work on one or more namespace. Because of the caching in the storage accelerator, the maintenance work does not impact performance. As already mentioned here above, most read and write I/O requests will be served from the SCO cache. In the event of a disaster where the complete storage accelerator is lost, all the logical units can be rebuilt from the backend storage system.
  • the storage accelerator In order to store the objects in a distributed and redundant way in the backend storage system, the storage accelerator has a storage distributor 106 in addition to the storage accelerator components 103 , 104 , 105 .
  • a third party cloud storage backend can be easily added by using the cloud storage initiator. Since the cloud storage initiator only stores, retrieves and changes SCO's, it makes it easy to add new cloud storage backend systems.
  • the storage accelerator can basically use the storage distributor 106 or any cloud storage as a backend storage system to store the SCO's.
  • the storage distributor 106 will distribute written transaction logs 131 , 132 coming from the storage accelerator towards the available resources on the distributed backend storage system.
  • the storage distributor 106 divides each SCO in mega-blocks of typically 1 Mbyte. Erasure coding is then applied to each set of X mega-blocks in order to generate a set of Y mega-blocks containing redundant data since Y>X. Erasure coding is used to equally spread storage of the Y mega-blocks over the attached distributed backend storage system. Erasure coding is an algorithm that allows to create a configurable amount of redundant data such that a configurable amount of disks in the backend storage system can get lost, without data loss.
  • the redundancy is created by pure processing power that doesn't require any read access on the storage capacity blocks to calculate parity, resulting in less disk spindles, less power consumption, better performance and the ability to distribute the data over the backend storage system. Furthermore, rebuilding redundancy for erasure encoded mega-blocks has no performance impact compared to for instance RAID technology. RAID has been designed for smaller disks. Today, disks are reaching 3 to 4 Tbyte in capacity as a result of which RAID rebuild times after a lost disk are in the order of days and/or weeks rather than hours, resulting in a much higher chance of losing data and suffering serious performance issues during rebuilds.
  • Erasure coding allows to change the number of blocks to be stored any time.
  • new storage capacity is added or old storage capacity is removed, whether it is a single disk, a logical unit or a share
  • the storage distributor 106 will start spreading the data using all the new available blocks. This enables to grow or shrink data capacity and change redundancy on demand, independent of any storage technology or hardware that is used in the backend storage system(s).
  • the storage accelerator can be configured to use a cloud storage initiator, not drawn in FIG. 1 , instead of the storage distributor 106 .
  • a cloud storage initiator not drawn in FIG. 1
  • the backend storage system can even be remote in the public cloud, accessible through a WAN with high latency like the Internet, while still experiencing local SSD/flash storage performance.
  • a slower backend storage system it is recommended to use more flash or SSD storage capacity in the caching layer.
  • the cloud storage router can deal with bigger bursts of written data without performance impact for the end user. Keeping more SCO's in cache storage helps to offset some of the penalties that are associated with accessing a slow backend storage system, mainly due to latency.
  • Each created logical unit will be stored in a different namespace or bucket in the case of an Amazon S3 or Openstack Swift backend storage system implementation. Because storage maintenance tasks can be done completely independent by the backend storage system, all SCO's stored in a public cloud system can be accessed from within that same public cloud system without the need of going over a slow internet link. In case of an Amazon S3 backend storage system for instance, the scrubbers can be spun up on EC2, using the local Amazon bandwidth to do the storage maintenance and shut downed once the work is done, resulting in less bandwidth consumption and lower Amazon costs.
  • the storage accelerator will receive the changes on the SCO's by retrieving the scrubber transaction log via an object request to update his local metadata and will send the delete requests for the obsolete SCO's to the Amazon S3 backend. Furthermore, the whole logical unit can be rebuilt if necessary by accessing the namespace/bucket from a different location in case the cloud storage router on the initial location is lost.
  • FIG. 2 illustrates a first use case of cloud storage router software program of FIG. 1 , i.e. protection against disk failures.
  • the hypervisors running on three standard x86 servers 201 , 202 and 203 are drawn. Each of these servers, like for instance server 202 , contains a flash drive 225 and six large SATA drives 226 .
  • a full physical server e.g. server 201
  • another drive e.g. one of the drives 226
  • the cloud storage router When the cloud storage router, CSR or 221 , consists of the storage accelerator, i.e. the SCO caching layer 222 with metadata lookup map MD or 224 , and the storage distributor 223 applying erasure coding, it can be installed as local Virtual Storage Appliance or VSA that will work like a converged infrastructure setup. This allows administrators to scale out hypervisor and storage capacity with standard x86-hardware.
  • the cloud storage router 221 is installed as a VSA inside the hypervisor and uses local flash or SSD storage 225 for caching.
  • the SCO cache 222 uses local SAS or SATA drives 226 to store the data from virtual machines 227 and 228 containerized in SCO's which can further be configured for different redundancy levels by using the erasure coding or EC technology in the storage distributor 223 .
  • the converged infrastructure becomes scalable and redundant when multiple converged nodes are added to the network.
  • the storage distributor 223 is able to spread each SCO by using erasure coding to all available SAS/SATA drives in all registered nodes 201 , 202 and 203 .
  • a disk safety policy of 18/7 is used, allowing 7 disks out of 18 available disks—6 SAS/SATA disks are available in each node—to fail.
  • a full node 201 can fail and even an additional disk on another node 202 can fail while still serving the workloads.
  • the volumes that were running on the failed converged node 201 can be restarted on the remaining nodes 202 and 203 thanks to the distributed redundancy introduced by the cloud storage router. It is further noticed that customers can start with a single node and can change the disk safety policy when new nodes are added, allowing to scale.
  • each node 201 , 202 , 203 in a converged infrastructure like the one illustrated by FIG. 2 is close to the sum of all SSD's used for caching. This means that with only a single SSD, more than 50.000 IOPS can be achieved on random 4 k data block reads.
  • the write performance is increased in a number of ways: by adding a larger SCO cache, adding SSD or flash storage disks, adding nodes, and sequencing random write I/O operations.
  • the cloud storage router 221 is configured to only use a fraction of the SSD storage 225 to act as write cache or SCO cache because spinning disks perform well on sequential 4 k writes.
  • the cloud storage router 221 is also hypervisor agnostic, allowing for instance to use VMware on one site and Hyper-V on another site. Thanks to its virtual machine or VM-centric approach, the cloud storage router 221 can replicate or move workloads between converged nodes 201 , 202 and 203 running a different hypervisor technology without conversion.
  • FIG. 3 illustrates a second use case of the cloud storage router computer program of FIG. 1 , i.e. a hybrid cloud storage implementation.
  • FIG. 3 a few use cases are drawn that will drive the hybrid cloud model.
  • a customer 301 can use the central data center 300 as a disaster recovery site, independent of the hypervisor or storage technology that is being used on both sides.
  • the central data center 300 can be public cloud infrastructure or enterprise cloud infrastructure running the cloud storage router software program according to the present invention next to its current workloads.
  • the hypervisor agnostic approach only the differences between the selected virtual machines 311 are replicated to the central data center 300 using an intelligent replication algorithm to optimize the bandwidth usage.
  • the hypervisor agnostic approach the local hypervisor can be different from the one in the data center 300 .
  • the cloud storage router also virtualizes the actual storage layer, making it possible to use Just a Bunch Of Disks or JBOD technology in one place and leveraging existing NAS/SAN storage in the data center 300 .
  • unified templates 313 can be created and provisioned to run on different hypervisors, VMWare and Hyper-V for example, allowing administrators to easily push virtual appliances to their end users, independent of which hypervisor they are using.
  • the multi-site scenario is illustrated.
  • the users 304 A and 304 B can replicate or move workloads like virtual machine 314 between different sites, independent of their running hypervisors.
  • FIG. 4 illustrates a third use case of the cloud storage router software program of FIG. 1 , i.e. a virtual desktop infrastructure implementation.
  • the cloud storage router CSR or 420 can spin up thin clones from a master template 431 .
  • CSR 420 thereto has an advanced REST API, allowing to spin up multiple virtual machines, 411 , 412 , 413 , 414 , 415 , using only a few API calls. This makes CSR 420 ideal to integrate into a virtual desktop integration or VDI solution.
  • VDI1, VDI2, VDI3, VDI4 and VDI5 will be a thin clone of the master template 431 on the backend.
  • VDI solution already becomes advantageous when only a few desktops are needed.
  • a standard server can easily handle hundred virtual desktops without the need of a small SAN or NAS that typically costs a fourfold of the server to run a similar amount of virtual desktops.
  • the VDI solution could be running on standard x86 hardware 402 , containing for instance SSD storage and SAS/SATA drives, not drawn in FIG. 4 .
  • the master template would be preloaded in the system, while for each virtual desktop VDI1 . . . VDI5 only the changes would be tracked.
  • the same features that apply to the converged infrastructure use case would also apply to this VDI setup. This would enable customers to scale gradually by adding additional nodes and would allow to use virtual machine centric enterprise storage features like unlimited snapshotting or replication of individual virtual desktops.
  • the cloud storage initiator instead of the local erasure coding client and by making use of a larger read/write cache, customers can take advantage of running their virtual desktops 432 , 433 , 434 , 435 , 436 on public cloud storage 403 .
  • the cloud storage initiator will store the SCO's as objects in the cloud storage system 403 , for example Amazon S3 or Swift Openstack implementations or a remote data center.
  • a master bucket/namespace can be used to store the VDI templates.
  • thin cloning can be used to store individual changes on each virtual desktop into a different bucket/namespace to keep track of the individual changes and snapshots.
  • FIG. 5 illustrates a fourth use case of the cloud storage router computer program of FIG. 1 , i.e. a cloud storage gateway implementation.
  • the cloud storage router 501 , 502 can be set up as a single or redundant cloud storage gateway by making use of a redundant storage accelerator implementation that is using the Fail-Over cache mechanism FO CACHE in the cloud storage router 501 , 502 .
  • the cloud storage gateway can act as an iSCSI storage or even as a NAS device that creates a new namespace or bucket, Bucket A or 531 , Bucket B or 532 , Bucket C or 533 , Bucket D or 534 , for each logical unit 541 , 542 , 543 , 544 , that is being created.
  • a similar setup has been explained in the virtual desktop integration use case here above.
  • the cloud storage gateway will be able to leverage all the storage features from the cloud storage router and enable read/write caching from the storage accelerator. This will give customers an experience as if they had their storage on-premise.
  • thin clones or unlimited snapshots can be created to allow administrators to quickly create new writeable copies of their data or quickly roll back to an older snapshot. In case of an entire CSR cluster failure due to a disaster on site, all the logical units from the backend storage system can be rebuilt.
  • the method according to the invention shall typically be computer-implemented on a data processing system or computing device.
  • a data processing system or computing device that is operated according to the present invention can include a workstation, a server, a laptop, a desktop, a hand-held device, a mobile device, a tablet computer, or other computing device, as would be understood by those of skill in the art.
  • the data processing system or computing device can include a bus or network for connectivity between several components, directly or indirectly, a memory or database, one or more processors, input/output ports, a power supply, etc.
  • the bus or network can include one or more busses, such as an address bus, a data bus, or any combination thereof, or can include one or more network links.
  • multiple of these components can be implemented by a single device. Similarly, in some instances, a single component can be implemented by multiple devices.
  • the data processing system or computing device can include or interact with a variety of computer-readable media.
  • computer-readable media can include Random Access Memory (RAM), Read Only Memory (ROM), Electronically Erasable Programmable Read Only Memory (EEPROM), flash storage or other storage technologies, CDROM, digital versatile disks (DVD) or other optical or holographic media, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices that can be used to encode information and can be accessed by the data processing system or computing device.
  • the memory can include computer-storage media in the form of volatile and/or nonvolatile memory.
  • the memory may be removable, non-removable, or any combination thereof.
  • Exemplary hardware devices are devices such as hard drives, solid-state memory, optical-disc drives, or the like.
  • the data processing system or computing device can include one or more processors that read data from components such as the memory, the various I/O components, etc.
  • the I/O ports can allow the data processing system or computing device to be logically coupled to other devices, such as I/O components.
  • I/O components can be built into the computing device. Examples of such I/O components include a microphone, joystick, recording device, game pad, satellite dish, scanner, printer, wireless device, networking device, or the like.
  • top”, bottom”, “over”, “under”, and the like are introduced for descriptive purposes and not necessarily to denote relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances and embodiments of the invention are capable of operating according to the present invention in other sequences, or in orientations different from the one(s) described or illustrated above.

Abstract

A computer-implemented method for layered storage of enterprise data comprises receiving from one or more virtual machines data blocks; time-based grouping the data blocks into data containers; dividing each data container in X fixed length mega-blocks; for each data container applying erasure encoding to the X fixed length mega-blocks to thereby generate Y fixed length mega-blocks with redundant data, Y being larger than X; and distributed storing the Y fixed length mega-blocks across one or multiple backend storage systems.

Description

    FIELD OF THE INVENTION
  • The present invention generally relates to layered storage of enterprise data, more precisely to a software-controlled method that implements a layered storage architecture enabling enterprises to store data in a virtual machine centric manner. The present invention in particular concerns storage of enterprise data in backend storage systems, like for instance remote cloud storage facilities, with reduced storage latency, reduced storage resource requirements and costs, and reduced dependency on specific storage hardware technology. The present invention aims at turning any existing backend storage capacity into high-speed storage for enterprises that require features like efficient and unlimited zero-copy snapshotting, thin cloning, thin provisioning, replicating, rollbacks, etc. of data at virtual machine level, as well as adequate protection against errors and recovery from outages of diverse storage hardware used in data centres or remote cloud storage facilities.
  • BACKGROUND OF THE INVENTION
  • Known methods for network-based storage of data are block based and they typically rely on a Storage Area Network or SAN, i.e. a dedicated network that provides access to storage devices such as disks, tapes, optical jukeboxes, etc., and enables block-level operations on the stored data. Alternatively, Network-Attached Storage or NAS may be deployed, to remotely store data using file-based protocols instead of block-based protocols. The data are usually stored in a Redundant Array of Independent Disks (RAID), i.e. multiple disk drives that form part of a single logical unit amongst which the data are distributed depending on the desired redundancy level. Such single logical unit has a logical unit number or LUN. In existing systems, the SAN/NAS system provides several LUNs to the hypervisor, i.e. a piece of software, firmware or hardware that serves, i.e. creates, runs, monitors and manages the different virtual machines on a host machine, i.e. a server or computer. A virtual machine or guest machine is a software implementation of a machine or computer, typically comprising a single operating system and application programs running on that operating system. Usually, plural virtual machines share the hardware resources of a single host machine. At present, physical servers or computers are easily supporting 10 or more virtual machines. Each computer or server is running a hypervisor to serve the virtual machines it is hosting. The computer-implemented method according to the present invention works in close cooperation with such hypervisor.
  • At present, redundant storage of large volumes in cloud storage systems, typically over the internet, is slow. As a consequence, storage of databases or other large volumes is still organized locally, i.e. close to the clients. On the one hand, the internet as medium to connect to remote storage systems is slow. On the other hand, known mechanisms that enable redundancy like erasure coding are object driven or file driven as a consequence of which they further slowdown remote storage when applied to block based storage.
  • The problem of internet latency for cloud storage has been addressed in several prior art documents.
  • U.S. Patent Application US 2012/0047339 entitled “Redundant Array of Independent Clouds” describes a mechanism for reliable block based storage in remote cloud storage facilities. US 2012/0047339 recognizes in paragraphs [0004]-[0006] the problem of slow network based storage via the internet and the need for redundancy. US 2012/0047339 consequently proposes to divide data in multiple blocks, and to store the data blocks with different cloud providers. The data can be reconstructed through use of a translation map. In order to be able to reconstruct erroneous blocks, a parity block may be generated from the N data blocks. The parity block is stored with yet another cloud provider.
  • Although US 2012/0047339 no longer relies on a single cloud storage provider, storage of data remains slow, i.e. limited by the internet speed. In addition, retrieval of data remains dependent on the slowest responding cloud storage provider since the data must be reproduced from data blocks retrieved from the different cloud storage providers. Only when one of the cloud storage providers is not responding at all, the parity block will be retrieved from a third cloud storage provider where it is stored.
  • United States Patent Application US 2011/0296440 entitled “Accelerator System for Use with Secure Data Storage” describes a system and method for accelerating the processing of and secure cloud storage of data. From paragraphs [003]-[0007] of US 2011/0296440, it is learned that the offloading certain processing from the motherboard, i.e. the secure parsing of data, in order to accelerate storage and processing of data is key to the system described in this patent application.
  • In addition to internet latency, it is recognized that computers or servers are at present easily hosting 10 or more virtual machines each of which generates its own input/output (I/O) patterns. These randomized I/O patterns further degrade storage efficiency. A straightforward solution consists in adding storage resources to the backend storage systems but this solution increases the cost for storage.
  • Another disadvantage of existing network based storage techniques that rely on SAN (block based) or NAS (file or object based) lies in the fact that they are LUN specific. A logical unit is a single storage volume that is identified and addressed through its LUN or Logical Unit Number. In a virtualized environment where multiple virtual disks are running on a single logical unit, features like roll back, snapshotting or replicating a single virtual machine are difficult to implement. The only way to roll back a virtual machine lies in retrieving an older snapshot of that virtual machine, mounting the file system and copying the requested virtual disks back to the primary storage logical unit. This is a complex and slow process.
  • Furthermore, existing network based storage techniques are tied to specific storage hardware. As a consequence, replication between different storage providers remains difficult. Gateways have been developed to be installed between the storage systems and hypervisors. These gateways however only solve a small part of the problem. They are difficult to manage, require the storage to reside near the hypervisor and remain expensive.
  • United States Patent Application US 2010/0332401 entitled “Performing Data Storage Operations with a Cloud Storage Environment, Including Automatically Selecting Among Multiple Cloud Storage Sites” for instance describes a method for data storage and migration in a cloud environment. In an attempt to tackle the problem of internet latency and packet loss, a cloud storage gateway introduces local caching and de-duplication. As part of a block based data migration process, data stored in cache (local, primary copies) are moved to cloud storage systems (secondary copies). As illustrated by FIG. 17 and described in paragraphs [0278]-[0286], containerized de-duplication is foreseen to avoid creating unnecessary additional instances of the data within secondary storage, i.e. within the cloud.
  • It is an objective of the present invention to provide a solution to the above identified problems inherent to existing SAN/NAS based network storage techniques. More particularly, it is an objective of the present invention to disclose a method for layered storage of enterprise data that reduces the effect of internet latency, reduces the dependency on particular storage hardware, reduces the storage resource requirements in general while enabling features like zero-copying, snapshotting, cloning, thin provisioning, replicating, rollbacks, etc. of data at virtual machine level.
  • SUMMARY OF THE INVENTION
  • According to the present invention, the above defined objectives are realized through a computer-implemented method for layered storage of enterprise data as defined by claim 1, the method comprising:
  • receiving from one or more virtual machines data blocks having a fixed length block size in the range from 4 kilobyte to 64 kilobyte;
    time-based grouping the data blocks into data containers having a configurable maximum container size typically in the range from 1 megabyte to 256 megabyte;
    dividing each data container in X fixed length mega-blocks, X being a first integer number;
    for each data container applying erasure encoding to the X fixed length mega-blocks to thereby generate Y fixed length mega-blocks with redundant data, Y being a second integer number larger than X; and
    distributed storing the Y fixed length mega-blocks over one or multiple backend storage systems.
  • Thus, the invention concerns a virtual machine centric layered storage method that combines efficient location based storage, i.e. local caching of data blocks, with redundant distributed time based back-end storage of data containers. The time based back-end storage with containerization enables enterprise features like zero copy snapshotting, cloning, replicating, rolling back, etc. at virtual machine level whereas the local caching turns any backend storage virtually into high-speed storage. Thanks to the containerization, the random I/O behavior of the virtual machines turns into sequential write operations. Thereafter, the containers are distributed in a redundant way across available backend storage using erasure encoding. The erasure coding enables to realize a configurable amount of redundancy such that a configurable amount of backend disks can get lost or remain non-responsive without loss of data and this without a need for RAID technology that suffers from lengthy rebuild times, i.e. days or even weeks. The method according to the present invention works with any type of existing backend storage solution, e.g. NAS, SAN, or JBOD (Just a Bunch Of Disks), and is consequently storage hardware independent. The backend storage can be a distributed local object store or it can be a remote object store in the cloud like for instance an Amazon S3 or Openstack Swift implemented object store. The backend store can be distributed over plural hardware units that reside in a single data center, plural hardware units that reside in one or more branch office, or plural hardware units that reside with different cloud storage providers accessible via a public wide area network (WAN) like the internet.
  • The caching of data containers or SCOs prior to distribution in the backend, is preferably implemented using fast memory like flash or SSD drives. The local caching accelerates the time to confirm a write operation to the hypervisor, and as such virtually accelerates the backend storage. The response time to a write request will improve when flash or SSD disks are used to implement the two-tier local caching in comparison to alternate storage devices.
  • Optionally, the data blocks may be de-duplicated per hypervisor serving one or more of the virtual machines and the de-duplicated data blocks may get stored in an additional local de-dup cache memory. For such de-duplicated data blocks stored locally in a de-dup cache, its address, a hash value, a container number and an offset number may be stored as metadata.
  • According to another optional aspect of the computer-implemented method for layered storage of enterprise data according to the current invention, the one or multiple backend storage systems are local or remote cloud storage systems operated by different cloud storage operators.
  • Indeed, in a preferred embodiment of the invention, the backend store consists of storage capacity that resides with different cloud storage providers. This way, retrieval of data shall no longer depend on the availability or responsiveness of a single cloud storage provider. In case of a read request, presence of the requested data in the local cache memory will be verified. In case the requested data are available in the cache, the local cache will serve the read request. In case the requested data are not available in the local cache, the data will have to be retrieved from the backend stores. In case the data are distributed over backend storage systems from different cloud storage providers, the redundancy introduced through erasure coding makes it possible to rely on the fastest responding cloud storage systems. As soon as X out of Y mega-blocks can be delivered by one or more fast responding cloud storage providers, the data container can be re-established and the read request can be served.
  • Further optionally, as defined by claim 3, the computer-implemented method for layered storage of enterprise data according to the present invention comprises: storing for each data block its address, a container number and an offset number as metadata.
  • Indeed, the local caching wherein data blocks are sequentially written in containers is preferably implemented as a transaction based cache. During each write operation in a container, the address of the data block, the number of the container wherein it is stored and the offset or pointer to the location in the container where it is stored, are logged as metadata. Once the transaction is logged, the write operation can be confirmed to the hypervisor. In case of a read request, the address will be used for look-up in the metadata. The read request will then be served from the local cache memory when the requested data are still available resulting in very fast read request handling. If not available in the local cache, the metadata will be used to retrieve the appropriate containers and requested data blocks from the backend storage system(s).
  • Still optionally, as defined by claim 4, the computer-implemented method for layered storage of enterprise data according to the present invention comprises: storing each data block that is not yet part of a data container that is remotely stored, in a distributed transaction log of another storage accelerator.
  • This mechanism will ensure that each data block of for instance 4 kbyte that is not yet part of a container that is stored in backend storage, is copied and synced onto fast memory, e.g. flash or SSD storage, of another storage accelerator. Storage in a distributed transaction log of another storage accelerator is done before a write request is confirmed to the requestor.
  • According to claim 5, the computer-implemented method for layered storage of enterprise data according to the present invention may be implemented with multiple backend storage systems comprising various types of backend storage systems.
  • Indeed, an advantage of the present invention is that it does not rely on particular backend storage technology, e.g. SAN NAS or JBOD. The backend storage system may be deployed locally or they may reside in one or more data centre, one or more branch office, in the public cloud, or a combination thereof
  • Further optionally, as defined by claim 6, the computer-implemented method for layered storage of enterprise data according to the present invention may comprise:
  • scrubbing the mega-blocks distributed over multiple backend storage systems.
  • The second layer storage, i.e. the time based storage that forms part of the present invention, requires maintenance of older containers or snapshots stored in the backend storage system for performance reasons. Thereto, a scrubber that has access to the transaction logs can independently from the storage acceleration delete obsolete containers stored in the backend storage system and update the corresponding metadata and transaction logs. An advantageous aspect of the present invention is that the scrubbing does not impact the performance of read/write operations because most of these read/write operations will be handled by the local caching layer.
  • In addition to the computer-implemented method defined by claim 1, the present invention also concerns a corresponding data processing system as defined by claim 7 comprising means for carrying out the method.
  • The present invention further also relates to one or more non-transitory computer readable storage mediums as defined by claim 8, comprising having a computer program stored thereon for execution of the method according to the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a functional block scheme illustrating an embodiment of the method for layered storage of enterprise data according to the present invention;
  • FIG. 2 illustrates a first use case of the method according to the present invention, i.e. protection against disk failures;
  • FIG. 3 illustrates a second use case of the method according to the present invention, i.e. a hybrid cloud storage implementation;
  • FIG. 4 illustrates a third use case of the method according to the present invention, i.e. a virtual desktop infrastructure implementation; and
  • FIG. 5 illustrates a fourth use case of the method according to the present invention, i.e. a cloud storage gateway implementation.
  • DETAILED DESCRIPTION OF EMBODIMENT(S)
  • FIG. 1 shows the functional blocks or components of a cloud storage router or CSR which is an embodiment of the computer program according to the present invention that performs an embodiment of the method according to the present invention when executed by a processor. The cloud storage router illustrated by FIG. 1 is a virtual machine centric layered storage application that combines the function of a SAN, high speed SSD caching, backup and off-site replication, encryption and compression. The cloud storage router is a software implemented storage application that installs on any x86 hardware or can be used as a virtual storage appliance (VSA). The cloud storage router comprises a containerization function 105 that generates containers, the so called Storage Container Objects or SCO's 151 and 152, from a configurable collection of 4 kbyte data blocks. The SCO's have a configurable maximum length that is supposed to be set to 256 Mbyte in FIG. 1. The generated SCO's can be stored in backend storage systems, e.g. a distributed local object store running within the same VSA or remote object storage like for instance an Amazon S3 or an Openstack Swift implemented storage. This second storage layer wherein the 4 kbyte long data blocks are containerized in SCO's is a time-based storage implementation because the 4 kbyte long data blocks are sequentially stored in the SCO's. The second, time-based storage layer enables features like unlimited zero-copy snapshotting, cloning, and easy replication. The second layer makes the storage independent of the location. The object can be stored distributed over different x86 hardware units that reside within one or more data centers, within one or more branch offices, or that reside remotely in the public cloud where they are reachable via a WAN like the internet.
  • The location based storage layer is implemented by a metadata lookup map 104, a transaction log 103 and an SCO generator 105. They are jointly called the storage accelerator since the location-based storage layer virtually accelerates storage in any backend storage system. The storage accelerator brings storage closer to the hypervisor in a virtual machine centric way, making use of an intelligent 2-tier caching algorithm. “Virtual machine centric” in this context means that it enables to do all storage related activities like snapshotting, cloning or replication at the virtual machine level, enabling for instance administrators to easily select only the most important virtual machines for replication. The caching can be done inside the hypervisor by using a Virtual Storage Appliance or VSA to access local flash or SSD drives. The storage accelerator uses SSD or flash storage to serve as a read and write cache, virtually turning any backend storage capacity into high-speed storage.
  • The write caching works as a transaction log based cache on fast flash or SSD redundant storage. Data blocks of 4 kbyte are received by the block device 101 on IO path 108. The write cache is accomplished by filling up SCO's 151, 152 with each new 4 k block that is written. As a result, any random write I/O behavior is turned into a sequential write operation. During each write, the address 141 of the 4 k block, and the location 142 comprising the SCO number 142A and the offset 142B within that SCO are stored as metadata in the metadata lookup map 104. The address 131 and location 132 are also logged by transaction log 103. The storage accelerator uses a distributed database to provide redundancy and immediate access to the metadata in case the volume is moved or failed-over to another cloud storage router. Only real data that is written to the cloud storage router will be stored as a result of which thin provisioning becomes possible. The transaction log 103 will immediately confirm the write to the hypervisor hence enabling fast response times while the SCO's 151, 152 can be pushed at a slower pace to the backend storage system(s).
  • In case of a read request, the storage accelerator will look-up the location in the metadata lookup map 104. If the requested data block exists in the cache, the storage accelerator will serve the read requests directly from the local SSD or flash storage, resulting in very fast read I/O operations.
  • The storage accelerator writes or reads the data using SCO's when it is communicating with the backend storage systems, not shown in FIG. 1. Because of the transaction log based approach in the storage accelerator, written data resides on redundant flash or SSD storage before it reaches the redundant backend storage. The storage accelerator drawn in FIG. 1 indeed enables a distributed transaction log 107. The distributed transaction log 107 will make sure that each 4 k block 171, 172 that is not yet part of an SCO that is stored in the backend storage system is synced to SSD or flash storage on another storage accelerator before it confirms the write to the guest. The size of the distributed transaction log 107 is very limited because there is only a need to protect data blocks 171, 172 that are not yet stored on the backend storage system. This is typically only a few outstanding SCO's per logical unit, which corresponds to Mbyte sizes rather than Gbyte sizes. However, the slower the backend storage or the higher the latency towards the backend storage, the larger the write cache that will be needed to cope with write bursts and the larger the distributed transaction log 107.
  • The cloud storage router illustrated by FIG. 1 implements a second, time-based storage layer. The SCO's 151, 152 and each set of transaction logs 131, 132, 133 that are created by the storage accelerator are therein treated as objects. For this reason, the containers that consists of a configurable maximum amount of 4 k blocks are called SCO's or Storage Container Objects. The benefit of this time-based storage layer is that it enables enterprise features like zero-copy snapshots and cloning.
  • The time-based storage also requires maintenance in order to compact and clean-up older SCO's and/or deleted snapshots. By having all transaction logs and SCO's stored on backend storage systems, these maintenance tasks can be offloaded from the storage accelerator. A scrubber, not drawn in FIG. 1, does the maintenance of the time-based storage. This scrubber works completely independent from the storage accelerator since it has access to all transaction logs and SCO's stored in the backend storage system. Once the scrubber has finished cleaning-up older SCO's and/or deleted snapshots, it will create an updated set of transaction logs that is being used by the storage accelerator to update the local metadata stored in the metadata lookup map 104. The obsolete SCO's on the backend storage system are deleted. Each scrubber is configurable to work on one or more namespace. Because of the caching in the storage accelerator, the maintenance work does not impact performance. As already mentioned here above, most read and write I/O requests will be served from the SCO cache. In the event of a disaster where the complete storage accelerator is lost, all the logical units can be rebuilt from the backend storage system.
  • In order to store the objects in a distributed and redundant way in the backend storage system, the storage accelerator has a storage distributor 106 in addition to the storage accelerator components 103, 104, 105. On the other hand, a third party cloud storage backend can be easily added by using the cloud storage initiator. Since the cloud storage initiator only stores, retrieves and changes SCO's, it makes it easy to add new cloud storage backend systems. Thus, the storage accelerator can basically use the storage distributor 106 or any cloud storage as a backend storage system to store the SCO's.
  • The storage distributor 106 will distribute written transaction logs 131, 132 coming from the storage accelerator towards the available resources on the distributed backend storage system. In addition, the storage distributor 106 divides each SCO in mega-blocks of typically 1 Mbyte. Erasure coding is then applied to each set of X mega-blocks in order to generate a set of Y mega-blocks containing redundant data since Y>X. Erasure coding is used to equally spread storage of the Y mega-blocks over the attached distributed backend storage system. Erasure coding is an algorithm that allows to create a configurable amount of redundant data such that a configurable amount of disks in the backend storage system can get lost, without data loss. The redundancy is created by pure processing power that doesn't require any read access on the storage capacity blocks to calculate parity, resulting in less disk spindles, less power consumption, better performance and the ability to distribute the data over the backend storage system. Furthermore, rebuilding redundancy for erasure encoded mega-blocks has no performance impact compared to for instance RAID technology. RAID has been designed for smaller disks. Nowadays, disks are reaching 3 to 4 Tbyte in capacity as a result of which RAID rebuild times after a lost disk are in the order of days and/or weeks rather than hours, resulting in a much higher chance of losing data and suffering serious performance issues during rebuilds.
  • Erasure coding allows to change the number of blocks to be stored any time. When new storage capacity is added or old storage capacity is removed, whether it is a single disk, a logical unit or a share, the storage distributor 106 will start spreading the data using all the new available blocks. This enables to grow or shrink data capacity and change redundancy on demand, independent of any storage technology or hardware that is used in the backend storage system(s).
  • If SCO's need to be stored in the cloud or on big data object infrastructure, the storage accelerator can be configured to use a cloud storage initiator, not drawn in FIG. 1, instead of the storage distributor 106. Due to the caching layer, i.e. SCO cache 105, the backend storage system can even be remote in the public cloud, accessible through a WAN with high latency like the Internet, while still experiencing local SSD/flash storage performance. When a slower backend storage system is used, it is recommended to use more flash or SSD storage capacity in the caching layer. By adding a larger SCO cache 105 the cloud storage router can deal with bigger bursts of written data without performance impact for the end user. Keeping more SCO's in cache storage helps to offset some of the penalties that are associated with accessing a slow backend storage system, mainly due to latency.
  • Each created logical unit will be stored in a different namespace or bucket in the case of an Amazon S3 or Openstack Swift backend storage system implementation. Because storage maintenance tasks can be done completely independent by the backend storage system, all SCO's stored in a public cloud system can be accessed from within that same public cloud system without the need of going over a slow internet link. In case of an Amazon S3 backend storage system for instance, the scrubbers can be spun up on EC2, using the local Amazon bandwidth to do the storage maintenance and shut downed once the work is done, resulting in less bandwidth consumption and lower Amazon costs. The storage accelerator will receive the changes on the SCO's by retrieving the scrubber transaction log via an object request to update his local metadata and will send the delete requests for the obsolete SCO's to the Amazon S3 backend. Furthermore, the whole logical unit can be rebuilt if necessary by accessing the namespace/bucket from a different location in case the cloud storage router on the initial location is lost.
  • FIG. 2 illustrates a first use case of cloud storage router software program of FIG. 1, i.e. protection against disk failures. In FIG. 2, the hypervisors running on three standard x86 servers 201, 202 and 203 are drawn. Each of these servers, like for instance server 202, contains a flash drive 225 and six large SATA drives 226. By spreading the data across the different servers 201, 202, 203 using erasure coding with a disk safety policy of 18/7 meaning that 7 disks out of 18 can get lost, a full physical server, e.g. server 201, can be lost and even another drive, e.g. one of the drives 226, can be lost without any data loss.
  • When the cloud storage router, CSR or 221, consists of the storage accelerator, i.e. the SCO caching layer 222 with metadata lookup map MD or 224, and the storage distributor 223 applying erasure coding, it can be installed as local Virtual Storage Appliance or VSA that will work like a converged infrastructure setup. This allows administrators to scale out hypervisor and storage capacity with standard x86-hardware. In a converged infrastructure setup, the cloud storage router 221 is installed as a VSA inside the hypervisor and uses local flash or SSD storage 225 for caching. Additionally, the SCO cache 222 uses local SAS or SATA drives 226 to store the data from virtual machines 227 and 228 containerized in SCO's which can further be configured for different redundancy levels by using the erasure coding or EC technology in the storage distributor 223. The converged infrastructure becomes scalable and redundant when multiple converged nodes are added to the network. When multiple converged nodes like 201, 202 and 203 are registered, the storage distributor 223 is able to spread each SCO by using erasure coding to all available SAS/SATA drives in all registered nodes 201, 202 and 203.
  • By implementing the desired disk safety policy in the storage distributor 223, customers can lose a full server 201 and even additional disks without losing data. In the example above, a disk safety policy of 18/7 is used, allowing 7 disks out of 18 available disks—6 SAS/SATA disks are available in each node—to fail. As a consequence, a full node 201 can fail and even an additional disk on another node 202 can fail while still serving the workloads. The volumes that were running on the failed converged node 201 can be restarted on the remaining nodes 202 and 203 thanks to the distributed redundancy introduced by the cloud storage router. It is further noticed that customers can start with a single node and can change the disk safety policy when new nodes are added, allowing to scale.
  • The read performance of each node 201, 202, 203 in a converged infrastructure like the one illustrated by FIG. 2, is close to the sum of all SSD's used for caching. This means that with only a single SSD, more than 50.000 IOPS can be achieved on random 4 k data block reads. The write performance is increased in a number of ways: by adding a larger SCO cache, adding SSD or flash storage disks, adding nodes, and sequencing random write I/O operations. In a standard multi-node converged setup the cloud storage router 221 is configured to only use a fraction of the SSD storage 225 to act as write cache or SCO cache because spinning disks perform well on sequential 4 k writes.
  • The cloud storage router 221 is also hypervisor agnostic, allowing for instance to use VMware on one site and Hyper-V on another site. Thanks to its virtual machine or VM-centric approach, the cloud storage router 221 can replicate or move workloads between converged nodes 201, 202 and 203 running a different hypervisor technology without conversion.
  • Using the cloud storage router 221 in a converged infrastructure setup like the one illustrated by FIG. 2, customers don't need to manage a separate SAN or NAS infrastructure.
  • This reduces the infrastructure cost, but also the operational costs. It allows customers to better scale costs to their business needs, reduce storage management and significantly reduce the power consumption of the cloud Infrastructure by using a VSA and erasure encoding.
  • FIG. 3 illustrates a second use case of the cloud storage router computer program of FIG. 1, i.e. a hybrid cloud storage implementation.
  • The cloud storage router of FIG. 1 can be installed as a Virtual Storage Appliance or VSA on a running cloud infrastructure, making use of the existing backend storage infrastructure like SAN or NAS. The cloud storage router can leverage existing cloud infrastructures like for instance FlexPod or V-Block. By installing the cloud storage router on existing cloud infrastructure it can act as a replication target for a smaller converged infrastructure running in a branch office. By leveraging virtual machine centric storage workloads, customers can move between cloud infrastructures independent of the hypervisor or the storage technology that is being used. For example, a virtual machine running locally on a converged Hyper-V infrastructure can be moved to a public cloud running VMware on a FlexPod infrastructure. True data mobility is achieved because the cloud storage router is agnostic of the underlying hypervisor or storage backend system.
  • In FIG. 3, a few use cases are drawn that will drive the hybrid cloud model.
  • With customer A or 301, the disaster recovery scenario is illustrated. In this use case, a customer 301 can use the central data center 300 as a disaster recovery site, independent of the hypervisor or storage technology that is being used on both sides. The central data center 300 can be public cloud infrastructure or enterprise cloud infrastructure running the cloud storage router software program according to the present invention next to its current workloads. As a result of the virtual machine centric approach, only the differences between the selected virtual machines 311 are replicated to the central data center 300 using an intelligent replication algorithm to optimize the bandwidth usage. Thanks to the hypervisor agnostic approach, the local hypervisor can be different from the one in the data center 300. The cloud storage router also virtualizes the actual storage layer, making it possible to use Just a Bunch Of Disks or JBOD technology in one place and leveraging existing NAS/SAN storage in the data center 300.
  • With customer B or 302, the scenario of moving workloads is illustrated. By using the time based storage model to store the SCO's in backend storage systems, replication comes down to syncing the missing SCO's since the last common snapshots that exist at both locations. When the data are in sync, workloads can be moved from a remote site to the data center 300. Once the virtual machine 312 boots up in the data center 300, it will start creating its own changed SCO's from that common snapshot. Moving the workload back to its original location comes down to keeping track of these generated SCO's and syncing them back to the original location. Moving workloads can be done at the virtual machine level by syncing changes in both directions, independent from which hypervisor or storage hardware is being used at both locations. The cloud storage router software according to the present invention hence is an ideal technology to move workloads.
  • With customer C or 303, the use case of unified templates is illustrated. Indeed, unified templates 313 can be created and provisioned to run on different hypervisors, VMWare and Hyper-V for example, allowing administrators to easily push virtual appliances to their end users, independent of which hypervisor they are using.
  • With customer D1 or 304A and customer D2 or 304B, the multi-site scenario is illustrated. When running the cloud storage router software from FIG. 1 as converged infrastructure, the users 304A and 304B can replicate or move workloads like virtual machine 314 between different sites, independent of their running hypervisors.
  • FIG. 4 illustrates a third use case of the cloud storage router software program of FIG. 1, i.e. a virtual desktop infrastructure implementation.
  • The cloud storage router CSR or 420 can spin up thin clones from a master template 431. CSR 420 thereto has an advanced REST API, allowing to spin up multiple virtual machines, 411, 412, 413, 414, 415, using only a few API calls. This makes CSR 420 ideal to integrate into a virtual desktop integration or VDI solution. Once the master template 431 is uploaded into CSR 420, each virtual desktop, VDI1, VDI2, VDI3, VDI4 and VDI5 will be a thin clone of the master template 431 on the backend. Only the individual changes of each virtual desktop will be tracked as SCO's in a different namespace, while the hypervisor 401 will see each thin cloned logical unit or vDisk as a full writeable volume. By making use of the cache 421, boot storms are avoided because each virtual machine will read the same data during boot and all these read requests will be served out of flash or SSD storage 424, 425, 426. By using thin cloning, customers can keep track of individual changes in each desktop and provide features like continuous data protection or replication. It is noticed that when CSR 420 would be run in a converged setup, one or more servers could be used to serve hundreds or more desktops. By adding more nodes, the storage and hypervisor capacity will scale out, resulting in extra virtual desktop capacity. Because these desktops can run on standard hardware, without the need of an external storage array, a VDI solution already becomes advantageous when only a few desktops are needed. A standard server can easily handle hundred virtual desktops without the need of a small SAN or NAS that typically costs a fourfold of the server to run a similar amount of virtual desktops.
  • The VDI solution could be running on standard x86 hardware 402, containing for instance SSD storage and SAS/SATA drives, not drawn in FIG. 4. The master template would be preloaded in the system, while for each virtual desktop VDI1 . . . VDI5 only the changes would be tracked. The same features that apply to the converged infrastructure use case would also apply to this VDI setup. This would enable customers to scale gradually by adding additional nodes and would allow to use virtual machine centric enterprise storage features like unlimited snapshotting or replication of individual virtual desktops.
  • As is illustrated by FIG. 4, by using the cloud storage initiator instead of the local erasure coding client and by making use of a larger read/write cache, customers can take advantage of running their virtual desktops 432, 433, 434, 435, 436 on public cloud storage 403. The cloud storage initiator will store the SCO's as objects in the cloud storage system 403, for example Amazon S3 or Swift Openstack implementations or a remote data center. Also, due to the use of thin cloning and thin provisioning, a master bucket/namespace can be used to store the VDI templates. Additionally, thin cloning can be used to store individual changes on each virtual desktop into a different bucket/namespace to keep track of the individual changes and snapshots.
  • FIG. 5 illustrates a fourth use case of the cloud storage router computer program of FIG. 1, i.e. a cloud storage gateway implementation.
  • The cloud storage router 501, 502 can be set up as a single or redundant cloud storage gateway by making use of a redundant storage accelerator implementation that is using the Fail-Over cache mechanism FO CACHE in the cloud storage router 501, 502. The cloud storage gateway can act as an iSCSI storage or even as a NAS device that creates a new namespace or bucket, Bucket A or 531, Bucket B or 532, Bucket C or 533, Bucket D or 534, for each logical unit 541, 542, 543, 544, that is being created. A similar setup has been explained in the virtual desktop integration use case here above. It can run as a clustered virtual or physical appliance, allowing customers to store data into cloud storage 503. The cloud storage gateway will be able to leverage all the storage features from the cloud storage router and enable read/write caching from the storage accelerator. This will give customers an experience as if they had their storage on-premise. In addition, thin clones or unlimited snapshots can be created to allow administrators to quickly create new writeable copies of their data or quickly roll back to an older snapshot. In case of an entire CSR cluster failure due to a disaster on site, all the logical units from the backend storage system can be rebuilt.
  • The method according to the invention shall typically be computer-implemented on a data processing system or computing device. A data processing system or computing device that is operated according to the present invention can include a workstation, a server, a laptop, a desktop, a hand-held device, a mobile device, a tablet computer, or other computing device, as would be understood by those of skill in the art.
  • The data processing system or computing device can include a bus or network for connectivity between several components, directly or indirectly, a memory or database, one or more processors, input/output ports, a power supply, etc. One of skill in the art will appreciate that the bus or network can include one or more busses, such as an address bus, a data bus, or any combination thereof, or can include one or more network links. One of skill in the art additionally will appreciate that, depending on the intended applications and uses of a particular embodiment, multiple of these components can be implemented by a single device. Similarly, in some instances, a single component can be implemented by multiple devices.
  • The data processing system or computing device can include or interact with a variety of computer-readable media. For example, computer-readable media can include Random Access Memory (RAM), Read Only Memory (ROM), Electronically Erasable Programmable Read Only Memory (EEPROM), flash storage or other storage technologies, CDROM, digital versatile disks (DVD) or other optical or holographic media, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices that can be used to encode information and can be accessed by the data processing system or computing device.
  • The memory can include computer-storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, non-removable, or any combination thereof. Exemplary hardware devices are devices such as hard drives, solid-state memory, optical-disc drives, or the like. The data processing system or computing device can include one or more processors that read data from components such as the memory, the various I/O components, etc.
  • The I/O ports can allow the data processing system or computing device to be logically coupled to other devices, such as I/O components. Some of the I/O components can be built into the computing device. Examples of such I/O components include a microphone, joystick, recording device, game pad, satellite dish, scanner, printer, wireless device, networking device, or the like.
  • Although the present invention has been illustrated by reference to specific embodiments, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied with various changes and modifications without departing from the scope thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. In other words, it is contemplated to cover any and all modifications, variations or equivalents that fall within the scope of the basic underlying principles and whose essential attributes are claimed in this patent application. It will furthermore be understood by the reader of this patent application that the words “comprising” or “comprise” do not exclude other elements or steps, that the words “a” or “an” do not exclude a plurality, and that a single element, such as a computer system, a processor, or another integrated unit may fulfil the functions of several means recited in the claims. Any reference signs in the claims shall not be construed as limiting the respective claims concerned. The terms “first”, “second”, third”, “a”, “b”, “c”, and the like, when used in the description or in the claims are introduced to distinguish between similar elements or steps and are not necessarily describing a sequential or chronological order. Similarly, the terms “top”, “bottom”, “over”, “under”, and the like are introduced for descriptive purposes and not necessarily to denote relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances and embodiments of the invention are capable of operating according to the present invention in other sequences, or in orientations different from the one(s) described or illustrated above.

Claims (8)

1. A computer-implemented method for layered storage of enterprise data, said method comprising:
receiving from one or more virtual machines data blocks having a fixed length block size in the range from 4 kilobyte to 64 kilobyte;
time-based grouping said data blocks into data containers having a configurable maximum container size in the range from 1 megabyte to 256 megabyte;
dividing each data container in X fixed length mega-blocks, X being a first integer number;
for each data container applying erasure encoding to said X fixed length mega-blocks to thereby generate Y fixed length mega-blocks with redundant data, Y being a second integer number larger than X; and
distributed storing said Y fixed length mega-blocks over one or multiple backend storage systems.
2. A computer-implemented method for layered storage of enterprise data according to claim 1, wherein said one or multiple backend storage systems are local and/or remote cloud storage systems operated by different cloud storage operators.
3. A computer-implemented method for layered storage of enterprise data according to claim 1, further comprising:
storing for each data block its address, a container number and an offset number as metadata.
4. A computer-implemented method for layered storage of enterprise data according to claim 1, further comprising:
storing each data block that is not yet part of a data container that is remotely stored, in a distributed transaction log of another storage accelerator.
5. A computer-implemented method for layered storage of enterprise data according to claim 1, wherein said one or multiple backend storage systems comprise various types of backend storage systems.
6. A computer-implemented method for layered storage of enterprise data according to claim 1, further comprising:
scrubbing said mega-blocks distributed over multiple backend storage systems.
7. A data processing system comprising:
one or more processing units; and
one or more computer-readable mediums having stored thereon executable instructions, which, when executed by the one or more processing units, cause the system to perform at least the following for layered storage of enterprise data:
receive from one or more virtual machines data blocks having a fixed length block size in the range from 4 kilobyte to 64 kilobyte;
time-based group said data blocks into data containers having a configurable maximum container size in the range from 1 megabyte to 256 megabyte;
divide each data container in X fixed length mega-blocks, X being a first integer number;
for each data container applying erasure encoding to said X fixed length mega-blocks to thereby generate Y fixed length mega-blocks with redundant data, Y being a second integer number larger than X; and
distributed store said Y fixed length mega-blocks over one or multiple backend storage systems.
8. One or more non-transitory computer readable storage mediums having instructions stored thereon, which, when executed on a computer comprising one or more processing units, cause the one or more processing units to perform a method for layered storage of enterprise data, the method comprising:
receiving from one or more virtual machines data blocks having a fixed length block size in the range from 4 kilobyte to 64 kilobyte;
time-based grouping said data blocks into data containers having a configurable maximum container size in the range from 1 megabyte to 256 megabyte;
dividing each data container in X fixed length mega-blocks, X being a first integer number;
for each data container applying erasure encoding to said X fixed length mega-blocks to thereby generate Y fixed length mega-blocks with redundant data, Y being a second integer number larger than X; and
distributed storing said Y fixed length mega-blocks over one or multiple backend storage systems.
US15/407,413 2013-04-30 2017-01-17 Method for layered storage of enterprise data Expired - Fee Related US10089009B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/407,413 US10089009B2 (en) 2013-04-30 2017-01-17 Method for layered storage of enterprise data

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
EP13166044.1A EP2799973B1 (en) 2013-04-30 2013-04-30 A method for layered storage of enterprise data
EP13166044 2013-04-30
EP13166044.1 2013-04-30
US14/247,362 US9547453B2 (en) 2013-04-30 2014-04-08 Method for layered storage of enterprise data
US15/407,413 US10089009B2 (en) 2013-04-30 2017-01-17 Method for layered storage of enterprise data

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/247,362 Continuation-In-Part US9547453B2 (en) 2013-04-30 2014-04-08 Method for layered storage of enterprise data

Publications (2)

Publication Number Publication Date
US20170123675A1 true US20170123675A1 (en) 2017-05-04
US10089009B2 US10089009B2 (en) 2018-10-02

Family

ID=58638431

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/407,413 Expired - Fee Related US10089009B2 (en) 2013-04-30 2017-01-17 Method for layered storage of enterprise data

Country Status (1)

Country Link
US (1) US10089009B2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190004844A1 (en) * 2017-06-26 2019-01-03 Wangsu Science & Technology Co., Ltd. Cloud platform construction method and cloud platform
US20190173949A1 (en) * 2017-03-22 2019-06-06 Hitachi, Ltd. Data processing system
WO2019209403A1 (en) * 2018-04-24 2019-10-31 Western Digital Technologies, Inc. Reduced storage of metadata in a distributed encoded storage system
US10474368B1 (en) 2018-04-24 2019-11-12 Western Digital Technologies, Inc Fast read operation utilizing reduced storage of metadata in a distributed encoded storage system
US20200068010A1 (en) * 2018-08-25 2020-02-27 Panzura, Inc. Managing a cloud-based distributed computing environment using a distributed database
US10809919B2 (en) * 2014-06-04 2020-10-20 Pure Storage, Inc. Scalable storage capacities
CN112083646A (en) * 2020-08-21 2020-12-15 珠海格力电器股份有限公司 Redundancy control method, device and control system
WO2021012932A1 (en) * 2019-07-22 2021-01-28 中兴通讯股份有限公司 Transaction rollback method and device, database, system, and computer storage medium
US11099948B2 (en) * 2018-09-21 2021-08-24 Microsoft Technology Licensing, Llc Persistent storage segment caching for data recovery
US11467967B2 (en) 2018-08-25 2022-10-11 Panzura, Llc Managing a distributed cache in a cloud-based distributed computing environment
US11567837B2 (en) 2018-08-25 2023-01-31 Panzura, Llc Journaling data received in a cloud-based distributed computing environment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201819616D0 (en) * 2018-11-30 2019-01-16 Graphcore Ltd Virtualised gateways

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181395A1 (en) * 2012-12-20 2014-06-26 Amazon Technologies, Inc. Virtual tape library system

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7926046B2 (en) * 2005-12-13 2011-04-12 Soorgoli Ashok Halambi Compiler method for extracting and accelerator template program
US8578083B2 (en) 2009-03-03 2013-11-05 Vmware, Inc. Block map based I/O optimization for storage virtual appliances
US8112607B2 (en) 2009-05-07 2012-02-07 Sap Ag Method and system for managing large write-once tables in shadow page databases
US20100332401A1 (en) 2009-06-30 2010-12-30 Anand Prahlad Performing data storage operations with a cloud storage environment, including automatically selecting among multiple cloud storage sites
US8312471B2 (en) 2010-04-26 2012-11-13 Vmware, Inc. File system independent content aware cache
US8589650B2 (en) 2010-05-17 2013-11-19 Texas Instruments Incorporated Dynamically configurable memory system
US8824492B2 (en) 2010-05-28 2014-09-02 Drc Computer Corporation Accelerator system for remote data storage
US20120047339A1 (en) 2010-08-20 2012-02-23 Dan Decasper Redundant array of independent clouds
US8504531B2 (en) 2010-11-27 2013-08-06 Netapp, Inc. System and method for application aware de-duplication of data blocks on a virtualized storage array
US20120179874A1 (en) 2011-01-07 2012-07-12 International Business Machines Corporation Scalable cloud storage architecture
US9417899B2 (en) 2013-03-14 2016-08-16 International Business Machines Corporation Memory page de-duplication in a computer system that includes a plurality of virtual machines

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181395A1 (en) * 2012-12-20 2014-06-26 Amazon Technologies, Inc. Virtual tape library system

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10809919B2 (en) * 2014-06-04 2020-10-20 Pure Storage, Inc. Scalable storage capacities
US11385799B2 (en) 2014-06-04 2022-07-12 Pure Storage, Inc. Storage nodes supporting multiple erasure coding schemes
US11102299B2 (en) * 2017-03-22 2021-08-24 Hitachi, Ltd. Data processing system
US20190173949A1 (en) * 2017-03-22 2019-06-06 Hitachi, Ltd. Data processing system
US20190004844A1 (en) * 2017-06-26 2019-01-03 Wangsu Science & Technology Co., Ltd. Cloud platform construction method and cloud platform
US10761869B2 (en) * 2017-06-26 2020-09-01 Wangsu Science & Technology Co., Ltd. Cloud platform construction method and cloud platform storing image files in storage backend cluster according to image file type
WO2019209403A1 (en) * 2018-04-24 2019-10-31 Western Digital Technologies, Inc. Reduced storage of metadata in a distributed encoded storage system
US10474368B1 (en) 2018-04-24 2019-11-12 Western Digital Technologies, Inc Fast read operation utilizing reduced storage of metadata in a distributed encoded storage system
US10749958B2 (en) 2018-04-24 2020-08-18 Western Digital Technologies, Inc. Reduced storage of metadata in a distributed encoded storage system
US20200068010A1 (en) * 2018-08-25 2020-02-27 Panzura, Inc. Managing a cloud-based distributed computing environment using a distributed database
US11467967B2 (en) 2018-08-25 2022-10-11 Panzura, Llc Managing a distributed cache in a cloud-based distributed computing environment
US11470146B2 (en) * 2018-08-25 2022-10-11 Panzura, Llc Managing a cloud-based distributed computing environment using a distributed database
US11567837B2 (en) 2018-08-25 2023-01-31 Panzura, Llc Journaling data received in a cloud-based distributed computing environment
US11099948B2 (en) * 2018-09-21 2021-08-24 Microsoft Technology Licensing, Llc Persistent storage segment caching for data recovery
WO2021012932A1 (en) * 2019-07-22 2021-01-28 中兴通讯股份有限公司 Transaction rollback method and device, database, system, and computer storage medium
CN112083646A (en) * 2020-08-21 2020-12-15 珠海格力电器股份有限公司 Redundancy control method, device and control system

Also Published As

Publication number Publication date
US10089009B2 (en) 2018-10-02

Similar Documents

Publication Publication Date Title
US9547453B2 (en) Method for layered storage of enterprise data
US10089009B2 (en) Method for layered storage of enterprise data
US11210219B1 (en) Synchronously replicating a dataset across a plurality of storage systems
US11960777B2 (en) Utilizing multiple redundancy schemes within a unified storage element
US11112989B2 (en) Utilizing a decentralized storage network for data storage
US11593036B2 (en) Staging data within a unified storage element
US11093324B2 (en) Dynamic data verification and recovery in a storage system
US10534677B2 (en) Providing high availability for applications executing on a storage system
CN110023896B (en) Block merging in a direct mapped flash memory storage system
US11656804B2 (en) Copy using metadata representation
US20220091771A1 (en) Moving Data Between Tiers In A Multi-Tiered, Cloud-Based Storage System
US20180024964A1 (en) Disaggregated compute resources and storage resources in a storage system
US10942869B2 (en) Efficient coding in a storage system
US20220217049A1 (en) Path Management For Container Clusters That Access Persistent Storage
US20230418716A1 (en) Anti-entropy-based metadata recovery in a strongly consistent distributed data storage system
US11573711B2 (en) Enhanced data encryption in distributed datastores using random tweaks stored in data blocks
US20230229363A1 (en) Tiering Valid Data after a Disaster Recovery Operation
WO2018022779A1 (en) Evacuating blades in a storage array that includes a plurality of blades
US11461273B1 (en) Modifying storage distribution in a storage system that includes one or more storage devices
US11379383B2 (en) Data encryption in a two-tier storage system

Legal Events

Date Code Title Description
AS Assignment

Owner name: INURON, BELGIUM

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GLAZEMAKERS, KURT;REEL/FRAME:041163/0835

Effective date: 20170203

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20221002