US20160080490A1 - Online data movement without compromising data integrity - Google Patents

Online data movement without compromising data integrity Download PDF

Info

Publication number
US20160080490A1
US20160080490A1 US14/486,198 US201414486198A US2016080490A1 US 20160080490 A1 US20160080490 A1 US 20160080490A1 US 201414486198 A US201414486198 A US 201414486198A US 2016080490 A1 US2016080490 A1 US 2016080490A1
Authority
US
United States
Prior art keywords
data
data store
store
allocation
resiliency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/486,198
Inventor
Surendra Verma
Emanuel Paleologu
Erik Gregory Hortsch
Karan Mehra
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Priority to US14/486,198 priority Critical patent/US20160080490A1/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MEHRA, KARAN, VERMA, SURENDRA, HORTSCH, ERIK GREGORY, PALEOLOGU, EMANUEL
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Priority to CN201580049784.XA priority patent/CN106687911B/en
Priority to EP15775033.2A priority patent/EP3195103A1/en
Priority to PCT/US2015/049873 priority patent/WO2016044111A1/en
Publication of US20160080490A1 publication Critical patent/US20160080490A1/en
Priority to US15/645,515 priority patent/US10178174B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Definitions

  • Computing systems have become ubiquitous, ranging from small embedded devices to phones and tablets to PCs and backend servers. Each of these computing systems includes some type of data storage and typically, many different types of data storage.
  • a computing system may include solid-state storage and a hard drive or set of hard drives.
  • the solid-state storage may be able to handle read and write I/O requests more quickly than the hard drive, but may not have the storage capacity of the hard drive.
  • Other media such as tape drives, DVDs (or other optical media) or other kinds of media may have different advantages and disadvantages when reading, writing and storing data.
  • Embodiments described herein are directed to modifying storage capacity within a data store and to modifying resiliency for at least a portion of a data store.
  • a computer system receives a request to move data.
  • the request to move data may specify a data store to move the data off of, a data store to move the data to, or may allow the computer system to select where the data is moved from and/or moved to.
  • the computer system may determine that data is to be moved from an allocation on one data store to a new allocation on another data store.
  • the computer system may create a new allocation on the other data store, where the new allocation is configured to receive data from the first data store.
  • the computer system then moves the data to the new allocation on the second data store as data I/O requests are received at the first data store.
  • Data store access requests are synchronized with the data movement by directing the data store access requests to the first data store, to the second data store or to both data stores depending on the type of access request.
  • a computer system modifies resiliency for a data store.
  • the computer system determines that a resiliency scheme for at least part of a data store is to be changed from one resiliency scheme to another resiliency scheme, where the data store is configured to store different portions of data.
  • the computer system determines how the specified portion of data within the data store is to be altered according to the change in resiliency scheme, and modifies the resiliency scheme of the specified portion of the data store, such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed.
  • FIG. 1 illustrates a computer architecture in which embodiments described herein may operate including modifying storage capacity within a data store.
  • FIG. 2 illustrates a flowchart of an example method for modifying storage capacity within a data store.
  • FIG. 3 illustrates a flowchart of an example method for modifying resiliency for at least a portion of a data store.
  • FIG. 4 illustrates an embodiment in which a resiliency scheme is modified for at least a portion of data.
  • FIG. 5 illustrates an embodiment in which storage capacity is added and data is rebalanced among remaining data storage.
  • FIG. 6 illustrates an embodiment in which storage capacity is removed and data is rebalanced among remaining data storage.
  • Embodiments described herein are directed to modifying storage capacity within a data store and to modifying resiliency for at least a portion of a data store.
  • a computer system receives a request to move data.
  • the request to move data may specify a data store to move the data off of, a data store to move the data to, or may allow the computer system to select where the data is moved from and/or moved to.
  • the computer system may determine that data is to be moved from an allocation on one data store to a new allocation on another data store.
  • the computer system may create a new allocation on the other data store, where the new allocation is configured to receive data from the first data store.
  • the computer system then moves the data to the new allocation on the second data store as data I/O requests are received at the first data store.
  • Data store access requests are synchronized with the data movement by directing the data store access requests to the first data store, to the second data store or to both data stores depending on the type of access request.
  • a computer system modifies resiliency for a data store.
  • the computer system determines that a resiliency scheme for at least part of a data store is to be changed from one resiliency scheme to another resiliency scheme, where the data store is configured to store different portions of data.
  • the computer system determines how the specified portion of data within the data store is to be altered according to the change in resiliency scheme, and modifies the resiliency scheme of the specified portion of the data store, such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed.
  • Embodiments described herein may implement various types of computing systems. These computing systems are now increasingly taking a wide variety of forms. Computing systems may, for example, be handheld devices such as smartphones or feature phones, appliances, laptop computers, wearable devices, desktop computers, mainframes, distributed computing systems, or even devices that have not conventionally been considered a computing system.
  • the term “computing system” is defined broadly as including any device or system (or combination thereof) that includes at least one physical and tangible processor, and a physical and tangible memory capable of having thereon computer-executable instructions that may be executed by the processor.
  • a computing system may be distributed over a network environment and may include multiple constituent computing systems.
  • a computing system 101 typically includes at least one processing unit 102 and memory 103 .
  • the memory 103 may be physical system memory, which may be volatile, non-volatile, or some combination of the two.
  • the term “memory” may also be used herein to refer to non-volatile mass storage such as physical storage media. If the computing system is distributed, the processing, memory and/or storage capability may be distributed as well.
  • executable module can refer to software objects, routines, or methods that may be executed on the computing system.
  • the different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads).
  • embodiments are described with reference to acts that are performed by one or more computing systems. If such acts are implemented in software, one or more processors of the associated computing system that performs the act direct the operation of the computing system in response to having executed computer-executable instructions.
  • such computer-executable instructions may be embodied on one or more computer-readable media that form a computer program product.
  • An example of such an operation involves the manipulation of data.
  • the computer-executable instructions (and the manipulated data) may be stored in the memory 103 of the computing system 101 .
  • Computing system 101 may also contain communication channels that allow the computing system 101 to communicate with other message processors over a wired or wireless network.
  • Embodiments described herein may comprise or utilize a special-purpose or general-purpose computer system that includes computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below.
  • the system memory may be included within the overall memory 103 .
  • the system memory may also be referred to as “main memory”, and includes memory locations that are addressable by the at least one processing unit 102 over a memory bus in which case the address location is asserted on the memory bus itself.
  • System memory has been traditionally volatile, but the principles described herein also apply in circumstances in which the system memory is partially, or even fully, non-volatile.
  • Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures.
  • Such computer-readable media can be any available media that can be accessed by a general-purpose or special-purpose computer system.
  • Computer-readable media that store computer-executable instructions and/or data structures are computer storage media.
  • Computer-readable media that carry computer-executable instructions and/or data structures are transmission media.
  • embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: computer storage media and transmission media.
  • Computer storage media are physical hardware storage media that store computer-executable instructions and/or data structures.
  • Physical hardware storage media include computer hardware, such as RAM, ROM, EEPROM, solid state drives (“SSDs”), flash memory, phase-change memory (“PCM”), optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage device(s) which can be used to store program code in the form of computer-executable instructions or data structures, which can be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention.
  • Transmission media can include a network and/or data links which can be used to carry program code in the form of computer-executable instructions or data structures, and which can be accessed by a general-purpose or special-purpose computer system.
  • a “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices.
  • program code in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (or vice versa).
  • program code in the form of computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer storage media at a computer system.
  • a network interface module e.g., a “NIC”
  • computer storage media can be included in computer system components that also (or even primarily) utilize transmission media.
  • Computer-executable instructions comprise, for example, instructions and data which, when executed at one or more processors, cause a general-purpose computer system, special-purpose computer system, or special-purpose processing device to perform a certain function or group of functions.
  • Computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
  • a computer system may include a plurality of constituent computer systems.
  • program modules may be located in both local and remote memory storage devices.
  • Cloud computing environments may be distributed, although this is not required. When distributed, cloud computing environments may be distributed internationally within an organization and/or have components possessed across multiple organizations.
  • cloud computing is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services). The definition of “cloud computing” is not limited to any of the other numerous advantages that can be obtained from such a model when properly deployed.
  • system architectures described herein can include a plurality of independent components that each contribute to the functionality of the system as a whole.
  • This modularity allows for increased flexibility when approaching issues of platform scalability and, to this end, provides a variety of advantages.
  • System complexity and growth can be managed more easily through the use of smaller-scale parts with limited functional scope.
  • Platform fault tolerance is enhanced through the use of these loosely coupled modules.
  • Individual components can be grown incrementally as business needs dictate. Modular development also translates to decreased time to market for new functionality. New functionality can be added or subtracted without impacting the core system.
  • FIG. 1 illustrates a computer architecture 100 in which at least one embodiment may be employed.
  • Computer architecture 100 includes computer system 101 .
  • Computer system 101 may be any type of local or distributed computer system, including a cloud computing system.
  • the computer system 101 includes modules for performing a variety of different functions.
  • the communications module 104 may be configured to communicate with other computing systems.
  • the communications module 104 may include any wired or wireless communication means that can receive and/or transmit data to or from other computing systems.
  • the communications module 104 may be configured to interact with databases, mobile computing devices (such as mobile phones or tablets), embedded or other types of computing systems.
  • the communications module 104 of computer system 101 may be further configured to receive requests to move data 105 . Such requests may be received from applications, from users or from other computer systems.
  • the request to move data 105 may be generated internally to computer system 101 , or may be received from a source external to computer system 101 .
  • the determining module 106 may determine, based on the received request to move data 105 , that data 113 is to be moved from a first data store 112 to a second data store 115 .
  • the data stores 112 and 113 may be local to or remote to computer system 101 .
  • the data stores may be single storage devices, arrays of storage devices or storage networks such as SANs or the cloud.
  • the data stores may store the data 113 according to resiliency schemes. These resiliency schemes may include data mirroring or parity schemes such as data striping, or any other type of resiliency scheme including the various redundant array of inexpensive disks (RAID) schemes.
  • the allocation creating module 107 of computer system 101 creates a new allocation 116 on the second data store 115 .
  • the data moving module 108 may then move the data 113 to the newly created allocation 116 on the second data store 115 .
  • the data stores 112 and 115 may be online data stores that are exposed to the internet. In such cases, data is moved between online databases or other data stores.
  • any data store access requests (such as a request to move data 105 ) may be synchronized with the data movement by directing the data store access requests to the first data store 112 , to the second data store 115 or to both data stores depending on the type of access request. This process will be described in greater detail below.
  • online data movement represents the process of moving allocations containing data from one data store (e.g. a set of hard drives or tape drives) to another. This migration of data takes place without disrupting the functionality or availability of the data store, and without reducing the number of failures that can be tolerated. Additionally, as part of this process, a new set of drives may be selected to transition the data storage space to a different fault domain (e.g. upgrading from being able to tolerate a single enclosure failure, to being able to tolerate a whole rack failure).
  • fault domain may refer to an enclosure (e.g.
  • the new set of drives may also increase the storage efficiency of the storage space (i.e. better utilize the drive's capacity), or improve the performance of the storage space (e.g. spread ‘hot’ data (i.e. data that is accessed frequently) across more drives).
  • Embodiments described herein allow data to be moved between data stores (online or otherwise) based on various criteria including user-defined criteria. Embodiments further provide the ability to selectively move data based on external input or other criteria (such as information about the heat of data), or internal heuristics (such as moving data away from the ends of hard drives to achieve short stroking and thus faster data access times).
  • external input or other criteria such as information about the heat of data
  • internal heuristics such as moving data away from the ends of hard drives to achieve short stroking and thus faster data access times.
  • Embodiments may further include increasing the number of copies in a mirror and converting a parity (RAID5/6) to parity with mirroring (RAID5/6+1) dynamically and sparsely (only on the sections that need to be moved), removing a disk from a RAID array by mirroring its contents across the remaining disks to avoid compromising integrity, moving data across fault domains to increase the resiliency of a RAID array to more than its initial creation (e.g. migrating an array that can lose an enclosure to one that can lose a rack), and converting a mirror space to a parity space in place (or vice-versa) without rewriting the data.
  • RAID5/6 parity with mirroring
  • RAID5/6+1 parity with mirroring
  • data migration is performed by temporarily converting simple and mirror spaces to mirrors with more copies.
  • RAID5+1 will include a standard parity layer, which has read, write, and reconstruct capabilities. Reads and writes to the underlying disks will be redirected through a mirror layer which has its own read, write, and reconstruct capabilities. To avoid unnecessary complexity in the parity layer, the mirroring laying will provide an aggregated view of all the copies holding each individual column.
  • a task may be used to create another allocation as the destination and temporarily increase the data store's number of copies. This allocation will begin life as stale (i.e. it needs to be reconstructed because it does not contain valid data), and will be picked up and transitioned to healthy by a reconstruction task. In this manner, data migration is performed at the granularity of allocation within a data store (instead performing it on every allocation in the data store).
  • Such embodiments offer advantages including, but not limited to, the following: 1) When migrating multiple copies of the same column, only one of the copies needs to be read and can be written to both of the destinations. 2) If a read fails during migration, but other copies of data 113 are available, they will be available to reconstruct from. 3) The ability to read from any copy of data to perform the movement will also increase the ability to parallelize migrations, especially when moving mirrors off of a disk.
  • data is migrated between data stores by migrating entire slabs (i.e. collections of allocations that form a resiliency level).
  • This process allocates a whole slab, or set of slabs, at the same offset of a current group of slabs.
  • These new allocations may be marked as a destination in an object pool configuration.
  • the slab size can change, as well as any other resiliency properties. If the source and destination configurations have different slab sizes, then the migration will be performed on the smallest size which may be divided by both slab sizes (i.e. the least common multiple).
  • a mirror object may be placed above the slabs, forwarding writes to both copies while a task (e.g. a reconstruction task) copies data from the old slab(s) to the new destination slab(s).
  • a task e.g. a reconstruction task
  • the old slabs will be discarded and the new slabs will come in as a separate storage tier (to represent any changes in resiliency).
  • a second child space may be allocated to replace the old one. This allows migration between any two resiliency configurations (resiliency type, slab size and fault tolerance can all change).
  • whole slabs are migrated with data overlap.
  • This is a variant to the embodiment described above, and would migrate at the slab level, but would not allow the size of a slab to change.
  • To stop the excessive movement of data only columns which are moving would be reallocated, the remaining columns would be “ghosted” or “no-oped” on the second (destination) slab. The columns would appear to be there, but writes to them would be blocked. This moves a minimal amount of data and allows upgrades including enabling resiliency changes.
  • individual columns may be migrated with RAID level migration.
  • This process may be implemented by two separate mechanisms which work together to provide an end-to-end solution.
  • the first process reallocates individual columns in place.
  • a task such as a pool transaction
  • Each source and destination are then combined into a mirror, with the destination being marked as ‘Needs Regeneration’ or an equivalent marking.
  • These mirrors are then surfaced to the slab as a single allocation, and the regeneration task copies the data from the source to destination.
  • a task deletes the old allocations and the mirror objects under the slab are replaced by the new allocations.
  • the second mechanism allows conversion between mirror and parity storage spaces.
  • the mirroring is separated from the striping by making a storage space with a mirror in place of each allocation.
  • the parity columns are then tacked onto the end and marked as needing regeneration. When this regeneration completes, a second pool transaction selects one copy from each of the mirrors and surfaces a parity slab.
  • the conversion from mirror to parity results in an enclosure- or rack-aware parity space, the enclosure-aware parity spaces having the correct on-disk format.
  • This process can also be reversed to convert back to a mirror and a similar process can convert between storage spaces such as 2-way mirrors and 3-way mirrors.
  • some data columns may need to be moved to guarantee the ability to tolerate higher fault domain failure(s) (as mirror has different allocation requirements than parity).
  • This migration may be performed as an intermediate step (after parity has been regenerated) to avoid placing the data store in a state of reduced resiliency. This allows fine grain control of which allocations move.
  • free space is only required on destination drives, and multiple slabs may be migrated in parallel.
  • FIG. 2 illustrates a flowchart of a method 200 for modifying storage capacity within a data store. The method 200 will now be described with frequent reference to the components and data of environment 100 .
  • Method 200 includes receiving a request to move one or more portions of data ( 210 ).
  • communications module 104 of computer system 101 may receive a request to move data 105 from a request source.
  • the request source may be an application, service, user or other computer system.
  • the request may specify that data 113 is to be moved from one data store 112 to another data store 115 , either or both of which may be online.
  • the data 113 may be individual files, collections of files, blobs of data or other allocations of data such as slabs, metadata or other types of data or collections of data.
  • the request 105 may specify the data store to move data off of (e.g. first data store 112 in FIG. 1 ), the data store to move data to (e.g. second data store 115 in FIG.
  • the request may simply indicate that a certain portion of data is to be moved.
  • the computer system 101 may determine which data stores have the specified data and may further determine which data store(s) the data is to be moved to.
  • the request 105 may include information about the data stores to aid the system in making the decision.
  • the request may include multiple data sources and multiple data targets.
  • Method 200 further includes determining that data is to be moved from the first data store to the second data store ( 220 ).
  • the determining module 106 of computer system 101 may determine, based on the request to move data 105 , that data 113 is to be moved from the first data store 112 to the second data store 115 . This determination may include determining which data or data stores are being most heavily utilized.
  • each data store may include a single storage device or multiple storage devices. In cases where a data store is an array of hard drives, some of the hard drives may be being used more than others.
  • the determining module 106 may identify which data (among data 113 ) can be moved, which data must move and where the data is to be moved to. In some cases, data cannot be moved and may be labeled “unmovable data.” If the data can move, the determining module 106 may determine the best location for that data.
  • a heat engine may be implemented which tracks all reads/writes to data in a given data store. Other factors may include heuristics (e.g. move data away from ends of drives to facilitate short trips for the hard drive data reading tip). Still other factors may include characteristics of the data store including favoring larger drives over smaller drives, favoring the outside of the drive platter as it is traveling faster and is capable of quicker reads and writes.
  • the determining module 106 may further be configured to identify where data I/O request bottlenecks are occurring.
  • the determining module may determine that existing data on those drives is to be moved to other drives to spread out the I/O requests 111 , or that the incoming I/O requests are to be redirected to other drives within the data store (e.g. by the data redirecting module 109 ) or to a different data store (e.g. the second data store 115 ).
  • Method 200 further includes creating a new allocation on the second data store, the new allocation being configured to receive at least a portion of data from the first data store ( 230 ), and moving the data to the new allocation on the second data store as data I/O requests are received at the first data store, wherein data store access requests are synchronized with data movement by directing the data store access requests to the first data store, the second data store or both data stores depending on the type of access request ( 240 ).
  • the allocation creating module 107 of computer system 101 may create new allocation 116 on the second data store 115 . This new allocation 116 may be configured to receive some or all of the data 113 that is moved from the first data store 112 to the second data store 115 .
  • the second data store 115 may include at least one hard drive.
  • the newly created allocation 16 on the second data store 115 may be located substantially near the beginning of the hard drive (i.e. near the outer edge of the hard drive). In this manner, data may be moved away from the ends of hard drives on the first data store and moved to the beginning of drives on the second data store 115 . This allows the data to be accessed more quickly.
  • Other optimizations may be used for other data storage devices such as tape drives or optical drives.
  • the second data store 115 may be configured to accept new data storage devices and/or new data storage media.
  • the second data store 115 may include data storage media that was added to the second data store.
  • This second data store may be located on a fault domain that is different from the fault domain of the first data store. For instance, if a fault domain is established for a given hardware storage rack (e.g. first data store 112 ), the storage media may be added to the second data store 115 which, at least in some embodiments, is in a different fault domain than the first data store.
  • the existing data may be rebalanced, based on what kind of hardware was added. Indeed, in some cases, entire racks may be added to existing data stores. In such cases, the existing data may be rebalanced among the hardware storage devices of the newly added rack.
  • each hard drive or tape drive or other type of block storage such as solid-state drives (SSDs), non-volatile memory express (NVMe), virtual hard disks (VHDs), etc. may be used to its fullest extent, even when other drives of larger or smaller capacity are present.
  • SSDs solid-state drives
  • NVMe non-volatile memory express
  • VHDs virtual hard disks
  • the data writes When data writes are received at the data store, the data writes may be sent to both the first and second data stores, and incoming data reads may be sent to the first data store until the data of the first data store is copied to the new allocation on the second data store. In this manner, consistency is maintained at the data stores, such that incoming writes can be sent to either data store, while data reads are sent to the older data until the data is fully copied over to the other (second) data store.
  • a disk array 501 is shown having two hard drives: HD 502 A and HD 502 B.
  • a new hard drive 502 C may be added to the disk array 501 during operations.
  • the data of the disk array is rebalanced using the new disk and any existing disks. The rebalancing may be performed without compromising any existing resiliency implementations on the disk array. For instance, if data mirroring has been implemented, the data in HD 502 A may be mirrored between previous disk 502 B and newly added disk 502 C.
  • the data may be distributed evenly among the disks of the array, or may be distributed in another manner, such as based on the heat of the data or the overall heat of the disk.
  • the disk array 501 may include substantially any number of disks, tape drives or other storage devices.
  • a mirroring resiliency scheme is implemented in FIGS. 5 and 6 , it should be noted that any RAID or other type of mirroring or parity resiliency scheme may be used.
  • FIG. 6 illustrates an embodiment where at least one hard disk is removed from a disk array 601 .
  • the disk array 601 may include hard drives HD 602 B, HD 602 C and HD 602 D.
  • Hard drive 602 C may be removed due to failure of the drive or for some other reason.
  • the disk array 601 now includes 602 A, 602 B and 602 D.
  • the data that was on drive 602 C is rebalanced among the remaining hard drives. As with the embodiment above where a hard drive was added to the disk array, the data may be rebalanced according to a variety of different factors, and does not need to be rebalanced evenly over the remaining hard drives.
  • disks may be removed from the array 601 without compromising existing resiliency implementations such as mirroring.
  • the data may be automatically and dynamically distributed among the remaining drives in a manner that does not degrading the resiliency of the disk array 601 .
  • the data may be rebalanced according to hot or cold data, such that the hot and cold data are distributed evenly among the remaining drives, or may be rebalanced to the beginning of each disk. Additionally or alternatively, data may be rebalanced according to the assigned importance of the data (i.e. the importance of the data may dictate the order in which the data is rebalanced).
  • data I/O collisions may be prevented during transition of the data 113 to the new allocation 116 by allowing a first user's data writes take priority over a second user's data writes or by allowing a user's data writes to take priority over a computing system's data writes, or vice versa.
  • the writes may be prioritized based on user or applications and processed in order of priority, such that I/O collisions are avoided.
  • any previously used allocations on the first data store may be deleted.
  • the allocations (whether existing or newly added) are implemented within the data store to logically define specified areas of storage. Each allocation identifies where the allocation is located within the data store, what data it contains and where its data is stored on different data storage devices.
  • the allocations may be stored in a mapping table. Whenever storage devices are added to a data store (such as disk array 501 / 601 above) or removed from a data store, the computing system 101 may access the mapping table to determine which allocations were stored on the added/removed storage devices. Then, the data stored on the added/removed drives is rebalanced to one or more other storage devices of the data store.
  • previously used allocations may include a pointer to the newly created allocation on the data store to which the data is being moved (i.e. the second data store 115 ), In this manner, if data is deleted during transition of the data from the first data store to the second data store, the newly created allocation is notified of the deletion, and resiliency is guaranteed throughout the transition.
  • FIG. 3 a flowchart is illustrated of a method 300 for modifying resiliency for at least a portion of a data store. The method 300 will now be described with frequent reference to the components and data of environment 100 .
  • Method 300 includes determining that a resiliency scheme for at least a specified portion of a data store is to be changed from a first resiliency scheme to a second, different resiliency scheme, the data store including one or more portions of data ( 310 ).
  • the determining module 106 of computer system 101 may determine that resiliency scheme 114 A for at least some data 113 on the first data store 112 is to be changed to a second resiliency scheme 114 B.
  • the resiliency schemes may include mirroring, parity or combinations thereof (including the various RAID implementations) or other resiliency schemes.
  • Method 300 next includes determining how the data within the specified portion of the data store is to be altered according to the change in resiliency scheme (act 320 ) and modifying the resiliency scheme of the specified portion of the data store, such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed ( 330 ).
  • the determining module 106 of computer system 101 may thus determine how the data 113 is to be altered according to the change in resiliency scheme (e.g. from mirroring to parity or from parity to mirror).
  • the modifying module 110 of computer system 101 may then modify the resiliency scheme for a certain portion of data, while leaving other portions of data untouched.
  • data store 401 has multiple different data portions ( 402 A, 402 B and 402 C). These data portions may each be different storage devices (e.g. hard disks) or may be logical portions of the same hard disk, or a combination of physical and logical data portions. Each data portion within the data store may have its own resiliency scheme: scheme 403 A for data portion 402 A, scheme 403 B for data portion 402 B, and scheme 403 C for data portion 402 C.
  • Embodiments herein may modify a portion of a data store (e.g. 402 B) and its resiliency scheme without modifying other portions of the data store or their resiliency schemes.
  • a new resiliency scheme 403 D may be implemented for that data portion without affecting any other data portions.
  • a storage device may be added to a data store. At least one portion of that storage device may be implementing an N-way mirror resiliency scheme. When the new device is added, an N+1-way mirroring scheme may be implemented for the data store, such that the data store data is split between two storage devices. The split need not be even, and may be balanced according to heuristics such as relative heat level. Still further, in some case, a storage device may be removed from a data store. The data that was stored on the removed data storage device may be rebalanced among the remaining storage devices, without rebalancing existing data on the remaining storage devices.
  • the granularity of the data store portions that are to be converted from one resiliency scheme to another may be set to an arbitrary value (1 GB) or may be substantially any size. In this manner, whole volumes or arrays need not be converted to change a resiliency scheme. Rather, embodiments herein may convert one section of an array or volume from mirroring to parity or vice versa, while leaving the rest of the volume or array alone. Then if user wants to remove one drive, the system can merely rebalance/realign the data on that drive or that portion of the data store.
  • methods, systems and computer program products which modify storage capacity within a data store. Moreover, methods, systems and computer program products are provided which modify resiliency for at least a portion of a data store.
  • a computer system including at least one processor.
  • a computer-implemented method for modifying storage capacity within a data store. The method includes receiving a request 105 to move one or more portions of data, determining that data 113 is to be moved from an allocation on a first data store 112 to a new allocation 116 on the second data store 115 , the first and second data stores being configured to store allocations of data, creating the new allocation 116 on the second data store 115 , the new allocation being configured to receive at least a portion of data 113 from the first data store 112 , and moving the data 113 to the new allocation 116 on the second data store 115 as data I/O requests 111 are received at the first data store, wherein data store access requests are synchronized with the data movement by directing the data store access requests to the first data store 112 , to the second data store 115 or to both data stores depending on the type of access request.
  • determining that data is to be moved from the first data store to the second data store comprises determining which data or data stores are being most heavily utilized.
  • the second data store comprises at least one hard drive, and wherein the new allocation on the second data store is located nearer to the beginning of the second data store than the allocation on the first data store.
  • the second data store comprises a data storage media that was added to the computing system, the second data store being located on a fault domain that is different from the fault domain of the first data store.
  • the fault domain comprises a hardware storage rack, such that the second data store comprises data storage media that was added to hardware storage rack that is different from the hardware storage rack of the first data store.
  • a computer system including at least one processor.
  • a computer-implemented method for modifying resiliency for at least a portion of a data store. The method includes determining that a resiliency scheme 114 A for at least a specified portion of a data store 112 is to be changed from a first resiliency scheme 114 A to a second, different resiliency scheme 114 B, the data store including one or more portions of data 113 , determining how the data 113 within the specified portion of the data store 112 is to be altered according to the change in resiliency scheme, and modifying the resiliency scheme 114 A of the specified portion of the data store 112 , such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed.
  • Some embodiments further include adding a storage device to the data store, wherein the specified portion of the data store is implementing an N-way mirror resiliency scheme and implementing an N+1-way mirroring scheme for the data store, wherein the data store data is split between two storage devices.
  • Other embodiments further include removing a storage device from the data store and rebalancing the data that was stored on the removed data storage device among the remaining storage devices, without rebalancing existing data on the remaining storage devices.
  • a computer system comprising the following: one or more processors, a receiver 104 for receiving a request 105 to move one or more portions of data off of a first data store 112 and on to a second data store 115 , a determining module 106 for identifying which data 113 is to be moved from the first data store to the second data store, an allocation creating module 107 for creating a new allocation 116 on the second data store 115 , the new allocation being configured to receive at least a portion of data 113 from the first data store 112 and a data moving module 108 for moving the data 113 to the new allocation 116 on the second data store 115 as data I/O requests 111 are received at the first data store, such that data writes are sent to both the first and second data stores, and data reads are sent to the first data store 112 until the data 113 of the first data store is copied to the new allocation 116 on the second data store 115 .
  • Some embodiments further include removing at least one storage device from the data store, accessing the mapping table to determine which allocations were stored on the removed storage devices and rebalancing the data of the allocations stored on the removed drive to one or more other storage devices of the data store.
  • the second data store comprises a plurality of block storage devices, at least two of which are of different capacity.
  • Some embodiments further include adding at least one hard disk to the plurality of block storage devices in the second data store and rebalancing at least a portion of data stored on the first data store among the newly added hard drive and at least one of the existing plurality of hard disks, the rebalancing being performed without compromising existing resiliency implementations on the second data store.
  • Some embodiments further include removing at least one hard disk from the plurality of hard disks in the first data store and rebalancing at least a portion of data stored on the first data store among the remaining hard disks of the plurality of hard disks, the rebalancing being performed without compromising existing resiliency implementations on the second data store.
  • data I/O collisions are prevented during transition of the data to the new allocation by allowing a user's data writes take priority over the computing system's data writes.
  • the previously used allocation includes a pointer to the newly created allocation on the second data store, such that if data is deleted during transition of the data from the first data store to the second data store, the newly created allocation is notified of the deletion.

Abstract

Embodiments are directed to modifying storage capacity within a data store and to modifying resiliency for a data store. In one scenario, a computer system receives a request to move data. The computer system may determine that data is to be moved from an allocation on one data store to a new allocation on another data store. The computer system may create a new allocation on the other data store, where the new allocation is configured to receive data from the first data store. The computer system then moves the data to the new allocation on the second data store as data I/O requests are received at the first data store. Data store access requests are synchronized with the data movement by directing the data store access requests to the first data store, to the second data store or to both data stores depending on the type of access request.

Description

    BACKGROUND
  • Computing systems have become ubiquitous, ranging from small embedded devices to phones and tablets to PCs and backend servers. Each of these computing systems includes some type of data storage and typically, many different types of data storage. For example, a computing system may include solid-state storage and a hard drive or set of hard drives. The solid-state storage may be able to handle read and write I/O requests more quickly than the hard drive, but may not have the storage capacity of the hard drive. Other media such as tape drives, DVDs (or other optical media) or other kinds of media may have different advantages and disadvantages when reading, writing and storing data.
  • BRIEF SUMMARY
  • Embodiments described herein are directed to modifying storage capacity within a data store and to modifying resiliency for at least a portion of a data store. In one embodiment, a computer system receives a request to move data. The request to move data may specify a data store to move the data off of, a data store to move the data to, or may allow the computer system to select where the data is moved from and/or moved to. The computer system may determine that data is to be moved from an allocation on one data store to a new allocation on another data store. The computer system may create a new allocation on the other data store, where the new allocation is configured to receive data from the first data store. The computer system then moves the data to the new allocation on the second data store as data I/O requests are received at the first data store. Data store access requests are synchronized with the data movement by directing the data store access requests to the first data store, to the second data store or to both data stores depending on the type of access request.
  • In another embodiment, a computer system modifies resiliency for a data store. The computer system determines that a resiliency scheme for at least part of a data store is to be changed from one resiliency scheme to another resiliency scheme, where the data store is configured to store different portions of data. The computer system determines how the specified portion of data within the data store is to be altered according to the change in resiliency scheme, and modifies the resiliency scheme of the specified portion of the data store, such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed.
  • This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
  • Additional features and advantages will be set forth in the description which follows, and in part will be apparent to one of ordinary skill in the art from the description, or may be learned by the practice of the teachings herein. Features and advantages of embodiments described herein may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the embodiments described herein will become more fully apparent from the following description and appended claims.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • To further clarify the above and other features of the embodiments described herein, a more particular description will be rendered by reference to the appended drawings. It is appreciated that these drawings depict only examples of the embodiments described herein and are therefore not to be considered limiting of its scope. The embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
  • FIG. 1 illustrates a computer architecture in which embodiments described herein may operate including modifying storage capacity within a data store.
  • FIG. 2 illustrates a flowchart of an example method for modifying storage capacity within a data store.
  • FIG. 3 illustrates a flowchart of an example method for modifying resiliency for at least a portion of a data store.
  • FIG. 4 illustrates an embodiment in which a resiliency scheme is modified for at least a portion of data.
  • FIG. 5 illustrates an embodiment in which storage capacity is added and data is rebalanced among remaining data storage.
  • FIG. 6 illustrates an embodiment in which storage capacity is removed and data is rebalanced among remaining data storage.
  • DETAILED DESCRIPTION
  • Embodiments described herein are directed to modifying storage capacity within a data store and to modifying resiliency for at least a portion of a data store. In one embodiment, a computer system receives a request to move data. The request to move data may specify a data store to move the data off of, a data store to move the data to, or may allow the computer system to select where the data is moved from and/or moved to. The computer system may determine that data is to be moved from an allocation on one data store to a new allocation on another data store. The computer system may create a new allocation on the other data store, where the new allocation is configured to receive data from the first data store. The computer system then moves the data to the new allocation on the second data store as data I/O requests are received at the first data store. Data store access requests are synchronized with the data movement by directing the data store access requests to the first data store, to the second data store or to both data stores depending on the type of access request.
  • In another embodiment, a computer system modifies resiliency for a data store. The computer system determines that a resiliency scheme for at least part of a data store is to be changed from one resiliency scheme to another resiliency scheme, where the data store is configured to store different portions of data. The computer system determines how the specified portion of data within the data store is to be altered according to the change in resiliency scheme, and modifies the resiliency scheme of the specified portion of the data store, such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed.
  • The following discussion now refers to a number of methods and method acts that may be performed. It should be noted, that although the method acts may be discussed in a certain order or illustrated in a flow chart as occurring in a particular order, no particular ordering is necessarily required unless specifically stated, or required because an act is dependent on another act being completed prior to the act being performed.
  • Embodiments described herein may implement various types of computing systems. These computing systems are now increasingly taking a wide variety of forms. Computing systems may, for example, be handheld devices such as smartphones or feature phones, appliances, laptop computers, wearable devices, desktop computers, mainframes, distributed computing systems, or even devices that have not conventionally been considered a computing system. In this description and in the claims, the term “computing system” is defined broadly as including any device or system (or combination thereof) that includes at least one physical and tangible processor, and a physical and tangible memory capable of having thereon computer-executable instructions that may be executed by the processor. A computing system may be distributed over a network environment and may include multiple constituent computing systems.
  • As illustrated in FIG. 1, a computing system 101 typically includes at least one processing unit 102 and memory 103. The memory 103 may be physical system memory, which may be volatile, non-volatile, or some combination of the two. The term “memory” may also be used herein to refer to non-volatile mass storage such as physical storage media. If the computing system is distributed, the processing, memory and/or storage capability may be distributed as well.
  • As used herein, the term “executable module” or “executable component” can refer to software objects, routines, or methods that may be executed on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads).
  • In the description that follows, embodiments are described with reference to acts that are performed by one or more computing systems. If such acts are implemented in software, one or more processors of the associated computing system that performs the act direct the operation of the computing system in response to having executed computer-executable instructions. For example, such computer-executable instructions may be embodied on one or more computer-readable media that form a computer program product. An example of such an operation involves the manipulation of data. The computer-executable instructions (and the manipulated data) may be stored in the memory 103 of the computing system 101. Computing system 101 may also contain communication channels that allow the computing system 101 to communicate with other message processors over a wired or wireless network.
  • Embodiments described herein may comprise or utilize a special-purpose or general-purpose computer system that includes computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. The system memory may be included within the overall memory 103. The system memory may also be referred to as “main memory”, and includes memory locations that are addressable by the at least one processing unit 102 over a memory bus in which case the address location is asserted on the memory bus itself. System memory has been traditionally volatile, but the principles described herein also apply in circumstances in which the system memory is partially, or even fully, non-volatile.
  • Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general-purpose or special-purpose computer system. Computer-readable media that store computer-executable instructions and/or data structures are computer storage media. Computer-readable media that carry computer-executable instructions and/or data structures are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: computer storage media and transmission media.
  • Computer storage media are physical hardware storage media that store computer-executable instructions and/or data structures. Physical hardware storage media include computer hardware, such as RAM, ROM, EEPROM, solid state drives (“SSDs”), flash memory, phase-change memory (“PCM”), optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage device(s) which can be used to store program code in the form of computer-executable instructions or data structures, which can be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention.
  • Transmission media can include a network and/or data links which can be used to carry program code in the form of computer-executable instructions or data structures, and which can be accessed by a general-purpose or special-purpose computer system. A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer system, the computer system may view the connection as transmission media. Combinations of the above should also be included within the scope of computer-readable media.
  • Further, upon reaching various computer system components, program code in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer storage media at a computer system. Thus, it should be understood that computer storage media can be included in computer system components that also (or even primarily) utilize transmission media.
  • Computer-executable instructions comprise, for example, instructions and data which, when executed at one or more processors, cause a general-purpose computer system, special-purpose computer system, or special-purpose processing device to perform a certain function or group of functions. Computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
  • Those skilled in the art will appreciate that the principles described herein may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like. The invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. As such, in a distributed system environment, a computer system may include a plurality of constituent computer systems. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
  • Those skilled in the art will also appreciate that the invention may be practiced in a cloud computing environment. Cloud computing environments may be distributed, although this is not required. When distributed, cloud computing environments may be distributed internationally within an organization and/or have components possessed across multiple organizations. In this description and the following claims, “cloud computing” is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services). The definition of “cloud computing” is not limited to any of the other numerous advantages that can be obtained from such a model when properly deployed.
  • Still further, system architectures described herein can include a plurality of independent components that each contribute to the functionality of the system as a whole. This modularity allows for increased flexibility when approaching issues of platform scalability and, to this end, provides a variety of advantages. System complexity and growth can be managed more easily through the use of smaller-scale parts with limited functional scope. Platform fault tolerance is enhanced through the use of these loosely coupled modules. Individual components can be grown incrementally as business needs dictate. Modular development also translates to decreased time to market for new functionality. New functionality can be added or subtracted without impacting the core system.
  • FIG. 1 illustrates a computer architecture 100 in which at least one embodiment may be employed. Computer architecture 100 includes computer system 101. Computer system 101 may be any type of local or distributed computer system, including a cloud computing system. The computer system 101 includes modules for performing a variety of different functions. For instance, the communications module 104 may be configured to communicate with other computing systems. The communications module 104 may include any wired or wireless communication means that can receive and/or transmit data to or from other computing systems. The communications module 104 may be configured to interact with databases, mobile computing devices (such as mobile phones or tablets), embedded or other types of computing systems.
  • The communications module 104 of computer system 101 may be further configured to receive requests to move data 105. Such requests may be received from applications, from users or from other computer systems. The request to move data 105 may be generated internally to computer system 101, or may be received from a source external to computer system 101. The determining module 106 may determine, based on the received request to move data 105, that data 113 is to be moved from a first data store 112 to a second data store 115. The data stores 112 and 113 may be local to or remote to computer system 101. The data stores may be single storage devices, arrays of storage devices or storage networks such as SANs or the cloud. The data stores may store the data 113 according to resiliency schemes. These resiliency schemes may include data mirroring or parity schemes such as data striping, or any other type of resiliency scheme including the various redundant array of inexpensive disks (RAID) schemes.
  • In response to the determination that data 113 is to be moved from the first data store 112 to the second data store 115, the allocation creating module 107 of computer system 101 creates a new allocation 116 on the second data store 115. The data moving module 108 may then move the data 113 to the newly created allocation 116 on the second data store 115. In some embodiments, the data stores 112 and 115 may be online data stores that are exposed to the internet. In such cases, data is moved between online databases or other data stores. During this process, any data store access requests (such as a request to move data 105) may be synchronized with the data movement by directing the data store access requests to the first data store 112, to the second data store 115 or to both data stores depending on the type of access request. This process will be described in greater detail below.
  • As the term is used herein, “online data movement” represents the process of moving allocations containing data from one data store (e.g. a set of hard drives or tape drives) to another. This migration of data takes place without disrupting the functionality or availability of the data store, and without reducing the number of failures that can be tolerated. Additionally, as part of this process, a new set of drives may be selected to transition the data storage space to a different fault domain (e.g. upgrading from being able to tolerate a single enclosure failure, to being able to tolerate a whole rack failure). As used herein, the term “fault domain” may refer to an enclosure (e.g. just a bunch of disks or JBOD), a computer (node), a collection of nodes grouped by a common physical element (e.g. all the blade servers in an enclosure, all the nodes in a rack, or all the nodes behind a specific network switch), or a collection of nodes grouped by a logical element (e.g. an upgrade domain which includes nodes that will be brought down together for servicing). The new set of drives may also increase the storage efficiency of the storage space (i.e. better utilize the drive's capacity), or improve the performance of the storage space (e.g. spread ‘hot’ data (i.e. data that is accessed frequently) across more drives).
  • Large scale deployments frequently add and remove hardware as requirements grow and old hardware goes out of warranty. Moreover, workloads may grow and change over time, requiring storage that can adapt to these changes by allowing data to migrate away from drives that have reached their end of life, migrate onto new hardware, and shift around to better utilize the available bandwidth and capacity based on the workload. This is done in real time without compromising the integrity or resiliency of data.
  • In traditional scenarios, data can be shifted as drives are added or removed; however, the data is typically required to be spread across all drives in the system equally. For example, many RAID cards support increasing the drives in an array by increasing the columns of the RAID volume). Also, previous solutions would compromise the integrity of the data in order to perform movement (e.g. treating a disk to remove data from as failed).
  • Embodiments described herein allow data to be moved between data stores (online or otherwise) based on various criteria including user-defined criteria. Embodiments further provide the ability to selectively move data based on external input or other criteria (such as information about the heat of data), or internal heuristics (such as moving data away from the ends of hard drives to achieve short stroking and thus faster data access times). Embodiments may further include increasing the number of copies in a mirror and converting a parity (RAID5/6) to parity with mirroring (RAID5/6+1) dynamically and sparsely (only on the sections that need to be moved), removing a disk from a RAID array by mirroring its contents across the remaining disks to avoid compromising integrity, moving data across fault domains to increase the resiliency of a RAID array to more than its initial creation (e.g. migrating an array that can lose an enclosure to one that can lose a rack), and converting a mirror space to a parity space in place (or vice-versa) without rewriting the data.
  • In some embodiments, data migration is performed by temporarily converting simple and mirror spaces to mirrors with more copies. For this approach to work on parity, the concept of a RAID5+1 will be described. As the term is used herein, RAID5+1 will include a standard parity layer, which has read, write, and reconstruct capabilities. Reads and writes to the underlying disks will be redirected through a mirror layer which has its own read, write, and reconstruct capabilities. To avoid unnecessary complexity in the parity layer, the mirroring laying will provide an aggregated view of all the copies holding each individual column.
  • When a data migration is to be performed, a task may be used to create another allocation as the destination and temporarily increase the data store's number of copies. This allocation will begin life as stale (i.e. it needs to be reconstructed because it does not contain valid data), and will be picked up and transitioned to healthy by a reconstruction task. In this manner, data migration is performed at the granularity of allocation within a data store (instead performing it on every allocation in the data store). Such embodiments offer advantages including, but not limited to, the following: 1) When migrating multiple copies of the same column, only one of the copies needs to be read and can be written to both of the destinations. 2) If a read fails during migration, but other copies of data 113 are available, they will be available to reconstruct from. 3) The ability to read from any copy of data to perform the movement will also increase the ability to parallelize migrations, especially when moving mirrors off of a disk.
  • In another embodiment, data is migrated between data stores by migrating entire slabs (i.e. collections of allocations that form a resiliency level). This process allocates a whole slab, or set of slabs, at the same offset of a current group of slabs. These new allocations may be marked as a destination in an object pool configuration. By allowing sets of slabs to be migrated, the slab size can change, as well as any other resiliency properties. If the source and destination configurations have different slab sizes, then the migration will be performed on the smallest size which may be divided by both slab sizes (i.e. the least common multiple).
  • Following the reallocation, a mirror object may be placed above the slabs, forwarding writes to both copies while a task (e.g. a reconstruction task) copies data from the old slab(s) to the new destination slab(s). When this task completes, the old slabs will be discarded and the new slabs will come in as a separate storage tier (to represent any changes in resiliency). If the resiliency type of the destination implements a write-back cache, then a second child space may be allocated to replace the old one. This allows migration between any two resiliency configurations (resiliency type, slab size and fault tolerance can all change).
  • In another embodiment, whole slabs are migrated with data overlap. This is a variant to the embodiment described above, and would migrate at the slab level, but would not allow the size of a slab to change. To stop the excessive movement of data, only columns which are moving would be reallocated, the remaining columns would be “ghosted” or “no-oped” on the second (destination) slab. The columns would appear to be there, but writes to them would be blocked. This moves a minimal amount of data and allows upgrades including enabling resiliency changes.
  • In yet another embodiment, individual columns may be migrated with RAID level migration. This process may be implemented by two separate mechanisms which work together to provide an end-to-end solution. The first process reallocates individual columns in place. First, a task (such as a pool transaction) creates new allocations and pairs them with sources that are to be moved. Each source and destination are then combined into a mirror, with the destination being marked as ‘Needs Regeneration’ or an equivalent marking. These mirrors are then surfaced to the slab as a single allocation, and the regeneration task copies the data from the source to destination. Upon completion, a task deletes the old allocations and the mirror objects under the slab are replaced by the new allocations. The second mechanism allows conversion between mirror and parity storage spaces. First, the mirroring is separated from the striping by making a storage space with a mirror in place of each allocation. The parity columns are then tacked onto the end and marked as needing regeneration. When this regeneration completes, a second pool transaction selects one copy from each of the mirrors and surfaces a parity slab.
  • The conversion from mirror to parity results in an enclosure- or rack-aware parity space, the enclosure-aware parity spaces having the correct on-disk format. This process can also be reversed to convert back to a mirror and a similar process can convert between storage spaces such as 2-way mirrors and 3-way mirrors. During this conversion, some data columns may need to be moved to guarantee the ability to tolerate higher fault domain failure(s) (as mirror has different allocation requirements than parity). This migration may be performed as an intermediate step (after parity has been regenerated) to avoid placing the data store in a state of reduced resiliency. This allows fine grain control of which allocations move. Moreover, free space is only required on destination drives, and multiple slabs may be migrated in parallel. These concepts will be explained further below with regard to methods 200 and 300 of FIGS. 2 and 3, respectively.
  • In view of the systems and architectures described above, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to the flow charts of FIGS. 2 and 3. For purposes of simplicity of explanation, the methodologies are shown and described as a series of blocks. However, it should be understood and appreciated that the claimed subject matter is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement the methodologies described hereinafter.
  • FIG. 2 illustrates a flowchart of a method 200 for modifying storage capacity within a data store. The method 200 will now be described with frequent reference to the components and data of environment 100.
  • Method 200 includes receiving a request to move one or more portions of data (210). For example, communications module 104 of computer system 101 may receive a request to move data 105 from a request source. The request source may be an application, service, user or other computer system. The request may specify that data 113 is to be moved from one data store 112 to another data store 115, either or both of which may be online. The data 113 may be individual files, collections of files, blobs of data or other allocations of data such as slabs, metadata or other types of data or collections of data. The request 105 may specify the data store to move data off of (e.g. first data store 112 in FIG. 1), the data store to move data to (e.g. second data store 115 in FIG. 1), or neither (i.e. the request may simply indicate that a certain portion of data is to be moved. If no data store is specified, the computer system 101 may determine which data stores have the specified data and may further determine which data store(s) the data is to be moved to. In such cases, the request 105 may include information about the data stores to aid the system in making the decision. The request may include multiple data sources and multiple data targets.
  • Method 200 further includes determining that data is to be moved from the first data store to the second data store (220). The determining module 106 of computer system 101 may determine, based on the request to move data 105, that data 113 is to be moved from the first data store 112 to the second data store 115. This determination may include determining which data or data stores are being most heavily utilized. As mentioned above, each data store may include a single storage device or multiple storage devices. In cases where a data store is an array of hard drives, some of the hard drives may be being used more than others. Those drives that are constantly being written to may be said to be “hot” or including “hot data”, whereas drives that are not being written to as often are “cold” or include a greater portion of “cold data.” The determining module 106 may identify which data (among data 113) can be moved, which data must move and where the data is to be moved to. In some cases, data cannot be moved and may be labeled “unmovable data.” If the data can move, the determining module 106 may determine the best location for that data.
  • These determinations may be made based on various factors including external component input. For example, a heat engine may be implemented which tracks all reads/writes to data in a given data store. Other factors may include heuristics (e.g. move data away from ends of drives to facilitate short trips for the hard drive data reading tip). Still other factors may include characteristics of the data store including favoring larger drives over smaller drives, favoring the outside of the drive platter as it is traveling faster and is capable of quicker reads and writes. The determining module 106 may further be configured to identify where data I/O request bottlenecks are occurring. For example, if multiple applications are trying to write data to a single hard drive or a set of hard drives within the first data store, and the high volume of data writes to those drives is causing an I/O bottleneck, the determining module may determine that existing data on those drives is to be moved to other drives to spread out the I/O requests 111, or that the incoming I/O requests are to be redirected to other drives within the data store (e.g. by the data redirecting module 109) or to a different data store (e.g. the second data store 115).
  • Method 200 further includes creating a new allocation on the second data store, the new allocation being configured to receive at least a portion of data from the first data store (230), and moving the data to the new allocation on the second data store as data I/O requests are received at the first data store, wherein data store access requests are synchronized with data movement by directing the data store access requests to the first data store, the second data store or both data stores depending on the type of access request (240). The allocation creating module 107 of computer system 101 may create new allocation 116 on the second data store 115. This new allocation 116 may be configured to receive some or all of the data 113 that is moved from the first data store 112 to the second data store 115.
  • In some embodiments, the second data store 115 may include at least one hard drive. In such cases, the newly created allocation 16 on the second data store 115 may be located substantially near the beginning of the hard drive (i.e. near the outer edge of the hard drive). In this manner, data may be moved away from the ends of hard drives on the first data store and moved to the beginning of drives on the second data store 115. This allows the data to be accessed more quickly. Other optimizations may be used for other data storage devices such as tape drives or optical drives.
  • The second data store 115 may be configured to accept new data storage devices and/or new data storage media. In some embodiments, the second data store 115 may include data storage media that was added to the second data store. This second data store may be located on a fault domain that is different from the fault domain of the first data store. For instance, if a fault domain is established for a given hardware storage rack (e.g. first data store 112), the storage media may be added to the second data store 115 which, at least in some embodiments, is in a different fault domain than the first data store. When new media is added, the existing data may be rebalanced, based on what kind of hardware was added. Indeed, in some cases, entire racks may be added to existing data stores. In such cases, the existing data may be rebalanced among the hardware storage devices of the newly added rack.
  • When the rebalancing occurs, the data is not necessarily distributed evenly among the different drives. For instance, when hard drives are added to a data store, some of those hard drives may be different capacity drives. In such cases, the full capacity of each hard disk may be assigned to and be accessible by the second data store. Accordingly, each hard drive or tape drive or other type of block storage such as solid-state drives (SSDs), non-volatile memory express (NVMe), virtual hard disks (VHDs), etc. may be used to its fullest extent, even when other drives of larger or smaller capacity are present. When data writes are received at the data store, the data writes may be sent to both the first and second data stores, and incoming data reads may be sent to the first data store until the data of the first data store is copied to the new allocation on the second data store. In this manner, consistency is maintained at the data stores, such that incoming writes can be sent to either data store, while data reads are sent to the older data until the data is fully copied over to the other (second) data store.
  • In FIG. 5, a disk array 501 is shown having two hard drives: HD 502A and HD 502B. A new hard drive 502C may be added to the disk array 501 during operations. When the new disk 502C is added, the data of the disk array is rebalanced using the new disk and any existing disks. The rebalancing may be performed without compromising any existing resiliency implementations on the disk array. For instance, if data mirroring has been implemented, the data in HD 502A may be mirrored between previous disk 502B and newly added disk 502C. The data may be distributed evenly among the disks of the array, or may be distributed in another manner, such as based on the heat of the data or the overall heat of the disk. Here, it should be noted that while two or three disks are shown in FIG. 5, the disk array 501, or either of the data stores in FIGS. 1 (112 & 115), may include substantially any number of disks, tape drives or other storage devices. Moreover, while a mirroring resiliency scheme is implemented in FIGS. 5 and 6, it should be noted that any RAID or other type of mirroring or parity resiliency scheme may be used.
  • FIG. 6 illustrates an embodiment where at least one hard disk is removed from a disk array 601. The disk array 601 may include hard drives HD 602B, HD 602C and HD 602D. Hard drive 602C may be removed due to failure of the drive or for some other reason. The disk array 601 now includes 602A, 602B and 602D. The data that was on drive 602C is rebalanced among the remaining hard drives. As with the embodiment above where a hard drive was added to the disk array, the data may be rebalanced according to a variety of different factors, and does not need to be rebalanced evenly over the remaining hard drives. Furthermore, as with the above example, disks may be removed from the array 601 without compromising existing resiliency implementations such as mirroring. The data may be automatically and dynamically distributed among the remaining drives in a manner that does not degrading the resiliency of the disk array 601. The data may be rebalanced according to hot or cold data, such that the hot and cold data are distributed evenly among the remaining drives, or may be rebalanced to the beginning of each disk. Additionally or alternatively, data may be rebalanced according to the assigned importance of the data (i.e. the importance of the data may dictate the order in which the data is rebalanced).
  • Returning to FIG. 1, in some embodiments, data I/O collisions may be prevented during transition of the data 113 to the new allocation 116 by allowing a first user's data writes take priority over a second user's data writes or by allowing a user's data writes to take priority over a computing system's data writes, or vice versa. As such, when writes are coming in from multiple different users or applications, the writes may be prioritized based on user or applications and processed in order of priority, such that I/O collisions are avoided. When data has been successfully moved to a new data store (or to a new allocation), any previously used allocations on the first data store may be deleted.
  • The allocations (whether existing or newly added) are implemented within the data store to logically define specified areas of storage. Each allocation identifies where the allocation is located within the data store, what data it contains and where its data is stored on different data storage devices. The allocations may be stored in a mapping table. Whenever storage devices are added to a data store (such as disk array 501/601 above) or removed from a data store, the computing system 101 may access the mapping table to determine which allocations were stored on the added/removed storage devices. Then, the data stored on the added/removed drives is rebalanced to one or more other storage devices of the data store. In some cases, previously used allocations may include a pointer to the newly created allocation on the data store to which the data is being moved (i.e. the second data store 115), In this manner, if data is deleted during transition of the data from the first data store to the second data store, the newly created allocation is notified of the deletion, and resiliency is guaranteed throughout the transition.
  • Turning now to FIG. 3, a flowchart is illustrated of a method 300 for modifying resiliency for at least a portion of a data store. The method 300 will now be described with frequent reference to the components and data of environment 100.
  • Method 300 includes determining that a resiliency scheme for at least a specified portion of a data store is to be changed from a first resiliency scheme to a second, different resiliency scheme, the data store including one or more portions of data (310). For example, the determining module 106 of computer system 101 may determine that resiliency scheme 114A for at least some data 113 on the first data store 112 is to be changed to a second resiliency scheme 114B. As mentioned above, the resiliency schemes may include mirroring, parity or combinations thereof (including the various RAID implementations) or other resiliency schemes.
  • Method 300 next includes determining how the data within the specified portion of the data store is to be altered according to the change in resiliency scheme (act 320) and modifying the resiliency scheme of the specified portion of the data store, such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed (330). The determining module 106 of computer system 101 may thus determine how the data 113 is to be altered according to the change in resiliency scheme (e.g. from mirroring to parity or from parity to mirror). The modifying module 110 of computer system 101 may then modify the resiliency scheme for a certain portion of data, while leaving other portions of data untouched.
  • Thus, for example, as shown in FIG. 4, data store 401 has multiple different data portions (402A, 402B and 402C). These data portions may each be different storage devices (e.g. hard disks) or may be logical portions of the same hard disk, or a combination of physical and logical data portions. Each data portion within the data store may have its own resiliency scheme: scheme 403A for data portion 402A, scheme 403B for data portion 402B, and scheme 403C for data portion 402C. Embodiments herein may modify a portion of a data store (e.g. 402B) and its resiliency scheme without modifying other portions of the data store or their resiliency schemes. Thus, when modifications 404 are made to the data store portion 402B, a new resiliency scheme 403D may be implemented for that data portion without affecting any other data portions.
  • In some cases, a storage device may be added to a data store. At least one portion of that storage device may be implementing an N-way mirror resiliency scheme. When the new device is added, an N+1-way mirroring scheme may be implemented for the data store, such that the data store data is split between two storage devices. The split need not be even, and may be balanced according to heuristics such as relative heat level. Still further, in some case, a storage device may be removed from a data store. The data that was stored on the removed data storage device may be rebalanced among the remaining storage devices, without rebalancing existing data on the remaining storage devices. The granularity of the data store portions that are to be converted from one resiliency scheme to another may be set to an arbitrary value (1 GB) or may be substantially any size. In this manner, whole volumes or arrays need not be converted to change a resiliency scheme. Rather, embodiments herein may convert one section of an array or volume from mirroring to parity or vice versa, while leaving the rest of the volume or array alone. Then if user wants to remove one drive, the system can merely rebalance/realign the data on that drive or that portion of the data store.
  • Accordingly, methods, systems and computer program products are provided which modify storage capacity within a data store. Moreover, methods, systems and computer program products are provided which modify resiliency for at least a portion of a data store.
  • Claim Support
  • A computer system is provided including at least one processor. At the computer system, a computer-implemented method is provided for modifying storage capacity within a data store. The method includes receiving a request 105 to move one or more portions of data, determining that data 113 is to be moved from an allocation on a first data store 112 to a new allocation 116 on the second data store 115, the first and second data stores being configured to store allocations of data, creating the new allocation 116 on the second data store 115, the new allocation being configured to receive at least a portion of data 113 from the first data store 112, and moving the data 113 to the new allocation 116 on the second data store 115 as data I/O requests 111 are received at the first data store, wherein data store access requests are synchronized with the data movement by directing the data store access requests to the first data store 112, to the second data store 115 or to both data stores depending on the type of access request.
  • In some embodiments, determining that data is to be moved from the first data store to the second data store comprises determining which data or data stores are being most heavily utilized. In some embodiments, the second data store comprises at least one hard drive, and wherein the new allocation on the second data store is located nearer to the beginning of the second data store than the allocation on the first data store. In some embodiments, the second data store comprises a data storage media that was added to the computing system, the second data store being located on a fault domain that is different from the fault domain of the first data store. In some embodiments, the fault domain comprises a hardware storage rack, such that the second data store comprises data storage media that was added to hardware storage rack that is different from the hardware storage rack of the first data store.
  • A computer system is provided including at least one processor. At the computer system, a computer-implemented method is provided for modifying resiliency for at least a portion of a data store. The method includes determining that a resiliency scheme 114A for at least a specified portion of a data store 112 is to be changed from a first resiliency scheme 114A to a second, different resiliency scheme 114B, the data store including one or more portions of data 113, determining how the data 113 within the specified portion of the data store 112 is to be altered according to the change in resiliency scheme, and modifying the resiliency scheme 114A of the specified portion of the data store 112, such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed.
  • Some embodiments further include adding a storage device to the data store, wherein the specified portion of the data store is implementing an N-way mirror resiliency scheme and implementing an N+1-way mirroring scheme for the data store, wherein the data store data is split between two storage devices. Other embodiments further include removing a storage device from the data store and rebalancing the data that was stored on the removed data storage device among the remaining storage devices, without rebalancing existing data on the remaining storage devices.
  • A computer system comprising the following: one or more processors, a receiver 104 for receiving a request 105 to move one or more portions of data off of a first data store 112 and on to a second data store 115, a determining module 106 for identifying which data 113 is to be moved from the first data store to the second data store, an allocation creating module 107 for creating a new allocation 116 on the second data store 115, the new allocation being configured to receive at least a portion of data 113 from the first data store 112 and a data moving module 108 for moving the data 113 to the new allocation 116 on the second data store 115 as data I/O requests 111 are received at the first data store, such that data writes are sent to both the first and second data stores, and data reads are sent to the first data store 112 until the data 113 of the first data store is copied to the new allocation 116 on the second data store 115.
  • Some embodiments further include removing at least one storage device from the data store, accessing the mapping table to determine which allocations were stored on the removed storage devices and rebalancing the data of the allocations stored on the removed drive to one or more other storage devices of the data store. In some embodiments, the second data store comprises a plurality of block storage devices, at least two of which are of different capacity. Some embodiments further include adding at least one hard disk to the plurality of block storage devices in the second data store and rebalancing at least a portion of data stored on the first data store among the newly added hard drive and at least one of the existing plurality of hard disks, the rebalancing being performed without compromising existing resiliency implementations on the second data store.
  • Some embodiments further include removing at least one hard disk from the plurality of hard disks in the first data store and rebalancing at least a portion of data stored on the first data store among the remaining hard disks of the plurality of hard disks, the rebalancing being performed without compromising existing resiliency implementations on the second data store. In some embodiments, data I/O collisions are prevented during transition of the data to the new allocation by allowing a user's data writes take priority over the computing system's data writes. In some embodiments, the previously used allocation includes a pointer to the newly created allocation on the second data store, such that if data is deleted during transition of the data from the first data store to the second data store, the newly created allocation is notified of the deletion.
  • The concepts and features described herein may be embodied in other specific forms without departing from their spirit or descriptive characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the disclosure is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (20)

We claim:
1. At a computer system including at least one processor, a computer-implemented method for modifying storage capacity within a data store, the method comprising:
receiving a request to move one or more portions of data;
determining that data is to be moved from an allocation on a first data store to a new allocation on the second data store, the first and second data stores being configured to store allocations of data;
creating the new allocation on the second data store, the new allocation being configured to receive at least a portion of data from the first data store; and
moving the data to the new allocation on the second data store as data I/O requests are received at the first data store, wherein data store access requests are synchronized with the data movement by directing the data store access requests to the first data store, to the second data store or to both data stores depending on the type of access request.
2. The method of claim 1, wherein determining that data is to be moved from the first data store to the second data store comprises determining which data or data stores are being most heavily utilized.
3. The method of claim 1, wherein determining that data is to be moved from the first data store to the second data store further comprises determining which data among the stored data is moveable.
4. The method of claim 1, wherein the second data store comprises at least one hard drive, and wherein the new allocation on the second data store is located nearer to the beginning of the second data store than the allocation on the first data store.
5. The method of claim 1, wherein the second data store comprises a data storage media that was added to the computing system, the second data store being located on a fault domain that is different from the fault domain of the first data store.
6. The method of claim 5, wherein the fault domain comprises a hardware storage rack, such that the second data store comprises data storage media that was added to hardware storage rack that is different from the hardware storage rack of the first data store.
7. The method of claim 1, wherein the second data store comprises a plurality of block storage devices, at least two of which are of different capacity.
8. The method of claim 7, further comprising:
adding at least one hard disk to the plurality of block storage devices in the second data store; and
rebalancing at least a portion of data stored on the first data store among the newly added hard drive and at least one of the existing plurality of hard disks, the rebalancing being performed without compromising existing resiliency implementations on the second data store.
9. The method of claim 7, further comprising:
removing at least one hard disk from the plurality of hard disks in the first data store; and
rebalancing at least a portion of data stored on the first data store among the remaining hard disks of the plurality of hard disks, the rebalancing being performed without compromising existing resiliency implementations on the second data store.
10. The method of claim 1, wherein data I/O collisions are prevented during transition of the data to the new allocation by allowing a user's data writes take priority over the computing system's data writes.
11. The method of claim 1, further comprising deleting one or more previously used allocations on the first data store upon determining that the data contained in the allocation has been moved to the second data store.
12. The method of claim 11, wherein the previously used allocation includes a pointer to the newly created allocation on the second data store, such that if data is deleted during transition of the data from the first data store to the second data store, the newly created allocation is notified of the deletion.
13. At a computer system including at least one processor, a computer-implemented method for modifying resiliency for at least a portion of a data store, the method comprising:
determining that a resiliency scheme for at least a specified portion of a data store is to be changed from a first resiliency scheme to a second, different resiliency scheme, the data store including one or more portions of data;
determining how the data within the specified portion of the data store is to be altered according to the change in resiliency scheme; and
modifying the resiliency scheme of the specified portion of the data store, such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed.
14. The method of claim 13, wherein the resiliency scheme for the specified portion of the data store is changed from mirror to parity or from parity to mirror.
15. The method of claim 13, further comprising:
adding a storage device to the data store, wherein the specified portion of the data store is implementing an N-way mirror resiliency scheme; and
implementing an N+1-way mirroring scheme for the data store, wherein the data store data is split between two storage devices.
16. The method of claim 13, further comprising:
removing a storage device from the data store; and
rebalancing the data that was stored on the removed data storage device among the remaining storage devices, without rebalancing existing data on the remaining storage devices.
17. The method of claim 16, wherein allocations are implemented within the data store to logically define specified areas of storage, each allocation identifying where the allocation is located within the data store, what data it contains and where its data is stored on one or more different data storage devices.
18. A computer system comprising the following:
one or more processors;
one or more computer-readable storage media having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to perform a method for modifying storage capacity within a data store, the method comprising the following:
receiving a request to move one or more portions of data off of a first data store and on to a second data store;
identifying which data is to be moved from the first data store to the second data store;
creating a new allocation on the second data store, the new allocation being configured to receive at least a portion of data from the first data store; and
moving the data to the new allocation on the second data store as data I/O requests are received at the first data store, such that data writes are sent to both the first and second data stores, and data reads are sent to the first data store until the data of the first data store is copied to the new allocation on the second data store.
19. The computer system of claim 18, wherein allocations are implemented within the data store to logically define specified areas of storage, each allocation identifying where the allocation is located within the data store, what data it contains and where its data is stored on one or more different data storage devices, the allocations being stored in a mapping table.
20. The computer system of claim 19, further comprising:
removing at least one storage device from the data store;
accessing the mapping table to determine which allocations were stored on the removed storage devices; and
rebalancing the data of the allocations stored on the removed drive to one or more other storage devices of the data store.
US14/486,198 2014-09-15 2014-09-15 Online data movement without compromising data integrity Abandoned US20160080490A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US14/486,198 US20160080490A1 (en) 2014-09-15 2014-09-15 Online data movement without compromising data integrity
CN201580049784.XA CN106687911B (en) 2014-09-15 2015-09-14 Online data movement without compromising data integrity
EP15775033.2A EP3195103A1 (en) 2014-09-15 2015-09-14 Online data movement without compromising data integrity
PCT/US2015/049873 WO2016044111A1 (en) 2014-09-15 2015-09-14 Online data movement without compromising data integrity
US15/645,515 US10178174B2 (en) 2014-09-15 2017-07-10 Migrating data in response to changes in hardware or workloads at a data store

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/486,198 US20160080490A1 (en) 2014-09-15 2014-09-15 Online data movement without compromising data integrity

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/645,515 Continuation US10178174B2 (en) 2014-09-15 2017-07-10 Migrating data in response to changes in hardware or workloads at a data store

Publications (1)

Publication Number Publication Date
US20160080490A1 true US20160080490A1 (en) 2016-03-17

Family

ID=54251728

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/486,198 Abandoned US20160080490A1 (en) 2014-09-15 2014-09-15 Online data movement without compromising data integrity
US15/645,515 Active 2034-09-17 US10178174B2 (en) 2014-09-15 2017-07-10 Migrating data in response to changes in hardware or workloads at a data store

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/645,515 Active 2034-09-17 US10178174B2 (en) 2014-09-15 2017-07-10 Migrating data in response to changes in hardware or workloads at a data store

Country Status (4)

Country Link
US (2) US20160080490A1 (en)
EP (1) EP3195103A1 (en)
CN (1) CN106687911B (en)
WO (1) WO2016044111A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160086654A1 (en) * 2014-09-21 2016-03-24 Advanced Micro Devices, Inc. Thermal aware data placement and compute dispatch in a memory system
CN106027653A (en) * 2016-05-23 2016-10-12 华中科技大学 Multi-cloud storage system expansion method based on RAID4 (Redundant Array of Independent Disks)
US20160373382A1 (en) * 2015-06-19 2016-12-22 Whatsapp Inc. Techniques to replicate data using uploads from messaging clients
US20170123718A1 (en) * 2015-11-03 2017-05-04 Samsung Electronics Co., Ltd. Coordinated garbage collection of flash devices in a distributed storage system
WO2018001200A1 (en) * 2016-06-29 2018-01-04 中兴通讯股份有限公司 Data processing method, cluster manager, resource manager and data processing system
WO2018038811A1 (en) * 2016-08-25 2018-03-01 Pure Storage, Inc. Migrating data in a storage array that includes a plurality of storage devices
US10552090B2 (en) 2017-09-07 2020-02-04 Pure Storage, Inc. Solid state drives with multiple types of addressable memory
US10862952B1 (en) * 2017-03-30 2020-12-08 Amazon Technologies, Inc. Migration of operational computing hardware to a data center
US11507278B2 (en) * 2018-10-25 2022-11-22 EMC IP Holding Company LLC Proactive copy in a storage environment
US11544187B2 (en) 2015-11-03 2023-01-03 Samsung Electronics Co., Ltd. IO redirection methods with cost estimation
US11592991B2 (en) 2017-09-07 2023-02-28 Pure Storage, Inc. Converting raid data between persistent storage types
US11593036B2 (en) * 2017-06-12 2023-02-28 Pure Storage, Inc. Staging data within a unified storage element
US11609718B1 (en) 2017-06-12 2023-03-21 Pure Storage, Inc. Identifying valid data after a storage system recovery
US11960777B2 (en) 2023-02-27 2024-04-16 Pure Storage, Inc. Utilizing multiple redundancy schemes within a unified storage element

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11385792B2 (en) * 2018-04-27 2022-07-12 Pure Storage, Inc. High availability controller pair transitioning
CN111381770B (en) * 2018-12-30 2021-07-06 浙江宇视科技有限公司 Data storage switching method, device, equipment and storage medium
CN110334079B (en) * 2019-06-21 2024-03-15 腾讯科技(深圳)有限公司 Data migration method and device
CN112835533B (en) * 2021-02-25 2023-02-17 上海交通大学 Cloud storage array expansion method and device based on rack level

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030115439A1 (en) * 2001-12-19 2003-06-19 Hewlett Packard Company Updating references to a migrated object in a partition-based distributed file system
US20050091078A1 (en) * 2000-10-24 2005-04-28 Microsoft Corporation System and method for distributed management of shared computers
US20080109601A1 (en) * 2006-05-24 2008-05-08 Klemm Michael J System and method for raid management, reallocation, and restriping
US20090222631A1 (en) * 2008-02-29 2009-09-03 Hitachi, Ltd. Storage system and data migration method
US20110191537A1 (en) * 2009-10-09 2011-08-04 Hitachi, Ltd. Storage controller and virtual volume control method
US20110246922A1 (en) * 2010-03-31 2011-10-06 Microsoft Corporation Enhanced Virtualization System

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5392244A (en) 1993-08-19 1995-02-21 Hewlett-Packard Company Memory systems with data storage redundancy management
US5542065A (en) 1995-02-10 1996-07-30 Hewlett-Packard Company Methods for using non-contiguously reserved storage space for data migration in a redundant hierarchic data storage system
US5875456A (en) 1995-08-17 1999-02-23 Nstor Corporation Storage device array and methods for striping and unstriping data and for adding and removing disks online to/from a raid storage array
US5680640A (en) 1995-09-01 1997-10-21 Emc Corporation System for migrating data by selecting a first or second transfer means based on the status of a data element map initialized to a predetermined state
US6640291B2 (en) 2001-08-10 2003-10-28 Hitachi, Ltd. Apparatus and method for online data migration with remote copy
US20060059306A1 (en) 2004-09-14 2006-03-16 Charlie Tseng Apparatus, system, and method for integrity-assured online raid set expansion
JP2006164162A (en) * 2004-12-10 2006-06-22 Fujitsu Ltd Copy control device and method
US7343467B2 (en) 2004-12-20 2008-03-11 Emc Corporation Method to perform parallel data migration in a clustered storage environment
US7281104B1 (en) 2005-03-21 2007-10-09 Acronis Inc. System and method for online data migration
US7958303B2 (en) * 2007-04-27 2011-06-07 Gary Stephen Shuster Flexible data storage system
US8341459B2 (en) * 2007-08-01 2012-12-25 Brocade Communications Systems, Inc. Data migration without interrupting host access and with data lock for write access requests such that held write access requests do not expire
TWI346944B (en) 2007-12-31 2011-08-11 Qnap Systems Inc Method of raid level migration and system for the same
US8409031B2 (en) * 2008-01-17 2013-04-02 Nike, Inc. Golf clubs and golf club heads with adjustable center of gravity and moment of inertia characteristics
CN102761566B (en) * 2011-04-26 2015-09-23 国际商业机器公司 The method and apparatus of migration virtual machine
CN103473335B (en) * 2013-09-18 2016-08-17 浪潮(北京)电子信息产业有限公司 A kind of hot spot data detection method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050091078A1 (en) * 2000-10-24 2005-04-28 Microsoft Corporation System and method for distributed management of shared computers
US20030115439A1 (en) * 2001-12-19 2003-06-19 Hewlett Packard Company Updating references to a migrated object in a partition-based distributed file system
US20080109601A1 (en) * 2006-05-24 2008-05-08 Klemm Michael J System and method for raid management, reallocation, and restriping
US20090222631A1 (en) * 2008-02-29 2009-09-03 Hitachi, Ltd. Storage system and data migration method
US20110191537A1 (en) * 2009-10-09 2011-08-04 Hitachi, Ltd. Storage controller and virtual volume control method
US20110246922A1 (en) * 2010-03-31 2011-10-06 Microsoft Corporation Enhanced Virtualization System

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160086654A1 (en) * 2014-09-21 2016-03-24 Advanced Micro Devices, Inc. Thermal aware data placement and compute dispatch in a memory system
US9947386B2 (en) * 2014-09-21 2018-04-17 Advanced Micro Devices, Inc. Thermal aware data placement and compute dispatch in a memory system
US9948580B2 (en) * 2015-06-19 2018-04-17 Whatsapp Inc. Techniques to replicate data using uploads from messaging clients
US20160373382A1 (en) * 2015-06-19 2016-12-22 Whatsapp Inc. Techniques to replicate data using uploads from messaging clients
US20170123718A1 (en) * 2015-11-03 2017-05-04 Samsung Electronics Co., Ltd. Coordinated garbage collection of flash devices in a distributed storage system
US11544187B2 (en) 2015-11-03 2023-01-03 Samsung Electronics Co., Ltd. IO redirection methods with cost estimation
US10254998B2 (en) * 2015-11-03 2019-04-09 Samsung Electronics Co., Ltd. Coordinated garbage collection of flash devices in a distributed storage system
CN106027653A (en) * 2016-05-23 2016-10-12 华中科技大学 Multi-cloud storage system expansion method based on RAID4 (Redundant Array of Independent Disks)
WO2018001200A1 (en) * 2016-06-29 2018-01-04 中兴通讯股份有限公司 Data processing method, cluster manager, resource manager and data processing system
CN107547606A (en) * 2016-06-29 2018-01-05 中兴通讯股份有限公司 Data processing method, cluster manager dual system, explorer, data handling system
WO2018038811A1 (en) * 2016-08-25 2018-03-01 Pure Storage, Inc. Migrating data in a storage array that includes a plurality of storage devices
US11630585B1 (en) 2016-08-25 2023-04-18 Pure Storage, Inc. Processing evacuation events in a storage array that includes a plurality of storage devices
US10862952B1 (en) * 2017-03-30 2020-12-08 Amazon Technologies, Inc. Migration of operational computing hardware to a data center
US11593036B2 (en) * 2017-06-12 2023-02-28 Pure Storage, Inc. Staging data within a unified storage element
US11609718B1 (en) 2017-06-12 2023-03-21 Pure Storage, Inc. Identifying valid data after a storage system recovery
US10552090B2 (en) 2017-09-07 2020-02-04 Pure Storage, Inc. Solid state drives with multiple types of addressable memory
US11592991B2 (en) 2017-09-07 2023-02-28 Pure Storage, Inc. Converting raid data between persistent storage types
US11507278B2 (en) * 2018-10-25 2022-11-22 EMC IP Holding Company LLC Proactive copy in a storage environment
US11960777B2 (en) 2023-02-27 2024-04-16 Pure Storage, Inc. Utilizing multiple redundancy schemes within a unified storage element

Also Published As

Publication number Publication date
CN106687911B (en) 2020-04-10
EP3195103A1 (en) 2017-07-26
CN106687911A8 (en) 2017-07-14
CN106687911A (en) 2017-05-17
US10178174B2 (en) 2019-01-08
US20170310757A1 (en) 2017-10-26
WO2016044111A1 (en) 2016-03-24

Similar Documents

Publication Publication Date Title
US10178174B2 (en) Migrating data in response to changes in hardware or workloads at a data store
US11687423B2 (en) Prioritizing highly performant storage systems for servicing a synchronously replicated dataset
US9836419B2 (en) Efficient data movement within file system volumes
US20220035714A1 (en) Managing Disaster Recovery To Cloud Computing Environment
US10915813B2 (en) Search acceleration for artificial intelligence
US20220217049A1 (en) Path Management For Container Clusters That Access Persistent Storage
US10895995B2 (en) Capacity based load balancing in distributed storage systems with deduplication and compression functionalities
US10521151B1 (en) Determining effective space utilization in a storage system
US11579790B1 (en) Servicing input/output (‘I/O’) operations during data migration
US20230080046A1 (en) Online Resize of a Volume of a Distributed Storage System
US10613755B1 (en) Efficient repurposing of application data in storage environments
US20130297871A1 (en) Systems, Methods, And Computer Program Products Providing Read Access In A Storage System
US20240004570A1 (en) Storage cluster data structure expansion
US11947968B2 (en) Efficient use of zone in a storage device
US20220358019A1 (en) Initiating Recovery Actions When A Dataset Ceases To Be Synchronously Replicated Across A Set Of Storage Systems
US11175999B2 (en) Management of backup volume extents via a tiered storage mechanism
Azagury et al. GPFS-based implementation of a hyperconverged system for software defined infrastructure
US10628379B1 (en) Efficient local data protection of application data in storage environments
US11966841B2 (en) Search acceleration for artificial intelligence
Herodotou Towards a distributed multi-tier file system for cluster computing
US20240069729A1 (en) Optimizing Data Deletion in a Storage System
US20230325094A1 (en) Calculating Storage Consumption In A Storage-As-A-Service Model
US20240069781A1 (en) Optimizing Data Deletion Settings in a Storage System
Chandrashekhara et al. Cider: A Case for Block Level Variable Redundancy on a Distributed Flash Array
CN117616378A (en) Efficient writing of data in a partitioned drive storage system

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VERMA, SURENDRA;PALEOLOGU, EMANUEL;HORTSCH, ERIK GREGORY;AND OTHERS;SIGNING DATES FROM 20140912 TO 20140916;REEL/FRAME:033751/0237

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034747/0417

Effective date: 20141014

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:039025/0454

Effective date: 20141014

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE