US6651147B2 - Data placement and allocation using virtual contiguity - Google Patents
Data placement and allocation using virtual contiguity Download PDFInfo
- Publication number
- US6651147B2 US6651147B2 US09/850,824 US85082401A US6651147B2 US 6651147 B2 US6651147 B2 US 6651147B2 US 85082401 A US85082401 A US 85082401A US 6651147 B2 US6651147 B2 US 6651147B2
- Authority
- US
- United States
- Prior art keywords
- data
- region
- storage medium
- minimum desired
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
Definitions
- the present invention relates to allocating electronic storage space for data storage.
- Data is placed on storage media, such as disk drives, in physical regions referred to as “blocks”.
- the storage system such as a database or file system, essentially divides an object such as a file to be stored into block-sized portions, and then places them in the block-size regions of the disk drive. When a portion of a file is smaller than a block, the entire block nonetheless is allocated. The unused space in the block is referred to as “internal fragmentation”.
- the present invention understands that a disk drive can read and deliver a single large read or write (say, 256 Kbytes long) in less time than it takes to perform two very much smaller operations. This is because the read head, which must move along the disk from one operation to the next, can be physically moved only so fast, with even a small movement generally consuming a large amount of time as compared to the time required to execute a single long read or write. For this reason, larger block sizes are advantageous. Competing with this, however, is the fact that smaller block sizes result in less internal fragmentation, i.e., less wasted space on the disk drive. That is, as block size increases, so does internal fragmentation.
- the present invention critically observes that sequentially allocating free blocks on a disk can result in later versions of file blocks being stored apart from the remainder of the file, particularly in cases of point-in-time snapshots. Moreover, the present invention recognizes that large file blocks are desirable from a performance view, and that apart from the block size, it is desirable to minimize I/O operations, i.e., head movement. Accordingly, the present invention recognizes a need to provide a system in file blocks associated with a single file are generally stored contiguous to each other, even if written to disk at different times, which could otherwise result in data “sparseness”. The present invention also recognizes a need to minimize I/O operations when reading or writing file blocks when the blocks might not be exactly contiguous with each other. The solutions below have been provided to address one or more of the above-recognized needs.
- a general purpose computer is programmed according to the inventive steps herein to allocate storage space in a storage system.
- the invention can also be embodied as an article of manufacture—a machine component—that is used by a digital processing apparatus and which tangibly embodies a program of instructions that are executable by the digital processing apparatus to execute the present logic.
- This invention is realized in a critical machine component that causes a digital processing apparatus to perform the inventive method steps herein.
- the invention can be implemented by a computer system including a data storage system having and a processor associated with a data storage system.
- the processor has logic for receiving a data object for storage, and randomly determining an original position on the storage medium to which to write the object. Updated blocks of the object are written in the original position or as close thereto as possible pursuant to, e.g., a copy-on-write operation of a point-in-time snapshot, or when data is written through multiple allocations.
- chaff can be discarded by a file subsystem, or prior to delivery to the subsystem by an input/output processor using a bit mask provided by the file subsystem.
- the logic can determine a minimum desired data region density on the storage medium. Data is written into the region at least until the minimum desired data region density for the region is reached.
- the minimum desired data region density can be determined dynamically during system operation.
- a computer program device in another aspect, includes a computer program storage device that is readable by a processor.
- a program is on the program storage device and includes instructions which can be executed by the processor for allocating space in a data storage system.
- the program includes computer readable code means for randomly determining, for each object, a start offset on a storage medium associated with the storage system.
- Computer readable code means write each object starting at its respective randomly determined start offset.
- a computer-implemented method for transferring data objects to and from a storage medium includes writing a first data object to a region of a data storage medium. A portion of a second object can be physically juxtaposed between two portions of the first object. The It first object is read along with the interposed portions of the second objects, and then subsequently the read portions of the second object are discarded.
- FIG. 1 is a schematic diagram showing the system of the present invention
- FIG. 2 is a schematic diagram illustrating logical-to-physical address mapping
- FIG. 3 is a flow chart showing the present logic
- FIG. 4 is a schematic diagram illustrating logical-to-physical address mapping after a copy-on-write for a point-in-time snapshot in accordance with the present invention.
- FIGS. 5-8 are flow charts showing respective file region allocation policies.
- a system for storing blocks of data on a data storage medium 12 .
- an input/output (I/O) processor 14 is associated with the system 10 to control data flow to the storage medium 12
- a data storage system processor 16 interacts with the I/O processor (or indeed can be implemented by the I/O processor) to send and receive data from the storage 12 .
- Either one or both of the processors 14 , 16 can access an allocation module 18 to undertake the logic set forth herein.
- a memory 20 can be provided in the system 10 .
- the system 10 can be a file system, database system, or other system that must allocate space for variable-sized data objects.
- the processor or processors (computers) of the present invention may be personal computers made by International Business Machines Corporation (IBM) of Armonk, N.Y., or any computers, including computers sold under trademarks such as AS400, with accompanying IBM Network Stations.
- IBM International Business Machines Corporation
- AS400 IBM Network Stations
- the flow charts herein illustrate the structure of the logic embodied by the module 18 and executed by the processor 14 and/or processor 16 as embodied in computer program software.
- the flow charts illustrate the structures of logic elements, such as computer program code elements or electronic logic circuits, that function according to this invention.
- the invention is practiced in its essential embodiment by a machine component that renders the logic elements in a form that instructs a digital processing apparatus (that is, a computer) to perform a sequence of function steps corresponding to those shown.
- the flow charts may be embodied in a computer program that is executed by a processor as a series of computer-executable instructions. These instructions may reside, for example, in a program storage device of the system 10 .
- the program storage device may be RAM, or a magnetic or optical disk or diskette, DASD array, magnetic tape, electronic read-only memory, or other appropriate data storage device.
- the computer-executable instructions may be lines of compiled C ++ compatible code.
- mapping of logical file blocks to physical disk blocks is illustrated.
- a sequence of logical blocks 22 in an object such as a file are mapped to physical locations 24 on a disk.
- the order of the logical blocks 22 need not be maintained to in physical storage, as envisioned herein.
- the physical locations 24 are more or less contiguous with each other, with an occasional location not devoted to the file (e.g., locations P 1 and P 9 ) being interspersed with locations that are associated with the file. How the present invention deals with this is discussed further below.
- FIG. 3 shows the logic of present invention. While for ease of disclosure the discussion refers to “files” and “file systems”, it is to be understood that the principles herein apply to any objects that are stored in random access storage systems, including database systems.
- each file to be stored is written to a random physical location on the disk. More specifically, a start offset on the disk is determined randomly for each file to be written. A forward search from the start offset is then undertaken to find a contiguous set of physical blocks that can store the entire file. In contrast to prior systems that systematically consume contiguous blocks on the disk, this randomness promotes uniform storage density across the disk.
- FIG. 4 illustrates this.
- the logical blocks L 1 , L 2 , L 3 , and L 4 of a file to be read have been stored in respective physical locations P 1 , P 4 , P 2 , and P 6 .
- Physical locations P 3 and P 5 are empty or dedicated to files other than the one being read. In any case, physical locations P 3 and P 5 are “chaff”. Nonetheless, to optimize performance by minimizing I/O operations, the entire physical range of the file, from P 1 to P 6 , is read. The “chaff” data in locations P 3 and P 5 is later discarded, by the I/O processor 14 or file system processor 16 .
- Implementing the discarding of chaff in the I/O subsystem advantageously minimizes data transferred between the I/O subsystem and file subsystem.
- the file subsystem must instruct the I/O subsystem as to what is chaff.
- One way to accomplish this is for the file system processor 16 to construct a bit mask using the allocated region of the file's start offset and size, with masked bits representing chaff to be discarded by the I/O processor 14 .
- FIGS. 5-8 show various allocation policies that can be determined at block 24 .
- multiple blocks of a file can be grouped into larger fixed size blocks that have sizes which are integrals of the page size of the system memory 20 .
- each large block's worth of small blocks is packed into a disk region. This policy has the advantage of maintaining spatial locality.
- a minimum region density is determined and then at block 38 blocks from multiple successive files being written to disk are packed into regions at least until the minimum density is reached, at which time another region is selected.
- the density can be determined once (statically and a priori) or dynamically, using disk space utilization information or other criteria. This policy essentially varies the “large block size” of the policy shown in FIG. 5 .
- FIG. 7 shows yet another policy wherein all blocks of a file are placed in the same region at block 40 .
- multiple head movements are used in such a way as to optimize the cost of reading data based on disk drive parameters. This policy simply acknowledges that under some circumstances, multiple head movements might be desired.
- FIG. 8 shows that at block 44 , the blocks of original versions of files are always initially allocated to physically contiguous disk blocks.
- decision diamond 46 the above-described virtual contiguity—writing updated block versions contiguous to or as close as possible to the original file—is implemented at block 48 .
- the policies above can be used singly or in combination with each other. In one preferred embodiment, the policies shown in FIGS. 6 and 8 are used together.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (21)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US09/850,824 US6651147B2 (en) | 2001-05-08 | 2001-05-08 | Data placement and allocation using virtual contiguity |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US09/850,824 US6651147B2 (en) | 2001-05-08 | 2001-05-08 | Data placement and allocation using virtual contiguity |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20020169932A1 US20020169932A1 (en) | 2002-11-14 |
| US6651147B2 true US6651147B2 (en) | 2003-11-18 |
Family
ID=25309201
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US09/850,824 Expired - Fee Related US6651147B2 (en) | 2001-05-08 | 2001-05-08 | Data placement and allocation using virtual contiguity |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US6651147B2 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050210218A1 (en) * | 2004-01-22 | 2005-09-22 | Tquist, Llc, | Method and apparatus for improving update performance of non-uniform access time persistent storage media |
| US20080162780A1 (en) * | 2006-12-19 | 2008-07-03 | Nobuaki Kohinata | Information terminal apparatus |
| US8370301B1 (en) * | 2003-03-21 | 2013-02-05 | Netapp, Inc. | System and method for reallocating blocks in checkpointing bitmap-based file systems |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7747660B1 (en) * | 2003-03-24 | 2010-06-29 | Symantec Operating Corporation | Method and system of providing access to a virtual storage device |
| US20060174074A1 (en) * | 2005-02-03 | 2006-08-03 | International Business Machines Corporation | Point-in-time copy operation |
| KR100883651B1 (en) * | 2006-05-18 | 2009-02-18 | 삼성전자주식회사 | Method and device to allocate space on disk to store files |
| KR100794312B1 (en) * | 2006-12-27 | 2008-01-11 | 삼성전자주식회사 | A memory controller including an instruction automatic processing unit and a memory system including the same |
| JP2011515727A (en) * | 2008-02-12 | 2011-05-19 | ネットアップ,インコーポレイテッド | Hybrid media storage system architecture |
| US10013166B2 (en) | 2012-12-20 | 2018-07-03 | Amazon Technologies, Inc. | Virtual tape library system |
| US20140181396A1 (en) * | 2012-12-20 | 2014-06-26 | Amazon Technologies, Inc. | Virtual tape using a logical data container |
| US9354813B1 (en) * | 2012-12-28 | 2016-05-31 | Emc Corporation | Data storage system modeling |
| CN104050200B (en) * | 2013-03-15 | 2017-12-08 | 伊姆西公司 | Method and apparatus for data copy |
| EP3203386A4 (en) | 2014-12-27 | 2017-12-27 | Huawei Technologies Co. Ltd. | Data processing method, apparatus and system |
| US10204044B2 (en) * | 2016-05-18 | 2019-02-12 | Sap Se | Memory management process using data sheet |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5897662A (en) * | 1995-08-18 | 1999-04-27 | International Business Machines Corporation | Pseudo-random address generation mechanism that reduces address translation time |
| US6341341B1 (en) * | 1999-12-16 | 2002-01-22 | Adaptec, Inc. | System and method for disk control with snapshot feature including read-write snapshot half |
| US6381677B1 (en) * | 1998-08-19 | 2002-04-30 | International Business Machines Corporation | Method and system for staging data into cache |
| US6427184B1 (en) * | 1997-06-03 | 2002-07-30 | Nec Corporation | Disk drive with prefetch and writeback algorithm for sequential and nearly sequential input/output streams |
-
2001
- 2001-05-08 US US09/850,824 patent/US6651147B2/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5897662A (en) * | 1995-08-18 | 1999-04-27 | International Business Machines Corporation | Pseudo-random address generation mechanism that reduces address translation time |
| US6427184B1 (en) * | 1997-06-03 | 2002-07-30 | Nec Corporation | Disk drive with prefetch and writeback algorithm for sequential and nearly sequential input/output streams |
| US6381677B1 (en) * | 1998-08-19 | 2002-04-30 | International Business Machines Corporation | Method and system for staging data into cache |
| US6341341B1 (en) * | 1999-12-16 | 2002-01-22 | Adaptec, Inc. | System and method for disk control with snapshot feature including read-write snapshot half |
Non-Patent Citations (4)
| Title |
|---|
| Albers et al, "Average-Case Analysis of First Fit and Random Fit Bin Packing", Proc of the 9th annual ACM-SIAM symposium on Discrete Algorithms, Jan. 1998.* * |
| Insession Technologies, "A Review of Technical Features in AutoDBA", 2001 http://www.insession.com/autoDBA/autodba_wp.pdf. * |
| Maymournkov, "Divergence-proving Techniques for Best Fit Bin Packing and Random Fit," Senior Thesis Harvard College, May 7, 2001.* * |
| Pechura et al. "Estimating file access time of floppy disks", Computing practices Communications of the ACM v.26 n.10 Oct. 1983.* * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8370301B1 (en) * | 2003-03-21 | 2013-02-05 | Netapp, Inc. | System and method for reallocating blocks in checkpointing bitmap-based file systems |
| US20050210218A1 (en) * | 2004-01-22 | 2005-09-22 | Tquist, Llc, | Method and apparatus for improving update performance of non-uniform access time persistent storage media |
| US7328307B2 (en) * | 2004-01-22 | 2008-02-05 | Tquist, Llc | Method and apparatus for improving update performance of non-uniform access time persistent storage media |
| US20080162780A1 (en) * | 2006-12-19 | 2008-07-03 | Nobuaki Kohinata | Information terminal apparatus |
Also Published As
| Publication number | Publication date |
|---|---|
| US20020169932A1 (en) | 2002-11-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6571326B2 (en) | Space allocation for data in a nonvolatile memory | |
| US9442844B2 (en) | Apparatus, system, and method for a storage layer | |
| US9983993B2 (en) | Apparatus, system, and method for conditional and atomic storage operations | |
| US10635310B2 (en) | Storage device that compresses data received from a host before writing therein | |
| US7877569B2 (en) | Reduction of fragmentation in nonvolatile memory using alternate address mapping | |
| US7533214B2 (en) | Open architecture flash driver | |
| US8966191B2 (en) | Logical interface for contextual storage | |
| US7610434B2 (en) | File recording apparatus | |
| US9563555B2 (en) | Systems and methods for storage allocation | |
| US7010662B2 (en) | Dynamic data structures for tracking file system free space in a flash memory device | |
| US7594064B2 (en) | Free sector manager for data stored in flash memory devices | |
| US7093101B2 (en) | Dynamic data structures for tracking file system free space in a flash memory device | |
| US6621746B1 (en) | Monitoring entropic conditions of a flash memory device as an indicator for invoking erasure operations | |
| US6651147B2 (en) | Data placement and allocation using virtual contiguity | |
| EP1351151A2 (en) | System and method for achieving uniform wear levels in a flash memory device | |
| CN1226687C (en) | Systems and methods for persistent and robust storage management | |
| US20030163630A1 (en) | Dynamic data structures for tracking data stored in a flash memory device | |
| KR20150105323A (en) | Method and system for data storage | |
| JP2004206733A (en) | Permanent and robust memory allocation system and its method | |
| US20090094299A1 (en) | Apparatus and method for defragmenting files on a hydrid hard disk | |
| US11494303B1 (en) | Data storage system with adaptive, memory-efficient cache flushing structure |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BURNS, RANDAL CHILTON;LONG, DARRELL D.E.;REES, ROBERT MICHAEL;REEL/FRAME:012064/0786;SIGNING DATES FROM 20010316 TO 20010502 |
|
| FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| FPAY | Fee payment |
Year of fee payment: 4 |
|
| FPAY | Fee payment |
Year of fee payment: 8 |
|
| AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:026664/0866 Effective date: 20110503 |
|
| REMI | Maintenance fee reminder mailed | ||
| LAPS | Lapse for failure to pay maintenance fees | ||
| STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
| FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20151118 |
|
| AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044144/0001 Effective date: 20170929 |
|
| AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE THE REMOVAL OF THE INCORRECTLY RECORDED APPLICATION NUMBERS 14/149802 AND 15/419313 PREVIOUSLY RECORDED AT REEL: 44144 FRAME: 1. ASSIGNOR(S) HEREBY CONFIRMS THE CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:068092/0502 Effective date: 20170929 |