WO2008030694A1 - Dynamic fragment mapping - Google Patents

Dynamic fragment mapping Download PDF

Info

Publication number
WO2008030694A1
WO2008030694A1 PCT/US2007/076257 US2007076257W WO2008030694A1 WO 2008030694 A1 WO2008030694 A1 WO 2008030694A1 US 2007076257 W US2007076257 W US 2007076257W WO 2008030694 A1 WO2008030694 A1 WO 2008030694A1
Authority
WO
WIPO (PCT)
Prior art keywords
fragment
map
index
actual
data
Prior art date
Application number
PCT/US2007/076257
Other languages
English (en)
French (fr)
Inventor
Robert H. Gerber
Vishal Kathuria
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Priority to JP2009527480A priority Critical patent/JP4669067B2/ja
Priority to EP07814240.3A priority patent/EP2069979B1/en
Priority to CN2007800330122A priority patent/CN101512526B/zh
Publication of WO2008030694A1 publication Critical patent/WO2008030694A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/40Data acquisition and logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99937Sorting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users
    • Y10S707/99953Recoverability

Definitions

  • Database systems may store a set of tabular data having rows and columns in a variety of ways.
  • a database system may store data in volatile and nonvolatile memory, in a file located in conventional file storage local to the database system, in a file located in conventional file storage attached to one or more storage systems located on a network, or the like.
  • a database system may typically add or remove data to a set of data and therefore the set of data may shrink or grow over time.
  • the set of data may grow too large and may exceed the storage capabilities of the location where it is stored.
  • a data set may be stored as a file on a hard disk drive. If the data set grows larger than the capacity of the hard disk drive, the data set may either be moved as a whole to a hard disk drive with larger capacity or may be divided into one or more pieces and each piece may be moved to one of multiple physical storage locations.
  • the database system may implement fixed functionality to locate and retrieve the data stored in each of the multiple physical storage locations.
  • a database system may include two physical storage locations. The database system may choose to store data associated with the first row of the data set in the first physical storage location. The database system may further choose to store data associated with the second row of the data set in the second physical storage location. The database system may then choose to store data associated with the third row of the data set in the first physical storage location. Such a pattern may be repeated for each row of the dataset. The database system may then reverse such fixed functionality to locate and retrieve the data. For example, the database system may retrieve the data associated with the second row of the dataset by accessing the data stored on the second physical location.
  • first or second physical storage location should also become filled to capacity, an additional physical storage location may be added to the database system.
  • the database system may utilize the fixed functionality to divide as well as retrieve data, all of the data stored on the first and second physical storage location may be required to be reshuffled to accommodate the addition of the third physical storage location.
  • the database system may now choose to store data associated with the first row in the first physical storage location.
  • the database system may further choose to store data associated with the second row in the second physical storage location.
  • the database system may then choose to store data associated with the third row in the third physical storage location. Note that the data associated with the third row was previously stored in the first physical storage location.
  • the data associated with the third row may be required to be reshuffled to the third physical storage location.
  • the database system does not employ fixed functionality to divide and store the dataset, the division and lookup functions may be required to change each time a new physical location is added to or removed from the database system.
  • the database system may include a non- fixed function that selects a physical location for a row in the dataset by performing a mathematical function based on the row number and the number of physical storage locations. Because the both the location of each row of data and the mathematical function determining the location of each row is based on the number of physical storage locations, both the mathematical function and the location of each row must be recalculated and reshuffled each time a physical storage location is added to or removed from the database system.
  • a system that implements a method that allows data to be easily moved from one physical storage location to another without requiring a reshuffling of all data on all physical storage locations each time a physical storage location is either added or removed may be useful.
  • a database object such as a table, a rowset, index, or a partition of a table or index, may be divided into data fragments or may exist as a data fragment.
  • a rowset may be considered to be a set of one or more rows in a database table or may be entries in an index.
  • the terms row and record may be considered substantially identical. Therefore, a rowset may also be equivalent to a recordset.
  • a maximum number of possible data fragments may be chosen. As a dataset is divided into fragments, each piece of the dataset may be assigned a virtual identifier. A physical location that may contain a set of rows may be known as an actual fragment.
  • An array may be created with each element including an identifier token that may be dereferenced to discover a physical location of an actual fragment. Such an array may act as a map for the purpose of discovering the physical location of an actual fragment from the virtual identifier.
  • a mapping function that correlates the virtual identifier to a map index value may be called to discover the value of the element at the associated map index value.
  • Each piece of the dataset may then be physically stored at the location referenced by the map index value.
  • Multiple elements in the map may refer to the same physical location.
  • a new physical location may be added by changing the value at one of the multiple map index elements that point to a pre-existing physical location.
  • the pieces of data that were previously associated with the map element may then be moved from the old physical location to the new physical location.
  • an existing physical location may be deleted by changing the value at a map index element to another pre-existing location and by moving the data from the location to be deleted to the said pre-existing location.
  • the set of elements comprising the actual fragment map may be expanded or contracted as necessary. For example, a circumstance may arise in which the map may no longer have enough storage for new physical locations. The number of map elements may then be increased by a chosen factor. As the mapping function may correlate a virtual identifier to map index values, a number of virtual identifier values may now correlate to a new map index value. The value of the map element at new index may be changed to reference a new physical location. All pieces of data associated with the new map index value may be moved to the new physical location.
  • map index elements may be reduced. As the mapping function correlates virtual identifiers to map index values, a number of virtual identifiers may now correlate to a new map index value. The value of the map index element may be changed to reference an existing physical location. All pieces of data associated with the new map index value may be moved to the existing physical location.
  • the physical data associated with a virtual identifier may have grown too large for a physical location.
  • a condition may be known as data skew.
  • data skew In such a data skew condition, it may not be possible to add a new physical location and assign a new physical address to the virtual identifier as doing so may move all information from the old physical location to the new location, instead of splitting the information into two parts.
  • a flag may be added to the map index entry and the data associated with the map index value may be distributed to all physical locations.
  • Such a data skew flag may indicate that a lookup function may inspect all physical locations to discover the location of the data.
  • the virtual identifier is passed to the mapping function.
  • the mapping function may then return the map index value.
  • the element at the map index value may include an identifier referencing the physical location at which the fragment is stored.
  • the identifier may then be dereferenced to discover the physical location and the fragment may be retrieved.
  • FIC. 1 is a block diagram showing a conventional data storage and lookup system.
  • FIC. 2 is a block diagram showing a dynamic fragment system including an example method for creating a fragment map, an example method for splitting an actual fragment, an example method for merging an actual fragment, and an example method for looking up the location of a fragment.
  • FIC. 3 is a block diagram showing a fragment mapping data structure.
  • FIC. 4 is a flow diagram showing an example method for creating a fragment map.
  • FIC. 5 is a flow diagram showing an example method for splitting an actual fragment.
  • FIC. 6 is a block diagram showing state transitions in an example fragment map and example physical locations during actual fragment splitting.
  • FIC. 7 is a flow diagram showing an example method for merging an actual fragment.
  • FIC. 8 is a block diagram showing state transitions in an example fragment map and example physical locations during actual fragment merging.
  • FIC. 9 is a flow diagram showing an example method for looking up an identifier of a physical location of a fragment using a fragment identifier.
  • FIC. 1 0 is a flow diagram showing an example method for splitting an actual fragment and for marking a fragment as having data skew in a fragment map.
  • FIC. 1 1 is a block diagram showing state transitions in an example fragment map and example physical locations during actual fragment splitting including data skew.
  • FIC. 1 2 is a flow diagram showing an example method for looking up an identifier of a physical location of a fragment with data skew using a fragment identifier.
  • FIC. 1 3 shows an example computer device for implementing the described systems and methods.
  • Like reference numerals are used to designate like parts in the accompanying drawings. DETAILED DESCRIPTION
  • FIC. 1 is a block diagram showing a conventional data lookup system
  • the conventional data lookup system 1 00 may be any type of typical database system including a standalone database system such as Microsoft® SQL ServerTM, a database system incorporated into a managed or interpreted language system such as Microsoft® .Net, Sun Microsystems Java, or the like.
  • the conventional dataset 1 1 0 may be coupled to the conventional storage and lookup component 1 20.
  • the conventional storage and lookup component 1 20 may be coupled to conventional physical storage 1 30, conventional physical storage 140, and conventional physical storage 1 50.
  • Each of conventional physical storage 1 30, conventional physical storage 140, or conventional physical storage 1 50 may be a typical form of computer storage such as volatile or non-volatile physical memory, a computer hard disk drive, or the like.
  • the conventional dataset 1 1 0 may be any type of data stored in a tabular structure.
  • the conventional dataset 1 1 0 may be a conventional database table having conventional rows and conventional columns, extensible markup language (XML), or the like.
  • the conventional storage and lookup component 1 20 may be a typically constructed software component capable of executing on the conventional data lookup system 1 00.
  • the conventional storage and lookup component 1 20 may implement a conventional storage function. Such a conventional storage function may provide functionality to store one or more rows of the conventional dataset 1 1 0 in one of conventional physical storage 1 30, conventional physical storage 1 40, or conventional physical storage 1 50.
  • Such a conventional storage function may further implement fixed functionality to determine which of conventional physical storage 1 30, conventional physical storage 140, or conventional physical storage 1 50 a row may be stored in.
  • the conventional storage and lookup component 1 20 may further implement a conventional fixed lookup function.
  • Such a conventional fixed lookup function may provide functionality to discover the location of the one or more rows of the conventional dataset 1 1 0 stored in conventional physical storage 1 30, conventional physical storage 1 40, or conventional physical storage 1 50.
  • Such a conventional fixed lookup function may further make use of the fixed functionality used by the conventional storage function to discover the location of one or more rows of the conventional dataset 1 1 0.
  • the conventional storage function may utilize the count of physical storage units to determine where to store the one or more rows of the conventional dataset 1 1 0.
  • the conventional storage and lookup component 1 20 may initially be coupled to conventional physical storage 1 30 and conventional physical storage 1 40.
  • the conventional storage function may store odd -numbered rows in conventional physical storage 1 30 and even-numbered rows in conventional physical storage 140.
  • the conventional lookup function may therefore locate the data associated with an odd-numbered row in conventional physical storage 1 30 and data associated with an even-numbered row in conventional physical storage 140.
  • the conventional storage and lookup component 1 20 may then add conventional physical storage 1 50 in response to either of conventional physical storage 1 30 or conventional physical storage 140 becoming filled to capacity.
  • the conventional storage and lookup component 1 20 may not simply migrate rows stored at conventional physical storage 1 30 or conventional physical storage 1 40 to conventional physical storage 1 50.
  • the conventional lookup function may expect to find odd- numbered rows in conventional physical storage 1 30 and even-numbered rows in conventional physical storage 1 50.
  • the conventional lookup function may extend the fixed functionality such that a first row of the conventional dataset 1 1 0 may be stored in conventional physical storage 1 30, a second row may be stored in conventional physical storage 1 40, a third row may be stored in conventional physical storage 1 50, a fourth row may be stored in conventional physical storage 1 30, and so on, repeating such a pattern.
  • conventional physical storage 1 50 may require a complete reshuffling of the data that was previously stored on conventional physical storage 1 30 and conventional physical storage 1 40.
  • conventional storage and lookup component 1 20 may be replaced with an alternative conventional storage and lookup component including a new conventional storage method and new conventional lookup method.
  • Such a new conventional storage method and new conventional lookup method may include functionality to store and retrieve data from each of conventional physical storage 1 30, conventional physical storage 1 40, or conventional physical storage 1 50.
  • an additional conventional physical storage location should be introduced, an additional new conventional storage and lookup component may also be required to replace the previous conventional storage and lookup component.
  • a system that implements a method that allows data to be easily moved from one physical storage location to another without requiring a reshuffling of all data on all physical storage locations each time a physical storage location is either added or removed may be useful.
  • FIC. 2 is a block diagram showing a dynamic fragment system 200 including an example lookup component 21 0 including an example method for creating a fragment map 400, an example method for splitting an actual fragment 500, an example method for merging an actual fragment 700, and an example method for looking up the location of a fragment 900.
  • the dynamic fragment system 200 may be any type of typical database system including a standalone database system such as Microsoft ® SQL ServerTM, a database system incorporated into a managed or interpreted language system such as Microsoft ® .Net or Sun Microsystems Java, or the like. However, the dynamic fragment system 200 may be used in any type of computer system to provide the functionality described herein. Those skilled in the art will realize that the dynamic fragment system 200 includes functionality which may be used in a broad range of applications where a first identifier may be mapped to a second identifier. A fragment may also be known as a subset of a dataset.
  • the dataset 1 1 0 may be coupled to the storage and lookup component 21 0.
  • the storage and lookup component 21 0 may be coupled to physical storage location 1 30, physical storage location 1 40, and physical storage location 1 50.
  • the dataset 1 1 0 may be any type of data stored in a tabular structure.
  • the dataset 1 1 0 may be a conventional database table having conventional rows and conventional columns.
  • a conventional column of the dataset 1 1 0 may be designated to include a unique identifier or unique key.
  • the storage and lookup component 21 0 may be a typically constructed software component capable of executing on the dynamic fragment system 200.
  • Each of physical storage location 1 30, physical storage location 140, or physical storage location 1 50 may be a typical form of computer storage such as volatile or non-volatile physical memory, a computer hard disk drive, or the like.
  • the storage and lookup component 21 0 may further implement an example method for creating a fragment map 400, an example method for splitting an actual fragment 500, an example method for merging an actual fragment 700, an example method for looking up the location of a fragment 900, and the like.
  • the method for creating a fragment map 400 will be discussed more fully in the description of FIC. 4.
  • the method for splitting an actual fragment 500 will be discussed more fully in the description of FIC. 5.
  • the method for merging an actual fragment 700 will be discussed more fully in the description of FIC. 7.
  • the method for looking up the location of a fragment 900 will be discussed more fully in the description of FIC. 9.
  • FIC. 3 is a block diagram showing a fragment mapping data structure
  • fragment mapping data structure 300 The fragment mapping data structure 300fragment mapping data structure 300 may be a typical computer data structure in the form of an array, a list, a collection, or any other grouping of homogenous elements of a specific data type.
  • the fragment mapping data structure 300 may be comprised of a map
  • each element in the map 330 may be indexed by number, as represented in FIC.
  • elements in the map 330 included in the virtual actual fragment map 320 may have an index value that, when passed through a function, such a function may return an index value falling within the actual fragment map 31 0. In this manner, index values of elements in virtual actual fragment map 320 are "mapped" using the function to corresponding elements in the actual fragment map 31 0.
  • Such a function will be described more fully in the description of FIC. 4, FIC. 5, FIC. 7, and FIC. 9.
  • the elements in the actual fragment map 31 0 may include an identifier corresponding to an actual fragment 340, a second actual fragment 350, or the like, the elements in virtual actual fragment map 320 thereby "point" to one of actual fragment 340 or second actual fragment 350. Such a relationship is illustrated by the inclusion of index values 0, 2, 4, 6 through n in actual fragment 340 and index values 1 , 3, 5, 7, and n + 1 in second actual fragment 350.
  • the fragment mapping data structure 300 actual fragment map 31 0 may be used to map rows in a dataset to physical storage locations.
  • the actual fragment map 31 0fragment mapping data structure 300 may be used to map any indexed data to any corresponding fragment type.
  • the actual fragment map 31 0 may be used to map virtual identifiers to actual identifiers in another fragment map structure.
  • An example method for creating an example embodiment of a fragment map structure may now be discussed.
  • FIC. 4 is a flow diagram showing an example method for creating a fragment map 400.
  • Block 41 0 may refer to operation in which an initial number of fragments may be determined. Any method may be used to determine an initial number of fragments. In one embodiment, the initial number of fragments may be determined to be a power of 2. In an alternative example, the initial number of fragments may be determined to be a factor of the size of the actual fragment map. In another alternative example, the initial number of fragments may be related to the size of a dataset represented by the initial number of fragments. [0054] Block 420 may refer to an operation in which the size of an actual fragment map may be determined. An actual fragment map may include an identifier, token, or the like representing a physical storage location of an actual fragment, physical fragment, virtual fragment, or the like. Such a determination may be performed using any method.
  • Block 430 may refer to an operation in which an array having index values and elements may be created. An array may be created that is the size of the actual fragment map determined at block 420. In an alternative embodiment, an array may be created that may be equivalent to the size of the initial fragment map created at block 41 0 and the size of the actual fragment map created at block 420 combined. In an alternative embodiment, an array either greater or lesser in size than the combination of the size of the initial fragment map created at block 41 0 and the size of the actual fragment map created at block 420 may be created and dynamically adjusted to another size.
  • Block 440 may refer to an operation in which the elements of the actual fragment map as determined in block 420 may be populated with values. Values may be chosen using any method and such values may be identifiers, tokens, or the like. In one embodiment, these values are reference to physical locations of actual fragments. Such values may be inserted into the actual fragment map using any method.
  • FIC. 5 is a flow diagram showing an example method for splitting an actual fragment 500.
  • a fragment map may be an array such as that discussed in FIC. 3.
  • Block 51 0 may refer to an operation in which a fragment that has grown too large is identified. Such identification may take the form of querying the fragment and making a further determination that the fragment may be full to capacity or larger than the desired size and may not store any additional information.
  • LARCE_AF may refer to the fragment which has grown too large.
  • Block 520 may refer to decision in which it is determined if LARCE_AF is referenced more than one times in the actual fragment map. In response to a negative determination, flow continues to block 530. In response to a positive determination, flow continues to block 540. [0060] Block 530 may refer to an operation in which the size of the actual fragment map is doubled. However, as will be appreciated by those skilled in the art, the actual fragment map may be increased in size by any appropriate factor. Subsequent to doubling the size of the actual fragment map, the values of the elements in the original actual fragment map are copied to the newly created elements of the actual fragment map.
  • Block 540 may refer to an operation in which a new actual fragment is created. Such an actual fragment may be "empty", may not contain any information, or the like. Such a new actual fragment may be referred to by NEW AF for the purposes of discussion.
  • Block 550 may refer to an operation in which a new identifier is selected or created for the actual fragment created in block 540. Such an identifier may be a numeral, a token, any value, or the like. Such a new identifier may be referred to by NEW AF ID for the purposes of discussion.
  • Block 560 may refer to an operation in which one of the entries in the actual fragment map that was once populated with the identifier LARCE_AF may be changed to the NEW AF ID selected or created at block 550.
  • Block 570 may refer to an operation in which data rows associated with IDs in the virtual actual fragment map of the array which now point to the NEW AF ID may be moved from the LARCE_AF fragment to the NEW AF.
  • Such a moving of information may be accomplished using any means.
  • the information may be copied from one physical storage device to another, may be copied over a network, may be copied from one area of physical memory to another area of physical memory, or the like.
  • the following figure may utilize a series of diagrams to illustrate the method of FIC. 5.
  • FIC. 6 is a block diagram showing state transitions in an example fragment map and example physical locations during actual fragment splitting 600.
  • Block A 61 0 may refer to a state in which the actual fragment map
  • a first element of the actual fragment map 640 may include an identifier corresponding to a first actual fragment 670.
  • a second element of the actual fragment map 640 may include an identifier corresponding to a second actual fragment 680.
  • the second actual fragment 680 may have become filled to capacity. Note that the information included in first actual fragment 670 may be mapped to the first element of actual fragment map 640 using a mapping function as discussed earlier. Further, information included in the second actual fragment 680 may also be mapped to the second element of the actual fragment map 640 using a mapping function also as discussed earlier. Such a mapping may be represented by the arrows in Block A 61 0.
  • Block B 620 may refer to a state in which the actual fragment map 640 has been doubled in size to create the expanded actual fragment map 650.
  • the identifiers included in the first and second elements of actual fragment map 640 may have been copied into the newly created elements of expanded actual fragment map 650.
  • the first and third elements of the expanded actual fragment map 650 may correspond to the first actual fragment 670.
  • the second and fourth elements of the expanded actual fragment map 650 may correspond to the second actual fragment 680.
  • the mapping function that may map elements of a virtual actual fragment associated with the expanded actual fragment map 650 to the first actual fragment 670 and the second and fourth elements of the expanded actual fragment map 650 to the second actual fragment 680 may be a modulus function.
  • the modulus function may return the remainder after dividing the size of the expanded actual fragment map 650 by the index number of an element within the expanded actual fragment map 650 or any other portion of elements beyond the index values of expanded actual fragment map 650 such as a virtual actual fragment map.
  • any type of hashing or mapping function which produces such behavior may be used.
  • Block C 630 may refer to a state in which a third actual fragment 690 may have been added.
  • the fourth entry in the expanded actual fragment map 650 may have been modified to create the final expanded actual fragment map 660.
  • the fourth element of the final expanded actual fragment map 660 may now correspond to the third actual fragment 690. That is, because the mapping function discussed earlier may map elements in the virtual actual fragment map to either of the second or fourth elements in the final expanded actual fragment map 660, the elements in the virtual actual fragment map that may have corresponded to the second entry may now correspond to the fourth entry. Thereby, these elements may now correspond to the new fragment, the third actual fragment 690. Finally, any information stored on the second actual fragment 680 that now maps to the fourth element in final expanded actual fragment map 660 is moved to third actual fragment 690.
  • FIC. 7 is a flow diagram showing an example method for merging an actual fragment 700.
  • Block 71 0 may refer to an operation in which an actual fragment is identified as a candidate to be merged. Such identification may take the form of querying the actual fragment and making a further determination that the fragment may have a large amount of free or empty storage space.
  • a candidate actual fragment may be referred to as AF_M for the purposes of discussion.
  • AF_M ID may refer to the identifier associated with the identified fragment.
  • Block 720 may refer to a determination in which the actual fragment map is queried to determine if the actual fragment map includes a potential merge partner for the candidate actual fragment AF_M.
  • elements in the actual fragment map having similar factors like free storage space are queried to determine if one of the actual fragments is a potential merge partner for the candidate actual fragment AF_M.
  • any element within the actual fragment map may be a potential merge partner for the candidate actual fragment AF_M.
  • flow continues to block 770.
  • the potential merge partner actual fragment becomes the merge partner actual fragment and flow continues to block 730.
  • Block 730 may refer to an operation in which any information stored on the actual fragment AF_M is moved to the merge partner actual fragment that may have been identified at block 720.
  • Block 740 may refer to an operation in which the map entry in the actual fragment map that formerly included a reference to the candidate actual fragment AF_M is modified to include a reference to the identifier of the merge partner actual fragment that may have been identified at block 720.
  • Block 750 may refer to a determination in which is determined if the actual fragment map may be collapsed, or reduced in size.
  • the actual fragment map is inspected and if the elements of the actual fragment map include identifiers forming a regular pattern, the actual fragment map may be collapsed by a factor related to the number of times the pattern may repeat.
  • the actual fragment map may include four elements, with the first and third elements including a first identifier and the second and fourth elements including a second identifier.
  • the actual fragment map may be collapsed because the identifiers follow a repeated pattern of first identifier, second identifier.
  • flow continues to block 770.
  • Block 770 may refer to an operation in which the method may be terminated.
  • Block 760 may refer an operation in which the actual fragment map is collapsed. Such collapsing may be performed using any method. In one embodiment, the actual fragment map may be collapsed by reducing the number of elements included in the actual fragment map. The following figure may utilize a series of diagrams to illustrate the method of FIC. 7.
  • FIC. 8 is a block diagram showing state transitions in an example fragment map and example physical locations during actual fragment merging 800.
  • Block A 81 0 may refer to a state in which the actual fragment map 840 includes four elements.
  • a first element of the actual fragment map 840 may include an identifier corresponding to a first actual fragment 870.
  • a second element of the actual fragment map 840 may include an identifier corresponding to a second actual fragment 880.
  • a third element of the actual fragment map 840 may include an identifier corresponding to the first actual fragment 870.
  • a fourth element of the actual fragment map 840 may include an identifier corresponding to a third actual fragment 890.
  • the third actual fragment 890 may store less than a certain minimum amount of data.
  • the information included in third actual fragment 890 may be referenced by the fourth element of actual fragment map 840.
  • Such references may be represented by the arrows in Block A 81 0.
  • Elements in an associated virtual actual fragment map are not shown, but such elements in a virtual actual fragment map may be associated with one of the elements in actual fragment map 840 using a mapping function as described earlier.
  • a mapping function may be a modulus function utilizing the size of the actual fragment map 840 to determine a remainder.
  • Block B 820 may refer to state in which the fourth element of the modified actual fragment map 850 has been changed to include an identifier corresponding to second actual fragment 880. In addition, the information stored on third actual fragment 890 may be moved to second actual fragment 880.
  • Block C 830 may refer to a state in which the modified actual fragment map 850 of Block B 820 may be collapsed to produce the collapsed actual fragment map 860. Note that elements in an associated virtual actual fragment map may have been previously mapped to the third and fourth elements of the actual fragment map using a mapping function. Because such a mapping function may be a modulus function utilizing the size of the collapsed actual fragment map 860, elements in an associated virtual actual fragment map may now map to either the fist or second element of collapsed actual fragment map 860.
  • Block 91 0 may refer to an operation in which a hashing function may be used to map a row in a data set to an element that may be included in a virtual actual fragment map.
  • the hashing function may take a column value of a row in the data set as a parameter and return an element index value of a virtual actual fragment map.
  • any hash function that may map a piece of information to an index in the virtual actual fragment map may be used.
  • Block 920 may refer to an operation in which a wrapping, or mapping, function may be used to discover the corresponding element in the actual fragment map to which the element index determined at block 91 0 may be mapped.
  • the wrapping function may be a modulus function that may return the actual fragment map index by dividing the element index determined at block 91 0 by the size of the actual fragment map and returning the remainder.
  • any wrapping function that maps an element index in the virtual actual fragment map to an element index in the actual fragment map may be used.
  • Block 930 may refer to an operation in which the element index in the actual fragment map determined at block 920 is located.
  • Block 940 may refer to an operation in which an identifier that may be stored at the element index in the actual fragment map that may have been located at block 930 is read. Such a reading operation may be performed using any typical array data reading operation.
  • FIC. 1 0 is a flow diagram showing an example method for splitting an actual fragment and for marking a fragment as having data skew in a fragment map 1 000.
  • Block 1 01 0 may refer to an operation in which a fragment that has grown too large is identified. Such identification may take the form of querying the fragment and making a further determination that the fragment may be full to capacity or larger than the desired size and may not store any additional information.
  • LARCE_AF may refer to the fragment that has grown too large for the purposes of discussion.
  • Block 1 020 may refer to a decision to determine whether the fragment identified at block 1 01 0 is in a condition of data skew.
  • Data skew may refer to a condition in which a number of rows of data exceeding a predetermined threshold may map to a single fragment. Furthermore, the fragment map may be in such a condition that the fragment may no longer be split as described in the discussion of FIC. 5. In response to a negative determination, flow continues on to block 500 of FIC. 5. In response to a positive determination, flow continues on to block 1 030.
  • Block 1 030 may refer to an operation in which the element index to which the fragment identified at block 1 01 0 may have had a data skew flag added to designate that the fragment is in a condition of data skew.
  • a flag may be any type of data or information.
  • the data skew flag may be an additional portion of data stored with any information already stored at the actual fragment LARCE_AF identified at block 1 01 0.
  • Block 1 040 may refer to an operation in which the information or data that was previously stored on the fragment identified at block 1 01 0 LARCE_AF is distributed to all other fragments. Such a distribution may be performed using any method. For example, the information or data stored on LARCE_AF may be copied to local physical storage represented by all other fragments, may be copied over a network connection to other physical storage represented by all other fragments, or the like.
  • FIC. 1 1 is a block diagram showing state transitions in an example fragment map and example physical locations during actual fragment splitting including data skew 1 1 00.
  • Block A l I l O may refer to a state in which the actual fragment map
  • a first element of the actual fragment map 1 1 30 may include an identifier corresponding to a first actual fragment 1 1 50.
  • a second element of the actual fragment map 1 1 30 may include an identifier corresponding to a second actual fragment 1 1 60.
  • a third element of the actual fragment map 1 1 30 may include an identifier corresponding to the actual fragment 1 1 50.
  • a fourth element of the actual fragment map 1 1 30 may include an identifier corresponding to a third actual fragment 1 1 70.
  • the third actual fragment 1 1 70 may be in a condition of data skew.
  • Block B 1 1 20 may refer to a state in which the fourth element of the actual fragment map 1 1 40 may include a data skew flag represented by the small letter 'x' in the figure. The presence of the data skew flag may indicate that the fourth element of the actual fragment map 1 140 may now maps to all of actual fragment 1 1 50, second actual fragment 1 1 60, and third actual fragment 1 1 70. Furthermore, the information previously stored on the third actual fragment 1 1 70 may now have been moved to each of actual fragment 1 1 50 and second actual fragment 1 1 60.
  • FIC. 1 2 is a flow diagram showing an example method for looking up an identifier of a physical location of a fragment with data skew using a fragment identifier 1 200.
  • Block 900 may refer to the example method for looking up an identifier of a physical location of a fragment using a fragment identifier described in FIC. 9.
  • Block 1 21 0 may refer to a determination as to whether the fragment looked up at block 900 maps to an element of the fragment map that includes a data skew flag. Such a determination may take any form. For example, the element corresponding to the fragment may be read and the information returned may be queried to determine if the information includes a data skew flag. In response to a negative determination, flow continues to block 1 220. In response to a positive determination, flow continues to block 1 230.
  • Block 1 220 may refer to an operation in which information is read from the fragment identified at block 900.
  • Block 1 230 may refer to an operation in which information is read from each fragment in the fragment map.
  • FIC. 1 3 shows an example computer device 1 300 for implementing the described systems and methods.
  • computing device 1 300 typically includes at least one central processing unit (CPU) 1 305 and memory 1 31 0.
  • CPU central processing unit
  • memory 1 31 0 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two.
  • computing device 1 300 may also have additional features/functionality.
  • computing device 1 300 may include multiple CPU's. The described methods may be executed in any manner by any processing unit in computing device 1 300. For example, the described process may be executed by multiple CPU's in parallel.
  • Computing device 1 300 may also include additional storage
  • Computer storage media includes volatile and nonvolatile, removable and nonremovable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Memory 1 31 0 and storage 1 31 5 are all examples of computer storage media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computing device 1 300. Any such computer storage media may be part of computing device 1 300.
  • Computing device 1 300 may also contain communications device(s)
  • Communications device(s) 1 340 is an example of communication media.
  • Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
  • the term computer-readable media or device- readable media as used herein includes both computer storage media and communication media. The described methods may be encoded in any computer- readable media in any form, such as data, computer-executable instructions, and the like.
  • Computing device 1 300 may also have input device(s) 1 335 such as keyboard, mouse, pen, voice input device, touch input device, etc.
  • Output device(s) 1 330 such as a display, speakers, printer, etc. may also be included. All these devices are well known in the art and need not be discussed at length.
  • a remote computer may store an example of the process described as software.
  • a local or terminal computer may access the remote computer and download a part or all of the software to run the program.
  • the local computer may download pieces of the software as needed, or distributively process by executing some software instructions at the local terminal and some at the remote computer (or computer network).
  • a dedicated circuit such as a DSP, programmable logic array, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/US2007/076257 2006-09-06 2007-08-18 Dynamic fragment mapping WO2008030694A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2009527480A JP4669067B2 (ja) 2006-09-06 2007-08-18 動的フラグメントマッピング
EP07814240.3A EP2069979B1 (en) 2006-09-06 2007-08-18 Dynamic fragment mapping
CN2007800330122A CN101512526B (zh) 2006-09-06 2007-08-18 动态片段映射

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/470,586 2006-09-06
US11/470,586 US7523288B2 (en) 2006-09-06 2006-09-06 Dynamic fragment mapping

Publications (1)

Publication Number Publication Date
WO2008030694A1 true WO2008030694A1 (en) 2008-03-13

Family

ID=39153420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/076257 WO2008030694A1 (en) 2006-09-06 2007-08-18 Dynamic fragment mapping

Country Status (7)

Country Link
US (1) US7523288B2 (ko)
EP (1) EP2069979B1 (ko)
JP (1) JP4669067B2 (ko)
KR (1) KR101467589B1 (ko)
CN (1) CN101512526B (ko)
TW (1) TWI372981B (ko)
WO (1) WO2008030694A1 (ko)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010079886A (ja) * 2008-09-11 2010-04-08 Nec Lab America Inc 拡張可能な2次ストレージシステムと方法
WO2017151262A1 (en) * 2016-03-02 2017-09-08 Intel Corporation Method and apparatus for providing a contiguously addressable memory region by remapping an address space

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012020505A1 (ja) * 2010-08-13 2012-02-16 富士通株式会社 メモリ制御装置、情報処理装置及びメモリ制御装置の制御方法
GB2489405B (en) * 2011-03-22 2018-03-07 Advanced Risc Mach Ltd Encrypting and storing confidential data
KR20130014943A (ko) * 2011-08-01 2013-02-12 삼성전자주식회사 임의의 메모리 집합을 지원하는 메모리 구성 장치 및 방법
US20140344235A1 (en) * 2013-05-17 2014-11-20 Emmanuel Zarpas Determination of data modification
CN103441937A (zh) * 2013-08-21 2013-12-11 曙光信息产业(北京)有限公司 组播数据的发送方法和接收方法
US9405643B2 (en) 2013-11-26 2016-08-02 Dropbox, Inc. Multi-level lookup architecture to facilitate failure recovery
US9547706B2 (en) * 2014-03-10 2017-01-17 Dropbox, Inc. Using colocation hints to facilitate accessing a distributed data storage system
US10509798B2 (en) * 2016-05-11 2019-12-17 Informatica Llc Data flow design with static and dynamic elements
CN106227678B (zh) * 2016-07-21 2018-12-28 北京四维益友信息技术有限公司 一种虚拟存储介质的存取方法
CN108509438B (zh) * 2017-02-24 2021-08-31 南京烽火星空通信发展有限公司 一种ElasticSearch分片扩展方法
WO2018199795A1 (en) 2017-04-27 2018-11-01 EMC IP Holding Company LLC Best-effort deduplication of data
US11099983B2 (en) 2017-04-27 2021-08-24 EMC IP Holding Company LLC Consolidating temporally-related data within log-based storage
WO2018199794A1 (en) 2017-04-27 2018-11-01 EMC IP Holding Company LLC Re-placing data within a mapped-raid environment
WO2019022631A1 (en) * 2017-07-27 2019-01-31 EMC IP Holding Company LLC STORING DATA IN DIFFERENTLY SIZED WAFERS WITHIN DIFFERENT STORAGE LEVELS
CN107465573B (zh) * 2017-08-04 2020-08-21 苏州浪潮智能科技有限公司 一种提高ssr客户端在线监控效能的方法
US11461250B2 (en) 2017-10-26 2022-10-04 EMC IP Holding Company LLC Tuning data storage equipment based on comparing observed I/O statistics with expected I/O statistics which are defined by operating settings that control operation
US11199968B2 (en) 2017-10-26 2021-12-14 EMC IP Holding Company LLC Using recurring write quotas to optimize utilization of solid state storage in a hybrid storage array
US11461287B2 (en) 2017-10-26 2022-10-04 EMC IP Holding Company LLC Managing a file system within multiple LUNS while different LUN level policies are applied to the LUNS
CN108776692A (zh) * 2018-06-06 2018-11-09 北京京东尚科信息技术有限公司 用于处理信息的方法和装置
US11385822B2 (en) 2019-07-31 2022-07-12 EMC IP Holding Company LLC Moving data from a first group of slices to a second group of slices

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5881379A (en) * 1996-05-20 1999-03-09 International Business Machines Corporation System, method, and program for using duplicated direct pointer sets in keyed database records to enhance data recoverability without logging
US6055604A (en) * 1997-08-26 2000-04-25 Hewlett-Packard Company Forced transaction log posting using a least busy storage media without maintaining redundancy of the transaction log
US6874061B1 (en) * 1998-10-23 2005-03-29 Oracle International Corporation Method and system for implementing variable sized extents
US20050223156A1 (en) * 2004-04-02 2005-10-06 Lubbers Clark E Storage media data structure system and method

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5555404A (en) * 1992-03-17 1996-09-10 Telenor As Continuously available database server having multiple groups of nodes with minimum intersecting sets of database fragment replicas
JP3345628B2 (ja) * 1997-07-11 2002-11-18 アネックスシステムズ株式会社 データ格納及び検索方法
JPH11110262A (ja) * 1997-10-01 1999-04-23 Toshiba Corp 情報管理システム
JP4206586B2 (ja) * 1999-11-12 2009-01-14 株式会社日立製作所 データベース管理方法および装置並びにデータベース管理プログラムを記録した記憶媒体
US6523036B1 (en) * 2000-08-01 2003-02-18 Dantz Development Corporation Internet database system
US7162478B2 (en) * 2001-02-28 2007-01-09 International Business Machines Corporation System and method for correlated fragmentations in databases
JP2004021797A (ja) * 2002-06-19 2004-01-22 Hitachi Ltd データベース管理方法および装置
US6941310B2 (en) * 2002-07-17 2005-09-06 Oracle International Corp. System and method for caching data for a mobile application
JPWO2004025475A1 (ja) * 2002-09-10 2006-01-12 玉津 雅晴 データベースの再編成システム、並びに、データベース

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5881379A (en) * 1996-05-20 1999-03-09 International Business Machines Corporation System, method, and program for using duplicated direct pointer sets in keyed database records to enhance data recoverability without logging
US6055604A (en) * 1997-08-26 2000-04-25 Hewlett-Packard Company Forced transaction log posting using a least busy storage media without maintaining redundancy of the transaction log
US6874061B1 (en) * 1998-10-23 2005-03-29 Oracle International Corporation Method and system for implementing variable sized extents
US20050223156A1 (en) * 2004-04-02 2005-10-06 Lubbers Clark E Storage media data structure system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2069979A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010079886A (ja) * 2008-09-11 2010-04-08 Nec Lab America Inc 拡張可能な2次ストレージシステムと方法
WO2017151262A1 (en) * 2016-03-02 2017-09-08 Intel Corporation Method and apparatus for providing a contiguously addressable memory region by remapping an address space

Also Published As

Publication number Publication date
US20080059749A1 (en) 2008-03-06
JP2010503117A (ja) 2010-01-28
JP4669067B2 (ja) 2011-04-13
CN101512526A (zh) 2009-08-19
EP2069979A4 (en) 2016-08-03
KR20090048624A (ko) 2009-05-14
EP2069979B1 (en) 2018-10-24
TWI372981B (en) 2012-09-21
KR101467589B1 (ko) 2014-12-02
CN101512526B (zh) 2011-12-07
TW200825800A (en) 2008-06-16
US7523288B2 (en) 2009-04-21
EP2069979A1 (en) 2009-06-17

Similar Documents

Publication Publication Date Title
US7523288B2 (en) Dynamic fragment mapping
US5765165A (en) Fast method of determining duplicates on a linked list
EP0124097B1 (en) Method for storing and retrieving data in a data base
EP1866775B1 (en) Method for indexing in a reduced-redundancy storage system
KR100856245B1 (ko) 파일 시스템 장치 및 그 파일 시스템의 파일 저장 및 파일 탐색 방법
US20060271540A1 (en) Method and apparatus for indexing in a reduced-redundancy storage system
US7769719B2 (en) File system dump/restore by node numbering
JP3318834B2 (ja) データファイルシステム及びデータ検索方法
CN105320775A (zh) 数据的存取方法和装置
CN110109894B (zh) 非关系型数据库的实现方法、装置、存储介质和设备
JPH1131096A (ja) データ格納検索方式
GB2196764A (en) Hierarchical file system
CN114840487A (zh) 分布式文件系统的元数据管理方法和装置
CN105389394A (zh) 基于多个数据库集群的数据请求处理方法及装置
CN102110117B (zh) 用于最长匹配的b树的表项添加、查找、删除方法及装置
CN115576947A (zh) 一种数据管理方法、装置、组合库、电子设备及存储介质
CN108984780B (zh) 基于支持重复键值树数据结构管理磁盘数据的方法和装置
JPWO2007032068A1 (ja) データベース管理プログラム
US7162505B2 (en) Classification of data for insertion into a database
KR100289087B1 (ko) 비플러스트리에다수의키값을추가하기위한방법
US7996366B1 (en) Method and system for identifying stale directories
JP2000090115A (ja) インデクス作成方法および検索方法
CN115374127B (zh) 数据存储方法及装置
US8849866B2 (en) Method and computer program product for creating ordered data structure
JPS62287350A (ja) インデツクス一括更新方式

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780033012.2

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07814240

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 1171/CHENP/2009

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 1020097004700

Country of ref document: KR

ENP Entry into the national phase

Ref document number: 2009527480

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2007814240

Country of ref document: EP