WO2023138788A1 - Method of backing up file-system onto object storgae system and data management module - Google Patents

Method of backing up file-system onto object storgae system and data management module Download PDF

Info

Publication number
WO2023138788A1
WO2023138788A1 PCT/EP2022/051475 EP2022051475W WO2023138788A1 WO 2023138788 A1 WO2023138788 A1 WO 2023138788A1 EP 2022051475 W EP2022051475 W EP 2022051475W WO 2023138788 A1 WO2023138788 A1 WO 2023138788A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
directory
sequential
objects
representation
Prior art date
Application number
PCT/EP2022/051475
Other languages
French (fr)
Inventor
Idan Zach
Assaf Natanzon
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2022/051475 priority Critical patent/WO2023138788A1/en
Publication of WO2023138788A1 publication Critical patent/WO2023138788A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion

Definitions

  • the present disclosure relates generally to the field of data management systems; and, more specifically, to a computer-implemented method of backing up a file-system onto an object storage system, a data management module and an object-based storage comprising the data management module.
  • a file-system is a computer data storage architecture that manages data as a collection of files and directories.
  • the directories allow a user to group the files into separate collections.
  • the directory structures may be either flat (i. e. , linear) or hierarchical (i. e. , nonlinear), where directories may contain sub-directories as well.
  • object storage system that manages data as objects
  • storage architectures such as the typical file-system, which manages the data as a file hierarchy.
  • NAS network attached storage
  • a method based on a built-in mechanism of the object storage system allows a user to use prefixes to organize the data that the user stores in the object storage system.
  • a prefix value is similar to a directory name that enables the user to group similar objects together in a bucket.
  • the use of prefix limits the results to only those keys that begin with the specified prefix.
  • the delimiter causes a list operation to roll up all the keys that share a common prefix into a single summary list result.
  • prefix and delimiter parameters The purpose of the prefix and delimiter parameters is to assist the user to organize and then, browse the keys hierarchically.
  • a list request with the delimiter allows the user to browse the hierarchy at just one level, while skipping over and summarizing the keys (approximately millions of keys) nested at deeper levels.
  • Another limitation of the method is how to maintain a consistent hierarchy when a change happens in the hierarchical representation. For example, if a directory moves to another directory, then, all the objects under the moved directory should be changed. The reason being the object name contains the full-path in the hierarchy. Moreover, the number of objects in a sub-tree can be huge (e.g., millions), which makes this method unreasonable.
  • the object storage providers have a minimal cost for each object, regardless of the object size therefore, the large number of objects raises the cost as well.
  • a technical problem of how to efficiently preserve a consistent hierarchical view of the typical file-system inside the object storage system there exists a technical problem of how to efficiently preserve a consistent hierarchical view of the typical file-system inside the object storage system.
  • the present disclosure provides a computer-implemented method of backing up a filesystem onto an object storage system, a data management module and an object-based storage comprising the data management module.
  • the present disclosure provides a solution to the existing problem of how to efficiently preserve a consistent hierarchical view of a typical file-system inside an object storage system.
  • An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in prior art, and provide an improved computer-implemented method of backing up a file-system onto an object storage system, an improved data management module and an improved objectbased storage comprising the improved data management module, for efficiently preserving the consistent hierarchical view of the typical file-system inside the object-based storage without an overhead of small size objects (too small) and large size objects (too large).
  • a computer-implemented method of backing up a file-system onto an object storage system comprises receiving a plurality of files from the file-system and a hierarchical representation for the file-system which includes the plurality of files and a plurality of directories.
  • the computer-implemented method further comprises assigning a sequential ID for each file and directory in the file-system and generating a representation of the file-system including an entry for each file and directory in the file-system, where each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory.
  • the computer- implemented method further comprises storing one or more file-system, FS, objects in the object storage system, where each FS object contains a plurality of entries of the representation and storing the plurality of files as one or more file objects in the object storage system.
  • the disclosed computer-implemented method efficiently preserves a consistent hierarchical view of the file-system inside the object storage system without an overhead of small size (e.g., too small) objects and large size (e.g., too large) objects.
  • the disclosed computer-implemented method provides an efficient way of backing up the filesystem onto the object storage system.
  • the entries are ordered firstly by the sequential ID of the parent directory and secondly by the sequential ID of the file or directory.
  • the ordering (e.g., linear ordering) of the entries enables a fast and an efficient searching of one or more FS objects in the object storage system.
  • the computer-implemented method further comprises, in response to a change to one or more elements of the file-system, searching the object storage system for one or more FS objects storing representations which include the changed elements.
  • the computer-implemented method further comprises, in response to a change to one or more elements of the file-system, identifying the sequential ID of each parent directory listed in each representation and generating one or more replacement representations by reading each identified parent directory in the file-system.
  • the computer- implemented method further comprises, in response to a change to one or more elements of the file-system, storing the replacement representations as replacement FS objects in the object storage system.
  • the disclosed method enables a fast and reliable changes to the one or more FS objects in the object storage system.
  • the representation is stored in a plurality of FS objects, where the sequential ID of the parent directory and the sequential ID of the file or directory form a key pair and name of each FS object is based on a key pair for the first entry.
  • the computer-implemented method further comprises dividing the representation stored in one FS object into two or more FS objects if the number of entries in the representation is greater than a predefined upper threshold.
  • the computer-implemented method further comprises combining an FS object with an adjacent FS object if the number of entries in the representation is less than a predefined lower threshold.
  • the combination of the FS object with the adjacent FS object reduces an overhead of small (e.g., too small) size objects.
  • searching the object storage system includes performing a binary search based on the FS object name.
  • the binary search based on the FS object name enables a fast searching of the object storage system.
  • the sequential ID of each parent directory is stored as FS object metadata, and identifying the sequential ID of each parent directory listed in each representation is based on the FS object metadata.
  • the FS object metadata enables a fast and relatively efficient way to re-generate a FS object.
  • the change to one or more elements of the file-system includes deletion or addition of a file or directory.
  • the change to one or more elements of the file-system includes moving a file or directory
  • searching the object storage system includes search for one or more FS objects storing representations which include the original location and the final location of the file or directory.
  • the computer-implemented method further comprises recording a plurality of changes to elements of the file-system in a log of changes and replacing the corresponding FS objects after a predetermined period of time.
  • the present disclosure provides a data management module comprising an input unit configured to receive a plurality of files from the file-system and a hierarchical representation for the file-system which includes the plurality of files and a plurality of directories.
  • the data management module further comprises a processing unit configured to assign a sequential ID for each file and directory in the file-system and generate a representation of the file-system including an entry for each file and directory in the filesystem, where each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory.
  • the data management module further comprises an object generation unit configured to store one or more file-system, FS, objects in the object storage system, where each FS object contains a plurality of entries of the representation and store the plurality of files as one or more file objects in the object storage system.
  • object generation unit configured to store one or more file-system, FS, objects in the object storage system, where each FS object contains a plurality of entries of the representation and store the plurality of files as one or more file objects in the object storage system.
  • the data management module achieves all the advantages and effects of the method of the present disclosure, after execution of the method.
  • the present disclosure provides an object-based storage comprising the data management module.
  • the object-based storage preserves a consistent hierarchical view of the file-system without an overhead of small size (e.g., too small) objects and large size (e.g., too large) objects.
  • the present disclosure provides a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method.
  • the processor achieves all the advantages and effects of the method after execution of the method.
  • FIG. 1 is a flowchart of a computer-implemented method of backing up a file-system onto an object storage system, in accordance with an embodiment of the present disclosure
  • FIG. 2 is a block diagram that illustrates various exemplary components of a data management module, in accordance with an embodiment of the present disclosure
  • FIG. 3 is a block diagram illustrates various exemplary components of an object-based storage, in accordance with an embodiment of the present disclosure
  • FIGs. 4A and 4B collectively illustrate an exemplary implementation scenario of backing up a file-system onto an object storage system, in accordance with an embodiment of the present disclosure
  • FIGs. 5 A and 5B collectively illustrate an exemplary implementation scenario of deleting a file from a file-system and updating a file-system (FS) object in an object-based storage, in accordance with an embodiment of the present disclosure
  • FIGs. 6A and 6B collectively illustrate an exemplary implementation scenario of adding a new file to a file-system and updating a FS object in an object-based storage, in accordance with an embodiment of the present disclosure
  • FIGs. 7A and 7B collectively illustrate an exemplary implementation scenario of moving a directory from one FS object to another FS object of an object-based storage, in accordance with an embodiment of the present disclosure.
  • FIGs. 8A and 8B collectively illustrate an exemplary implementation scenario to maintain an explicit list of directory IDs in each FS object of an object-based storage, in accordance with an embodiment of the present disclosure.
  • an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent.
  • a non-underlined number relates to an item identified by a line linking the nonunderlined number to the item.
  • the non-underlined number is used to identify a general item at which the arrow is pointing.
  • FIG. 1 is a flowchart of a computer-implemented method of backing up a file-system onto an object storage system, in accordance with an embodiment of the present disclosure.
  • a computer-implemented method 100 that includes steps 102-to-110.
  • the computer-implemented method 100 is executed by a data management module, described in detail, for example, in FIG. 2.
  • the computer-implemented method 100 efficiently preserves a consistent hierarchical view of a file-system inside an object storage system without an overhead of small size (e.g., too small) objects and large size (e.g., too large) objects.
  • the computer- implemented method 100 provides an efficient way of backing up a file-system onto an object storage system, that is described in the following steps.
  • the computer-implemented method 100 comprises receiving a plurality of files from the file-system and a hierarchical representation for the file-system which includes the plurality of files and a plurality of directories.
  • the plurality of files such as a file 1, file 2, file 3, and the like, are received from the file-system.
  • Each file of the plurality of files is arranged in a hierarchical order, for example, the file 1 may be arranged at an upper level than the file 2 and the file 3, and the file 2 and the file 3 may be arranged at the same level.
  • Such representation of the plurality of files represent the hierarchical representation of the file-system.
  • the hierarchical representation of the file-system also includes the plurality of directories, such as a Dir 1, Dir 2, and the like.
  • An exemplary implementation scenario of the hierarchical representation of the file-system is described in detail, for example, in FIG. 4A.
  • the computer-implemented method 100 further comprises assigning a sequential ID for each file and directory in the file-system.
  • the sequential ID e.g., 1, 2, 3, and the like
  • the sequential ID is assigned to each file of the plurality of files and each directory of the plurality of directories in the file-system.
  • the computer-implemented method 100 further comprises generating a representation of the file-system including an entry for each file and directory in the filesystem, where each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory.
  • a linear (e.g., tabular) representation of the file-system is generated that includes the entry for each file and directory in the file-system.
  • each file and directory in the file-system is linearly ordered based on the entry.
  • each entry includes the name of the file or directory, the sequential ID of the file or a sub-directory and the sequential ID of the parent directory that holds the file or the sub-directory.
  • any file/directory attributes can be attached to the entry (e.g., creation time, last-modified time, permissions, etc.).
  • the entries are ordered firstly by the sequential ID of the parent directory and secondly by the sequential ID of the file or directory.
  • Each of the entries generated for each file and directory is linearly ordered.
  • the linear ordering is based on the sequential ID of the parent directory, which is used as most significant bit (MSB) and the sequential ID of the file or directory, which is used as least significant bit (LSB).
  • the linear ordering may also be represented as “ ⁇ Directory ID (MSB), file ID (LSB)>”.
  • the computer-implemented method 100 further comprises storing one or more file-system, FS, objects in the object storage system, where each FS object contains a plurality of entries of the representation.
  • the plurality of entries are grouped into one or more file-system (FS) objects depending on a pre-defined object size. In this way, one or more FS objects are generated and stored in the object storage system.
  • FS file-system
  • the representation is stored in a plurality of FS objects, where the sequential ID of the parent directory and the sequential ID of the file or directory form a key pair and name of each FS object is based on a key pair for the first entry.
  • the representation including the plurality of entries for each file and directory is stored in the form of one or more FS objects, described in detail, for example, in FIG. 4B.
  • the sequential ID of the parent directory that holds the file or directory and the sequential ID of the file or directory form the key pair.
  • naming of the each of the one or more FS objects is based on the key pair of the first entry in each object, described in detail, for example, in FIG. 4B.
  • the computer-implemented method 100 further comprises storing the plurality of files as one or more file objects in the object storage system.
  • the plurality of files and the plurality of directories are stored in form of the one or more FS objects in the object storage system.
  • the computer-implemented method 100 further comprises in response to a change to one or more elements of the file-system, searching the object storage system for one or more FS objects storing representations which include the changed elements.
  • the computer-implemented method 100 further comprises in response to the change to one or more elements of the file-system, identifying the sequential ID of each parent directory listed in each representation and generating one or more replacement representations by reading each identified parent directory in the file-system.
  • the computer- implemented method 100 further comprises in response to the change to one or more elements of the file-system, storing the replacement representations as replacement FS objects in the object storage system.
  • the one or more FS objects are searched that includes the changed elements.
  • each parent directory included in each representation that is the hierarchical representation as well as linear representation of the file-system is identified that includes the changed elements.
  • the one or more replacement representations are generated in each identified parent directory and the generated one or more replacement representations are stored as replacement FS objects in the object storage system.
  • the change to one or more elements of the file-system is described in detail, for example, in FIGs. 5A-5B, 6A-6B, and 7A-7B.
  • the change to one or more elements of the file-system includes deletion or addition of a file or directory.
  • a file or a directory is either added or deleted from the filesystem.
  • the deletion of the file or the directory from the file-system is described in detail, for example, in FIGs. 5 A and 5B.
  • the addition of a new file or directory to the file-system is described in detail, for example, in FIGs. 6A and 6B.
  • the change to one or more elements of the file-system includes moving a file or directory
  • searching the object storage system includes search for one or more FS objects storing representations which include the original location and the final location of the file or directory.
  • the one or more FS objects are searched before and after the move of the file or directory.
  • the FS object that includes an entry corresponding to the file or directory that is moved to another FS obj ect, that entry is deleted from the F S obj ect.
  • the other F S obj ect where the file or directory is moved, an entry corresponding to the moved file or directory is generated.
  • the change to one or more elements of the file-system includes an update to the file or directory attributes only, i.e., no change to the hierarchical representation
  • searching the object storage system includes search for one or more FS objects storing representations which include the location of the file or directory.
  • searching the object storage system includes performing a binary search based on the FS object name.
  • the searching for the one or more FS objects in the object storage system is performed as the binary search using the key pair of the first entry in the FS object, which enables a fast searching in the object storage system.
  • the sequential ID of each parent directory is stored as FS object metadata, and identifying the sequential ID of each parent directory listed in each representation is based on the FS object metadata.
  • An explicit list of the sequential ID of each parent directory is stored as the FS object metadata inside the FS object. For example, in a case, if it is required to regenerate an object then, in such a case, the exact list of the sequential ID of each parent directory is available in form of the FS object metadata. Bringing the explicit list from the object in the object storage system does not require to read the entire object but just the meta data, which is a fast and relatively efficient operation.
  • the explicit list of the sequential ID of each parent directory may be stored as a FS object data inside the FS object. The explicit list of the sequential ID of each parent directory is described in detail, for example, in FIGs. 8A and 8B.
  • the computer-implemented method 100 further comprises dividing the representation stored in one FS object into two or more FS objects if the number of entries in the representation is greater than a predefined upper threshold.
  • a predefined upper threshold For example, the linear representation of the file-system is stored in form of one FS object in the object storage system. If a new file or directory is added to the file-system, thereafter, if the number of entries in the FS object become greater than the predefined upper threshold then, the number of entries in the FS object may be divided and stored into two or more FS objects.
  • the computer-implemented method 100 further comprises combining an FS object with an adjacent FS object if the number of entries in the representation is less than a predefined lower threshold.
  • the threshold may be, for example, a threshold number of entries or a threshold object size in bytes.
  • the linear representation of the file-system is stored in form of one FS object in the object storage system. If a file or directory is deleted from the file-system, thereafter, if the number of entries in the FS object become less than the predefined lower threshold then, the FS object may be combined with the adjacent (e.g., next or previous one) FS object. In some examples, the FS object may be combined with multiple consecutive adjacent small FS objects.
  • the computer-implemented method 100 further comprises recording a plurality of changes to elements of the file-system in a log of changes and replacing the corresponding FS objects after a predetermined period of time.
  • the plurality of changes to the elements (e.g., the plurality of files and the plurality of directories) of the file-system are stored in the log of changes in order to avoid frequent changes to large FS objects.
  • the plurality of changes are stored in the log of changes so that the corresponding FS objects are updated after the predetermined period of time in a fast and an efficient way.
  • the computer-implemented method 100 efficiently preserves the consistent hierarchical view of the file-system inside an object storage system without the overhead of small size objects (i. e. , too small) and large size objects (i.e., too large).
  • steps 102-to-110 are only illustrative and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
  • FIG. 2 illustrates various exemplary components of a data management module, in accordance with an embodiment of the present disclosure.
  • FIG. 2 is described in conjunction with elements from FIG. 1.
  • a data management module 200 that includes an input unit 202, a processing unit 204, an object generation unit 206, a memory 208 and a network interface 210.
  • a file-system 212 There is further shown a file-system 212.
  • the data management module 200 may include suitable logic, circuitry, interfaces, or code that is configured to efficiently manage the data of the file-system 212. Alternatively stated, the data management module 200 may be configured for efficiently preserving a consistent hierarchical view of the file-system 212 into an object storage system, which has linear representation. The data management module 200 may be configured to execute the computer-implemented method 100 (of FIG. 1). Additionally, the data management module 200 may include one or more data processing facilities for storing, processing and/or sharing the plurality of files and/or plurality of directories. Furthermore, the data management module 200 may include hardware, software, firmware or a combination of these, suitable for temporally storing and processing various information and services accessed by the one or more users using the one or more user equipments.
  • the input unit 202 may include suitable logic, circuitry, interfaces, or code that is configured to receive a plurality of files from the file-system 212. Examples of the input unit 202 may include, but are not limited to, a receiver, a receiver unit, and the like.
  • the processing unit 204 may include suitable logic, circuitry, interfaces, or code that is configured to assign a sequential ID for each file and directory in the file-system 212.
  • the processing unit 204 may also be configured to execute the instructions stored in the memory 208.
  • the processing unit 204 may be a general-purpose processor.
  • processing unit 204 may include, but is not limited to a processor, a central processing unit (CPU), a digital signal processor (DSP), a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application-specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a central processing unit (CPU), a control unit, a state machine, a data processing unit, a graphics processing unit (GPU), and other processors or control circuitry.
  • the processing unit 204 may refer to one or more individual processors, processing devices, a processing unit that is part of a machine, such as the data management module 200.
  • the object generation unit 206 may include suitable logic, circuitry, interfaces, or code that is configured to store one or more file-system (FS) objects in the object storage system.
  • FS file-system
  • the memory 208 may include suitable logic, circuitry, interfaces, or code that is configured to store data and the instructions executable by the processing unit 204.
  • the memory 208 may also be configured to comprise the object generation unit 206. Examples of implementation of the memory 208 may include, but are not limited to, an Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, Solid-State Drive (SSD), or CPU cache memory.
  • the memory 208 may store an operating system or other program products (including one or more operation algorithms) to operate the data management module 200.
  • the network interface 210 may include suitable logic, circuitry, interfaces, or code that is communicatively coupled with the input unit 202.
  • Examples of the network interface 210 may include, but are not limited to, a data terminal, a transceiver, a facsimile machine, a virtual server, and the like.
  • the input unit 202 is configured to receive a plurality of files from the filesystem 212 and a hierarchical representation for the file-system 212 which includes the plurality of files and a plurality of directories.
  • the file-system 212 may include the plurality of files and the plurality of directories, described in detail, for example, in FIG. 4A and 4B.
  • the plurality of files and the plurality of directories of the file-system 212 are arranged hierarchically. Therefore, the input unit 202 is configured to receive the hierarchical representation of the plurality of files and the plurality of directories of the file-system 212.
  • the processing unit 204 is configured to assign a sequential ID for each file and directory in the file-system 212 and generate a representation of the file-system 212 including an entry for each file and directory in the file-system 212, where each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory.
  • the sequential ID (e.g., 1, 2, 3, and so on) is assigned to each file and directory in the file-system 212. Thereafter, the entry corresponding to each file and directory is generated and arranged linearly.
  • the entry corresponding to each file and directory includes the name of the file or directory, the sequential ID of the file or directory and the sequential ID of the parent directory that holds the file or directory.
  • the arrangement of the entry corresponding to each file and directory is described in detail, for example, in FIGs. 4A and 4B.
  • the object generation unit 206 is configured to store one or more file-system (FS) objects in the object storage system, where each FS object contains a plurality of entries of the representation and store the plurality of files as one or more file objects in the object storage system.
  • the plurality of entries corresponding to the plurality of files and the plurality of directories of the file-system 212 are stored in form of one or more FS objects in the object storage system.
  • the plurality of files are stored as one or more Fs objects in the object storage system.
  • a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method.
  • the processor i.e., the processing unit 204) is configured to execute the computer-implemented method 100 (of FIG. 1).
  • the data management module 200 efficiently preserves the consistent hierarchical view of the file-system 212 inside an object storage system without the overhead of small size objects (i.e., too small) and large size objects (i.e., too large).
  • FIG. 3 illustrates various exemplary components of an object-based storage, in accordance with an embodiment of the present disclosure.
  • FIG. 3 is described in conjunction with elements from FIGs. 1 and 2.
  • an object-based storage 300 that comprises the data management module 200 (of FIG. 2).
  • the object-based storage 300 may include suitable logic, circuitry, interfaces, or code that is configured to store the data in form of one or more objects in contrast to a conventional data storage architecture like typical file-systems which manages the data as a file hierarchy.
  • the object-based storage 300 may also be referred to as an object storage system.
  • an object storage system allows retention of massive amounts of unstructured data and is used for purposes, such as storing photos, videos or files. Examples of the object storage system may include, but are not limited to, Amazon S3, Microsoft blob storage and google cloud storage, and the like.
  • the object-based storage 300 may also be referred to as backbone of cloud storage.
  • the object-based storage 300 enables public cloud storage providers to easily scale their infrastructure to exabyte scale, while keeping costs at minimum.
  • the object-based storage 300 may deliver high availability, extreme durability and has low overhead ratios to match backup value.
  • the object-based storage 300 may also grow cost-effectively to meet new organizational requirements across an enterprise. Since the object-based storage 300 comprises the data management module 200 (of FIG. 2), a hierarchy of the file-system 212 is maintained properly.
  • FIGs. 4A and 4B collectively illustrate an exemplary implementation scenario of backing up a file-system onto an object storage system, in accordance with an embodiment of the present disclosure.
  • FIGs. 4A and 4B are described in conjunction with elements from FIGs. 1, 2, and 3.
  • FIG. 4A there is shown a hierarchical representation 402 of the filesystem 212 (of FIG. 2).
  • FIG. 4B there is shown a linear representation 404 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3).
  • Each of the hierarchical representation 402 and the linear representation 404 of the file-system 212 is represented by a dashed box, which is used for illustration purpose only, and does not form a part of circuitry.
  • the file-system 212 includes a plurality of files, such as a file 1, file 2, file 3, file 4, and file 5.
  • the file-system 212 further includes a plurality of directories, such as a root directory (also represented as root), directory 1 (also represented as Dir 1), directory 2 (also represented as Dir 2), and directory 3 (also represented as Dir 1.1).
  • the hierarchical representation 402 of the file-system 212 represents how the plurality of files and the plurality of directories are arranged hierarchically.
  • a sequential ID is assigned to each file and directory in the file-system 212.
  • the root directory i.e., root
  • the root directory is assigned a sequential ID of 1.
  • each of the directory 1 i.e., Dir 1
  • the file 1 and the directory 2 i.e., Dir 2
  • each of the file 2, the directory 3 i.e., Dir 1.1
  • the file 3, the file 4, and the file 5 is assigned a sequential ID of 5, 6, 7, 8, and 9, respectively.
  • the file-system 212 with the hierarchical representation 402 is stored in the object-based storage 300, which has a linear representation, as shown in FIG. 4B.
  • the linear representation 404 of the file-system 212 in the object-based storage 300 preserves the consistent hierarchical view of the file-system 212 without an overhead of small size objects (i.e., too small) and large size objects (i.e., too large).
  • the linear representation 404 of the file-system 212 includes an entry for each of the plurality of files and the plurality of directories.
  • the entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory that holds the file or directory.
  • the entry for the root directory includes the name of the root directory as “Root”, the sequential ID of the root directory as “1” and the sequential ID of the parent directory that holds the root directory as “0”. Since the root directory is itself a parent directory, hence, the sequential ID of its parent directory is considered as “0”.
  • the entry for the directory 1 includes the name of the directory 1 as “Dir 1”, the sequential ID of the directory 1 as “2” and the sequential ID of the parent directory that holds the directory 1 as “1”.
  • the entry for the file 1 includes the name of the file 1 as “File 1”, the sequential ID of the file
  • the linear representation 404 of the file-system 212 including the entries for each of the plurality of files and plurality of directories is split into one or more file-system (FS) objects, such as an object 1 and object 2.
  • FS file-system
  • Each of the object 1 and the object 2 includes a plurality of entries, such as the object 1 includes four entries and the object 2 includes five entries of the linear representation 404.
  • the object 1 includes the entries of the root directory (i.e. , root), the directory 1 (i.e., Dir 1), the file 1 and the directory 2 (i.e., Dir 2).
  • the object 2 includes the entries of the file 2, the directory 3 (i.e., Dir 1.1), the file 3, the file 4 and the file 5.
  • Each of the object 1 and the object 2 is named using the key pair of the respective first entry.
  • the object 1 is named using the key pair of the root directory as “0.0” because the root directory lies at first in the object 1.
  • the object 1 is named using the key pair of the root directory as “0.0” because the root directory lies at first in the object 1.
  • the plurality of files of the file-system 212 is stored as the one or more filesystem objects in the object-based storage 300 while preserving the consistent hierarchical view of the file-system 212.
  • FIGs. 5A and 5B collectively illustrate an exemplary implementation scenario of deleting a file from a file-system and updating a file-system (FS) object in an object-based storage, in accordance with an embodiment of the present disclosure.
  • FIGs. 5A and 5B are described in conjunction with elements from FIGs. 1, 2, 3, 4A, and 4B.
  • FIG. 5A there is shown a hierarchical representation 502 of the file-system 212 (of FIG. 2).
  • FIG. 5B there is shown a linear representation 504 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3).
  • the hierarchical representation 502 of the file-system 212 is similar to the hierarchical representation 402 (of FIG. 4A) of the file-system 212 except that the file 3 (with the sequential ID of 7) of the plurality of files is deleted from the file-system 212
  • the deletion of the file 3 from the file-system 212 and update of the FS object in the object-based storage 300 is performed in three steps.
  • a first step one or more FS objects are searched in the object-based storage 300 that includes the deleted file (i.e., the file 3) in the linear representation 504.
  • the searching is performed in form of a binary search using the key pair that is comprised in name of the one or more FS objects.
  • the searching is performed either in a local (cache) table or by using the names of the one or more FS objects stored in the object-based storage 300.
  • each of the object 1 and the object 2 is searched with their respective names and the object 2 is found with the deleted file (i.e., the file 3).
  • the FS object i.e., the object 2
  • the FS object is updated that includes the deleted file (i.e., the file 3).
  • an object-based storage does not support modification of a FS object therefore, there are two options for updating the FS object.
  • a first option is, the FS object that includes the deleted file (i.e., the file 3) is fetched up from a conventional object-based storage. This approach is used conventionally because several drawbacks, such as bandwidth, cost of egress traffic from the conventional object storage, increased latency, and the like, are associated with this approach.
  • a second option is to re-generate the entire FS object (i.e., the object 2) locally by using the “ReadDir” application programming interfaces (APIs) provided by the file-system 212.
  • One “ReadDir” call is performed for each directory in the object-based storage 300. That means the sequential ID for each parent directory listed in the object-based storage 300 is identified. Limiting the maximum number of directories that can be packed together (e.g., up to 10) will limit the number of required calls. Additionally, an explicit list of directory IDs may also be stored in each FS object (i.e., the object 1 and the object 2).
  • the one or more replacement representations are generated in each identified parent directory and the generated one or more replacement representations are stored as replacement FS objects (or new FS objects) in the object-based storage 300.
  • a new FS object i.e., the object 2
  • the new FS object i.e., the object 2
  • the adjacent (e.g., next or previous one) FS object i.e., the object 1).
  • the deletion of the file (i.e., the file 3) and update (e.g., merging) of the FS objects is performed in the object-based storage 300.
  • FIGs. 6A and 6B collectively illustrate an exemplary implementation scenario of adding a new file to a file-system and updating a FS object in an object-based storage, in accordance with an embodiment of the present disclosure.
  • FIGs. 6A and 6B are described in conjunction with elements from FIGs. 1, 2, 3, 4A-4B, and 5A-5B.
  • FIG. 6A there is shown a hierarchical representation 602 of the file-system 212 (of FIG. 2).
  • FIG. 6B there is shown a linear representation 604 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3).
  • the hierarchical representation 602 of the file-system 212 is similar to the hierarchical representation 402 (of FIG. 4A) of the file-system 212 except that anew file (e.g., a file 6) is added to the file-system 212.
  • the new file i.e., file 6) is assigned a sequential ID of 10 in the hierarchical representation 602.
  • the new file i.e., the file 6) is added in the hierarchical representation 602 with a parent directory, such as the directory 2 (i.e., Dir 2).
  • the addition of the new file (i.e., the file 6) to the file-system 212 and update of the FS object in the object-based storage 300 is performed in three steps.
  • a first step one or more FS objects are searched in the object-based storage 300 that includes the new file (i.e., the file 6) in the linear representation 604.
  • the searching is performed in form of a binary search using the key pair that is comprised in name of the one or more FS objects.
  • the searching is performed either in a local (cache) table or by using the names of the one or more FS objects stored in the object-based storage 300.
  • each of the object 1 and the object 2 is searched with their respective names and the object 2 is found with the new file (i.e., the file 6), initially (i.e., before splitting of the object 2).
  • the FS object i.e., the object 2 is updated that includes the new file (i.e., the file 6).
  • an object-based storage does not support modification of a FS object therefore, there are two options for updating the FS object.
  • a first option is, the FS object that includes the new file (i.e., the file 6) is fetched up from a conventional object-based storage. This approach is used conventionally because several drawbacks, such as bandwidth, cost of egress traffic from the conventional object storage, increased latency, and the like, are associated with this approach.
  • a second option is to regenerate the entire FS object (i.e., the object 2) locally by using the “ReadDir” application programming interfaces (APIs) provided by the file-system 212.
  • One “ReadDir” call is performed for each directory in the object-based storage 300. That means the sequential ID for each parent directory listed in the object-based storage 300 is identified.
  • an explicit list of directory IDs may also be stored in each FS object (i. e. , the object 1 and the object 2). Thereafter, the one or more replacement representations are generated in each identified parent directory and the generated one or more replacement representations are stored as replacement FS objects (or new FS objects) in the object-based storage 300.
  • a FS object i.e., the object 2
  • the FS object is divided in two or more FS objects.
  • the FS object i.e., the object 2
  • a new object e.g., object 3
  • the new object i.e., object 3
  • the new object includes the new file (i.e., the file 6) and name of the new object is determined based on the key pair (i.e., 4. 10) of its first entry (i.e., the file 6).
  • the addition of the new file (i.e., the file 6) and update (e.g., split) of the FS objects is performed in the object-based storage 300.
  • FIGs. 7A and 7B collectively illustrate an exemplary implementation scenario of moving a directory from one FS object to another FS object of an object-based storage, in accordance with an embodiment of the present disclosure.
  • FIGs. 7A and 7B are described in conjunction with elements from FIGs. 1, 2, 3, 4A-4B, 5A-5B, and 6A-6B.
  • FIG. 7A there is shown a hierarchical representation 702 of the file-system 212 (of FIG. 2).
  • FIG. 7B there is shown a linear representation 706 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3).
  • the hierarchical representation 702 of the file-system 212 is similar to the hierarchical representation 402 (of FIG. 4A) of the file-system 212. Furthermore, the hierarchical representation 702 corresponds to a hierarchical representation before the move of a directory, such as the directory 1 (i.e., Dir 1) and its associated plurality of files, such as the file 2, file 4 and file 5 and the directory 3 (i.e., Dir 1.1), represented by a dashed box.
  • the other hierarchical representation 704 of the file-system 212 corresponds to a hierarchical representation after the move of the directory, such as the directory 1 (i.e., Dir 1) under the directory 2 (i.e., Dir 2).
  • the sequential ID of the parent directory that holds the directory 1 gets changed.
  • the sequential ID of the parent directory that holds the directory 1 is 1 that corresponds to the root directory (i.e., Root).
  • sequential ID of the parent directory that holds the directory 1 is 4 that corresponds to the directory 2 (i.e., Dir 2).
  • the moving of the directory form one FS object to the other FS object in the object-based storage 300 is performed in five steps.
  • one or more FS objects are searched in the object-based storage 300 that includes the directory 1 (i.e., Dir 1) before the move using the sequential ID of an old parent directory (i.e., root directory).
  • the searching is performed in form of a binary search using the key pair of the old position that is “ ⁇ Old parent ID (MSB), Dir ID (LSB)>”.
  • MSB Small parent ID
  • LSB Dir ID
  • the searching is performed either in a local (cache) table or by using the names of the one or more FS objects stored in the object-based storage 300.
  • each of the object 1 and the object 2 is searched with their respective names and the object 1 is found with an entry that corresponds to the directory 1 (i.e., Dir 1) with the sequential ID of “2” and the sequential ID of the old parent directory as “1”, before the move.
  • the entry that corresponds to the directory 1 i.e., Dir 1 is deleted from the object 1.
  • one or more FS objects are searched in the object-based storage 300 that includes the directory 1 (i.e., Dir 1) after the move using a sequential ID of a new parent directory (i.e., directory 2).
  • a new entry that corresponds to the directory 1 (i.e., Dir 1) along with the sequential ID of the new parent directory (i.e., directory 2) is added in the object 2.
  • each of the object 1 and object 2, which gets modified due to move of the directory 1 (i.e., Dir 1) from the object 1 to the object 2 is uploaded in the object-based storage 300.
  • FIGs. 8A and 8B collectively illustrate an exemplary implementation scenario to maintain an explicit list of directory IDs in each FS object of an object-based storage, in accordance with an embodiment of the present disclosure.
  • FIGs. 8A and 8B are described in conjunction with elements from FIGs. 1, 2, 3, 4A-4B, 5A-5B, 6A-6B, and 7A-7B.
  • FIG. 8A there is shown a hierarchical representation 802 of the file-system 212 (of FIG. 2).
  • FIG. 8B there is shown a linear representation 804 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3).
  • the hierarchical representation 802 of the file-system 212 is similar to the hierarchical representation 402 (of FIG. 4A) of the file-system 212.
  • the linear representation 804 of the file-system 212 is similar to the linear representation 404 (of FIG. 4B) except that the linear representation 804 includes an additional information in form of an explicit list of directory IDs stored in each FS object.
  • the one or more FS objects stored in the object-based storage 300 can be regenerated locally after a change (e.g., addition or deletion of movement of a file or a directory) by using multiple “ReadDir” calls on the local file-system.
  • One “ReadDir” call is used for one directory.
  • most of the file-systems supports “ReadDir” using merely the directory ID (without the requirement of providing the full-path), however, these file-systems do not support “ReadDir” by range of directory IDs that is all the directories IDs provided in a given range.
  • the solution for such file-systems is to store the explicit list of directory IDs inside each FS object in the object-based storage 300 in form of FS object metadata or FS object data or in a local cache table (if available). In such a way, if it is required to re-generate an object (or a FS object) and the exact list of directory IDs stored in the object or in the local (cache) table is available then, fetching the list of directory IDs from the object in the objectbased storage 300 does not require to read the object data but just the meta data, which is a fast and relatively efficient operation in contrast to bringing an entire object.
  • a pre-defined limit (e.g., 10) is used to limit the maximum number of different directories that are allowed to be packed together.
  • the object 1 in the linear representation 804 includes an explicit list of directory IDs as X-Dirs: 0,1.
  • object 2 in the linear representation 804 includes an explicit list of directory IDs as X-Dirs: 2, 4, 6.
  • a plurality of changes can be made to the plurality of files and the plurality of directories of the file-system 212 and the plurality of changes are stored in a log of changes in order to avoid frequent changes to large FS objects.
  • the plurality of changes is stored in the log of changes so that the FS objects are updated after a predetermined period of time. Since, every FS object (e.g., the object 1 and the object 2 of the linear representation 804 in the object-based storage 300) includes a plurality of entries, a single change in the file-system 212 requires to overwrite the entire object. Moreover, the object-based storage 300 does not support modifying of an object.
  • the log of changes can be used.
  • re-playing the log is just a step- by-step execution of the computer-implemented method 100.
  • an optional preprocessing of the logs allows to reduce the number of “ReadDir” calls by sorting the logs based on the directory ID and then coalescing into a single “ReadDir” call per directory.

Abstract

A method of backing up a file-system onto an object storage system. The method includes receiving a hierarchical representation of the file-system which includes a plurality of files and a plurality of directories. The method includes assigning a sequential ID for each file and directory in the file-system and generating a representation of the file-system including an entry for each file and directory in the file-system, where each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory. The method includes storing one or more file-system objects in the object storage system, and storing the plurality of files as one or more file objects in the object storage system. The method efficiently preserves a consistent hierarchical view of the file-system inside the object storage system.

Description

METHOD OF BACKING UP FILE-SYSTEM ONTO OBJECT STORGAE SYSTEM AND DATA MANAGEMENT MODULE
TECHNICAL FIELD
The present disclosure relates generally to the field of data management systems; and, more specifically, to a computer-implemented method of backing up a file-system onto an object storage system, a data management module and an object-based storage comprising the data management module.
BACKGROUND
Typically, a file-system is a computer data storage architecture that manages data as a collection of files and directories. The directories allow a user to group the files into separate collections. The directory structures may be either flat (i. e. , linear) or hierarchical (i. e. , nonlinear), where directories may contain sub-directories as well. Similarly, there is another computer data storage architecture, named as an object storage system that manages data as objects, in contrast to other storage architectures, such as the typical file-system, which manages the data as a file hierarchy. Nowadays, majority of organizations use the object storage system to back up their network attached storage (NAS) systems. However, the underlying architectures of the typical file-system and the object storage system are significantly different therefore, there persists a technical problem of how to consistently represent one of the main features of the typical file-system, such as a hierarchical representation, in the object storage system, which has a linear representation.
Currently, a few methods have been proposed for the consistent representation of the hierarchical view of the typical file-system inside the object storage system. For example, a method based on a built-in mechanism of the object storage system allows a user to use prefixes to organize the data that the user stores in the object storage system. A prefix value is similar to a directory name that enables the user to group similar objects together in a bucket. When a file is uploaded, the user can use prefixes to organize the data. The use of prefix limits the results to only those keys that begin with the specified prefix. The delimiter causes a list operation to roll up all the keys that share a common prefix into a single summary list result. The purpose of the prefix and delimiter parameters is to assist the user to organize and then, browse the keys hierarchically. However, there are certain limitations associated with the method. For example, a list request with the delimiter allows the user to browse the hierarchy at just one level, while skipping over and summarizing the keys (approximately millions of keys) nested at deeper levels. Another limitation of the method is how to maintain a consistent hierarchy when a change happens in the hierarchical representation. For example, if a directory moves to another directory, then, all the objects under the moved directory should be changed. The reason being the object name contains the full-path in the hierarchy. Moreover, the number of objects in a sub-tree can be huge (e.g., millions), which makes this method unreasonable. There is another method based on maintaining the hierarchy in a separated view, such that for each directory there is an object representing the directory. Additionally, each file or a directory has a unique identification (ID) in the typical file-system. There is an advantage associated with this method is that the hierarchy is maintained separately. That means when a directory moves to another directory then, the objects those should be changed, are minimum. However, there is a limitation associated with this method is that it generates many very small size objects in the object storage system. The reason being small directories are stored in small size objects (e.g., few KBs or even less). Therefore, huge number of objects in the object storage system may cause a significant performance degradation due to large index and high cost. The object storage providers have a minimal cost for each object, regardless of the object size therefore, the large number of objects raises the cost as well. Thus, there exists a technical problem of how to efficiently preserve a consistent hierarchical view of the typical file-system inside the object storage system.
Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with the conventional methods used for the consistent representation of the hierarchical view of the typical file-system inside the object storage system. SUMMARY
The present disclosure provides a computer-implemented method of backing up a filesystem onto an object storage system, a data management module and an object-based storage comprising the data management module. The present disclosure provides a solution to the existing problem of how to efficiently preserve a consistent hierarchical view of a typical file-system inside an object storage system. An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in prior art, and provide an improved computer-implemented method of backing up a file-system onto an object storage system, an improved data management module and an improved objectbased storage comprising the improved data management module, for efficiently preserving the consistent hierarchical view of the typical file-system inside the object-based storage without an overhead of small size objects (too small) and large size objects (too large).
The object of the present disclosure is achieved by the solutions provided in the enclosed independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.
According to an aspect of the present disclosure, there is provided a computer-implemented method of backing up a file-system onto an object storage system. The computer- implemented method comprises receiving a plurality of files from the file-system and a hierarchical representation for the file-system which includes the plurality of files and a plurality of directories. The computer-implemented method further comprises assigning a sequential ID for each file and directory in the file-system and generating a representation of the file-system including an entry for each file and directory in the file-system, where each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory. The computer- implemented method further comprises storing one or more file-system, FS, objects in the object storage system, where each FS object contains a plurality of entries of the representation and storing the plurality of files as one or more file objects in the object storage system.
The disclosed computer-implemented method efficiently preserves a consistent hierarchical view of the file-system inside the object storage system without an overhead of small size (e.g., too small) objects and large size (e.g., too large) objects. Alternatively stated, the disclosed computer-implemented method provides an efficient way of backing up the filesystem onto the object storage system.
In an implementation form, the entries are ordered firstly by the sequential ID of the parent directory and secondly by the sequential ID of the file or directory.
The ordering (e.g., linear ordering) of the entries enables a fast and an efficient searching of one or more FS objects in the object storage system.
In a further implementation form, the computer-implemented method further comprises, in response to a change to one or more elements of the file-system, searching the object storage system for one or more FS objects storing representations which include the changed elements. The computer-implemented method further comprises, in response to a change to one or more elements of the file-system, identifying the sequential ID of each parent directory listed in each representation and generating one or more replacement representations by reading each identified parent directory in the file-system. The computer- implemented method further comprises, in response to a change to one or more elements of the file-system, storing the replacement representations as replacement FS objects in the object storage system.
The disclosed method enables a fast and reliable changes to the one or more FS objects in the object storage system.
In a further implementation form, the representation is stored in a plurality of FS objects, where the sequential ID of the parent directory and the sequential ID of the file or directory form a key pair and name of each FS object is based on a key pair for the first entry.
In a further implementation form, the computer-implemented method further comprises dividing the representation stored in one FS object into two or more FS objects if the number of entries in the representation is greater than a predefined upper threshold.
The division of the representations stored in one FS object into two or more FS objects reduces an overhead of large (e.g., too large) size objects. In a further implementation form, the computer-implemented method further comprises combining an FS object with an adjacent FS object if the number of entries in the representation is less than a predefined lower threshold.
The combination of the FS object with the adjacent FS object reduces an overhead of small (e.g., too small) size objects.
In a further implementation form, searching the object storage system includes performing a binary search based on the FS object name.
The binary search based on the FS object name enables a fast searching of the object storage system.
In a further implementation form, the sequential ID of each parent directory is stored as FS object metadata, and identifying the sequential ID of each parent directory listed in each representation is based on the FS object metadata.
The FS object metadata enables a fast and relatively efficient way to re-generate a FS object.
In a further implementation form, the change to one or more elements of the file-system includes deletion or addition of a file or directory.
In a further implementation form, the change to one or more elements of the file-system includes moving a file or directory, and searching the object storage system includes search for one or more FS objects storing representations which include the original location and the final location of the file or directory.
In case of moving the file or directory from one FS object to another FS object, minimum number of changes are required.
In a further implementation form, the computer-implemented method further comprises recording a plurality of changes to elements of the file-system in a log of changes and replacing the corresponding FS objects after a predetermined period of time.
The plurality of changes to elements of the file-system are stored in the log of changes in order to avoid frequent changes of large file-system objects. In another aspect, the present disclosure provides a data management module comprising an input unit configured to receive a plurality of files from the file-system and a hierarchical representation for the file-system which includes the plurality of files and a plurality of directories. The data management module further comprises a processing unit configured to assign a sequential ID for each file and directory in the file-system and generate a representation of the file-system including an entry for each file and directory in the filesystem, where each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory. The data management module further comprises an object generation unit configured to store one or more file-system, FS, objects in the object storage system, where each FS object contains a plurality of entries of the representation and store the plurality of files as one or more file objects in the object storage system.
The data management module achieves all the advantages and effects of the method of the present disclosure, after execution of the method.
In a yet another aspect, the present disclosure provides an object-based storage comprising the data management module.
The object-based storage preserves a consistent hierarchical view of the file-system without an overhead of small size (e.g., too small) objects and large size (e.g., too large) objects.
In a yet another aspect, the present disclosure provides a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method.
The processor achieves all the advantages and effects of the method after execution of the method.
It is to be appreciated that all the aforementioned implementation forms can be combined.
It has to be noted that all devices, elements, circuitry, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.
Additional aspects, advantages, features and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative implementations construed in conjunction with the appended claims that follow.
BRIEF DESCRIPTION OF THE DRAWINGS
The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.
Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:
FIG. 1 is a flowchart of a computer-implemented method of backing up a file-system onto an object storage system, in accordance with an embodiment of the present disclosure;
FIG. 2 is a block diagram that illustrates various exemplary components of a data management module, in accordance with an embodiment of the present disclosure;
FIG. 3 is a block diagram illustrates various exemplary components of an object-based storage, in accordance with an embodiment of the present disclosure; FIGs. 4A and 4B collectively illustrate an exemplary implementation scenario of backing up a file-system onto an object storage system, in accordance with an embodiment of the present disclosure;
FIGs. 5 A and 5B collectively illustrate an exemplary implementation scenario of deleting a file from a file-system and updating a file-system (FS) object in an object-based storage, in accordance with an embodiment of the present disclosure;
FIGs. 6A and 6B collectively illustrate an exemplary implementation scenario of adding a new file to a file-system and updating a FS object in an object-based storage, in accordance with an embodiment of the present disclosure;
FIGs. 7A and 7B collectively illustrate an exemplary implementation scenario of moving a directory from one FS object to another FS object of an object-based storage, in accordance with an embodiment of the present disclosure; and
FIGs. 8A and 8B collectively illustrate an exemplary implementation scenario to maintain an explicit list of directory IDs in each FS object of an object-based storage, in accordance with an embodiment of the present disclosure.
In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the nonunderlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.
DETAILED DESCRIPTION OF EMBODIMENTS
The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible.
FIG. 1 is a flowchart of a computer-implemented method of backing up a file-system onto an object storage system, in accordance with an embodiment of the present disclosure. With reference to FIG. 1, there is shown a computer-implemented method 100 that includes steps 102-to-110. The computer-implemented method 100 is executed by a data management module, described in detail, for example, in FIG. 2.
The computer-implemented method 100 efficiently preserves a consistent hierarchical view of a file-system inside an object storage system without an overhead of small size (e.g., too small) objects and large size (e.g., too large) objects. Alternatively stated, the computer- implemented method 100 provides an efficient way of backing up a file-system onto an object storage system, that is described in the following steps.
At step 102, the computer-implemented method 100 comprises receiving a plurality of files from the file-system and a hierarchical representation for the file-system which includes the plurality of files and a plurality of directories. The plurality of files, such as a file 1, file 2, file 3, and the like, are received from the file-system. Each file of the plurality of files is arranged in a hierarchical order, for example, the file 1 may be arranged at an upper level than the file 2 and the file 3, and the file 2 and the file 3 may be arranged at the same level. Such representation of the plurality of files represent the hierarchical representation of the file-system. The hierarchical representation of the file-system also includes the plurality of directories, such as a Dir 1, Dir 2, and the like. An exemplary implementation scenario of the hierarchical representation of the file-system is described in detail, for example, in FIG. 4A.
At step 104, the computer-implemented method 100 further comprises assigning a sequential ID for each file and directory in the file-system. The sequential ID (e.g., 1, 2, 3, and the like) is assigned to each file of the plurality of files and each directory of the plurality of directories in the file-system.
At step 106, the computer-implemented method 100 further comprises generating a representation of the file-system including an entry for each file and directory in the filesystem, where each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory. For example, a linear (e.g., tabular) representation of the file-system is generated that includes the entry for each file and directory in the file-system. Alternatively stated, each file and directory in the file-system is linearly ordered based on the entry. Moreover, each entry includes the name of the file or directory, the sequential ID of the file or a sub-directory and the sequential ID of the parent directory that holds the file or the sub-directory. In some examples, in addition to the name, any file/directory attributes can be attached to the entry (e.g., creation time, last-modified time, permissions, etc.).
In accordance with an embodiment, the entries are ordered firstly by the sequential ID of the parent directory and secondly by the sequential ID of the file or directory. Each of the entries generated for each file and directory is linearly ordered. The linear ordering is based on the sequential ID of the parent directory, which is used as most significant bit (MSB) and the sequential ID of the file or directory, which is used as least significant bit (LSB). The linear ordering may also be represented as “<Directory ID (MSB), file ID (LSB)>”.
At step 108, the computer-implemented method 100 further comprises storing one or more file-system, FS, objects in the object storage system, where each FS object contains a plurality of entries of the representation. The plurality of entries are grouped into one or more file-system (FS) objects depending on a pre-defined object size. In this way, one or more FS objects are generated and stored in the object storage system.
In accordance with an embodiment, the representation is stored in a plurality of FS objects, where the sequential ID of the parent directory and the sequential ID of the file or directory form a key pair and name of each FS object is based on a key pair for the first entry. The representation including the plurality of entries for each file and directory is stored in the form of one or more FS objects, described in detail, for example, in FIG. 4B. Moreover, the sequential ID of the parent directory that holds the file or directory and the sequential ID of the file or directory form the key pair. And, naming of the each of the one or more FS objects is based on the key pair of the first entry in each object, described in detail, for example, in FIG. 4B.
At step 110, the computer-implemented method 100 further comprises storing the plurality of files as one or more file objects in the object storage system. The plurality of files and the plurality of directories are stored in form of the one or more FS objects in the object storage system.
In accordance with an embodiment, the computer-implemented method 100 further comprises in response to a change to one or more elements of the file-system, searching the object storage system for one or more FS objects storing representations which include the changed elements. The computer-implemented method 100 further comprises in response to the change to one or more elements of the file-system, identifying the sequential ID of each parent directory listed in each representation and generating one or more replacement representations by reading each identified parent directory in the file-system. The computer- implemented method 100 further comprises in response to the change to one or more elements of the file-system, storing the replacement representations as replacement FS objects in the object storage system. In response to the change to one or more elements of the file-system, the one or more FS objects are searched that includes the changed elements. Thereafter, the sequential ID of each parent directory included in each representation, that is the hierarchical representation as well as linear representation of the file-system is identified that includes the changed elements. Furthermore, the one or more replacement representations are generated in each identified parent directory and the generated one or more replacement representations are stored as replacement FS objects in the object storage system. The change to one or more elements of the file-system is described in detail, for example, in FIGs. 5A-5B, 6A-6B, and 7A-7B.
In accordance with an embodiment, the change to one or more elements of the file-system includes deletion or addition of a file or directory. In case of the change to one or more elements of the file-system, a file or a directory is either added or deleted from the filesystem. The deletion of the file or the directory from the file-system is described in detail, for example, in FIGs. 5 A and 5B. The addition of a new file or directory to the file-system is described in detail, for example, in FIGs. 6A and 6B.
In accordance with an embodiment, the change to one or more elements of the file-system includes moving a file or directory, and searching the object storage system includes search for one or more FS objects storing representations which include the original location and the final location of the file or directory. In case of moving the file or directory, the one or more FS objects are searched before and after the move of the file or directory. The FS object that includes an entry corresponding to the file or directory that is moved to another FS obj ect, that entry is deleted from the F S obj ect. The other F S obj ect where the file or directory is moved, an entry corresponding to the moved file or directory is generated. Therefore, in such a way, two FS objects are updated in the object storage system, in case of moving the file or directory. An exemplary implementation scenario is described in detail, for example, in FIGs. 7A and 7B. In accordance with an embodiment, the change to one or more elements of the file-system includes an update to the file or directory attributes only, i.e., no change to the hierarchical representation, and searching the object storage system includes search for one or more FS objects storing representations which include the location of the file or directory. The FS object that includes an entry corresponding to the file or directory that is updated, that entry is updated.
In accordance with an embodiment, searching the object storage system includes performing a binary search based on the FS object name. The searching for the one or more FS objects in the object storage system is performed as the binary search using the key pair of the first entry in the FS object, which enables a fast searching in the object storage system.
In accordance with an embodiment, the sequential ID of each parent directory is stored as FS object metadata, and identifying the sequential ID of each parent directory listed in each representation is based on the FS object metadata. An explicit list of the sequential ID of each parent directory is stored as the FS object metadata inside the FS object. For example, in a case, if it is required to regenerate an object then, in such a case, the exact list of the sequential ID of each parent directory is available in form of the FS object metadata. Bringing the explicit list from the object in the object storage system does not require to read the entire object but just the meta data, which is a fast and relatively efficient operation. Alternatively, in some examples, the explicit list of the sequential ID of each parent directory may be stored as a FS object data inside the FS object. The explicit list of the sequential ID of each parent directory is described in detail, for example, in FIGs. 8A and 8B.
In accordance with an embodiment, the computer-implemented method 100 further comprises dividing the representation stored in one FS object into two or more FS objects if the number of entries in the representation is greater than a predefined upper threshold. For example, the linear representation of the file-system is stored in form of one FS object in the object storage system. If a new file or directory is added to the file-system, thereafter, if the number of entries in the FS object become greater than the predefined upper threshold then, the number of entries in the FS object may be divided and stored into two or more FS objects.
In accordance with an embodiment, the computer-implemented method 100 further comprises combining an FS object with an adjacent FS object if the number of entries in the representation is less than a predefined lower threshold. The threshold may be, for example, a threshold number of entries or a threshold object size in bytes. For example, the linear representation of the file-system is stored in form of one FS object in the object storage system. If a file or directory is deleted from the file-system, thereafter, if the number of entries in the FS object become less than the predefined lower threshold then, the FS object may be combined with the adjacent (e.g., next or previous one) FS object. In some examples, the FS object may be combined with multiple consecutive adjacent small FS objects.
In accordance with an embodiment, the computer-implemented method 100 further comprises recording a plurality of changes to elements of the file-system in a log of changes and replacing the corresponding FS objects after a predetermined period of time. The plurality of changes to the elements (e.g., the plurality of files and the plurality of directories) of the file-system are stored in the log of changes in order to avoid frequent changes to large FS objects. Alternatively stated, the plurality of changes are stored in the log of changes so that the corresponding FS objects are updated after the predetermined period of time in a fast and an efficient way.
Thus, the computer-implemented method 100 efficiently preserves the consistent hierarchical view of the file-system inside an object storage system without the overhead of small size objects (i. e. , too small) and large size objects (i.e., too large).
The steps 102-to-110 are only illustrative and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
FIG. 2 illustrates various exemplary components of a data management module, in accordance with an embodiment of the present disclosure. FIG. 2 is described in conjunction with elements from FIG. 1. With reference to FIG. 2, there is shown a data management module 200 that includes an input unit 202, a processing unit 204, an object generation unit 206, a memory 208 and a network interface 210. There is further shown a file-system 212.
The data management module 200 may include suitable logic, circuitry, interfaces, or code that is configured to efficiently manage the data of the file-system 212. Alternatively stated, the data management module 200 may be configured for efficiently preserving a consistent hierarchical view of the file-system 212 into an object storage system, which has linear representation. The data management module 200 may be configured to execute the computer-implemented method 100 (of FIG. 1). Additionally, the data management module 200 may include one or more data processing facilities for storing, processing and/or sharing the plurality of files and/or plurality of directories. Furthermore, the data management module 200 may include hardware, software, firmware or a combination of these, suitable for temporally storing and processing various information and services accessed by the one or more users using the one or more user equipments.
The input unit 202 may include suitable logic, circuitry, interfaces, or code that is configured to receive a plurality of files from the file-system 212. Examples of the input unit 202 may include, but are not limited to, a receiver, a receiver unit, and the like.
The processing unit 204 may include suitable logic, circuitry, interfaces, or code that is configured to assign a sequential ID for each file and directory in the file-system 212. The processing unit 204 may also be configured to execute the instructions stored in the memory 208. In an example, the processing unit 204 may be a general-purpose processor. Other examples of the processing unit 204 may include, but is not limited to a processor, a central processing unit (CPU), a digital signal processor (DSP), a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application-specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a central processing unit (CPU), a control unit, a state machine, a data processing unit, a graphics processing unit (GPU), and other processors or control circuitry. Moreover, the processing unit 204 may refer to one or more individual processors, processing devices, a processing unit that is part of a machine, such as the data management module 200.
The object generation unit 206 may include suitable logic, circuitry, interfaces, or code that is configured to store one or more file-system (FS) objects in the object storage system.
The memory 208 may include suitable logic, circuitry, interfaces, or code that is configured to store data and the instructions executable by the processing unit 204. The memory 208 may also be configured to comprise the object generation unit 206. Examples of implementation of the memory 208 may include, but are not limited to, an Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, Solid-State Drive (SSD), or CPU cache memory. The memory 208 may store an operating system or other program products (including one or more operation algorithms) to operate the data management module 200.
The network interface 210 may include suitable logic, circuitry, interfaces, or code that is communicatively coupled with the input unit 202. Examples of the network interface 210 may include, but are not limited to, a data terminal, a transceiver, a facsimile machine, a virtual server, and the like.
In operation, the input unit 202 is configured to receive a plurality of files from the filesystem 212 and a hierarchical representation for the file-system 212 which includes the plurality of files and a plurality of directories. The file-system 212 may include the plurality of files and the plurality of directories, described in detail, for example, in FIG. 4A and 4B. The plurality of files and the plurality of directories of the file-system 212 are arranged hierarchically. Therefore, the input unit 202 is configured to receive the hierarchical representation of the plurality of files and the plurality of directories of the file-system 212.
The processing unit 204 is configured to assign a sequential ID for each file and directory in the file-system 212 and generate a representation of the file-system 212 including an entry for each file and directory in the file-system 212, where each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory. The sequential ID (e.g., 1, 2, 3, and so on) is assigned to each file and directory in the file-system 212. Thereafter, the entry corresponding to each file and directory is generated and arranged linearly. The entry corresponding to each file and directory includes the name of the file or directory, the sequential ID of the file or directory and the sequential ID of the parent directory that holds the file or directory. The arrangement of the entry corresponding to each file and directory is described in detail, for example, in FIGs. 4A and 4B.
The object generation unit 206 is configured to store one or more file-system (FS) objects in the object storage system, where each FS object contains a plurality of entries of the representation and store the plurality of files as one or more file objects in the object storage system. The plurality of entries corresponding to the plurality of files and the plurality of directories of the file-system 212 are stored in form of one or more FS objects in the object storage system. In this way, the plurality of files are stored as one or more Fs objects in the object storage system. In accordance with an embodiment, a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method. The processor (i.e., the processing unit 204) is configured to execute the computer-implemented method 100 (of FIG. 1).
Thus, the data management module 200 efficiently preserves the consistent hierarchical view of the file-system 212 inside an object storage system without the overhead of small size objects (i.e., too small) and large size objects (i.e., too large).
FIG. 3 illustrates various exemplary components of an object-based storage, in accordance with an embodiment of the present disclosure. FIG. 3 is described in conjunction with elements from FIGs. 1 and 2. With reference to FIG. 3, there is shown an object-based storage 300 that comprises the data management module 200 (of FIG. 2).
The object-based storage 300 may include suitable logic, circuitry, interfaces, or code that is configured to store the data in form of one or more objects in contrast to a conventional data storage architecture like typical file-systems which manages the data as a file hierarchy. The object-based storage 300 may also be referred to as an object storage system. Generally, an object storage system allows retention of massive amounts of unstructured data and is used for purposes, such as storing photos, videos or files. Examples of the object storage system may include, but are not limited to, Amazon S3, Microsoft blob storage and google cloud storage, and the like. The object-based storage 300 may also be referred to as backbone of cloud storage. Moreover, the object-based storage 300 enables public cloud storage providers to easily scale their infrastructure to exabyte scale, while keeping costs at minimum. In contrast to conventional backup systems, such as network attached storage systems or other backup appliances, the object-based storage 300 may deliver high availability, extreme durability and has low overhead ratios to match backup value. The object-based storage 300 may also grow cost-effectively to meet new organizational requirements across an enterprise. Since the object-based storage 300 comprises the data management module 200 (of FIG. 2), a hierarchy of the file-system 212 is maintained properly.
FIGs. 4A and 4B collectively illustrate an exemplary implementation scenario of backing up a file-system onto an object storage system, in accordance with an embodiment of the present disclosure. FIGs. 4A and 4B are described in conjunction with elements from FIGs. 1, 2, and 3. With reference to FIG. 4A, there is shown a hierarchical representation 402 of the filesystem 212 (of FIG. 2). With reference to FIG. 4B, there is shown a linear representation 404 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3). Each of the hierarchical representation 402 and the linear representation 404 of the file-system 212 is represented by a dashed box, which is used for illustration purpose only, and does not form a part of circuitry.
In the FIG. 4A, it is shown that the file-system 212 includes a plurality of files, such as a file 1, file 2, file 3, file 4, and file 5. The file-system 212 further includes a plurality of directories, such as a root directory (also represented as root), directory 1 (also represented as Dir 1), directory 2 (also represented as Dir 2), and directory 3 (also represented as Dir 1.1). The hierarchical representation 402 of the file-system 212 represents how the plurality of files and the plurality of directories are arranged hierarchically. Furthermore, a sequential ID is assigned to each file and directory in the file-system 212. For example, the root directory (i.e., root) is assigned a sequential ID of 1. Similarly, each of the directory 1 (i.e., Dir 1), the file 1 and the directory 2 (i.e., Dir 2) is assigned a sequential ID of 2, 3 and 4, respectively. Similarly, each of the file 2, the directory 3 (i.e., Dir 1.1), the file 3, the file 4, and the file 5 is assigned a sequential ID of 5, 6, 7, 8, and 9, respectively. The file-system 212 with the hierarchical representation 402 is stored in the object-based storage 300, which has a linear representation, as shown in FIG. 4B. The linear representation 404 of the file-system 212 in the object-based storage 300 preserves the consistent hierarchical view of the file-system 212 without an overhead of small size objects (i.e., too small) and large size objects (i.e., too large).
Now referring to FIG. 4B, the linear representation 404 of the file-system 212 includes an entry for each of the plurality of files and the plurality of directories. The entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory that holds the file or directory. In an example, the entry for the root directory includes the name of the root directory as “Root”, the sequential ID of the root directory as “1” and the sequential ID of the parent directory that holds the root directory as “0”. Since the root directory is itself a parent directory, hence, the sequential ID of its parent directory is considered as “0”. In another example, the entry for the directory 1 includes the name of the directory 1 as “Dir 1”, the sequential ID of the directory 1 as “2” and the sequential ID of the parent directory that holds the directory 1 as “1”. In yet another example, the entry for the file 1 includes the name of the file 1 as “File 1”, the sequential ID of the file
1 as “3” and the sequential ID of the parent directory that holds the file 1 as “1”. Similarly, the entry for each of the plurality of files and plurality of directories is generated. Thereafter, the linear representation 404 of the file-system 212 including the entries for each of the plurality of files and plurality of directories is split into one or more file-system (FS) objects, such as an object 1 and object 2. Each of the object 1 and the object 2 includes a plurality of entries, such as the object 1 includes four entries and the object 2 includes five entries of the linear representation 404. For example, the object 1 includes the entries of the root directory (i.e. , root), the directory 1 (i.e., Dir 1), the file 1 and the directory 2 (i.e., Dir 2). Similarly, the object 2 includes the entries of the file 2, the directory 3 (i.e., Dir 1.1), the file 3, the file 4 and the file 5. Each of the object 1 and the object 2 is named using the key pair of the respective first entry. For example, the object 1 is named using the key pair of the root directory as “0.0” because the root directory lies at first in the object 1. Similarly, the object
2 is named using the key pair of the file 2 as “2.5” because the file 2 lies at first in the object 2. In this way, the plurality of files of the file-system 212 is stored as the one or more filesystem objects in the object-based storage 300 while preserving the consistent hierarchical view of the file-system 212.
FIGs. 5A and 5B collectively illustrate an exemplary implementation scenario of deleting a file from a file-system and updating a file-system (FS) object in an object-based storage, in accordance with an embodiment of the present disclosure. FIGs. 5A and 5B are described in conjunction with elements from FIGs. 1, 2, 3, 4A, and 4B. With reference to FIG. 5A, there is shown a hierarchical representation 502 of the file-system 212 (of FIG. 2). With reference to FIG. 5B, there is shown a linear representation 504 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3).
With reference to FIG. 5A, the hierarchical representation 502 of the file-system 212 is similar to the hierarchical representation 402 (of FIG. 4A) of the file-system 212 except that the file 3 (with the sequential ID of 7) of the plurality of files is deleted from the file-system 212
Now referring to FIG. 5B, the deletion of the file 3 from the file-system 212 and update of the FS object in the object-based storage 300 is performed in three steps. In a first step, one or more FS objects are searched in the object-based storage 300 that includes the deleted file (i.e., the file 3) in the linear representation 504. The searching is performed in form of a binary search using the key pair that is comprised in name of the one or more FS objects. Alternatively stated, the searching is performed either in a local (cache) table or by using the names of the one or more FS objects stored in the object-based storage 300. For example, in the linear representation 504, each of the object 1 and the object 2 is searched with their respective names and the object 2 is found with the deleted file (i.e., the file 3). In a second step, the FS object (i.e., the object 2) is updated that includes the deleted file (i.e., the file 3). Generally, an object-based storage does not support modification of a FS object therefore, there are two options for updating the FS object. A first option is, the FS object that includes the deleted file (i.e., the file 3) is fetched up from a conventional object-based storage. This approach is used conventionally because several drawbacks, such as bandwidth, cost of egress traffic from the conventional object storage, increased latency, and the like, are associated with this approach. A second option is to re-generate the entire FS object (i.e., the object 2) locally by using the “ReadDir” application programming interfaces (APIs) provided by the file-system 212. One “ReadDir” call is performed for each directory in the object-based storage 300. That means the sequential ID for each parent directory listed in the object-based storage 300 is identified. Limiting the maximum number of directories that can be packed together (e.g., up to 10) will limit the number of required calls. Additionally, an explicit list of directory IDs may also be stored in each FS object (i.e., the object 1 and the object 2). Thereafter, the one or more replacement representations are generated in each identified parent directory and the generated one or more replacement representations are stored as replacement FS objects (or new FS objects) in the object-based storage 300. In a third step, in case that after the deletion of the file 3, if a new FS object (i.e., the object 2) includes the number of entries less than the predefined lower threshold then, in such a case, the new FS object (i.e., the object 2) is combined (or merged) with the adjacent (e.g., next or previous one) FS object (i.e., the object 1). In this way, the deletion of the file (i.e., the file 3) and update (e.g., merging) of the FS objects is performed in the object-based storage 300.
FIGs. 6A and 6B collectively illustrate an exemplary implementation scenario of adding a new file to a file-system and updating a FS object in an object-based storage, in accordance with an embodiment of the present disclosure. FIGs. 6A and 6B are described in conjunction with elements from FIGs. 1, 2, 3, 4A-4B, and 5A-5B. With reference to FIG. 6A, there is shown a hierarchical representation 602 of the file-system 212 (of FIG. 2). With reference to FIG. 6B, there is shown a linear representation 604 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3).
With reference to FIG. 6A, the hierarchical representation 602 of the file-system 212 is similar to the hierarchical representation 402 (of FIG. 4A) of the file-system 212 except that anew file (e.g., a file 6) is added to the file-system 212. The new file (i.e., file 6) is assigned a sequential ID of 10 in the hierarchical representation 602. Moreover, the new file (i.e., the file 6) is added in the hierarchical representation 602 with a parent directory, such as the directory 2 (i.e., Dir 2).
Now referring to FIG. 6B, the addition of the new file (i.e., the file 6) to the file-system 212 and update of the FS object in the object-based storage 300 is performed in three steps. In a first step, one or more FS objects are searched in the object-based storage 300 that includes the new file (i.e., the file 6) in the linear representation 604. The searching is performed in form of a binary search using the key pair that is comprised in name of the one or more FS objects. Alternatively stated, the searching is performed either in a local (cache) table or by using the names of the one or more FS objects stored in the object-based storage 300. If the searching is performed using the names of the one or more FS objects stored in the objectbased storage 300 then, there is no requirement to store a local table. For example, in the linear representation 604, each of the object 1 and the object 2 is searched with their respective names and the object 2 is found with the new file (i.e., the file 6), initially (i.e., before splitting of the object 2). In a second step, the FS object (i.e., the object 2) is updated that includes the new file (i.e., the file 6). Generally, an object-based storage does not support modification of a FS object therefore, there are two options for updating the FS object. A first option is, the FS object that includes the new file (i.e., the file 6) is fetched up from a conventional object-based storage. This approach is used conventionally because several drawbacks, such as bandwidth, cost of egress traffic from the conventional object storage, increased latency, and the like, are associated with this approach. A second option is to regenerate the entire FS object (i.e., the object 2) locally by using the “ReadDir” application programming interfaces (APIs) provided by the file-system 212. One “ReadDir” call is performed for each directory in the object-based storage 300. That means the sequential ID for each parent directory listed in the object-based storage 300 is identified. Limiting the maximum number of directories that can be packed together (e.g., up to 10) will limit the number of required calls. Additionally, an explicit list of directory IDs may also be stored in each FS object (i. e. , the object 1 and the object 2). Thereafter, the one or more replacement representations are generated in each identified parent directory and the generated one or more replacement representations are stored as replacement FS objects (or new FS objects) in the object-based storage 300. In a third step, in case that after the addition of the new file (i.e., the file 6), if a FS object (i.e., the object 2) includes the number of entries greater than the predefined upper threshold then, in such a case, the FS object (i.e., the object 2) is divided in two or more FS objects. For example, the FS object (i.e., the object 2) is divided into the object 2 and a new object (e.g., object 3). After division, the new object (i.e., object 3) includes the new file (i.e., the file 6) and name of the new object is determined based on the key pair (i.e., 4. 10) of its first entry (i.e., the file 6). In this way, the addition of the new file (i.e., the file 6) and update (e.g., split) of the FS objects is performed in the object-based storage 300.
FIGs. 7A and 7B collectively illustrate an exemplary implementation scenario of moving a directory from one FS object to another FS object of an object-based storage, in accordance with an embodiment of the present disclosure. FIGs. 7A and 7B are described in conjunction with elements from FIGs. 1, 2, 3, 4A-4B, 5A-5B, and 6A-6B. With reference to FIG. 7A, there is shown a hierarchical representation 702 of the file-system 212 (of FIG. 2). There is further shown another hierarchical representation 704 of the file-system 212. With reference to FIG. 7B, there is shown a linear representation 706 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3).
With reference to FIG. 7A, the hierarchical representation 702 of the file-system 212 is similar to the hierarchical representation 402 (of FIG. 4A) of the file-system 212. Furthermore, the hierarchical representation 702 corresponds to a hierarchical representation before the move of a directory, such as the directory 1 (i.e., Dir 1) and its associated plurality of files, such as the file 2, file 4 and file 5 and the directory 3 (i.e., Dir 1.1), represented by a dashed box. The other hierarchical representation 704 of the file-system 212 corresponds to a hierarchical representation after the move of the directory, such as the directory 1 (i.e., Dir 1) under the directory 2 (i.e., Dir 2). After movement of the directory 1 (i.e., Dir 1) from the root directory (i.e., Root) to the directory 2 (i.e., Dir 2), the sequential ID of the parent directory that holds the directory 1 (i.e., Dir 1) gets changed. Initially, before the movement, the sequential ID of the parent directory that holds the directory 1 (i.e., Dir 1) is 1 that corresponds to the root directory (i.e., Root). And after the movement, sequential ID of the parent directory that holds the directory 1 (i.e., Dir 1) is 4 that corresponds to the directory 2 (i.e., Dir 2).
Now referring to FIG. 7B, the moving of the directory form one FS object to the other FS object in the object-based storage 300 is performed in five steps. In a first step, one or more FS objects are searched in the object-based storage 300 that includes the directory 1 (i.e., Dir 1) before the move using the sequential ID of an old parent directory (i.e., root directory). The searching is performed in form of a binary search using the key pair of the old position that is “<Old parent ID (MSB), Dir ID (LSB)>”. Alternatively stated, the searching is performed either in a local (cache) table or by using the names of the one or more FS objects stored in the object-based storage 300. For example, in the linear representation 706, each of the object 1 and the object 2 is searched with their respective names and the object 1 is found with an entry that corresponds to the directory 1 (i.e., Dir 1) with the sequential ID of “2” and the sequential ID of the old parent directory as “1”, before the move. In a second step, the entry that corresponds to the directory 1 (i.e., Dir 1) is deleted from the object 1. In a third step, one or more FS objects are searched in the object-based storage 300 that includes the directory 1 (i.e., Dir 1) after the move using a sequential ID of a new parent directory (i.e., directory 2). In a fourth step, a new entry that corresponds to the directory 1 (i.e., Dir 1) along with the sequential ID of the new parent directory (i.e., directory 2) is added in the object 2. In a fifth step, each of the object 1 and object 2, which gets modified due to move of the directory 1 (i.e., Dir 1) from the object 1 to the object 2 is uploaded in the object-based storage 300.
FIGs. 8A and 8B collectively illustrate an exemplary implementation scenario to maintain an explicit list of directory IDs in each FS object of an object-based storage, in accordance with an embodiment of the present disclosure. FIGs. 8A and 8B are described in conjunction with elements from FIGs. 1, 2, 3, 4A-4B, 5A-5B, 6A-6B, and 7A-7B. With reference to FIG. 8A, there is shown a hierarchical representation 802 of the file-system 212 (of FIG. 2). With reference to FIG. 8B, there is shown a linear representation 804 of the file-system 212 (of FIG. 2) in the object-based storage 300 (of FIG. 3).
With reference to FIG. 8A, the hierarchical representation 802 of the file-system 212 is similar to the hierarchical representation 402 (of FIG. 4A) of the file-system 212. Now referring to FIG. 8B, the linear representation 804 of the file-system 212 is similar to the linear representation 404 (of FIG. 4B) except that the linear representation 804 includes an additional information in form of an explicit list of directory IDs stored in each FS object.
The one or more FS objects stored in the object-based storage 300 can be regenerated locally after a change (e.g., addition or deletion of movement of a file or a directory) by using multiple “ReadDir” calls on the local file-system. One “ReadDir” call is used for one directory. Although, most of the file-systems supports “ReadDir” using merely the directory ID (without the requirement of providing the full-path), however, these file-systems do not support “ReadDir” by range of directory IDs that is all the directories IDs provided in a given range. The solution for such file-systems is to store the explicit list of directory IDs inside each FS object in the object-based storage 300 in form of FS object metadata or FS object data or in a local cache table (if available). In such a way, if it is required to re-generate an object (or a FS object) and the exact list of directory IDs stored in the object or in the local (cache) table is available then, fetching the list of directory IDs from the object in the objectbased storage 300 does not require to read the object data but just the meta data, which is a fast and relatively efficient operation in contrast to bringing an entire object. In order to ensure that the explicit list of directory IDs does not get overflow, a pre-defined limit (e.g., 10) is used to limit the maximum number of different directories that are allowed to be packed together. For example, the object 1 in the linear representation 804 includes an explicit list of directory IDs as X-Dirs: 0,1. Similarly, object 2 in the linear representation 804 includes an explicit list of directory IDs as X-Dirs: 2, 4, 6.
Additionally, a plurality of changes can be made to the plurality of files and the plurality of directories of the file-system 212 and the plurality of changes are stored in a log of changes in order to avoid frequent changes to large FS objects. Alternatively stated, the plurality of changes is stored in the log of changes so that the FS objects are updated after a predetermined period of time. Since, every FS object (e.g., the object 1 and the object 2 of the linear representation 804 in the object-based storage 300) includes a plurality of entries, a single change in the file-system 212 requires to overwrite the entire object. Moreover, the object-based storage 300 does not support modifying of an object. In order to avoid frequent changes of one or more FS objects that includes the plurality of entries (i.e., multiple filesystem entries), the log of changes can be used. By using the log of changes, the small change events (i.e., event=<Set/Delete, Directory ID, File Id>) are stored in a log structure and once- in-a-while the log is re-played and then deleted. In general, re-playing the log is just a step- by-step execution of the computer-implemented method 100. However, an optional preprocessing of the logs allows to reduce the number of “ReadDir” calls by sorting the logs based on the directory ID and then coalescing into a single “ReadDir” call per directory.
Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments. The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". It is appreciated that certain features of the present disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the present disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.

Claims

1. A computer-implemented method (100) of backing up a file-system (212) onto an object storage system, comprising: receiving a plurality of files from the file-system (212) and a hierarchical representation for the file-system (212) which includes the plurality of files and a plurality of directories; assigning a sequential ID for each file and directory in the file-system (212); generating a representation of the file-system (212) including an entry for each file and directory in the file-system (212), wherein each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory; storing one or more file-system, FS, objects in the object storage system, wherein each FS object contains a plurality of entries of the representation; and storing the plurality of files as one or more file objects in the object storage system.
2. The computer-implemented method (100) of claim 1, wherein the entries are ordered firstly by the sequential ID of the parent directory and secondly by the sequential ID of the file or directory.
3. The computer-implemented method (100) of claim 1 or claim 2, further comprising, in response to a change to one or more elements of the file-system (212): searching the object storage system for one or more FS objects storing representations which include the changed elements; identifying the sequential ID of each parent directory listed in each representation; generating one or more replacement representations by reading each identified parent directory in the file-system (212); storing the replacement representations as replacement FS objects in the object storage system.
4. The computer-implemented method (100) of claim 3, wherein the representation is stored in a plurality of FS objects, wherein the sequential ID of the parent directory and the sequential ID of the file or directory form a key pair and name of each FS object is based on a key pair for the first entry.
5. The computer-implemented method (100) of claim 4, comprising dividing the representation stored in one FS object into two or more FS objects if the number of entries in the representation is greater than a predefined upper threshold.
6. The computer-implemented method (100) of claim 4 or claim 5, comprising combining an FS object with an adjacent FS object if the number of entries in the representation is less than a predefined lower threshold.
7. The computer-implemented method (100) of any one of claims 4 to 6, wherein searching the object storage system includes performing a binary search based on the FS object name.
8. The computer-implemented method (100) of any one of claims 3 to 7, wherein the sequential ID of each parent directory is stored as FS object metadata, and identifying the sequential ID of each parent directory listed in each representation is based on the FS object metadata.
9. The computer-implemented method (100) of any one of claims 3 to 8, wherein the change to one or more elements of the file-system (212) includes deletion or addition of a file or directory.
10. The computer-implemented method (100) of any one of claims 3 to 8, wherein the change to one or more elements of the file-system (212) includes moving a file or directory, and searching the object storage system includes search for one or more FS objects storing representations which include the original location and the final location of the file or directory.
11. The computer-implemented method (100) of any one of claims 3 to 10, comprising recording a plurality of changes to elements of the file-system (212) in a log of changes and replacing the corresponding FS objects after a predetermined period of time.
12. A data management module (200) comprising: an input unit (202) configured to receive a plurality of files from the file-system (212) and a hierarchical representation for the file-system (212) which includes the plurality of files and a plurality of directories; a processing unit (204) configured to assign a sequential ID for each file and directory in the file-system (212) and generate a representation of the file-system (212) including an entry for each file and directory in the file-system (212), wherein each entry includes a name of the file or directory, the sequential ID of the file or directory and the sequential ID of a parent directory which holds the file or directory; and an object generation unit (206) configured to store one or more file-system, FS, objects in the object storage system, wherein each FS object contains a plurality of entries of the representation and store the plurality of files as one or more file objects in the object storage system.
13. An object-based storage (300) comprising the data management module (200) of claim 12.
14. A computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method (100) of any preceding claim.
PCT/EP2022/051475 2022-01-24 2022-01-24 Method of backing up file-system onto object storgae system and data management module WO2023138788A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/051475 WO2023138788A1 (en) 2022-01-24 2022-01-24 Method of backing up file-system onto object storgae system and data management module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/051475 WO2023138788A1 (en) 2022-01-24 2022-01-24 Method of backing up file-system onto object storgae system and data management module

Publications (1)

Publication Number Publication Date
WO2023138788A1 true WO2023138788A1 (en) 2023-07-27

Family

ID=80168142

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/051475 WO2023138788A1 (en) 2022-01-24 2022-01-24 Method of backing up file-system onto object storgae system and data management module

Country Status (1)

Country Link
WO (1) WO2023138788A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150046502A1 (en) * 2012-12-06 2015-02-12 Netapp Inc. Migrating data from legacy storage systems to object storage systems
EP2840481A1 (en) * 2013-07-02 2015-02-25 Hitachi Data Systems Engineering UK Limited Apparatus and computer program product for virtualization of a file system
EP3731109A1 (en) * 2019-04-26 2020-10-28 Datadobi cvba Versioned backup on object addressable storage system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150046502A1 (en) * 2012-12-06 2015-02-12 Netapp Inc. Migrating data from legacy storage systems to object storage systems
EP2840481A1 (en) * 2013-07-02 2015-02-25 Hitachi Data Systems Engineering UK Limited Apparatus and computer program product for virtualization of a file system
EP3731109A1 (en) * 2019-04-26 2020-10-28 Datadobi cvba Versioned backup on object addressable storage system

Similar Documents

Publication Publication Date Title
US9830324B2 (en) Content based organization of file systems
US20200293547A1 (en) Atomic moves with lamport clocks in a content management system
US10346363B2 (en) Deduplicated file system
US10430398B2 (en) Data storage system having mutable objects incorporating time
US7860907B2 (en) Data processing
US8055864B2 (en) Efficient hierarchical storage management of a file system with snapshots
US9146930B2 (en) Method and apparatus for file storage
US20150347553A1 (en) Object Storage System with Local Transaction Logs, a Distributed Namespace, and Optimized Support for User Directories
US10817472B2 (en) Storage organization system with associated storage utilization values
JP7374232B2 (en) Content item sharing with context
GB2439577A (en) Storing data in streams of varying size
US20200250232A1 (en) Partial file system instances
US11403024B2 (en) Efficient restoration of content
US9110910B1 (en) Common backup format and log based virtual full construction
GB2440357A (en) Data storage method
US20230139297A1 (en) File lifetime tracking for cloud-based object stores
JP7424986B2 (en) Aggregated details displayed within the file browser interface
WO2023138788A1 (en) Method of backing up file-system onto object storgae system and data management module
EP4002143A1 (en) Storage of file system items related to a versioned snapshot of a directory-based file system onto a key-object storage system
US8886656B2 (en) Data processing
US9678979B1 (en) Common backup format and log based virtual full construction
US11436108B1 (en) File system agnostic content retrieval from backups using disk extents
US8024354B2 (en) System and method for managing data using a hierarchical metadata management system
US8290993B2 (en) Data processing
CN117215477A (en) Data object storage method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22702226

Country of ref document: EP

Kind code of ref document: A1