US11556495B2 - Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment - Google Patents
Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment Download PDFInfo
- Publication number
- US11556495B2 US11556495B2 US15/981,680 US201815981680A US11556495B2 US 11556495 B2 US11556495 B2 US 11556495B2 US 201815981680 A US201815981680 A US 201815981680A US 11556495 B2 US11556495 B2 US 11556495B2
- Authority
- US
- United States
- Prior art keywords
- file
- references
- keywords
- computer
- access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/185—Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/113—Details of archiving
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/164—File meta data generation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Definitions
- the present invention relates in general to the field of file storage management, and in particular to a method for file handling in a hierarchical storage environment and a corresponding hierarchical storage environment. Still more particularly, the present invention relates to a data processing program and a computer program product for file handling in a hierarchical storage environment.
- the technical area of this invention concerns the overall enhancement of file storage management. This is in an area of tremendous growth hence technologies increasing the effectiveness are very valuable.
- NAS network attached storage
- Tiered data storage is a technique which automatically moves data between high-cost and low-cost storage media based on rules and policies.
- a request to access data that was migrated from “higher” (faster) to “lower” (slower) level storage tier typically results in a longer response time; sometimes significantly longer response time. In some tiered storage environments this might involve recalling data from removable media such as tape. In such cases it usually takes several minutes, and even up to hours, before a user can access the data if all tapes drives are currently in use. For users trying to access this data, such long response times create a bad experience, and may be simply not acceptable in certain scenarios.
- One example is a student in a library trying to access a series of related articles and/or files via a console that have been moved to a lower level storage tier.
- Retrieving files in tiered storage environments may present several challenges. For instance, users expect immediate file access (and thus a good user experience), however it can take a long time to actually access the data due to the slower performance of lower level storage tiers or the slow performance associated with the recall process from removable media. Moreover, users often require access to multiple files, which may be content related independent from the actual storage location or storage tier; thus the overall response time may accumulate over multiple file access. Moreover still, while hierarchical storage environment or so called storage tiering may be used because of its cost effectiveness, it may not be economic to keep all data always in the highest level storage tier. Further, conventional solutions for file access in tiered storage environments may offer limited connection between a conventionally specified set (e.g. a prior art set) of metadata associated with a file and the probability that a file will be used again.
- a conventionally specified set e.g. a prior art set
- a computer-implemented method for file handling in a hierarchical storage environment includes performing a file access notification process for determining files related to the first file based on enhanced metadata and a priority list defining a likelihood of possible access, in response to receiving a file access notification corresponding to access of a first file.
- the related files are placed in a highest level storage tier, and the priority list is updated.
- a hierarchical storage environment includes at least two different storage tiers; and a content related tiering engine configured to perform the foregoing method.
- a computer program product includes a computer readable storage medium having program code embodied therewith, the program code executable by a computer to cause the computer to perform the foregoing method.
- FIG. 1 is a schematic block diagram of a network environment comprising network attached storage (NAS) system with a hierarchical storage environment, in accordance with one embodiment.
- NAS network attached storage
- FIG. 2 is a schematic diagram of the network attached storage (NAS) system with a hierarchical storage environment of FIG. 1 in greater detail, in accordance with one embodiment.
- NAS network attached storage
- FIG. 3 is a simplified database schema for a possible implementation of a content related tiering engine of the hierarchical storage environment of FIGS. 1 and 2 , in accordance with one embodiment.
- FIG. 4 is a schematic flow diagram of a file storage process being part of the method for file handling in a hierarchical storage environment, in accordance with one embodiment.
- FIG. 5 is a schematic flow diagram of a file retrieval process being part of the method for file handling in a hierarchical storage environment, in accordance with one embodiment.
- FIG. 6 is a schematic flow diagram of a file archiving process being part of the method for file handling in a hierarchical storage environment, in accordance with one embodiment.
- aspects of the present invention may be embodied as a system, a method, and/or a computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- tiering may be based on migration and placement policies using a prior art set of metadata.
- a simple example for such a policy would be to automatically move all data and/or files from a higher to a lower level storage tier if there was no access in a given timeframe.
- an example of a prior art set of metadata may comprise the date and time when the rule is evaluated, that is, the current date and time; date and time when the file was last accessed; date and time when the file was last modified; fileset name; file name or extension; file size; user identification and group identification.
- the actual storage tier the files reside on as well as the actual migration process of the data is transparent to the user. The user sees no change in the file directory view. However, in cases where the user wants to access a file being already migrated to tape, it might take up to 10-15 minutes to access it; sometimes even significantly longer.
- semantic storage extension uses extracted features of documents to define relations between them and also offers the possibility to specify additional knowledge by a domain expert.
- some embodiments of the disclosed hierarchical storage management systems disclosed herein focus on the content and metadata of the file only.
- various embodiments of the present invention introduce: systems, methods and computer program products for an inventive content related tiering engine in order to improve response time (user experience) in tiered network attached storage (NAS) and/or file storage environments; new file metadata concepts that permit content relationships between files, which are used to automatically migrate data between different storage tiers; and new systems, methods and computer program products to use the new metadata concepts to transparently manage the migration of data between different storage tiers.
- inventive content related tiering engine in order to improve response time (user experience) in tiered network attached storage (NAS) and/or file storage environments
- new file metadata concepts that permit content relationships between files, which are used to automatically migrate data between different storage tiers
- new systems, methods and computer program products to use the new metadata concepts to transparently manage the migration of data between different storage tiers.
- a method for file handling in a hierarchical storage environment comprising means for content scanning, content retrieving and content archiving, comprises the steps of: receiving a new document for storage triggers a new document notification process, which scans the received new document for a set of keywords and references; creates enhanced metadata for the new document including the scan result; stores the metadata in a file system; and evaluates the enhanced metadata by a relationship analyzing process resulting in a priority list which defines a likelihood of possible access; storing the new document based on a result of the priority list in a storage tier of the hierarchical storage environment.
- the set of keywords may comprise at least one of the following: keywords of a keywords section, authors, publisher, date or origin, and title.
- the set of references may comprise at least one of the following: references of a references section, hyperlinks, and other documents titles.
- the analyzing process may extend the enhanced metadata at least with one of the following information: user file access history and user search history.
- the priority list may comprise at least one of the following sub lists: a first sub list comprising relation and/or references between different objects, second sub list comprising most frequent and/or recent accessed objects, and a third sub list comprising user based relations between different objects.
- accessing a document may trigger a document access notification process that determines related documents based on the enhanced metadata and the priority list; places the related documents in a highest level storage tier; and updates the priority list; and the accessed document is retrieved.
- an archiving process may determine documents that can be archived in a lower lever storage tier based on the enhanced metadata and the priority list.
- a hierarchical storage environment comprises at least two different storage tiers and a file system with a content related tiering engine comprising means for content scanning, content retrieving and content archiving.
- the content related tiering engine triggers a new document notification process in response to a received new document for storage.
- the content scanning means scans the received new document for a set of keywords and references; creates enhanced metadata for the new document including the scan result; and stores the metadata in the content related tiering engine.
- the content related tiering engine evaluates the enhanced metadata by a relationship analyzing process resulting in a priority list which defines a likelihood of possible access; and stores the new document based on a result of the priority list in a storage tier of the hierarchical storage environment.
- the content related tiering engine may be implemented as a relational database management system to manage the enhanced metadata.
- the content retrieving means may trigger a document access notification process that determines related documents based on the enhanced metadata and the priority list; places the related documents in a highest level storage tier; and updates the priority list; and retrieves the accessed document.
- the content archiving means may run an archiving process to determine documents that can be archived in a lower lever storage tier based on the enhanced metadata and the priority list.
- a data processing program for execution in a data processing system comprises software code portions for performing a method for file handling in a hierarchical storage environment when the program is run on the data processing system.
- a computer program product stored on a computer-usable medium comprises computer-readable program means for causing a computer to perform a method for file handling in a hierarchical storage environment when the program is run on the computer.
- Various embodiments of the present invention are thus able to enhance the performance of file retrieval processes in network attached storage (NAS) and make the file tiering process and restore more effective.
- the main idea is to extend enhanced metadata with results of a relationship analyzing process, which utilizes the search and/or data access behavior of users, for example, and links this to the content of the files.
- user search, data access “behavior” and/or various metadata aspects are combined beyond what's available today. Accordingly, numerous embodiments of the present invention describe how the file retrieval, and in turn the archive process, can be optimized.
- various embodiments of the present invention introduce a content related tiering engine that will manage the tiering more effectively.
- the engine may link the user behavior to the file content.
- the latter may automatically be determined by scanning the actual content of the files (e.g. keywords and references).
- NAS network attached storage
- their metadata may be extended based on the outcome of the scan and the related user behavior.
- the content related tiering engine optimizes the management of file storage tiering.
- RDBMS Relational Database Management System
- Additional embodiments of the present invention may also be applied to other applications domains such as WAN (Wide Area Network) caching of data between “home” and “remote” locations of the WAN (Wide Area Network).
- WAN Wide Area Network
- FIG. 1 shows a network environment 1 comprising a network attached storage (NAS) system 7 with a hierarchical storage environment, in accordance with an embodiment of the present invention
- FIG. 2 shows the network attached storage (NAS) system with a hierarchical storage environment of FIG. 1 in greater detail, in accordance with an embodiment of the present invention
- FIG. 3 shows a simplified database schema for a possible implementation of a content related tiering engine 10 of the hierarchical storage environment of FIGS. 1 and 2 , in accordance with an embodiment of the present invention.
- the shown embodiment of the present invention employs a hierarchical storage environment comprising at least two different storage tiers 22 , 24 , 26 , 28 and a file system with a content related tiering engine 10 .
- the content related tiering engine 10 may include a processor and logic integrated with and/or executable by the processor, the logic being configured to perform one or more of the process steps recited herein.
- the processor has logic embedded therewith as hardware logic, such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), etc.
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- the logic is hardware logic; software logic such as firmware, part of an operating system, part of an application program; etc., or some combination of hardware and software logic that is accessible by the processor and configured to cause the processor to perform some functionality upon execution by the processor.
- Software logic may be stored on local and/or remote memory of any memory type, as known in the art. Any processor known in the art may be used, such as a software processor module and/or a hardware processor such as an ASIC, a FPGA, a central processing unit (CPU), an integrated circuit (IC), etc.
- the content related tiering engine 10 comprises means for content scanning 100 , content retrieving 300 and content archiving 400 . See e.g., FIG. 2 .
- the content related tiering engine 10 triggers a new document notification process S 100 , shown in FIG. 4 , in response to a received new document for storage.
- the content scanning means 100 scans the received new document for a set of keywords and references; creates enhanced metadata for the new document including the scan result; and stores the metadata in the content related tiering engine 10 .
- the content related tiering engine 10 evaluates the enhanced metadata by a relationship analyzing process resulting in a priority list which defines a likelihood of possible access; and stores the new document based on a result of the priority list in a storage tier 22 , 24 , 26 , 28 of the hierarchical storage environment.
- the shown embodiment of network environment 1 shows three users 5 connected to a network 3 and the network attached storage (NAS) system 7 with the hierarchical storage environment, in accordance with an embodiment of the present invention, comprising the content related tiering engine 10 and a file storage tiering architecture 20 comprising four storage tiers 22 , 24 , 26 , 28 .
- NAS network attached storage
- the files are migrated from very fast disk storage media of a first storage tier A, which is a very fast “Gold” disk storage media, to lower cost media with less performance and/or longer response time of a second storage tier B, which is a fast “Silver” disk storage media, a third storage tier C, which is a slower “Bronze” disk storage media and fourth storage tier D, which is a tape.
- the tiering is based on migration and placement policies using the enhanced metadata of the content related tiering engine 10 .
- the enhanced metadata includes, but is not limited to, the following information: keywords of a keywords section of the scanned document, authors, publisher, date or origin, title, references of references section of the scanned document, hyperlinks such as URL addresses, other documents titles, user file access history, and user search history.
- the content related tiering engine 10 manages the file storage using the means for content scanning 100 , file retrieval using the means for content retrieving 300 , and content related archiving using the means for content archiving 400 .
- the content retrieving means 300 triggers a document access notification process S 300 , shown in FIG. 5 , which determines related documents based on the enhanced metadata and the priority list; places the related documents in a highest level storage tier 22 ; and updates the priority list; and retrieves the accessed document.
- the content archiving means 400 runs an archiving process S 400 , shown in FIG. 6 , to determine documents that can be archived in a lower lever storage tier 24 , 26 , 28 based on the enhanced metadata and the priority list.
- RDBMS Relational Database Management System
- FIG. 3 A simplified example for a possible database model is shown in FIG. 3 also including the users 5 of the systems and their access and search history.
- the network attached storage (NAS) system 7 will execute the storage tiering based on the metadata definitions.
- the content related tiering engine 10 is implemented as relational database management system to manage the enhanced metadata.
- Each file (CRTE.FILES) is identified uniquely by an Identification ID (FILE_ID). This can be done by means of prior art such as generating a unique file hash.
- the list of all files in the system is maintained in a table within the Relational Database Management System (RDBMS).
- RDBMS Relational Database Management System
- the unique identifier is used to associate a file with the additional metadata maintained in additional tables (CRTE.Static_KEYWORDS; CRTE.KEYWORDS; CRTE.REFERENCES; CRTE.ACCESS_HISTORY; and CRTE.SEARCH_HISTORY).
- the content related tiering engine 10 uses four key methods.
- the content related tiering engine 10 uses management by keywords. This means, that each article or book; typically a file stored on network attached storage (NAS) system 7 , contains a set of keywords (strings) by the time it is stored on the file storage tiering architecture 20 . These keywords are added to the extended metadata of the file when it is initially stored in the network attached storage (NAS) system 7 .
- NAS network attached storage
- all other file objects that contain an identical key word set are automatically migrated in the highest level or “gold” storage tier A. Thus, they are already available in the faster storage tier 22 when the user 5 wants to access them and no further migrations or recalls are required.
- the content related tiering engine 10 uses management by references. This means, that each article or book contains a list of references. When user 5 requests an article or book, all cited articles are automatically migrated to the highest level or “gold” storage tier A. Thus, they are already available in the faster storage tier 22 when the user 5 wants to access them and no further migrations or recalls are required.
- the content related tiering engine 10 uses management by history. This means that the content related tiering engine 10 maintains the file access history for each user 5 . If a user 5 subsequently requests a document, the content related tiering engine 10 will identify users 5 that have accessed the same file and automatically retrieve documents according to the access history. For instance, reference is made to the following example, which is presented for illustration purposes only:
- the content related tiering engine 10 uses management by search. This means, that in case users 5 perform a document search each or the top hits document that is part of the search will be automatically migrated to the highest level or “gold” storage tier A as well. When a user 5 performs a search for a file the content related tiering engine 10 will identify the top rated search results. The top rated files will be migrated or recalled automatically.
- the above described content related tiering engine 10 can be used in network attached storage (NAS) products. This seems natural for object (files, documents, . . . ) as network attached storage (NAS) is managed on file level rather than block level which by itself has no notion of files.
- NAS network attached storage
- the content related tiering engine 10 may be part of the network attached storage (NAS) system 7 and, as such, transparent to the users 5 .
- the content related tiering engine 10 may be external to the network attached storage (NAS) system 7 .
- the content related tiering engine 10 may be shared between different network attached storage (NAS) systems 7 , which permits even more powerful usages of the enhanced metadata concept and the content tiering methods.
- FIG. 4 shows a file storage process being part of the method for file handling in a hierarchical storage environment, in accordance with an embodiment of the present invention
- FIG. 5 shows a file retrieval process being part of the method for file handling in a hierarchical storage environment, in accordance with an embodiment of the present invention
- FIG. 6 shows a schematic flow diagram of a file archiving process being part of the method for file handling in a hierarchical storage environment, in accordance with an embodiment of the present invention.
- FIG. 4 describes the file storage process started with step S 10 .
- the new file or document for storage is received by the network attached storage (NAS) system 7 in step S 20 the new file or document will be stored in step S 30 and in turn the process end will be acknowledged.
- NAS network attached storage
- a new file or document notification will be sent to the content related tiering engine 10 which performs a new document notification process in step S 100 .
- the content related tiering engine 10 scans the received new file or document for a set of keywords and references using means for content scanning 100 , for example.
- the content related tiering engine 10 performs a relationship analyzing process resulting in a priority list which defines a likelihood of possible access and creates enhanced metadata for the new document including the scan result.
- the metadata is stored in a file system, e.g. in the content related tiering engine 10 .
- the storing of the new document in a storage tier 22 , 24 , 26 , 28 of the hierarchical storage environment 1 is based on a result of the priority list.
- the notification of the content related tiering engine 10 may be created for example by using existing file system change notifications.
- One possible implementation is the “fschange” facility available on Linux operating systems.
- step 110 may be triggered by a “CREATE ⁇ filename>” event. This may trigger a document scan, e.g. keywords and references, and relations to other document will be created in step S 120 . These relations and metadata information may be stored, for example, in the content related tiering engine 10 .
- the metadata may be extended based on the content of the file.
- the set of keywords comprises keywords of a keywords section of the document, authors, publisher, date or origin, and title, for example.
- the set of references comprises references of a references section of the document, Hyper Links, and other documents titles.
- the analyzing process may extend the enhanced metadata with the user file access history and the user search history. For example, the analyzing process may combine the enhanced metadata with at least with one of: user file access history and user search history, such that the enhanced metadata then includes the user file access history and/or the user search history.
- FIG. 5 describes the file retrieval process started with step S 200 .
- a file or document access is received in step S 210 .
- a corresponding file or document is retrieved in step S 220 and migrated to the highest level tier A.
- a file or document access notification will be sent to the content related tiering engine 10 , which performs a document access notification process in step S 300 .
- the content related tiering engine 10 evaluates related files or documents in step S 310 using the content retrieving means 300 , for example.
- the related files or documents will then be migrated in the highest level tier A in step S 320 .
- the content related tiering engine 10 updates the first sub list 12 or “Relation List”, the second sub list 14 or “Hot List”, and the third sub list 16 or “Also List” of the priority list, respectively.
- FIG. 6 describes the content related archiving process started in step S 400 .
- the content related tiering engine 10 determines files or documents that can be archived in a lower lever storage tier B, C, D based on the enhanced metadata and the priority list using the content archiving means 400 .
- the enhanced content related metadata tables (CRTE.Static_KEYWORDS; CRTE.KEYWORDS; CRTE.REFERENCES; CRTE.ACCESS_HISTORY; and CRTE.SEARCH_HISTORY) of the Relational Database Management System (RDBMS) will be used to identify “cold” documents that can be archived in lower level storage tiers B, C, D.
- RDBMS Relational Database Management System
- the concepts and methods described in this invention can also be applied to WAN (Wide Area Network) caching environments.
- files are cached from a central (home) location to a remote (cache) location.
- Typical applications of this technique are branch offices which would comprise such a remote cache for the data.
- the data (files) are copied (synchronized) between the home and cache locations based on certain rules and policies.
- the concept and methods described in the context of storage tiering in network attached storage (NAS) systems can be applied in the same manner to such WAN caching environments.
- the new metadata concepts are used in addition, in order to synchronize data (files) between home and cache locations.
- One example is to also synchronize all references objects from home to cache. The probability of users accessing referenced objects from the cache which are automatically already made available in the cache will improve response/access time.
- the feature of File-List based migration/recall can be used to optimize the overall process performance. This feature optimizes the sequence of objects to be restored/migrate based on tapes and the actual position on each tape.
- Embodiments of the present invention propose a new approach for managing file storage tiering based on the actual file content and access patterns. While this is illustrated by the usage in a network attached storage (NAS) environment, the concept can be embodied in other systems and application as well.
- NAS network attached storage
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Abstract
Description
-
- (1) User “Smith” retrieved document “one”, “two” and “three”.
- (2) The content related
tiering engine 10 stores this information as part of the access history. - (3) At a later time user “Brown” or “Miller” retrieves document “one”.
- (4) The content related
tiering engine 10 scans the access history for access to document “one” and automatically retrieves “two” and “three” for user “Brown” or “Miller”.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/981,680 US11556495B2 (en) | 2013-04-09 | 2018-05-16 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1306393.8A GB2512856A (en) | 2013-04-09 | 2013-04-09 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
GB1306393.8 | 2013-04-09 | ||
US14/229,553 US9575989B2 (en) | 2013-04-09 | 2014-03-28 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US15/404,085 US10055416B2 (en) | 2013-04-09 | 2017-01-11 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US15/981,680 US11556495B2 (en) | 2013-04-09 | 2018-05-16 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/404,085 Continuation US10055416B2 (en) | 2013-04-09 | 2017-01-11 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180260399A1 US20180260399A1 (en) | 2018-09-13 |
US11556495B2 true US11556495B2 (en) | 2023-01-17 |
Family
ID=48483592
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/229,553 Active 2035-07-30 US9575989B2 (en) | 2013-04-09 | 2014-03-28 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US15/404,085 Active US10055416B2 (en) | 2013-04-09 | 2017-01-11 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US15/981,680 Active 2036-10-08 US11556495B2 (en) | 2013-04-09 | 2018-05-16 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/229,553 Active 2035-07-30 US9575989B2 (en) | 2013-04-09 | 2014-03-28 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US15/404,085 Active US10055416B2 (en) | 2013-04-09 | 2017-01-11 | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
Country Status (3)
Country | Link |
---|---|
US (3) | US9575989B2 (en) |
DE (1) | DE102014104971A1 (en) |
GB (1) | GB2512856A (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2512856A (en) | 2013-04-09 | 2014-10-15 | Ibm | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US10671565B2 (en) * | 2015-04-24 | 2020-06-02 | Quest Software Inc. | Partitioning target data to improve data replication performance |
US9658794B2 (en) | 2015-05-08 | 2017-05-23 | Sonicwall Inc. | Two stage memory allocation using a cache |
US10621102B2 (en) * | 2017-03-23 | 2020-04-14 | International Business Machines Corporation | Managing digital datasets on a multi-tiered storage system based on predictive caching |
US10705767B2 (en) * | 2017-07-20 | 2020-07-07 | International Business Machines Corporation | Optimizing user satisfaction when training a cognitive hierarchical storage-management system |
US11106637B2 (en) * | 2019-05-20 | 2021-08-31 | 5Th Kind, Inc. | Metadata-driven tiered storage |
US11340964B2 (en) * | 2019-05-24 | 2022-05-24 | International Business Machines Corporation | Systems and methods for efficient management of advanced functions in software defined storage systems |
US11573933B2 (en) * | 2019-11-14 | 2023-02-07 | Box, Inc. | Methods and systems for identifying and retrieving hierarchically related files |
US11451615B1 (en) * | 2021-08-23 | 2022-09-20 | Red Hat, Inc. | Probabilistic per-file images preloading |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060004847A1 (en) | 2004-07-01 | 2006-01-05 | Claudatos Christopher H | Content-driven information lifecycle management |
US20060010169A1 (en) | 2004-07-07 | 2006-01-12 | Hitachi, Ltd. | Hierarchical storage management system |
US20070192385A1 (en) * | 2005-11-28 | 2007-08-16 | Anand Prahlad | Systems and methods for using metadata to enhance storage operations |
US7779097B2 (en) | 2000-09-07 | 2010-08-17 | Sonic Solutions | Methods and systems for use in network management of content |
US20120331021A1 (en) * | 2011-06-24 | 2012-12-27 | Quantum Corporation | Synthetic View |
EP2551783A1 (en) | 2011-07-27 | 2013-01-30 | Verint Systems Limited | System and method for information lifecycle management of investigation cases |
US20140304309A1 (en) | 2013-04-09 | 2014-10-09 | International Business Machines Corporation | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US20200137150A1 (en) * | 2007-08-29 | 2020-04-30 | Oracle International Corporation | Method and system for selecting a storage node based on a distance from a requesting device |
-
2013
- 2013-04-09 GB GB1306393.8A patent/GB2512856A/en not_active Withdrawn
-
2014
- 2014-03-28 US US14/229,553 patent/US9575989B2/en active Active
- 2014-04-08 DE DE102014104971.3A patent/DE102014104971A1/en active Pending
-
2017
- 2017-01-11 US US15/404,085 patent/US10055416B2/en active Active
-
2018
- 2018-05-16 US US15/981,680 patent/US11556495B2/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7779097B2 (en) | 2000-09-07 | 2010-08-17 | Sonic Solutions | Methods and systems for use in network management of content |
US20060004847A1 (en) | 2004-07-01 | 2006-01-05 | Claudatos Christopher H | Content-driven information lifecycle management |
US20060010169A1 (en) | 2004-07-07 | 2006-01-12 | Hitachi, Ltd. | Hierarchical storage management system |
US20070192385A1 (en) * | 2005-11-28 | 2007-08-16 | Anand Prahlad | Systems and methods for using metadata to enhance storage operations |
US20200137150A1 (en) * | 2007-08-29 | 2020-04-30 | Oracle International Corporation | Method and system for selecting a storage node based on a distance from a requesting device |
US20120331021A1 (en) * | 2011-06-24 | 2012-12-27 | Quantum Corporation | Synthetic View |
EP2551783A1 (en) | 2011-07-27 | 2013-01-30 | Verint Systems Limited | System and method for information lifecycle management of investigation cases |
US20140304309A1 (en) | 2013-04-09 | 2014-10-09 | International Business Machines Corporation | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US9575989B2 (en) | 2013-04-09 | 2017-02-21 | International Business Machines Corporation | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US20170124095A1 (en) | 2013-04-09 | 2017-05-04 | International Business Machines Corporation | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
US10055416B2 (en) | 2013-04-09 | 2018-08-21 | International Business Machines Corporation | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment |
Non-Patent Citations (14)
Title |
---|
Diederich et al., U.S. Appl. No. 14/229,553, filed Mar. 28, 2014. |
Diederich et al., U.S. Appl. No. 15/404,085, dated Jan. 11, 2017. |
Final Office Action from U.S. Appl. No. 15/404,085, dated Dec. 29, 2017. |
Gantz et al., "Extracting Value from Chaos," IDC-IVIEW, Jun. 2011, pp. 1-12. |
Gantz et al., "The Digital Universe Decade—Are You Ready?" IDC-IVIEW, May 2010, pp. 1-16. |
Non-Final Office Action from U.S. Appl. No. 15/404,085, dated Jun. 19, 2017. |
Notice of Allowance from U.S. Appl. No. 14/229,553, dated Oct. 6, 2016. |
Notice of Allowance from U.S. Appl. No. 15/404,085, dated Apr. 17, 2018. |
Office Action from German Patent Application No. 10 2014104 971.3, dated Jan. 12, 2021. |
Ragnet et al., "Beyond Document 2.0: The Future of Documents," Xerox Global Services, Jan. 2007, pp. 1-11. |
Schroder et al., "A Semantic Extension of a Hierarchical Storage Management System for Small and Medium-sized Enterprises," Proceedings of the 1st International Workshop on Semantic Digital Archives (SDA 2011 ), pp. 23-36. |
United Kingdom Search Report from Application No. GB1306393.8 dated Oct. 16, 2013. |
Wikipedia, "Hierarchical storage management (tiered storage)," Wikipedia definition, http://en.wikipedia.org/wiki/Tiered_storage, last modified on Mar. 27, 2014, pp. 1-3. |
Wikipedia, "Network-attached storage (NAS)," Wikipedia definition, http://en.wikipedia.org/wiki/Network-attached_storage, last modified on Mar. 25, 2014 pp. 1-6. |
Also Published As
Publication number | Publication date |
---|---|
GB2512856A (en) | 2014-10-15 |
DE102014104971A1 (en) | 2014-10-09 |
GB201306393D0 (en) | 2013-05-22 |
US20170124095A1 (en) | 2017-05-04 |
US20140304309A1 (en) | 2014-10-09 |
US10055416B2 (en) | 2018-08-21 |
US20180260399A1 (en) | 2018-09-13 |
US9575989B2 (en) | 2017-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11556495B2 (en) | Method for file handling in a hierarchical storage environment and corresponding hierarchical storage environment | |
US11093466B2 (en) | Incremental out-of-place updates for index structures | |
US9830342B2 (en) | Optimizing database deduplication | |
US9195667B2 (en) | System for on-line archiving of content in an object store | |
US10089338B2 (en) | Method and apparatus for object storage | |
US9239690B2 (en) | System and method for in-place data migration | |
US10769117B2 (en) | Effective handling of HSM migrated files and snapshots | |
US20160147751A1 (en) | Generating an index for a table in a database background | |
US10417197B2 (en) | Implementing a secondary storage dentry cache | |
US20140358868A1 (en) | Life cycle management of metadata | |
US11392545B1 (en) | Tracking access pattern of inodes and pre-fetching inodes | |
WO2018064319A1 (en) | Tracking access pattern of inodes and pre-fetching inodes | |
US11500835B2 (en) | Cohort management for version updates in data deduplication | |
US9483469B1 (en) | Techniques for optimizing disk access | |
US10235293B2 (en) | Tracking access pattern of inodes and pre-fetching inodes | |
US9430513B1 (en) | Methods and apparatus for archiving system having policy based directory rename recording | |
JP5276391B2 (en) | Intelligent content indexing technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DIEDERICH, MICHAEL;MUEHGE, THORSTEN;RUEGER, ERIK;AND OTHERS;SIGNING DATES FROM 20140325 TO 20140328;REEL/FRAME:046183/0627 Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DIEDERICH, MICHAEL;MUEHGE, THORSTEN;RUEGER, ERIK;AND OTHERS;SIGNING DATES FROM 20140325 TO 20140328;REEL/FRAME:046183/0627 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |