WO2021034329A1 - Signatures d'ensemble de données pour gestion de stockage entraînée par impact de données - Google Patents

Signatures d'ensemble de données pour gestion de stockage entraînée par impact de données Download PDF

Info

Publication number
WO2021034329A1
WO2021034329A1 PCT/US2019/047699 US2019047699W WO2021034329A1 WO 2021034329 A1 WO2021034329 A1 WO 2021034329A1 US 2019047699 W US2019047699 W US 2019047699W WO 2021034329 A1 WO2021034329 A1 WO 2021034329A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data set
business
metadata
operational
Prior art date
Application number
PCT/US2019/047699
Other languages
English (en)
Inventor
David Reiner
Original Assignee
Futurewei Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Futurewei Technologies, Inc. filed Critical Futurewei Technologies, Inc.
Priority to CN201980099580.5A priority Critical patent/CN114586022A/zh
Priority to PCT/US2019/047699 priority patent/WO2021034329A1/fr
Publication of WO2021034329A1 publication Critical patent/WO2021034329A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation

Definitions

  • the present disclosure pertains to the field of information processing systems.
  • the present disclosure relates to the aggregation of business related metadata and operational metadata describing a data set at an information processing system.
  • a data management system such as an on-premises storage system, a cloud based storage system, an object storage system, or any other data repository, such that business applications can access this data to perform business related tasks or services.
  • the large amounts of data are typically stored at the data management system in a manner that optimizes the use of the storage facilities and hardware available at the data management system.
  • a data management system is configured to store data in a manner that facilitates flexible and efficient utilization of the stored data to support the processing of various tasks that are to be performed on the data.
  • the data management system may store data based on attributes of the data, such as image data or sensor data.
  • a first aspect of the present disclosure relates to a method performed by a data management system.
  • the method comprises storing, by a data set signature mapper of a data management system, operational metadata describing a data set, the operational metadata and the data set being stored in a memory of the data management system, receiving, by the data set signature mapper of data management system, business related metadata describing the data set from a business environment, the business environment being external to the data management system, storing, by the data set signature mapper of data management system, a data set signature comprising a combination of the operational metadata and the business related metadata, and performing, by a policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature.
  • the operational metadata comprises at least one of an operational name, an operational type, an operational structure, an operational size, an operational security class or tag, an operational timestamp, an operational content type, an operational content class or tag, an operational index existence, an operational source, an operational version status, an operational encryption class, an operational encoding class, an operational movement state, an operational access class, an operational access heat class, an operational access skew, an operational access lumpiness, an operational local entropy, an operational global entropy, or an operational relationship.
  • the operational metadata comprises at least one of an operational name, an operational type, an operational structure, an operational size, an operational security class or tag, an operational timestamp, an operational content type, an operational content class or tag, an operational index existence, an operational source, an operational version status, an operational encryption class, an operational encoding class, an operational movement state, an operational access class, an operational access heat class, an operational access skew, an operational access lumpiness, an operational local entropy, an operational global entropy, or an operational relationship.
  • the method further comprises determining, by the data set signature mapper of the data management system, the data set from a plurality of data sets stored locally at the data management system based on an attribute of interest associated with the data set, wherein the attribute of interest associated with the data set comprises at least one of an operational name, a time at which the data set was last accessed, an event associated with the data set, a previous operation performed on the data set, a present operation performed on the data set, or a future operation performed on the data set.
  • the method further comprises determining, by the data set signature mapper of the data management system, a business environment from which to receive the business related metadata based on a business data catalog of the business environment, wherein the business data catalog stores information describing data belonging to the business environment.
  • the method further comprises sending, by the data set signature mapper of the data management system to the business environment, a query requesting the business related metadata describing one or more data sets stored by the business environment, matching, by the data set signature mapper of the data management system, the data set stored locally at the data management system with the business data stored by the business environment, mapping, by the data set signature mapper of the data management system, the business related metadata describing the business data stored by the business environment to the operational metadata stored locally at the data management system, and generating and maintaining, by the data set signature mapper of the data management system, the data set signature describing the data set.
  • the business related metadata is received from a single business data catalog or a plurality of different business data catalogs.
  • the business related metadata from two or more external business data catalogs are combined by one of the two or more external business data catalogs before being combined with the operational metadata to generate the data set signature.
  • the data set signature is generated by at least one of the data management system, cloud management software, management software, or consumer by receiving the business related metadata or storage-oriented operational metadata from a storage manager of the data management system and combining the operational metadata with the business related metadata or storage-oriented operational metadata.
  • the method further comprises transmitting, by a storage manager of the data management system, the data set signature to another storage manager.
  • the method further comprises transmitting, by a storage manager of the data management system, the data set signature to the business environment to store the data set signature in a business data catalog associated with the business environment.
  • the business related metadata is received from a business data catalog associated with the business environment, and wherein the business data catalog describes data stored at a cloud computing environment or a data virtualization environment.
  • the method further comprises transmitting, by a storage manager of the data management system, the data set signature to a cloud based environment or a data virtualization environment.
  • the method further comprises determining, by the data set signature mapper of the data management system, other metadata describing the data set based on the business related metadata and the operational metadata using machine learning and training data.
  • the method further comprises determining, by the data set signature mapper of the data management system based on a data set profile, a system profile for a workload profile, wherein the workload profile describes a time-variant workload requested by a business environment to be executed by a system, wherein the system profile describes at least one of hardware, software, services, configuration, topology, or resource headroom to support and execute a workload described by the workload profile, wherein the data set profile comprises one or more data set signatures based on at least one of hypothesized data set signatures or historical data set signatures of the data sets that are expected to be accessed by the workload described in the workload profile.
  • performing, by the policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature further comprises determining, by the policy manager of the data management system, a system management policy indicating the data manipulation operation to be performed on one or more systems or data sets based on metadata included in the data set signature.
  • performing, by the policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature further comprises determining, by the policy manager of the data management system, a system management policy indicating the data manipulation operation to be performed on the data set based on metadata included in the data set signature and a current location of the data set.
  • performing, by the policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature further comprises determining, by the policy manager of the data management system, a system management policy indicating the data manipulation operation to be performed on one or more data sets based on metadata included in the data set signature and metadata included in a second data set signature describing a second data set that is related to a first data set.
  • performing, by the policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature further comprises determining, by the policy manager of the data management system, a system management policy indicating the data manipulation operation to be performed on the data set based on metadata included in the data set signature and an operation performed on the data set.
  • performing, by the policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature further comprises determining, by the policy manager of the data management system, a system management policy indicating the data manipulation operation to be performed on one or more systems or logs based on metadata included in the data set signature and a user requesting access to the data set.
  • performing, by the policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature further comprises determining, by the policy manager of the data management system, a system management policy indicating the data manipulation operation to be performed on one or more systems and the data set based on metadata included in the data set signature and a workload profile indicating that a workload to be performed by the data management system accesses the data set.
  • performing, by the policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature further comprises determining, by the policy manager of the data management system, a system management policy indicating the data manipulation operation to be performed on the data set based on metadata included in the data set signature, wherein the data manipulation operation comprises storing the data set at a specific location or type of location based on the metadata included in the data set signature.
  • performing, by the policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature further comprises determining, by the policy manager of the data management system, a system management policy indicating the data manipulation operation to be performed on the data set based on metadata included in the data set signature and one or more related data sets, wherein the data manipulation operation comprises tagging the data set based on the metadata included in the data set signature and the one or more related data sets.
  • performing, by the policy manager of the data management system, a data manipulation operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature further comprises determining, by the policy manager of the data management system, a system management policy indicating the data manipulation operation to be performed on the data set based on metadata included in the data set signature and metadata included in a second data set signature describing a second data set that is related to a first data set, wherein the data manipulation operation comprises asserting a relationship between the data set and a second data set based on the data set signature of the data set and a second data set signature of the second data set.
  • the data management system is included in an information processing system comprising a plurality of contexts, wherein a first lower context comprises stored data sets, wherein a second context comprises on-demand data virtualization, wherein a third context comprises data operations (DataOps) processes, wherein a fourth higher context comprises Internet of Things (IoT) status characterizations and event recognitions, and wherein the data set signature comprises metadata from at least one of the first lower context, second context, third context, or fourth context.
  • a first lower context comprises stored data sets
  • a second context comprises on-demand data virtualization
  • a third context comprises data operations (DataOps) processes
  • a fourth higher context comprises Internet of Things (IoT) status characterizations and event recognitions
  • IoT Internet of Things
  • the method further comprises unbundling metadata enrichment of objects based on at least one of observed, inferred, or scraped value representation of the objects at another context.
  • the method further comprises unbundling data catalog mapping and integration capabilities across layers and systems.
  • the method further comprises unbundling a serving of enhanced data profiles to a broader set of management system.
  • a second aspect of the present disclosure relates to a data management system.
  • the data management system comprises a memory configured to store instructions, and a processor coupled to the memory and configured to execute the instructions, which cause the processor to be configured to store operational metadata describing a data set, the operational metadata and the data set being stored in a memory of the data management system, receive business related metadata describing the data set from a business environment, the business environment being external to the data management system, store a data set signature comprising a combination of the operational metadata and the business related metadata, and perform an operation on the data set using the combination of the operational metadata and the business related metadata in the data set signature.
  • a third aspect of the present disclosure relates to an apparatus.
  • the apparatus comprises a means for storing operational metadata describing a data set, the operational metadata and the data set being stored in the apparatus, a means for receiving business related metadata describing the data set from a business environment, the business environment being external to the apparatus, a means for storing a data set signature comprising a combination of the operational metadata and the business related metadata, and a means for performing an operation on the data set using the combination of the operational metadata and the business related metadata in data set signature.
  • FIG. 1 A is a diagram of an information processing system.
  • FIG. IB is a diagram of an information processing system configured to implement data set signatures for data impact driven storage management according to various embodiments of the disclosure.
  • FIG. 2 is a schematic diagram of an information system element suitable for data impact driven storage management according to various embodiments of the disclosure.
  • FIGS. 3A-B are diagrams illustrating examples of metadata included in a data set signature associated with a data set according to various embodiments of the disclosure.
  • FIG. 4A is a flowchart illustrating a method of generating a data set signature according to various embodiments of the disclosure.
  • FIG. 4B is a flowchart illustrating another method for generating a data set signature according to various embodiments of the disclosure.
  • FIG. 5 is a diagram of another information processing system configured to implement data set signatures for data impact driven storage management according to various embodiments of the disclosure.
  • FIG. 6 is a diagram of another information processing system configured to implement data set signatures for data impact driven storage management according to various embodiments of the disclosure.
  • FIG. 7 is a diagram of another information processing system configured to implement data set signatures for data impact driven storage management according to various embodiments of the disclosure.
  • FIG. 8 is a diagram illustrating the use of historical or hypothetical data set signatures in designing and/or sizing a system for a projected future workload according to various embodiments of the disclosure.
  • FIG. 9 is a flowchart of a method for determining a system profile for a workload based on one or more data set signatures according to various embodiments of the disclosure.
  • FIG. 10 is a diagram illustrating various examples of system management policies that are executed or orchestrated based on data set signatures according to various embodiments of the disclosure.
  • FIG. 11 is a flowchart of a method to implement data set signatures for data impact driven storage management according to various embodiments of the disclosure.
  • FIG. 12 is a diagram of an apparatus configured to implement data set signatures for data impact driven storage management according to various embodiments of the disclosure.
  • FIG. 1A is a diagram of an information processing system 100.
  • the information processing system 100 of FIG. 1A includes a business environment 103 and a data management system 106 interconnected by one or more links 109.
  • the links 109 may be a wired connection, wireless connection, interface, or any other type of connection between one or more devices or between different software in the business environment 103 and the data management system 106.
  • the business environment 103 is associated with a business entity, organization, enterprise, or company that performs various services or tasks for a user or client using, for example, one or more business applications 117.
  • the business applications 117 may perform transaction management, sales, marketing, accounting, procurement, enterprise resource planning (ERP), customer relationship management (CRM), human resource management, data governance, business intelligence, data visualization, analytics, data mining, business asset management, database management (using a DBMS), etc., singly or in combination.
  • ERP enterprise resource planning
  • CRM customer relationship management
  • human resource management data governance, business intelligence, data visualization, analytics, data mining, business asset management, database management (using a DBMS), etc., singly or in combination.
  • the business environment 103 includes hardware resources, such as servers, processors, memories, routers, switches, virtual private networks (VPNs), gateways etc., that are positioned at or proximate to an office associated with a business entity.
  • the hardware resources within the business environment 103 are configured to store a business data catalog 115
  • the business entity may store data sets 120 that are used by the business applications 117 internally or externally relative to the business entity.
  • the data sets 120 may be stored at the data management system 106.
  • the data sets 120 contain one or more data items, such as objects, tables, tablespaces, documents, files, directories, file systems, blocks, and/or combinations thereof.
  • the data sets 120 are characterized in a business data catalog 115 by properties, business impact attributes, and relationships to other data.
  • the business data catalog 115 includes information describing data associated with, or belonging to, a business environment 103, such as the data sets 120.
  • the business data catalog 115 consolidates and organizes the information describing the data sets 120 into a catalog service.
  • An example of a business data catalog 115 is a WATERLINE data catalog, which catalogs information describing data stored in database management systems, big data ecosystems, clouds, file systems, etc.
  • the business data catalog 115 permits users and/or business applications within the business environment 103 to perform metadata-based operations such as searches relative to data sets 120.
  • the business data catalog 115 stores metadata describing the data sets 120 associated with the business environment 103.
  • the business data catalog 115 also stores metadata describing the business applications 117, processes, or services used by the business environment 103.
  • the business data catalog 115 stores metadata about the structure and/or content of a relational database and a manner by which business applications 117 access the content.
  • users within the business environment 103 may add metadata such as textual annotations and/or tags about data sets 120 to the business data catalog 115 based on business analyses or transactions.
  • the metadata describing the data sets 120 associated with the business environment 103 may include definitions, tables, synonyms, views, indexes, tags, and annotations used to describe the properties, history, business impact attributes, and relationships to other data sets 120.
  • the metadata describing the data sets 120 associated with a business environment 103 includes business related metadata 123.
  • Business related metadata 123 describes aspects of the data sets 120 that may be considered significant to a business application corresponding to the business environment 103.
  • the business related metadata 123 may include the sources and derivation of a particular data set 120.
  • the business related metadata 123 may be annotations input by an employee of the business entity, in which the annotations describe a significance or business impact that a particular data set 120 has to the business entity.
  • the business related metadata 123 may include an estimated monetary value associated with a data set 120 owned by the business entity. Additional examples of the business related metadata 123 will be further provided below with reference to FIGS. 3A-B.
  • the data management system 106 is a storage system, data store, cloud computing environment, data lake, data fabric, data virtualization engine, etc., or any other data repository configured to store and manage data sets 120 on behalf of one or more business environments 103.
  • the data management system 106 may be positioned proximate to or within the business environment 103.
  • the data management system 106 may be positioned at a remote storage system, such as a cloud computing environment, positioned geographically distant from the business environment 103.
  • the data management system 106 includes a data store 129 configured to store the data sets 120 on behalf of the business environment 103 and the storage manager 126.
  • the storage manager 126 is a device or a software process within the data management system 106 that is configured to manage the data sets 120.
  • the data store 129 may be a storage array or any type of memory large enough to store the data sets 120.
  • the data store 129 stores pointers or references to each of the data sets 120.
  • the storage manager 126 maintains the operational metadata 133 describing the data sets 120 stored at the data management system 106.
  • the operational metadata 133 describes basic features of the data sets 120, such as, for example, a format of a data set 120, a timestamp indicating a date of creation of the data set 120 or a time of last access to the data set 120, a size of the data set 120, etc.
  • the operational metadata 133 includes an access frequency of the data set 120, indicating how often the data set 120 has been accessed over a standard period of time.
  • the data management system 106 does not have access to or store the business related metadata 123.
  • the business related metadata 123 is available at the higher level business context such that business applications 117 may be executed to perform, schedule, or orchestrate business related tasks or catalog services using the business related metadata 123.
  • the business related metadata 123 is also used amongst individuals involved in the business entity, such as employees, analysts, and data scientists, enabling these individuals to make informed business decisions using the business related metadata 123.
  • the business related metadata 123 is not available at the lower level storage context, where the data sets 120 used by the business environment 103 are actually stored. Disclosed herein are embodiments directed to an enhanced data management system 106 which stores not only the operational metadata 133, but also the business related metadata 123.
  • FIG. IB is a diagram of an information processing system 150 configured to implement data set signatures for data impact driven storage management according to various embodiments of the disclosure.
  • the information processing system 150 of FIG. IB is similar to the information processing system 100 of FIG. 1A, except that the data management system 106 of the information processing system 150 additionally stores the business related metadata 123 describing one or more data sets 120.
  • the storage manager 126 is configured to execute a data set signature mapper 136 to receive business related metadata 123 from one or more business data catalogs 115 of one or more business environments 103.
  • the data set signature mapper 136 may send a request to the business environment 103 via link 109 for business related metadata 123 associated with, or describing, one or more data sets 120.
  • the storage manager 126 receives business related metadata 123 from the business data catalog 115 of the business environment 103 and then performs a mapping to determine the data sets 120 that correspond to the received business related metadata 123.
  • the data set signature mapper 136 may send a request to the business environment 103 via link 109 for business related metadata 123 associated with one or more business applications 117 or associated with specified types or aspects of data sets 120.
  • the data set signature mapper 136 retrieves or otherwise references the operational metadata 133 corresponding to the data set 120.
  • the data set signature mapper 136 combines the operational metadata 133 with the business related metadata 123 to generate a data set signature 140 for the data set 120.
  • the data set signature 140 for a data set 120 includes an aggregation or combination of the operational metadata 133 describing the data set 120, which is already stored locally at the data management system 106, and the business related metadata 123, which is received from the business environment 103.
  • the data set signature mapper 136 may first reduce or transform the business related metadata 123 to match the operational namespace of the storage manager 126, to standardize the business related metadata, or to facilitate later policy-based management of data sets 120.
  • the data set signatures 140 may be transmitted back up to the business environment 103, to a cloud management environment, to other database management systems responsible for managing a different portion of the data store 129, or a different data store.
  • the data management system 106 may also transmit the data set signatures 140 to one or more business information processes, such as data governance, information lifecycle management, disaster recovery, and/or data brokering.
  • the embodiments disclosed herein are advantageous and offer practical applications to the field of information processing for various reasons.
  • operations can be performed on the data set 120 more efficiently, with more accuracy, with finer granularity, with less latency, more comprehensively, and/or at more opportune moments.
  • the business entity corresponding to the business environment 103 can benefit from integrating the business related metadata 123 at the business context with the operational metadata 133 at the storage context.
  • the embodiments disclosed herein enable the data management system 106 to identify the data sets 120 having the greatest impact for the business entity corresponding to the business environment 103. This information can be used to enhance business tools, processes, and the storage of data sets 120.
  • the data management system 106 performs a data manipulation operation on the data sets 120 based on the data set signature 140 and various system management policies associated with the data set signatures 140.
  • a system management policy indicates a data manipulation operation to be performed on a data set based on a data set signature 140 describing the data set 120.
  • a system management policy may indicate that when a data set 120 is described by a certain type of metadata, as identified in the data set signature 140, the data management system 106 is configured to perform certain data manipulation operations on the data set 120 based on the metadata identified in the data set signature 140.
  • the data management system 106 may implement security mechanisms on the data set 120, enforce a higher cost to access the data set 120, or perform any other operation pertaining to the data set 120.
  • a system management policy may indicate that, when the data set signatures 140 of two data sets 120 indicate that the two data sets 120 are copies of one another, certain data manipulation operations should be performed.
  • the system management policy may indicate that when one operation (e.g., a write operation) is performed on one of the data sets 120, the operation should also be performed on the other copy of the data set 120.
  • the system management policy may indicate that, when the data set signatures 140 of two data sets 120 indicate that the two data sets 120 are copies of one another, one of the data sets 120 should be deleted.
  • data manipulation operations can be invoked on the data sets 120 more efficiently, accurately, comprehensively, and/or at more opportune moments by using the combination of the operational metadata 133 and the business related metadata 123 in the data set signature 140.
  • business entities may save costs and resources by enabling the creation and storage of data set signatures 140 at the data management systems 106. For example, when a data set signature 140 of two separate data sets 120 indicate that both data sets 120 are copies of one another, a system management policy that is preconfigured at the data management system 106 may instruct that operations performed on one of the data sets 120 should also be performed on the other copy of the data set 120.
  • the business entity may save on costs by simply performing the operation on one of the data sets 120, since the data management system 106 is configured to automatically perform the same operation on the other copy of the data set 120.
  • a particular data set 120 is tagged as being of significant value to the business entity, which may be indicated in the data set signature 140 of the data set 120.
  • a system management policy is preconfigured at the data management system 106 to automatically perform a data manipulation operation on the data set 120 being tagged as valuable to the business entity.
  • the data manipulation operation may be directed to further securing the data set 120, for example, by encrypting the data set 120 using a public or private key.
  • the data management system 106 is configured to automatically perform a data manipulation operation on the data set 120 based on the data set signature 140, which is directed to protecting the data sets 120 that are valuable to the business entity.
  • the performance and management of the data management system 106 is greatly enhanced by using the combination of the operational metadata 133 and the business related metadata 123 in the data set signature 140. Due to the additional details stored in the data set signatures 140, data manipulation operations may be performed on the data sets 120 with greater efficiency and accuracy. For example, if a particular read operation is to be performed on all data sets 120 having a particular characteristic, the data management system 106 may search through the data set signatures 140 to quickly find all the data sets 120 having the particular characteristic, instead of having to individually analyze each of the data sets 120. In this way, data set signatures 140 greatly enhance the value of a data management system 106 while also reducing the cost and overhead from the perspective of a business entity.
  • FIG. 2 is a schematic diagram of information processing element 200 suitable for implementing data set signatures for data impact driven storage management according to various embodiments of the disclosure.
  • the information processing element 200 may be implemented as the data management system 106, the storage manager 126, or the business environment 103.
  • the information processing element 200 comprises ports 220, transceiver units (Tx/Rx) 210, a processor 230, and a memory 233.
  • the processor 230 comprises a data set signature mapper 136 and an optional policy manager 137.
  • Ports 220 are coupled to Tx/Rx 210, which may be transmitters, receivers, or combinations thereof.
  • the Tx/Rx 210 may transmit and receive data via the ports 220.
  • Processor 230 is configured to process data.
  • Memory 233 is configured to store data and instructions for implementing embodiments described herein.
  • the information processing element 200 may also comprise electrical-to-optical (EO) components and optical-to-electrical (OE) components coupled to the ports 220 and Tx/Rx 210 for receiving and transmitting electrical signals and optical signals.
  • EO electrical-to-optical
  • OE optical-to-electrical
  • the processor 230 may be implemented by hardware and software.
  • the processor 230 may be implemented as one or more central processing unit (CPU) and/or graphics processing unit (GPU) chips, logic units, cores (e.g., as a multi-core processor), field- programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signal processors (DSPs).
  • the processor 230 is in communication with the ports 220, Tx/Rx 210, and memory 233.
  • the storage manager 126, data set signature mapper 136, and policy manager 137 are implemented by the processor 230 to execute the instructions for implementing various embodiments discussed herein.
  • the storage manager 126 is configured to manage the data sets 120 and the storage data catalog 240.
  • the data set signature mapper 136 is configured to determine the business environment 103 from which to receive business related metadata 123, determine the types of business related metadata 123 to be received from the business environment 103, and map the business related metadata 123 received from the business environment 103 to the data sets 120 stored at the data management system 106.
  • the policy manager 137 is configured to apply the system management policies 243 based on the data set signatures 140.
  • the inclusion of the storage manager 126, data set signature mapper 136, and policy manager 137 provides an improvement to the functionality of the information processing element 200.
  • the storage manager 126, data set signature mapper 136, and policy manager 137 also effect a transformation of information processing element 200 to a different state.
  • the storage manager 126, data set signature mapper 136, and policy manager 137 are implemented as instructions stored in the memory 233.
  • the functions of the policy manager 137 may not be a distinct component within the information processing element 200. Instead, the functions of the policy manager 137 may be external to the information processing system 200 and even the data management system 106.
  • the memory 233 comprises one or more of disks, tape drives, or solid-state drives and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution.
  • the memory 233 may be volatile and non-volatile and may be read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), and static random-access memory (SRAM).
  • the memory 233 is configured to store the data sets 120, and a storage data catalog 240.
  • the storage data catalog 240 includes multiple data set signatures 140 corresponding to multiple different data sets 120.
  • Each of the data set signatures 140 includes the operational metadata 133 and the business related metadata 123 describing a data set 120.
  • the memory 233 also stores system management policies 243, indicating a manner of managing and performing operations on data sets 120 associated with a particular data set signature 140.
  • the system management policies 243 may also indicate a manner of managing and performing operations on a data set 120 having a particular type of operational metadata 133 and/or a particular type of business related metadata 123.
  • the system management policies 243 are applied to the data sets 120 or to various systems under management based at least in part on the data set signature 140, as will be further described below with reference to FIG. 10.
  • a design that is still subject to frequent change may be preferred to be implemented in software, because re-spinning a hardware implementation is more expensive than re-spinning a software design.
  • a design that is stable that will be produced in large volume may be preferred to be implemented in hardware, for example in an ASIC, because for large production runs the hardware implementation may be less expensive than the software implementation.
  • a design may be developed and tested in a software form and later transformed, by well-known design rules, to an equivalent hardware implementation in an ASIC that hardwires the instructions of the software.
  • a machine controlled by a new ASIC is a particular machine or apparatus, likewise a computer that has been programmed and/or loaded with executable instructions may be viewed as a particular machine or apparatus.
  • FIGS. 3A-B are diagrams illustrating examples of metadata included in a data set signature 140 associated with a data set 120 according to various embodiments of the disclosure.
  • the data set signature 140 includes the operational metadata 133 and the business related metadata 123. Examples of the operational metadata 133 are shown in FIG. 3 A, and examples of the business related metadata 123 are shown in FIG. 3B.
  • the operational metadata 133 includes basic metadata describing basic features of the data sets 120.
  • the operational metadata 133 may include, for example, at least one of an operational name 307, an operational type 308, an operational structure 309, an operational size 310, an operational security class or tag 311, an operational timestamp 312, an operational content type 313, an operational content class or tag 314, an operational index existence 315, an operational source 316, an operational version status 317, an operational encryption class 318, an operational encoding class 319, an operational movement state 320, an operational access class 321, an operational access heat class 322, an operational access skew 323, an operational access lumpiness 324, an operational local growth rate 325, an operational global entropy 326, and/or an operational relationship 327.
  • operational metadata 133 may include other metadata or descriptions that are not otherwise shown in FIG. 3 A.
  • the examples of operational metadata 133 may be stored at the data management system 106 in the form of fields, key- value pairs, RDF triples or graphs, or any other type of data structure.
  • the operational metadata 133 is maintained by the storage manager 126.
  • the operational name 307 refers to a name of a data set 120.
  • the operational name 307 may be Sensorl354_09032019_Readings, or a system generated object identifier.
  • the operational type 309 indicates to a type of the data set 120, such as, for example, whether the data set 120 is a file, object, storage volume, or storage logical unit number (LUN).
  • the operational structure 309 refers to a structure of the data set 120, such as, for example, whether the data set 120 is unstructured, semi-structured, relational, key- value pairs, or a stored object.
  • the operational size 310 refers to a size of the data set 120, such as, for example, in physical sizes (e.g., megabytes (MB), gigabytes (GB), terabytes (TB), etc.).
  • the operational security class or tag 311 refers to a security tag or class associated with the data set 120, which may be used by data placement policies.
  • the operational timestamp 312 indicates a timestamp of an event occurring at the data set 120, such as, for example, a timestamp indicating when the data set 120 was created, last read, or last modified.
  • the operational content type 313 indicates a content type of the data set 120, such as, for example, a multipurpose Internet mail extension (MIME) or whether the data set 120 is an image or audio file or related to an image or audio file.
  • the operational content class or tag 314 refers to a content class or tag associated with the data set 120.
  • the operational content class or tag 314 may be a custom tag, such as a tag indicating that the data set 120 is “sensor data” or an “X-Ray image.”
  • the operational index existence 315 includes a binary value indicating whether the data set 120 has been indexed.
  • the operational source 316 indicates a source of the data set 120, such as, for example, an identification of a public cloud.
  • the operational version status 317 indicates a version status of the data set 120, such as, for example, an indication as to whether the data set 120 is an active version, deleted version, backup copy, archived version, etc.
  • the operational encryption class 318 indicates whether the data set 120 is encrypted or is to be encrypted.
  • the operational encoding class 319 indicates how the data set 120 is encoded. For example, when the data set 120 is encoded as a text-compressed gzip file, the operational encoding class 319 indicates that the data set 120 is encoded as a text- compressed gzip file.
  • the operational movement state 320 indicates a movement state of the data set 120. For example, the operational movement state 320 for the data set 120 may indicate whether the data set 120 is streaming or at rest.
  • the operation access class 321 may indicate an access class of the data set 120, such as, for example, whether the data set 120 is a read-only/immutable file, a read-and- write file, an append-only file, or a write-once file.
  • the operational access heat class 322 indicates an access frequency of the data set 120, such as, for example, whether the data set 120 is cold (rarely accessed), warm (infrequently accessed), hot (frequently accessed), or very hot (very frequently accessed). In an alternative embodiment, the operational access heat class 322 may be defined using ranges of access frequencies.
  • the operational access skew 323 indicates where the data set 120 falls in the overall access skew across data of a scope.
  • a typical example of operational access skew is “90% reads to 10% of the data set.”
  • the operational access lumpiness 324 indicates a uniformity or lack of uniformity in access to the data set 120 over time. For example, perhaps certain data is only accessed at the top of the hour, and never at night or on weekends.
  • the operational local growth rate 325 refers to a rate of change (e.g., growth or shrinkage) for the data set 120 considered in isolation.
  • the operational global growth rate 326 refers to a rate of change for the data set 120 relative to other data in a scope.
  • the operational relationships 327 refer to an operational relationship of the data set 120 with other data. For example, the operational relationship 327 indicates whether the data set 120 is a copy of, version of, index into, or subset of another data set. As another example, the operational relationship 327 indicates whether the data set 120 and another data set should be stored together or deleted together.
  • FIG. 3B shows the business related metadata 123, which includes metadata describing information related to the business aspects or utility of the data sets 120 with respect to one or more business entities.
  • the business related metadata 12 may include, for example, at least one of a business name 330, a business type 331, a business natural language description 332, a business owner 333, a business steward 334, a business content class/tag 335, a business permission workflow 336, a business organization level 337, a business geography of data 338, a business structure type 339, a business structure 340, a business language 341, a business size 342, a business security class 343, a business timestamp 344, a business source or lineage 345, a business cleansing or curation 346, a business data set profile 347, a business data set virtualization status 348, a business sort order 349, a business retention date 350, a business compliance obligation 351, a business relationship 352, a business usage count 35
  • the business related metadata 123 may include other metadata or descriptions that are not otherwise shown in FIG. 3B.
  • the examples of business related metadata 123 may be stored at the business environment 103 and/or data management system 106 in the form of fields, RDF triples or graphs, key- value pairs, or any other type of data structure.
  • the business related metadata 123 is stored at the business data catalog 115, one more business applications 117, or business information processes.
  • the business related metadata 123 may be stored anywhere and accessed through an API or as a service.
  • the business name 330 refers to a name of the data set 120, which may be the same as the operational name 307 for a data set 120, or different from the operational name 307 for a data set 120.
  • the business type 331 refers to a type of the data set 120, such as, for example, whether the data set 120 is a file, logical object, or table.
  • the business natural language description 332 refers to a natural language description of the data set 120. For example, the business natural language description 332 may indicate that the data set 120 includes normalized temperature readings from a particular sensor.
  • the business owner 333 indicates an owner of the data set, such as, for example, a sales department of a business entity in the New York state area.
  • the business steward 334 refers to a business custodian or a person who represents the business owner.
  • the business content class or tag 335 indicates a content class or tag of the data set 120.
  • the business content class or tag 335 of the data set 120 may indicate that the data set 120 refers to a customer purchase history, autonomous vehicle sensor readings, personally identifying information (PII), etc.
  • the business permission workflow 336 indicates a process to allow a user or application permission to access the data set 120, as an alternative to granting access to the data set 120through user roles, access control lists, and/or other mechanisms.
  • the business organization level 337 indicates an organization level of a business owner 333 of the data set 120.
  • the business organization level 337 may indicate that the data set 120 is owned by a department, a division, a line of business, a headquarters, etc.
  • the business geography of data 338 indicates a geographical area associated with the data set 120.
  • the business structure type 339 indicates a type of the data set 120, such as, for example, a scheme, encoding, markup language, or resource description framework (RDF) triple.
  • the business structure 340 indicates a structure of the data set 120, such as, for example, the actual schema or key space for key- value pairs.
  • the business language 341 indicates a written language used by the business entity or users that access the data set 120, such as, for example, “English” or “French.”
  • the business size 342 indicates a size of the data set 120 in logical units, such as, for example, relational rows or record counts.
  • the business security class 343 indicates a security class of the data set 120 with regard to the business entity, which may be meaningful for data governance and access control.
  • the business timestamp 344 indicates a timestamp of an event occurring at the data set 120.
  • the business source or lineage 345 indicates a source or lineage of the data set 120.
  • the business source or lineage 345 indicates whether the data set 120 is from a data warehouse, or an operational database management system.
  • the business source or lineage 345 may also indicate actions that have been recorded on the data set 120, such as, for example, data set merges, splits, or transformations.
  • the business cleansing or curation 346 indicates whether the data set 120 has been cleansed or cured, for example, by data normalization, to add defaults for missing fields, etc.
  • the business data set profile 347 may include details associated with one or more fields of the data set 120, such as, for example, value range or minimum or maximum values.
  • the business data set virtualization status 348 indicates whether the data set 120 is a virtual data set assembled on demand from existing data sets.
  • the business sort order 349 includes a sort key and direction of sorting the stored data set 120 (ascending or descending) with respect to the sort key.
  • the business retention date 350 indicates how long to retain the data set 120 before deleting or scheduling for deletion the data set 120.
  • the business compliance obligations 351 indicates and/or references enterprise policies, professional data handling standards, or legal requirements with respect to operations and/or actions on the data set 120.
  • the business relationships 352 includes other data sets that are related to the data set 120, such as, for example, data sets that are part of, subset of, derived from, reduced size version of, or friend of the data set 120. The business relationships 352 are useful since actions on data sets 120, such as move, delete, and tag-as, may need to propagate to related data sets 120.
  • Business relationships 352 may indicate a structural composition, such as objectl part-of object2; association, such as copy-of, version-of, index-into, subset-of, derived-from, reduced-size-version-of; grouping, such as store-together, delete-together, friend-of; and/or type level relationships, such as HDFS subtype-of file system, file tagged-as “Contains Personally Identifying Information.”
  • the business usage counts 353 indicates a number of times, either overall or by user type, that data set 120 has been used to provide a process or service in a given time period, such as year-to-date, for the business environment 103.
  • the business utility rating 354 refers to a rating provided by one or more users of the business environment 103 that indicates a business utility of the data set 120. For example, a business utility rating 354 of 0 indicates that the data set 120 is least useful to the business environment 103 and a business utility rating 354 of 10 indicates that the data set 120 is most useful to the business environment 103.
  • the business user veracity rating 355 refers to a rating provided by one or more users of the business environment 103 that indicates a truthfulness, accuracy, and/or completeness of the data set 120, for example, with regard to the category or type of data that the data set 120 describes. For example, a business user veracity rating 355 of 0 indicates that the data set 120 is least accurate or least believable by users, and a business user veracity rating 355 of 10 indicates that the data set 120 is most accurate or most believable by the users.
  • the business data scientist utility rating 356 refers to a rating provided by one or more data scientists that indicates a value of the data set 120 with respect to performing data analysis on behalf of the business environment 103.
  • the business data scientist comment 357 includes one or comments provided by a data scientist regarding, for example, an affinity to use the data set 120 in a particular type of analytic model development.
  • the business decision impact 358 represents the use of the data set 120 to make business decisions. For example, a business decision impact 358 of low indicates that there has historically been low impact of the data set 120 on business decisions, and a business decision impact 358 of high indicates that there has historically been high impact of the data set 120 on business decisions.
  • the business monetary impact 359 represents an estimation of the aggregate or periodic monetary outcome of decisions made through the use of the data set 120.
  • the business monetary impact 359 associated with a data set 120 may be reduced when additional data sets 120 also contributed to aggregate or periodic monetary outcomes.
  • the business monetary impact 359 may also represent a monetary value associated with the data set 120, which may have been determined by a valuation professional.
  • the business aggregate data set impact 360 represents a value or aggregate value of the data set 120 to the business environment 103 or the business entity.
  • the business aggregate data set impact 360 may be an aggregate reflection of a significance of the data set 120 based on the business monetary impact 359, the business decision impact 358, and/or any other metadata shown and described with reference to FIGS. 3A-B.
  • the business average impact of data set’s friends 361 refers to a business impact or value of friends of the data set 120. Friends of the data set 120 refer to other data sets stored at the data management system 106 that are similar or have a correspondence with the primary data set 120.
  • the friends of the data set 120 may include other data sets that have content similar to the primary data set 120, other data sets that describe a similar category or type of content as the primary data set 120, other data sets that have been accessed, updated, or edited by the same users, other data sets that have a similar access frequency, other data sets that have comparable ratings, other data sets that have a similar monetary impact, or any other data set with a similarity to the primary data set 120.
  • the friends of the data set 120 may be tagged manually, inferred by policies, or identified through machine- learned classifications.
  • the data set signature 140 shown in FIGS. 3A-B includes all of the metadata shown in FIGS. 3A-B and described above. However, a data set signature 140 for the data set 120 need not include all of the metadata shown in FIGS. 3 A-B and described above. In some cases, a data set signature 140 need only include one or more of the metadata shown in FIGS. 3 A-B and described above, or a data set signature 140 includes a subset of the metadata shown in FIGS. 3A-B and described above. There may be multiple instances of one or more of the different types of business related metadata 123. For example, there may be multiple instances of the business compliance obligation 351.
  • the business related metadata 123 is originally stored at the business data catalog 115 at the business environment 103.
  • the business related metadata 123 may be stored in the business data catalog 115.
  • a portion of the business related metadata 123 may be stored in the business data catalog 115 as annotations or input provided by users of the business environment 103.
  • the data management system 106 retrieves the business related metadata 123 from the business data catalog 115, and stores the business related metadata 123 of a data set 120 with the operational metadata 133 of the data set 120 in a data set signature 140.
  • the data sets 120 may include an on-demand virtualization object, a data operations (DataOps) process flow, status, and/or result, an Internet of Things (IoT) status characterization, or an IoT event recognition.
  • the business related metadata 123 may be metadata describing the DataOps object, IoT status characterization, or IoT event recognition.
  • FIG. 4A is a flowchart illustrating a method 400 of generating a data set signature 140 according to various embodiments of the disclosure.
  • the method 400 may be implemented by the information processing element 200 or the storage manager 126 in the data management system 106. Within the storage manager 126, all or part of the method 400 may be implemented by the data set signature mapper 136. In an embodiment, the method 400 may be implemented after a data set 120 is stored in a data store 129 of the data management system 106.
  • a data set 120 for which a data set signature 140 should be created is determined.
  • the data set signature mapper 136 determines the data set 120 from multiple data sets 120 stored locally at the data management system 106 based on an attribute of interest associated with the data set 120.
  • the attribute of interest associated with the data set 120 may be at least one of an operational name 307, a time at which the data set 120 was last accessed, an event associated with the data set 120, a previous, present, or future operation performed on the data set 120, and/or any another operational metadata associated with the data set 120.
  • the operational metadata 133 describing the data set 120 is stored locally at the data management system 106.
  • the data set signature mapper 136 retrieves the operational metadata 133 from an external entity and then stores the metadata locally at the storage data catalog 240, which may be stored at the database management system 106.
  • the operational metadata 133 is already stored in the storage data catalog 240, prior to selection of a data set 120.
  • the operational metadata 133 is managed elsewhere by the storage manager 126.
  • the operational metadata 133 may include one or more of the metadata described above with reference to FIG. 3A.
  • the business environment 103 storing business related metadata 123 that may be similar to, related to, or correlated to the data sets 120, is determined.
  • the data set signature mapper 136 may have access to more than one business environment 103, each storing a different business data catalog 115.
  • the data set signature mapper 136 selects one or more of the multiple business environments 103 accessible by the data management system 106 from which to receive business related metadata 123.
  • the business data stored at the business environment 103 is matched with one or more data sets 120 stored at the data management system 106, using corresponding metadata.
  • the business data and the data set 120 are matched based on explicit matching or inferred matching.
  • Explicit matching may be performed when an object in the business data has the same namespace as a data set 120 or an object in a data set 120.
  • the object from the business data can be explicitly matched by name to the object in the data set 120.
  • Inferred matching may be performed when some metadata describing the business data and some metadata describing the data set 120 indicates that the business data and the data set 120 are likely to be identical or nearly identical to one another.
  • the selected business related metadata 123 associated with the matched business data stored at the business environment 103 is mapped to the operational metadata 133 for one or more matched data sets 120 determined in step 409 that are stored at the data management system 106.
  • the business related metadata 123 is received from the business environment 103 without a context as to how the business related metadata 123 relates to a particular data set 120.
  • the data set signature mapper 136 is configured to determine a mapping between the business related metadata 123 and one or more data sets 120 based on a relationship or correlation between the business related metadata 123 and one or more data sets 120.
  • the business related metadata 123 may include a business monetary impact 359 of a data set 120 and identification information of the data set 120.
  • the data set signature mapper 136 is configured to map the identification information of the data set 120 to the locally stored identification information of the data set 120 to map the business monetary impact 359 to the data set 120.
  • the business related metadata 123 is mapped to a data set 120 based on explicitly identifying metadata, inherently identifying metadata, or inferring relationships between metadata. [0097]
  • the data set signature mapper 136 after receiving the business related metadata 123, the data set signature mapper 136 also filters the received business related metadata 123 to determine the business related metadata 123 that is relevant to the data set 120 based on parameters set for the data set 120.
  • the business environment 103 sets parameters defining the type of metadata that should be stored in a data set signature 140 for a data set 120.
  • the data set signature mapper 136 filters the received business related metadata 123 based on the parameters set by the business environment 103.
  • the data set signature mapper 136 may set parameters, or may be directed through an interface to set parameters, defining the type of metadata that should be stored in a data set signature 140 for a data set 120.
  • the business environment 103 sets a parameter indicating that a business monetary impact 359 of a data set 120 should be received from a business data catalog 115 and stored in a data set signature 140 of the data set 120.
  • the data set signature mapper 136 filters the business related metadata 123 received from the business environment 103 to retrieve the business monetary impact 359 of a data set 120.
  • the storage manager 126 creates a data set signature 140 including the business monetary impact 359 of the data set 120.
  • filtering of business related metadata 123 is performed by the business environment before the business related metadata 123 is received by the data set signature mapper 136.
  • the selected business related metadata 123 is received from the business environment 103 determined at step 406.
  • the Tx/Rx 210 of the data set signature mapper 136 receives the business related metadata 123 from the business environment 103.
  • one or more data set signatures 140 including both the operational metadata 133 and the selected business related metadata 123 describing the one or more data set 120 is locally stored at the storage catalog 240.
  • the data set signature 140 includes the business related metadata 123 after mapping, matching and filtering has been performed on the business related metadata 123.
  • the data set signature 140 is periodically updated and maintained at the storage catalog 240 by retrieving additional or updated business related metadata 123 from the business environment 103.
  • the data set signature mapper 136 periodically requests and retrieves any updates or changes to the locally stored business related metadata 123 from the business data catalog 115 of the business environment 103.
  • the business environment 103 automatically and periodically sends updates or changes to the business related metadata 123 to the data set signature mapper 136 of the data management system 106.
  • the data set signature mapper 136 updates the data set signature 140 for the data set 120 based on the updates received from the business environment 103.
  • an event at the business level or at the storage level may trigger updates and maintenance to the data set signature 140. For example, such an event might be a refresh of business related metadata 123 initiated by a business application 117, or the completion of a security operation that tags operational metadata 133 based on classifying data set content.
  • FIG. 4B is a flowchart illustrating a method 450 of generating a data set signature 140 according to various embodiments of the disclosure.
  • the method 450 which includes explicit mapping scopes, may be implemented by the information processing element 200, the storage manager 126, or the data set signature mapper 136.
  • the method 450 may be implemented after a data set 120 is stored in a data store 129 of the data management system 106.
  • filters are determined for the storage data catalog 240, which is known as the storage-side mapping scope, to identify and select stored data sets 120 of interest.
  • a storage- side mapping scope is a specification of one or more conditions, and combinations of conditions, on the operational metadata 133.
  • the storage-side mapping scope applies to the operational metadata 133 of the stored data sets 120 and can be used to select one or more data sets 120, also referred to as “targets.”
  • the data set signature mapper 136 determines the filters or storage-side mapping scope for the storage data catalog 240.
  • the business data catalog 115 is queried, through an API or other mechanism, to determine business data sets within a business-side mapping scope that corresponds or is likely to correspond to the storage-side mapping scope.
  • a corresponding business-side mapping is derived to identify business data sets within the business data catalog 115 that may correspond to the stored data sets 120 in the storage-side mapping scope.
  • the business-side mapping scope represents sources for mapping, matching, and combining business related metadata 123 into data set signatures 140.
  • asking for metadata about data sets in a business-side mapping scope is not necessarily the same as asking for metadata associated with, or describing one or more specific data sets 120.
  • the names and granularities of data sets may differ in the business related metadata 123 and the operational metadata 133. In a sense, data sets are potentially different at business and operational levels.
  • Business data sets are in a sense virtual in that business data sets are represented by business related metadata 123 in a business data catalog 115, business application 117, or business environment 103. In some cases, business data sets are actually stored in operational data sets managed by one or more data managers.
  • step 459 the business data sets determined at step 456 are mapped to corresponding stored data sets 120 through a combination of data set names and other metadata, optionally leveraging prior discovered correspondences from the storage data catalog 240.
  • step 459 is similar to some of the steps described above with reference to step 410 of FIG. 4 A, in that the data set signature mapper 136 is configured to map the business data sets to the corresponding stored data sets 120.
  • step 462 selected metadata about the business data sets within the business-side mapping scope from the business data catalog 115 is matched and propagated to the storage data catalog 240.
  • the selected metadata is associated with corresponding stored data sets 120 to create or modify data set signatures 140 in the storage data catalog 240.
  • step 462 is similar to some of the steps described above with reference to step 410 of FIG. 4A, in that the data set signature mapper 136 is configured to match the business related metadata 123 about business data sets within the business-side mapping scope from the business data catalog 115 to the storage data catalog 240 and then associate the business related metadata 123 with the corresponding stored data sets 120 to create or modify data set signatures 140.
  • the selected metadata may be transformed or modified and then propagated to the storage data catalog.
  • the data set signature mapper 136 may encode the business related metadata 123 in a particular manner to convey more information to the storage manager 126.
  • the data set signature mapper 136 may encode the business name to additionally indicate whether the corresponding business is a national or international entity.
  • the transformation or modification of metadata before propagating to the storage data catalog 240 may standardize the metadata, or may facilitate and/or increase the speed of policy management.
  • step 465 the data set signatures 140 of corresponding stored data sets 120 are inserted, updated, modified, deleted, or otherwise reconciled in the storage data catalog 240 to reflect the business-level metadata changes since the last update to the data set signatures 140.
  • step 465 is similar to step 418 of FIG. 4A, in that the data set signature mapper 136 is configured to periodically update the data set signature 140 in the storage data catalog 240 for corresponding data sets 120.
  • FIG. 5 is a diagram of another information processing system 500 configured to implement data set signatures 140 for data impact driven storage management according to various embodiments of the disclosure.
  • the information processing system 500 is similar to the information processing system 150, except that the information processing system 500 includes more than one business data catalog, the primary business data catalog 515 and the secondary business data catalog 516.
  • the information processing system 500 operates similar to the information processing system 150, except that the storage manager 126 receives business related metadata 123 from a primary business data catalog 515, which includes metadata received from a secondary business data catalog 516.
  • the primary business data catalog 515 is associated with, or belongs to, the business environment 103 and includes metadata describing data objects, business applications 117, processes, and other data stored at the business environment 103 and used by the business entity corresponding to the business environment 103.
  • the secondary business data catalog 516 is associated with the business environment 103 but includes metadata describing data objects, business applications 117, processes, and other data that is not otherwise described in the primary business data catalog 115.
  • the secondary business data catalog 516 is associated with another business environment 103 and includes metadata describing data objects, business applications, processes, and other data stored at the other business environment 103 and used by another business entity corresponding to the other business environment 103.
  • the primary business data catalog 515 receives business related metadata 123 from a secondary business data catalog 516 and orchestrates the integration of this business related metadata 123 with the business related metadata within the primary business data catalog 515.
  • the storage manager 126 requests business related metadata 123 from the primary business data catalog 515 and receives the integrated business related metadata 123 from the business environment 103. In this way, the storage manager 126 does not have to separately request, retrieve, and reconcile business related metadata 123 from two different business data catalogs 515 and 516. Instead, the storage manager 126 receives the business related metadata 123 from both the primary business data catalog 515 and secondary business data catalog 516 in one interaction with the primary business data catalog 515.
  • the advantage of the embodiment shown in FIG. 5 is that the storage manager 126 does not have to perform additional metadata integration because the integration of business related metadata 123 is done in a business context.
  • FIG. 6 is a diagram of another information processing system 600 configured to implement data set signatures 140 for data impact driven storage management according to various embodiments of the disclosure.
  • the information processing system 600 is similar to the information processing system 150, except that the information processing system 600 includes more than one storage manager 126, the primary storage manager 626 and the secondary storage manager 627.
  • the information processing system 600 operates similar to the information processing system 150, except that the primary storage manager 626 may transmit the data set signatures 140 to other secondary storage managers 627.
  • the primary storage manager 626 is associated with, or belongs to, the data management system 106 and is configured to manage the data sets 120 stored locally at the data store 129 of the data management system 106.
  • the secondary storage manager 627 is associated with the data management system 106 but is configured to manage another subset of the data sets 120 that is not otherwise managed by the primary storage manager 626. For example, when data sets 120 are migrated among storage tiers, the data sets 120 can be managed more efficiently and effectively if accompanied by their data set signatures 140.
  • the secondary storage manager 627 is associated with another data management system 106 and is configured to manage other data sets 120 that are not stored at the data management system 106.
  • the secondary storage manager 627 is associated with a cloud manager environment or a data virtualization environment.
  • the cloud manager environment may be dealing with a different view or level of the same underlying data sets 120.
  • a data virtualization environment may be managing virtual data sets that actually map to some of the stored data sets 120 managed by the primary storage manager 626.
  • the primary storage manager 626 transmits one or more data set signatures 140 to the secondary storage manager 627.
  • the secondary storage manager 627 may store the data set signatures 140 and use the data set signatures 140 to perform operations on behalf of one or more business environments 103.
  • FIG. 7 is a diagram of another information processing system 700 configured to implement data set signatures 140 for data impact driven storage management according to various embodiments of the disclosure.
  • the information processing system 700 is similar to the information processing system 150, except that the storage manager 126 in the information processing system 700 transmits the data set signatures 140 back to the business environment 103 via link 109.
  • a business data catalog 115 in business environment 103 may integrate the received data set signatures 140 into its business related metadata 123, thus providing operational insights about data sets 120 to users of the business environment 103.
  • business environment 103 may send a request for one or more data set signatures 140 to the storage manager 126. The storage manager 126 transmits the requested data set signatures 140 back to the business environment 103.
  • FIG. 8 is a diagram 800 illustrating the use of historical or hypothetical data set signatures 140 in designing and/or sizing a system for a workload according to various embodiments of the disclosure.
  • the system is designed and or sized with a system profile, or profile of system resources, configuration, and topology, to provide sufficient capacity, response time, throughput, and management services to meet the needs of an expected workload.
  • the system may be one or more of a data management system, a storage system, a cloud system, a hyper-converged infrastructure, a composite infrastructure with independently scalable blocks, etc.
  • a workload refers to a series of processes that uses memory and/or processing power of the system over a given period of time or at a specific instance of time.
  • a workload profile 803 describes a time-variant workload that a business environment 103 desires, or expects, to be executed by a new or modified system.
  • a workload profile 803 includes specifications of desired throughputs, response times, and other key performance indicators related to workload execution.
  • a system profile 806 describes the hardware (e.g., memory, processor, etc.), software, services, configuration, topology, resource headroom, etc. that could can potentially support and execute the workload described by the workload profile 803. For example, processes that are to be executed for a workload are described by the workload profile 803, and the processes require system capabilities, which are described by the system profile 806.
  • a data set profile 840 refers to the data set signatures 140 describing the one or more data sets 120 that are created and/or accessed during the hypothesized, simulated, or historical execution of the workload.
  • the workload profile 803 includes references to the data sets 120 used by the processes of the workload. Instead of creating a system profile 806 based only on a workload profile 803, in an embodiment, the system profile 806 is created based on both the workload profile 803 and the data set profile 840, or the collection of data set signatures 140 describing data sets 120 hypothetically used by the workload corresponding to the workload profile 803. Determining the system profile 806 using the data set signatures 140 enables the system designer or system sizer to more accurately and specifically size and configure the system for the workload corresponding to the workload profiles 803.
  • the data set signatures 140 provide a more detailed and fine-grained description of the data sets 120 that are used by the processes of the workload, which can be used to more precisely configure the necessary system hardware, software, and other resources to meet the requirements of the workload.
  • the use of data set signatures 140 within a data set profile 840 as an intermediate representation in addition to workload profile 803 in creating system profiles 806 for a workload results in more accurately designed and/or sized systems.
  • the system being designed and sized may be optimized to minimize acquisition and operational costs while still delivering the desired workload throughput and response times.
  • multiple workload profiles 803 and/or data set profiles 840 and/or system profiles 806 may be considered for sizing and optimization purposes.
  • FIG. 9 is a flowchart of a method 900 of determining a system profile 806 for a workload based on one or more data set signatures 140 according to various embodiments of the disclosure.
  • the method 900 may be implemented using historically captured, hypothesized, or simulated data set signatures 140 for one or more data sets 120.
  • a workload profile 803 for the system that is to be sized, specified, configured, or provisioned is determined.
  • the workload profile 803 describes the series of processes executed for the corresponding workload and one or more data sets 120 used by the workload during execution.
  • a workload profile 803 includes specifications of desired throughputs, response times, and other key performance indicators related to future workload execution.
  • one or more system profiles 806 that support the workload profile 803 are determined.
  • a system profile 806 describes the hardware, software, services, configuration, topology, resource headroom, and other capabilities within the system that may be used to support and execute the workload processes described in the workload profile 803.
  • the data set profile 840 including the one or more data set signatures 140 are determined for the workload profile 803 based on data sets 120 that are identified in the workload profile 803 as being accessed or used during hypothetical, simulated, or historical execution of the workload profile 803.
  • the data set signatures 140 included in a data set profile 840 are determined based on prior experience with previous workloads or workload profiles 803 having similar processes.
  • one or more of the system profiles 806 are evaluated with respect to the data set profile 840 determined for the workload profile 803 to eliminate, modify, or improve the one or more system profiles 806 based on a business attribute.
  • the business attribute refers to performance, cost, flexibility, or any other objective defined by the business environment 103 according to which the workload should be optimized.
  • the detail included in the data set signatures 140 of the data set profile 840 may indicate that certain aspects of the system profile 806 may not be needed to perform a particular process using a data set 120. In that case, the system profile 806 can be modified or reduced to consume less resources than previously indicated.
  • one of the system profiles 806 of the one or more system profiles 806 is selected based on business requirements, business attributes, and a tradeoff analysis.
  • the data management system 106 may in the future be sized, configured and provisioned to execute the workload corresponding to the workload profile 803.
  • step 906 may be omitted, and one or more system profiles may be determined at the beginning of step 912, based on the data set profile 840 and the workload profile 803.
  • FIG. 10 is a diagram illustrating various examples of system management policies 243 that are invoked, executed, or orchestrated based on data set signatures 140 according to various embodiments of the disclosure.
  • the system management policies 243 may be invoked, executed, or orchestrated by the policy manager 137. While some of the system management policies 124 invoke and execute operations on data sets 120, other policies apply to storage management, cloud management, and system management.
  • data set signatures 140 include detailed metadata, both operational metadata 133 and business related metadata 123, regarding the data sets 120 stored at the data management system 106.
  • the policy manager 137 can utilize the system management policies 243 to invoke fine grained operations on the data set 120 more efficiently, with more accuracy, with less latency, with finer granularity, more comprehensively, and/or at more opportune moments.
  • system management policies 124 can improve the matching and/or mapping steps in the creation of data set signatures 140 for the data sets 120 stored at the data management system 106, and/or the recognition of relationships among the data sets 120.
  • system management policies 124 enable management software to improve the efficiency, performance, automation, correctness, and/or outcomes of processes for system management, workload management, and/or event management.
  • a system management policy 243 is implemented based on certain parameters that are associated with a data set 120.
  • a system management policy 243 is applied to a data set 120 based on metadata included within a data set signature 140 describing the data set 120.
  • the system management policy 243 may be applied to a data set 120 based on metadata included within a data set signature 140 describing the data set 120 and a current location of the data set 120.
  • the system management policy 243 may be applied to a data set 120 based on metadata included within a data set signature 140 describing the data set 120 and metadata included in a second data set signature 140 describing a second data set 120 that is related to the data set 120.
  • the system management policy 243 may be applied to a data set 120 based on metadata included within a data set signature 140 describing the data set 120 and an operation performed on the data set 120.
  • the system management policy 243 may be applied to a data set 120 based on metadata included within a data set signature 140 describing the data set 120 and a user requesting access to the data set 120.
  • the system management policy 243 may be applied to a data set 120 based on metadata included within a data set signature 140 describing the data set 120 and a workload profile indicating that a workload to be performed by the data management system accesses the data set 120.
  • a system management policy 243 may be scheduled for application, or proposed to the policy manager 137 for application, instead of being immediately applied as described above.
  • the system management policy 243 may be applied to data management, storage management, system management, cloud management, workload management, and/or event management.
  • FIG. 10 lists examples of system management policies 243 that indicate operations that may be enforced, executed, or orchestrated by the policy manager 137 more efficiently, with more accuracy, with less latency, more comprehensively, and/or at more opportune moments using the data set signatures 140.
  • the various operations indicated by the system management policies 243 described herein may be collectively referred to herein as data manipulation operations.
  • a preferred location policy 1001 indicates that a data set 120 described by a certain type of metadata, as indicated by the data set signature 140, should be stored at a particular type of location or at a specific location within the data management system 106 or external to the data management system 106.
  • the preferred location policy 1001 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to store the data set 120 at a type of location or at a specific location within the data management system 106 or external to the data management system 106 based on the data set signature 140 of the data set 120.
  • the location specified for the data set 120 may be determined to improve performance for accessing important data sets 120, improve security for important data sets 120, and balance the load of data sets 120 stored at the data management system 106.
  • a data mobility policy 1002 indicates that a data set 120 described by a certain type of metadata, as identified in the data set signature 140, and stored at a particular location should be relocated to another type of location or to a specific location within the data management system 106 or external to the data management system 106.
  • the data mobility policy 1002 may be indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to change a location of the data set 120 to another location based on the data set signature 140 of the data set 120 and a current location of the data set 120.
  • the instructions may instruct the policy manager 137 to change a location of the data set 120 periodically or based on a triggering event in response to the data set 120 being associated with the data set signature 140.
  • the data mobility policy 1002 may be used to move data sets 120 or copy data sets 120 to another location periodically or based on triggered events.
  • a data deduplication policy 1003 indicates that a data set 120 described by a certain type of metadata, as identified in the data set signature 140, should be deduplicated, such that any duplicates of the data set 120 are removed from storage to save space.
  • the data deduplication policy 1003 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to delete duplicates of the data set 120 in response to the data set 120 being associated with the data set signature 140.
  • a data deduplication policy 1003 indicates when deduplication should be performed, and/or which of the duplicates discovered should be the one retained (e.g., retain the most recent copy).
  • a data compression policy 1004 indicates that a data set 120 described by a certain type of metadata, as identified in the data set signature 140, should be compressed to reduce the size of the data set 120.
  • the data compression policy 1004 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to compress the data set 120 in response to the data set 120 being associated with the data set signature 140.
  • a data compression policy 1004 is triggered when a data set 120 is to be stored for the first time at the data management system 106.
  • the namespace mapping policy 1005 indicates that a namespace mapping exists between a data set 120 described by business-related metadata 123, and one or more other data sets 120 described by operational metadata 133.
  • one data set 120 described by operational metadata 133 may have a namespace mapping to one or more data sets 120 described by business-related metadata.
  • Such mappings which may be 1:1, 1:N, M:l, or rarely M:N, precede the combination of metadata to create or update a data set signature 140 as described in FIG. 4A.
  • the namespace mapping policy 1005 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the data set signature mapper 136 to assert a namespace mapping between two or more data sets 120.
  • Such a policy connects a storage view of the data sets 120 to a business view of the data sets 120 to recognize and/or monetize high business impact data sets 120.
  • the tagging policy 1006 indicates that a data set 120 described by a certain type of metadata, as identified in the data set signature 140, should be tagged or identified as a certain type of data set 120.
  • the tagging policy 1006 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to tag the data set 120 in response to the data set signature 140 being associated with the data set 120.
  • the data management system 106 and the business environment 103 may treat the data set 120 differently from other data sets 120, for example, with respect to its access, location, copies, and/or updates.
  • a tagging policy 1006 that tags data sets 120 based on business importance, as identified or otherwise reflected in the data set signature allows high- impact data to be optimized for the business.
  • the relationship identification policy 1007 indicates that a relationship exists between a data set 120 described by a certain type of metadata, as identified in the data set signature 140, and one or more other sets also being described by a certain type of metadata identified in the data set signature 140.
  • the relationship identification policy 1007 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to assert a relationship between one or more data sets 120 based on the data set signature 140 associated with each of the data sets 120.
  • the instructions may instruct the policy manager 137 to first determine whether a relationship exists between a first data set 120 and a second data set 120 based on a data set signature 140 of the first data set 120 and a data set signature 140 of the second data set 120. Then, the instructions may instruct the policy manager 137 to update the operational metadata 133 and/or business metadata 123 within the data set signature 140 of the first data set 120 and/or the data set signature 140 of the second data set 120 in response to the relationship existing between the first data set 120 and the second data set 120.
  • the relationship existing between data sets 120 may be that two data sets 120 are copies of each other, that one data set 120 is a different version of another data set 120, one data set 120 is a subset of another data set 120, one data set 120 is a transformation of another data set 120, etc. Detecting and maintaining a record in metadata of relationships among data sets 120 may then advantageously trigger additional policy invocations or operations on one or more of the data sets 120.
  • a triggered operations policy 1008 indicates that when an operation is performed on a data set 120 described by a certain type of metadata, as identified in the data set signature 140, other operations may also need to be performed on the data set 120. For example, when a data set 120 having a data set signature 140 is accessed (a first operation), the data set 120 may also be copied to the cache (a second operation). In this way, the triggered operations policy 1008 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to perform one or more additional operations on a data set 120 in response to the data set 120 being associated with a data set signature 140 and a first operation being performed on a data set 120.
  • a propagated operations policy 1009 indicates that when an operation is performed on a first data set 120 described by a certain type of metadata, as identified in the data set signature 140, other operations may also need to be performed on other data sets 120 that are related to the first data set 120. For example, when a first data set 120 is deleted, other related data sets 120 (such as copies, subsets, transformations, etc.) of the first data set 120 may also need to be deleted.
  • the propagated operations policy 1009 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to perform one or more additional operations on other related data sets 120 in response to a first data set 120 being associated with a data set signature 140 and first operation being performed on a first data set 120.
  • an access governance policy 1010 indicates whether to permit or deny access to a data set 120 having certain metadata, as identified by the data set signature 140, stored at a certain location based on the requested operation and the user, whether human or not, making the request.
  • the access governance policy 1010 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to determine whether to permit or deny access to a data set 120 based on the data set signature 140, the location of the data set 120, an operation requested on the data set 120, and/or a user requesting to the operation on the data set 120.
  • the access governance policy 1010 may help enforce access policies more selectively based on the metadata in the data set signature 140, and may also help enforce access policies consistently across heterogeneous data stores.
  • an access enablement policy 1011 indicates how to permit access to a data set 120 having certain metadata, as identified by the data set signature 140, stored at a certain location based on the requested operation and the user making the request.
  • the access enablement policy 1011 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to permit restricted access to the data set 120 or determine whether to transform, mask, or allow access to a data set 120 based on the data set signature 140, the location of the data set 120, a requested operation on the data set 120, and/or a user requesting the access operation to the data set 120.
  • the access enablement policy 1011 may help sanitize or anonymize requested data sets 120, especially in cases where sensitive personal identifying information, medical details, financial details, historical details, or similar information is contained in requested data sets 120.
  • the configuration policy 1012 indicates a configuration of an existing or future system or its workload parameters based partly on one or more data sets 120 having certain metadata, as identified by the data set signature 140, stored at a certain location, and optionally associated with a certain workload profile 803.
  • the configuration policy 1012 may indicate a data manipulation operation, or include a set of instructions stored at the database management system 106 or externally to the database management system 106, which instructs management tools to alter the configuration or workload parameters of the existing system.
  • the configuration policy 1012 optimizes storage and cloud configurations for anticipated future workloads.
  • the configuration of a future system or its workload parameters may describe intended placement or treatment of data sets 120, or modify configurations of the system profile 806, , and/or describe where and how workloads should be placed.
  • the provisioning policy 1013 indicates how to provision storage tiers of various types and capacities for the storage of one or more data sets 120 having certain metadata, as identified by the data set signature 140, for a workload within the data management system 106.
  • the provisioning policy 1013 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to provision storage tiers as described in the provisioning policy 1013.
  • the provisioning policy 1013 provisions the storage for the storage of various types of data sets 120 at possibly more than one location based on user profiles, anticipated data volumes and types, and predicted workloads.
  • the provisioning policy 1013 orchestrates the provisioning of system resources such as computation and/or networking, based on certain metadata, as identified by the data set signature 140.
  • a workload migration policy 1014 indicates that, given one or more data sets 120 described by a certain type of metadata, as identified in the data set signature 140, and associated with a particular workload, all or part of the workload should be migrated to another location, such as a cloud server. This is different than migrating any or all of the data sets 120 accessed by the particular workload.
  • the workload migration policy 1014 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106 or externally to the data management system 106, which instructs system management and/or workload management tools to migrate the workload.
  • a workload migration policy may orchestrate workload migration once, several times, periodically, or under certain circumstances.
  • the workload migration policy 1014 provides decisions and instructions to management tools to migrate workloads when deemed appropriate to better access data, improve performance, and balance a load of the data management system 106 or other system.
  • an event response policy 1015 indicates that when an event occurs at a data set 120 described by a certain type of metadata, as identified in the data set signature 140, applicable policies may be executed or orchestrated for the data set 120 and/or for other systems or objects.
  • the event response policy 1015 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to execute policies for the data set 120 when an event occurs at the data set 120 based on the data set signature 140, a workload that accesses the data set 120, a location of the data set 120, and/or a user accessing the data set 120.
  • the event response policy 1015 provides instructions to respond automatically to an event at a data set 120 for which policies apply.
  • policies may be executed or orchestrated for the data set 120 and/or for other systems or objects. For example, during the time that a prior state of a data management system is being restored (the event), access may not be permitted to data sets 120 associated with a particular business organization.
  • an event recognition policy 1016 indicates that when one or more events occurs at a data set 120 described by a certain type of metadata, as identified in the data set signature 140, another event may be detected and identified as the root cause of the one or more events.
  • the event recognition policy 1016 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to recognize an event occurring at a data set 120 as a root cause or failure resulting in other events or failures occurring at the data management system 106.
  • the event may be recognized based on a data set signature 140 and/or a workload that accesses the data set 120.
  • the event recognition policy 1016 is used to debug or determine fingerprints of a system bottleneck that may occur at the data management system 106. Complex events can be difficult to recognize and to trace to the root cause, and considering one or more data set signatures 140 can provide valuable insight into such recognition and tracing. For example, an unauthorized attempt to access one or more data sets 140 owned by a particular organization within the business environment 103 may be more quickly recognized by considering the data set signatures 140 of those data sets 120.
  • a situational policy 1017 indicates that certain policies or operations may be executed or orchestrated for a data set 120 described by a certain type of metadata, as identified in the data set signature 140, to control performance and operations of the data management system 106 based on machine learning or trained data relative to the best .
  • the situational policy 1017 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to orchestrate or execute one or more policies based on a data set signature 140, a workload to be performed using the data set 120, an operation to be performed on the data set 120 or already performed on the data set 120, a location of the data set 120, and/or a user requesting access to the data set 120.
  • system management policies 243 may be organized in a multi-dimensional table that covers all or most situations.
  • situational policies 1017 applicable to the current situation may be executed or orchestrated for the data set 120 and/or for other systems or objects.
  • a tradeoff policy 1018 indicates preferences and/or weightings for certain business attributes, based at least partly on data sets 120 described by a certain type of metadata, as identified in the data set signature 140, stored at a certain location, and accessed by a workload.
  • the tradeoff policy 1018 allows the data management system 106 and/or other management systems to consider data set signatures 140 in setting preference levels for optimization of impact, performance, cost, or other business attributes. Once preference levels are set, management policies that access and consider these preference levels may be activated, and may in turn orchestrate operations on the data management system or on other systems. In short, the improved knowledge about data sets 120 that is captured in data set signatures 140 can be leveraged in determining tradeoffs in optimization preferences.
  • a policy creation policy 1019 enables the creation of additional system management policies 243 for data sets 120 described by a certain type of metadata, as identified in the data set signature 140.
  • the policy creation policy 1019 may indicate a data manipulation operation, or include a set of instructions stored at the data management system 106, which instructs the policy manager 137 to assert and activate new system management policies 243 for data sets 120 based on the data set signature 140, an operation performed or to be performed on the data set 120, the location of the data set 120, and/or a workload accessing the data set 120.
  • the policy creation policy 1019 may recognize through machine learning of the common actions or orchestrations that system management tools are asked to perform, which are within a data management system 106, and which are directly or indirectly related to data set 120 management, and may then create an automated system management policy to initiate these actions or orchestrations under similar circumstances.
  • the actions or orchestrations may be taken on other systems, and/or may involve cloud management, workload management, and/or system configuration.
  • FIG. 11 is a flowchart of a method 1100 to implement data set signatures 140 for data impact driven storage management according to various embodiments of the disclosure.
  • the method 1100 may be implemented by the information processing element 200 or the storage manager 126 in the data management system 106.
  • the method 1100 may be implemented after a data set 120 is stored in a data store 129 of the data management system 106.
  • operational metadata 133 describing a data set 120 is stored in a memory 233 of the data management system 106 is stored, for example, in a data catalog 240.
  • the data catalog 240 may be stored at a memory 233 of an information processing element 200, which may be implemented as a storage manager 126.
  • the data set signature mapper 136 stores the operational metadata 133 describing the data set 120 in the memory 233 of the information processing element 200 or the data management system 106.
  • business related metadata 123 describing business impact features of the data set 120 is received from one or more business environments 103.
  • the business environment 103 is external to the data management system 106.
  • Tx/Rx 210 of the information processing element 200 implemented as a storage manager 126 receives the business related metadata 123 from one or more business data catalogs 115 associated with one or more business environments 103.
  • the data set signature mapper 136 receives the business related metadata 123 from the business environment 103.
  • a data set signature 140 comprising a combination of the operational metadata 133 and the business related metadata 123 and describing the data set 120 is generated and stored at the data management system 106.
  • the data set signature mapper 136 when executed by the processor 230 of the information processing element 200 implemented as a storage manager 126, generates the data set signature 140 comprising the operational metadata 133 and the business related metadata 123.
  • the data set signatures 140 may also be stored in the memory 233 of the information processing element 200 implemented as a storage manager 126.
  • a data manipulation operation is performed on the data set 120 using the combination of the operational metadata 133 and the business related metadata 123 in the data set signature 140.
  • the policy manager 137 when executed by the processor 230 of the information processing element 200, performs or requests an operation on the data set based on the data set signature 140.
  • the operation performed on the data set 120 may be a read, write, copy, move, transform, or any other type of operation performed by a data management system 106 on the data set 120.
  • a lower context may refer to the stored data sets 120.
  • a second context above the stored data sets 120 may include on-demand data virtualization.
  • a third context above the stored data sets 120, transformation capabilities, and data copy or move operations, may include a layer of DataOps processes and process flows.
  • a higher context, above the stored data sets 120, stream processing capabilities, and on-the-fly analytic model evaluations, may include high-level Internet of Things (IoT) status characterization and event recognition.
  • IoT Internet of Things
  • the embodiments described herein also group together capabilities in a process. However, the separate steps in the process may be unbundled to apply in different environments in which not all the steps are needed. For example, using the embodiments disclosed herein, the metadata enrichment of objects may be unbundled based on observed, inferred, or scraped value representations of these objects at the various contexts described above. The embodiments disclosed herein may also be applied to unbundle the catalog mapping and integration capabilities across the contexts and systems described above. In addition, the embodiments disclosed herein may also be applied to unbundle the service of enhanced data profiles to a broader set of management systems.
  • FIG. 12 is a diagram of an apparatus 1200 configured to implement data set signatures 140 for data impact driven storage management according to various embodiments of the disclosure.
  • the apparatus 1200 comprises a means for storing 1203, a means for receiving 1206, and a means for performing 1209.
  • the means for storing 1203 includes a means for storing operational metadata 133 describing a data set 120 in a memory 233 of the apparatus 1200.
  • the means for receiving 1206 includes a means for receiving business related metadata 123 describing business impact features of the data set 120 received from one or more business environments 103 external to the apparatus 1200.
  • the means for storing 1203 further includes a means for storing a data set signature 140 comprising a combination of the operational metadata 133 and the business related metadata 123 and describing the data set 120.
  • the means for performing 1209 includes a means for performing a data manipulation operation on the data set 120 using the combination of the operational metadata 133 and the business related metadata 123 in data set signature 140.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Procédé mis en œuvre par un système de gestion de données, comprenant le stockage, par un dispositif de mappage de signature d'ensemble de données d'un système de gestion de données, des métadonnées opérationnelles décrivant un ensemble de données, les métadonnées opérationnelles et l'ensemble de données étant stockés dans une mémoire du système de gestion de données, la réception, par le dispositif de mappage de signature d'ensemble de données du système de gestion de données, des métadonnées commerciales décrivant l'ensemble de données à partir d'un environnement commercial, l'environnement commercial étant externe au système de gestion de données, le stockage, par le dispositif de mappage de signature d'ensemble de données du système de gestion de données, d'une signature d'ensemble de données comprenant une combinaison des métadonnées opérationnelles et des métadonnées commerciales, et la réalisation, par un gestionnaire de politique du système de gestion de données, d'une opération de manipulation de données sur l'ensemble de données à l'aide de la combinaison des métadonnées opérationnelles et des métadonnées commerciales dans la signature d'ensemble de données.
PCT/US2019/047699 2019-08-22 2019-08-22 Signatures d'ensemble de données pour gestion de stockage entraînée par impact de données WO2021034329A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980099580.5A CN114586022A (zh) 2019-08-22 2019-08-22 用于数据影响驱动存储管理的数据集签名
PCT/US2019/047699 WO2021034329A1 (fr) 2019-08-22 2019-08-22 Signatures d'ensemble de données pour gestion de stockage entraînée par impact de données

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2019/047699 WO2021034329A1 (fr) 2019-08-22 2019-08-22 Signatures d'ensemble de données pour gestion de stockage entraînée par impact de données

Publications (1)

Publication Number Publication Date
WO2021034329A1 true WO2021034329A1 (fr) 2021-02-25

Family

ID=67874541

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/047699 WO2021034329A1 (fr) 2019-08-22 2019-08-22 Signatures d'ensemble de données pour gestion de stockage entraînée par impact de données

Country Status (2)

Country Link
CN (1) CN114586022A (fr)
WO (1) WO2021034329A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065171A (zh) * 2022-08-08 2022-09-16 国网浙江省电力有限公司 DataOps平台联控感知终端系统的远程控制方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040215629A1 (en) * 2003-04-24 2004-10-28 International Business Machines Corporation Data abstraction model driven physical layout
US7010523B2 (en) * 2000-08-01 2006-03-07 Oracle International Corporation System and method for online analytical processing
US20130151571A1 (en) * 2011-12-07 2013-06-13 Sap Ag Interface defined virtual data fields
US20180357444A1 (en) * 2016-02-19 2018-12-13 Huawei Technologies Co.,Ltd. System, method, and device for unified access control on federated database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7010523B2 (en) * 2000-08-01 2006-03-07 Oracle International Corporation System and method for online analytical processing
US20040215629A1 (en) * 2003-04-24 2004-10-28 International Business Machines Corporation Data abstraction model driven physical layout
US20130151571A1 (en) * 2011-12-07 2013-06-13 Sap Ag Interface defined virtual data fields
US20180357444A1 (en) * 2016-02-19 2018-12-13 Huawei Technologies Co.,Ltd. System, method, and device for unified access control on federated database

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065171A (zh) * 2022-08-08 2022-09-16 国网浙江省电力有限公司 DataOps平台联控感知终端系统的远程控制方法
CN115065171B (zh) * 2022-08-08 2022-11-11 国网浙江省电力有限公司 DataOps平台联控感知终端系统的远程控制方法

Also Published As

Publication number Publication date
CN114586022A (zh) 2022-06-03

Similar Documents

Publication Publication Date Title
US11755628B2 (en) Data relationships storage platform
US11449506B2 (en) Recommendation model generation and use in a hybrid multi-cloud database environment
US8838556B1 (en) Managing data sets by reasoning over captured metadata
US7801894B1 (en) Method and apparatus for harvesting file system metadata
US11256702B2 (en) Systems and methods for management of multi-tenancy data analytics platforms
US9348856B2 (en) Data movement from a database to a distributed file system
CN104781812A (zh) 策略驱动的数据放置和信息生命周期管理
US20220043826A1 (en) Automated etl workflow generation
US10990689B1 (en) Data governance through policies and attributes
US11281704B2 (en) Merging search indexes of a search service
US20220138343A1 (en) Method of determining data set membership and delivery
CN110019440B (zh) 数据的处理方法及装置
US9275059B1 (en) Genome big data indexing
WO2021034329A1 (fr) Signatures d'ensemble de données pour gestion de stockage entraînée par impact de données
US11657069B1 (en) Dynamic compilation of machine learning models based on hardware configurations
US11914623B2 (en) Approaches for managing access control permissions
Lakhe et al. Case study: Implementing lambda architecture
Xia Research on Insurance Data Analysis Platform Based on the Hadoop Framework
Hamori MDSAA
Akhund Computing Infrastructure and Data Pipeline for Enterprise-scale Data Preparation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19765387

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19765387

Country of ref document: EP

Kind code of ref document: A1