WO2018102691A1 - Generating, accessing, and displaying lineage metadata - Google Patents

Generating, accessing, and displaying lineage metadata Download PDF

Info

Publication number
WO2018102691A1
WO2018102691A1 PCT/US2017/064227 US2017064227W WO2018102691A1 WO 2018102691 A1 WO2018102691 A1 WO 2018102691A1 US 2017064227 W US2017064227 W US 2017064227W WO 2018102691 A1 WO2018102691 A1 WO 2018102691A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
metadata
lineage
data structure
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2017/064227
Other languages
English (en)
French (fr)
Inventor
David Clemens
Dusan Radivojevic
Neil Galarneau
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ab Initio Technology LLC
Original Assignee
Ab Initio Technology LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN202410857505.3A priority Critical patent/CN118916521A/zh
Priority to CA3045810A priority patent/CA3045810A1/en
Priority to EP17851913.8A priority patent/EP3549036B1/en
Priority to AU2017367772A priority patent/AU2017367772B2/en
Priority to CN201780074708.3A priority patent/CN110023925B/zh
Priority to JP2019525760A priority patent/JP7170638B2/ja
Application filed by Ab Initio Technology LLC filed Critical Ab Initio Technology LLC
Priority to DE112017006106.7T priority patent/DE112017006106T5/de
Publication of WO2018102691A1 publication Critical patent/WO2018102691A1/en
Anticipated expiration legal-status Critical
Priority to AU2020203027A priority patent/AU2020203027B2/en
Priority to JP2021191498A priority patent/JP7410919B2/ja
Priority to JP2023216403A priority patent/JP7712998B2/ja
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/835Query processing
    • G06F16/8365Query optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/832Query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes

Definitions

  • This application relates to data structures and methods for generating, accessing, and displaying lineage metadata, e.g. lineage of an element of data stored in a data storage system.
  • lineage metadata e.g. lineage of an element of data stored in a data storage system.
  • Enterprises use data processing systems, such as data warehousing, customer relationship management, and data mining, to manage data.
  • data are pulled from many different data sources, such as database files, operational systems, flat files, the Internet, and other sources into a central repository.
  • data are transformed before being loaded in the data system. Transformation may include cleansing, integration, and extraction.
  • Metadata (sometimes called "data about data”) are data that describe other data's attributes, format, origins, histories, inter-relationships, etc. Metadata management can play a central role in complex data processing systems.
  • a user may want to investigate how certain data are derived from different data sources. For example, a user may want to know how a dataset or data object was generated or from which source a dataset or data object was imported. Tracing a dataset back to sources from which it is derived is called data lineage tracing (or "upstream data lineage tracing"). Sometimes a user may want to investigate how certain datasets have been used (called “downstream data lineage tracing" or "impact analysis”), for example, which application has read a given dataset. A user may also be interested in knowing how a dataset is related to other datasets. For example, a user may want to know if a dataset is modified, what tables will be affected.
  • Lineage which is a kind of metadata, enables a user to obtain answers to questions about data lineage (e.g., "Where did a given value come from?" "How was the output value computed?" "Which applications produce and depend on this data?”).
  • a user can understand the consequences of proposed modifications (e.g., “If this piece changes, what else will be affected?" "If this source format changes, which applications will be affected?”).
  • a user can also obtain questions to answers involving both technical metadata and business metadata (e.g., "Which groups are responsible for producing and using this data?” "Who changed this application last?” "What changes did they make?”).
  • any number of prior inputs or data processing steps might be responsible for this unexpected value.
  • lineage is sometimes presented to a user in the form of a diagram that includes a visual element representing an element of data of interest, as well as visual elements representing other elements of data that affect or are affected by the element of data of interest.
  • a user can view this diagram and visually identify other elements of data and/or transformations that affect the element of data of interest.
  • the user can see whether any of the elements of data and/or transformations may be a source of unexpected values, and correct (or flag for correction) any of the underlying data processing steps if problems are discovered.
  • the user can identify any elements of data or transformations that may be essential to a portion of the system (e.g., such that the element of data of interest would be affected by their removal from the system), and/or elements of data or transformations that may not be essential to a portion of the system (e.g., such that the element of data of interest would not be affected by their removal from the system).
  • a method performed by a data processing apparatus including receiving a portion of metadata from a data source, the portion of metadata describing nodes and edges, at least some of the edges each representing an effect of one node upon another node, each edge having a single direction; generating instances of a data structure representing the portion of metadata, at least one instance of the data structure including an identification value that identifies a corresponding node, one or more property values representing respective properties of the corresponding node, and one or more pointers to respective identification values, each pointer representing an edge associated with a node identified by the corresponding respective identification value; storing the instances of the data structure in random access memory; receiving a query that includes an identification of at least one particular element of data; and using at least one instance of the data structure to cause a display of a computer system to display a representation of lineage of the particular element of data.
  • Lineage metadata can be stored using a special-purpose data structure designed for speed and efficiency when responding to queries for lineage metadata.
  • Lineage metadata can be stored in memory, such that a computer system storing the lineage metadata can respond to queries for lineage metadata more quickly than if the lineage metadata were not stored in memory (e.g., if the lineage metadata were stored in and accessed from a hard disk or another kind of storage technique).
  • lineage data can be retrieved much faster than other techniques, e.g., 500 times faster.
  • FIGS. 1A-1D show a metadata processing environment.
  • FIGS. 2A-2E show examples of information displayed in a metadata viewing environment.
  • FIG. 3 shows an example data structure
  • FIG. 4 shows an example walk plan
  • FIG. 5A shows a flowchart representing a procedure for storing lineage metadata in a form defined by a special-purpose data structure.
  • FIG. 5B shows a flowchart representing a procedure for causing lineage metadata to be displayed.
  • FIG. 6 shows a flowchart representing a procedure for traversing lineage metadata stored in the form of special -purpose data structures.
  • a system that manages access to metadata can receive a query from a user requesting lineage of a particular element of data and, in response, deliver a diagram representing lineage of the element of data. If the element of data belongs to a data storage system that stores a relatively large amount of data, the system that manages access to the metadata may need to expend a large amount of processing time in order to process the lineage of the element of data and generate the corresponding diagram.
  • processing can be sped up and made more efficient by introducing a system that is dedicated to processing lineage metadata and optimized for this kind of processing. Accordingly, this specification describes a technique by which a specialized system is used for the purpose of processing and storing lineage metadata in a manner that is typically faster and more efficient than if the specialized system were not used.
  • FIG. 1 A shows a metadata processing environment 100 that includes a lineage server 102 which stores and provides lineage metadata to other systems in the
  • the metadata processing environment 100 also includes a metadata server 104 which typically responds to requests for metadata.
  • the metadata server 104 has access to metadata 106 stored in a metadata database 108.
  • the metadata 106 comes from data sources 110A-C which contribute metadata 112A-C to the metadata database 108 on an ongoing basis.
  • the data sources 1 lOA-C may be any combination of relational databases, flat files, network sources, and so on.
  • the metadata server 104 responds to a query 114 received from a user terminal 116 operated by a user 118.
  • the user terminal 116 could be a computing device such as a personal computer, laptop computer, tablet device, smartphone, etc.
  • the user terminal 1 16 operates a network-based user application such as a web browser, e.g., if the metadata server 104 is configured to provide access to data over a network and includes, or communicates with, a web server that can interface with the web browser.
  • a web browser e.g., if the metadata server 104 is configured to provide access to data over a network and includes, or communicates with, a web server that can interface with the web browser.
  • many of the interactions between computer systems described herein may take place using the Internet or a similar communications network using communications protocols commonly used on these kinds of networks.
  • the metadata server 104 is configured to respond to queries for multiple kinds of metadata.
  • the metadata server 104 can process a query 114 that requests lineage of a particular element of data, e.g., an element of data 120 stored by one of the data sources 1 lOA-C, by accessing metadata 106 stored in the metadata database 108 describing the lineage of the particular element of data.
  • the metadata server 104 can then provide lineage metadata 122 to the user terminal 116, e.g., lineage metadata in the form of a lineage diagram (described below with respect to FIGS. 2A-2E).
  • processing a query 114 related to lineage metadata is a task that takes a relatively large amount of processing time and/or uses a relatively large amount of processing resources of the metadata server 104.
  • the metadata server 104 may need to access metadata 106 stored in the metadata database 108.
  • the metadata server 104 would need to spend processing resources generating queries to the metadata database 108 in order to access all of the required metadata.
  • the process of transmitting a query to the metadata database 108 and waiting for a response introduces latency, e.g.,
  • the metadata server 104 would need to process the metadata received from the metadata database 108 in order to extract metadata needed to create a lineage diagram.
  • the metadata received from the metadata database 108 may include information not directly pertinent to lineage of an element of data of interest, since the metadata database 108 stores a variety of kinds of metadata beyond lineage metadata, and so additional processing time is used to identify and remove the information not pertinent to lineage.
  • the lineage server 102 is used to provide lineage metadata to the metadata server 104, e.g., to improve performance of the metadata processing environment 100.
  • the lineage server 102 is a specialized system that stores lineage metadata 124 in a form that can be typically accessed faster and more efficiently than techniques that do not use a lineage server.
  • the lineage server 102 stores lineage metadata 124 using a special-purpose data structure designed for speed and efficiency when responding to queries for lineage metadata.
  • a data structure defines an arrangement of data, such that all data stored using a particular data structure is arranged in the same manner. The data structure technique is described in further detail below with respect to FIG. 3.
  • the lineage server 102 transmits queries 126 to the metadata database 108 in order to retrieve lineage metadata 128.
  • the lineage server 102 ideally stores a comprehensive body of lineage metadata, e.g., for most or all of the data elements 120 stored by the data sources 1 lOA-C. In this way, the lineage server 102 can respond to queries for lineage for most of the data elements 120 for which a query might be made.
  • the lineage server 102 receives lineage metadata 128 from the metadata database 108, the lineage server 102 updates its data structures containing stored lineage metadata 124.
  • the lineage server 102 sends new queries 126 to the metadata database 108 on regular intervals, e.g., every hour or every day or another interval, in order to store a relatively up-to-date body of lineage metadata.
  • the intervals can be scheduled intervals, e.g., corresponding to schedule data maintained by the lineage server 102.
  • the metadata server 104 receives a query 114 for lineage metadata (e.g., a query for lineage metadata that can be used to display a lineage diagram on the user terminal 116), the metadata server 104 can provide the query 114 to the lineage server 102. The lineage server can then return lineage metadata 122 responsive to the query 114.
  • the metadata server 104 need not expend as much processing time and use as many processing resources preparing the received lineage metadata 122 before providing it to the user terminal 116, compared to lineage metadata retrieved using other techniques such as retrieving the lineage metadata from the metadata database 108.
  • FIGS. 1C and ID show elements of the lineage server 102 and metadata server 104 and the way in which they interact.
  • the metadata server 104 receives a query 114 that requests lineage of a particular element of data.
  • the query 114 identifies the particular data element (e.g., one of the data elements 120 of FIG. 1A) for which lineage is requested.
  • the metadata server 104 uses the identity of the data element to select a walk plan 130 from a set of walk plans 132 that can be used to gather lineage metadata related to the data element.
  • a walk plan 130 is a data structure (e.g., a structured document containing tagged portions, such as an XML document) which describes how to traverse ("walk") a set of lineage metadata in a particular manner.
  • a walk plan can be selected based on a data type of the data element. For example, a particular data type can be associated with a particular one of the walk plans 132 (e.g., an association stored in an index of associations accessible to the metadata server 104).
  • the walk plans 132 are described in detail below with respect to FIG. 4.
  • the metadata server 104 transmits the query 114 and the walk plan 130 to the lineage server 102.
  • the lineage server 102 identifies lineage metadata relevant to the query 114 among its data structures 134 of lineage metadata.
  • the data structures 134 are a representation of lineage metadata arranged in a way that minimizes the amount of storage space needed to contain them, without omitting any data that is needed to respond to a query for lineage metadata.
  • the lineage server 102 can typically use the data stored in its data structures 134 to provide the metadata server 104 all of the lineage metadata that the metadata server 104 would need to respond to a query 114.
  • the data structures 134 are described in detail below with respect to FIG. 3.
  • the data structures 134 are loaded in memory 135 of the lineage server 102 for fast access (e.g., fast reading and writing of data).
  • memory is random access memory. Random access memory stores items of data in a manner such that each item of data can be accessed in substantially the same amount of time as any other item of the same size (e.g., a byte or a word).
  • other types of data storage such as magnetic disks, have physical constraints that cause some items of data to take longer to access than other elements of data, depending on the current physical state of the disk (e.g., the position of a magnetic read/write head). Items of data stored in random access memory are typically stored at an address unique to that item of data or shared among a small number of items of data.
  • Random access memory is typically volatile, such that data stored in random is lost if the random access memory is disconnected from an active power source (e.g., a computer system loses power).
  • an active power source e.g., a computer system loses power
  • magnetic disks and some other kinds of data storage is non- volatile and retains data absent an active power source.
  • the lineage server 102 stores the data structures 134 in memory 135, the lineage server 102 can read and write lineage metadata faster than techniques that do not store the data structures in memory 135.
  • the data structures 134 are arranged in a way that minimizes the amount of data use.
  • the data structures 134 may omit data such as text strings present in the original lineage metadata obtained from the metadata server 104.
  • all of the data structures 134 e.g., all of the data representing lineage metadata, can be stored in memory 135 while the lineage server 102 is in use.
  • Computer systems typically have constraints on the amount of random access memory that can be used at a given time (e.g., due to addressing limitations).
  • random access memory tends to be more expensive on a per-byte basis than other types of data storage (e.g., magnetic disk).
  • the data structures 134 may have an upper limit to their combined size on a particular computer system. Accordingly, the techniques described herein (e.g., the techniques described below with respect to FIG. 3) minimize their size but retain information with respect to lineage that may be requested by a query 114.
  • the metadata server 104 also stores lineage metadata 137 (e.g., lineage metadata received from the metadata database 108 shown in FIG. 1 A). However, the metadata server 104 does not store most of its stored lineage metadata 137 in random access memory, e.g., because the metadata server 104 does not use the data structures of the lineage server 102. Thus, even if the metadata server 104 also stores some lineage metadata 137, the metadata server 104 can access the lineage metadata 124 of the lineage server 102 to obtain any metadata not stored locally at the metadata server 104. If the lineage server 102 were not used, a metadata server 104 storing some lineage metadata 137 typically would access lineage metadata stored in the metadata database 108 (FIG. 1A), which, as described above, may have performance disadvantages compared to using the lineage server 102.
  • lineage metadata 137 e.g., lineage metadata received from the metadata database 108 shown in FIG. 1 A.
  • random access memory is used here as the primary example, other types of memory can be used with the lineage server 102.
  • another kind of memory is flash memory.
  • flash memory is non-volatile.
  • flash memory typically has constraints on accessing items of data.
  • Some types of flash memory are configured in a way that a collection of data items (e.g., blocks of data items) is the smallest unit of data that can be accessed at a time, as opposed to individually accessible data items. For example, in order to delete an item of data on some types of flash memory, an entire block must be deleted. The remaining items of data can be re-written to the flash memory to preserve them.
  • the lineage server 102 uses the walk plan 130 to traverse the data structures 134 and collect lineage metadata stored in the data structures that is responsive to the query 114. As shown in FIG. ID, the lineage server 102 then sends a response 138 containing the lineage metadata 139 back to the metadata server 104.
  • the metadata server 104 can use the lineage metadata 139 to generate its own response 140 to the query 114.
  • the response 140 could take one of several forms.
  • the response 140 contains the same lineage metadata 139 received from the lineage server 102, e.g., in a form with minimal post-processing.
  • the metadata server 104 performs post-processing on the lineage metadata 139.
  • the metadata server 104 may change the form of the lineage metadata 139 to a human-readable form, e.g., if the lineage metadata 139 is received in an encoded format that is not human-readable.
  • the metadata server 104 generates a lineage diagram based on the lineage metadata 139 and incorporates data representing the lineage diagram into the response 140.
  • the response 140 is transmitted to the user terminal 116 (FIG. 1 A), e.g., if the response 140 is a lineage diagram (as described in detail below with respect to FIGS. 2A-2E).
  • the response 140 is transmitted to an intermediate system before it is transmitted to the user terminal and/or processed into a form suitable for transmission to the user terminal.
  • FIG. 2A shows an example of information displayed in a metadata viewing environment.
  • the metadata viewing environment is an interface that executes on a user terminal, e.g., the user terminal 116 shown in FIG. 1 A.
  • the metadata viewing environment displays information related to a data lineage diagram 200 A.
  • One example of metadata viewing environment is a web-based application that allows a user (e.g., the user 118 shown in FIG. 1 A) to visualize and edit metadata.
  • a user can explore, analyze, and manage metadata using a standard Web browser from anywhere within an enterprise.
  • Each type of metadata object has one or more views or visual representations.
  • the metadata viewing environment of FIG. 2 A illustrates a lineage diagram for target element 206A.
  • the lineage diagram displays the end-to-end lineage for the data and/or processing nodes that represent the metadata objects stored in the metadata server 104 (FIG. 1A); that is, the objects a given starting object depends on (its sources) and the objects that a given starting object affects (its targets).
  • connections are shown between data elements 202A and transformations 204A, two examples of metadata objects.
  • the metadata objects are represented by nodes in the diagram.
  • Data elements 202A can represent datasets, tables within datasets, columns in tables, and fields in files, messages, and reports, for example.
  • An example of a transformation 204A is an element of an executable that describes how a single output of a data element is produced.
  • the connections between the nodes are based on relationships among the metadata objects.
  • FIG. 2B is illustrates a corresponding lineage diagram 200B for the same target element 206A shown in FIG. 2A except each element 202B is grouped and shown in a group based on a context.
  • data elements 202B are grouped in datasets 208B (e.g., tables, files, messages, and reports), applications 21 OB (that contain executables such as graphs and plans and programs, plus the datasets that they operate on), and systems 212B.
  • Systems 212B are functional groupings of data and the applications that process the data; systems consist of applications and data groups (e.g., databases, file groups, messaging systems, and groups of datasets).
  • Transformations 204B are grouped in executables 214B, applications 21 OB, and systems 212B. Executables such as graphs, plans or programs, read and write datasets. Parameters can set what groups are expanded and what groups are collapsed by default. This allows users to see the details for only the groups that are important to them by removing unnecessary levels of details.
  • Using the metadata viewing environment to perform data lineage calculations is useful for a number of reasons. For example, calculating and illustrating relationships between data elements and transformations can help a user determine how a reported value was computed for a given field report.
  • a user may also view which datasets store a particular type of data, and which executables read and write to that dataset.
  • the data lineage diagram may illustrate which data elements (e.g., columns and fields) are associated with certain business terms (e.g., definitions in an enterprise).
  • Data lineage diagrams shown within the metadata viewing environment can also aid a user in impact analysis. Specifically, a user may want to know which downstream executables are affected if a column or field is added to a dataset, and who needs to be notified.
  • Impact analysis may determine where a given data element is used, and can also determine the ramifications of changing that data element. Similarly, a user may view what datasets are affected by a change in an executable, or whether it safe to remove a certain database table from production.
  • Using the metadata viewing environment to perform data lineage calculations for generating data lineage diagrams is useful for business term management. For instance, it is often desirable for employees within an enterprise to agree on the meanings of business terms across that enterprise, the relationships between those terms, and the data to which the terms refer. The consistent use of business terms may enhance the transparency of enterprise data and facilitates communication of business requirements. Thus, it is important to know where the physical data underlying a business term can be found, and what business logic is used in computations.
  • Viewing relationships between data nodes can also be helpful in managing and maintaining metadata. For instance, a user may wish to know who changed a piece of metadata, what the source (or "source of record") is for a piece of metadata, or what changes were made when loading or reloading metadata from an external source. In maintaining metadata, it may be desirable to allow designated users to be able to create metadata objects (such as business terms), edit properties of metadata objects (such as descriptions and relationships of objects to other objects), or delete obsolete metadata objects.
  • metadata objects such as business terms
  • edit properties of metadata objects such as descriptions and relationships of objects to other objects
  • the metadata viewing environment provides a number of graphical views of objects, allowing a user to explore and analyze metadata. For example, a user may view the contents of systems and applications and explore the details of any object, and can also view relationships between objects using the data lineage views, which allows a user to easily perform various types of dependency analysis such as the data lineage analysis and impact analysis described above. Hierarchies of objects can also be viewed, and the hierarchies can be searched for specific objects. Once the object is found bookmarks can be created for objects allowing a user to easily return to them. With the proper permissions, a user can edit the metadata in the metadata viewing environment. For example, a user can update descriptions of objects, create business terms, define relationships between objects (such as linking a business term to a field in a report or column in a table), move objects (for instance, moving a dataset from one application to another) or delete objects.
  • a corresponding lineage diagram 200C for target element 206A is shown, but the level of resolution is set to applications that are participating in the calculation for the target data element 206A. Specifically, applications 202C, 204C, 206C, 208C, and 2 IOC are shown, as only those applications directly participate in the calculation for the target data element 206A. If a user wishes to view any part of the lineage diagram in a different level of resolution (e.g., to display more or less detail in the diagram), the user may activate the corresponding expand/collapse button 212C.
  • FIG. 2D shows a corresponding lineage diagram 200D at a different level of resolution.
  • an expand/collapse button 212C has been activated by a user, and the metadata viewing environment now displays the same lineage diagram, but application 202C has been expanded to show the datasets 214D and executables 216D within application 202C.
  • FIG. 2E shows a corresponding lineage diagram 200E at a different level of resolution.
  • a user has selected to show everything expanded by a custom expansion. Any field or column which is an ultimate source of data (e.g., it has no upstream systems) is expanded.
  • fields that have a specific flag set are also expanded. In this example, the specific flags are set on datasets and fields at a key intermediate point in the lineage, and one column is the column for which the lineage is being shown.
  • Viewing elements and relationships in the metadata viewing environment can be made more useful by adding information relevant to each of the nodes that represent them.
  • One exemplary way to add relevant information to the nodes is to graphically overlay information on top of certain nodes. These graphics may show some value or characteristic of the data represented by the node, and can be any property in the metadata database.
  • This approach has the advantage of combining two or more normally disparate pieces of information (relationships between nodes of data and characteristics of the data represented by the nodes) and endeavors to put useful information "in context.” For example, characteristics such as metadata quality, metadata freshness, or source of record information can be displayed in conjunction with a visual representation of relationships between data nodes.
  • a user can select which characteristic of the data will be shown on top of the data element and/or transformation nodes within the metadata viewing environment. Which characteristic is shown can also be set according to default system settings.
  • the lineage server 102 uses data structures 134 to store lineage metadata in memory (e.g., random access memory).
  • FIG. 3 shows an example data structure 300.
  • the lineage server 102 contains many instances of the data structure 300.
  • An instance of a data structure is a collection of data (e.g., collection of bits) formatted in a manner defined by the data structure.
  • An instance of the data structure 300 described here is sometimes referred to as a "node.”
  • Each instance of the data structure 300 represents a metadata object, e.g., one of the data elements 202A or transformations 204A shown in FIG. 2A.
  • each instance of the data structure 300 represents a node that may be shown in a lineage diagram, e.g., the diagrams 200A-200E shown in FIGS. 2A-2E.
  • the lineage server 102 stores each data structure 300 at a memory location 302 specific to the data structure.
  • Each data structure 300 typically points to memory locations of other data structures.
  • the data structure 300 is made up of several fields.
  • a field is a collection of data, e.g., a subset of the bits that make up an instance of the data structure 300.
  • An identifier field 310 includes data representing a unique identifier for an instance of the data structure 300.
  • a type field 312 includes data representing a type of a metadata object represented by the corresponding instance of the data structure 300. In some examples, the type could be "data element,” "transformation,” and so on. In some examples, the type field 312 also indicates how many forward and backward edges are included in the instance of the data structure 300.
  • Properties fields 314 each represent different characteristics of the metadata object represented by the corresponding instance of the data structure 300.
  • Examples of the properties fields 314 can include a "name” field that includes a text label identifying the metadata object, and a "subtype” field that indicates a subtype of the metadata object, e.g., whether the metadata object represents a file object, executable object, a database object, or another subtype. Other types of properties can be used. In general, the type field 312 and properties fields 314 can be customized for a particular instance of the lineage server 102, and are not confined to the examples listed here.
  • the data structure also includes fields that represent forward edges 316A-C and backward edges 316D-F.
  • the edge fields 316A-F enable the lineage server 102 to "walk" from data structure to data structure and collect the data of the data structure when gathering lineage metadata.
  • Collecting a portion data we mean identifying the portion of data as pertinent to a future action (e.g., transmitting the collected data). Collecting a portion of data sometimes includes copying the data, e.g., copying the data to a buffer or queue to be used in the future action.
  • Each edge field 316A-F includes a pointer field 320A-B.
  • the pointer field 320 A- B stores an address of a respective memory location 322A-B.
  • a memory location 322A-B referenced by a pointer field 320A-B refers to a portion of memory that stores another instance of the data structure 300.
  • one instance of a data structure representing a metadata object is "linked" to one or more other instances of data structures representing other metadata objects.
  • the edges 316A-D can correspond to, e.g., the relationships among the metadata objects shown in the lineage diagram examples 200A-E of FIGS. 2A-2E.
  • a forward edge 316A represents an effect that a metadata object (e.g., the metadata object represented by this instance of the data structure 300) has on another metadata object (e.g., the metadata object represented by the instance of the data structure at the memory location 322A).
  • a backward edge 316D represents an effect that another metadata object (e.g, the metadata object represented by the instance of the data structure at the memory location 322B) has on the metadata object of this instance of the data structure 300.
  • Each edge field 316A-F also includes one or more flags 324.
  • the flags 324 are indicators of information about their associated edge. In some examples, one of the flags 324 may indicate a type of the associated edge, selected from multiple possible types. Many types of edges are possible. For example, some types of edges are input/output edges (representing output from one object and input to another object), element/dataset edges (representing an association between an element and the dataset to which the element belongs), and application/parent edges (representing an association between an executable application and a container, such as a container that also contains datasets associated with the application).
  • the data associated with the identifier field 310, type field 312, and properties fields 314 together may only be a few bytes, e.g., 32 bytes. These fields encode commonly used information within as little as a few bits; for example, if there are only eight possible types for a node, the type field 312 can be as little as three bits long. More complex data, such as strings of text representing the node types, need not be used.
  • the data associated with the memory location 322A-C is typically the same amount of data as the length of a memory address associated with the type of computer system executing software that instantiates the data structure 300. Thus, most or all instances of a data structure 300 may use a relatively small amount of data in total, compared to the data used by other techniques for storing lineage metadata.
  • FIG. 4 shows an example of a walk plan 400.
  • walk plans 400 are typically stored by the metadata server 104.
  • the metadata server 104 provides a walk plan to the lineage server 102 when requesting lineage metadata.
  • a walk plan 400 describes information used by the lineage server 102 when traversing its stored data structures 134.
  • a query for lineage metadata e.g., lineage metadata pertinent to a particular metadata object
  • not all types of lineage metadata need to be returned in response.
  • lineage metadata associated with some types of edges may not need to be returned because it is not responsive to the query.
  • the walk plan 400 includes records 402A-C for each edge type that may be among the types of edges represented by the lineage metadata stored by the lineage server 102.
  • a record 402A includes an edge type field 404 that includes data indicating the type of edge corresponding to the record 402A.
  • a record 402A also includes a follow flag 406, a collect node flag 408, and a collect edge flag 409 for the forward direction 410, and a follow flag 412, a collect node flag 414, and a collect edge flag 415 for the backward direction 416.
  • a follow flag 406, 412 indicates whether or not the lineage server 102 should follow an edge of this edge type when traversing its data structures 134. Put another way, a follow flag 406 for the forward direction 410 indicates whether or not the lineage server 102, referring to FIG. 3, should access the memory location 322A identified by a pointer field 320A of a forward edge field 316A of an instance of the data structure 300.
  • a follow flag 412 for the backward direction 416 indicates whether or not the lineage server 102, referring to FIG. 3, should access the memory location 322b identified by a pointer field 320b of a backward edge field 316D of an instance of the data structure 300.
  • a collect node flag 408, 414 indicates whether or not the lineage server 102 should collect an instance of the data structure 300 (FIG. 3), sometimes referred to as a "node,” pointed to by this edge type when traversing its data structures 134.
  • the data of the instances (or nodes) is added to the data that will be returned in response to a query being processed by the lineage server 102 (FIG. 1 A) processing the query.
  • data associated with the metadata object represented by the instance of the data structure 300 will be among the lineage metadata returned by the lineage server 102.
  • a collect edge flag 409, 415 indicates whether or not the lineage server 102 should collect the edge (e.g, corresponding to the pointer field 320A of an instance of the data structure 300). If an edge is collected, data representing the edge will be among the lineage metadata returned by the lineage server 102. In some implementations, an edge may not be collected if the edge does not represent a flow of data between the nodes. For example, the edge may represent the association between a data object (represented by one node) and a container of the data object (representing by another node).
  • nodes can be associated with each other in a variety of ways that may or may not be collected for inclusion in lineage metadata, and nodes can represent a variety of data that may or may not be collected for inclusion in lineage metadata.
  • a walk plan 400 can be represented in the form of one or more XML (Extensible Markup Language) documents.
  • An XML document is a collection of portions separated by "tags."
  • a tag typically contains a label (e.g., a label identifying the type of tag) and may also include one or more attributes.
  • Tags sometimes come in the form of a start tag and an end tag, such that a start tag is paired with a corresponding end tag. In this way, tags can be hierarchical, such that tags are "nested" within other tags, e.g., by placing a tag between another tag's start tag and end tag pair.
  • the "useEdge” tag specifies information for a given type of edge.
  • Each “useEdge” tag can correspond to a record (e.g., the records 402 A-C of the walk plan 400).
  • the "name” attribute specifies the type of edge (e.g., the edge type 404), the "direction” attribute specifies the direction (e.g., forward direction 410 or backward direction 416), the "collectEdge” attribute specifies whether to collect the edge (e.g., the collect flags 408, 414).
  • Other tags can be used.
  • the "condition special" tag shown in the example above is used to specify custom rules that are carried out when an edge of the specified edge type is followed. In some examples, the custom rules may specify conditions to determine if the edge should be followed and/or collected.
  • FIG. 5A shows a flowchart representing a procedure 500 for storing lineage metadata in a form defined by a special -purpose data structure, e.g., the data structure 300 shown in FIG. 3.
  • the procedure 500 can be carried out, for example, by components of the lineage server 102 shown in FIG. 1A.
  • the procedure requests 502 lineage metadata from a metadata source.
  • the metadata source could be the metadata database 108 shown in FIG. 1A.
  • the request could be a request made on regular or semi-regular intervals, for example, every hour, every ten minutes, every minute, or any other interval.
  • the request could be made in response to an event, e.g., an event such as a notification that new metadata is available at the metadata source.
  • Lineage metadata typically describes nodes and edges, such that each node represents a metadata object, and the edges each represent a one-way effect of one node upon another node, e.g., such that each edge has a single direction.
  • the request is a request for all lineage metadata stored by the data source.
  • the request is a request for lineage metadata that has been added or changed since the last request.
  • the procedure receives 504 data, e.g., lineage metadata, from the metadata source.
  • data e.g., lineage metadata
  • the lineage metadata can be data representing metadata objects and relationships between the metadata objects.
  • the procedure generates 506 data structures, e.g., instances of the data structure 300 shown in FIG. 3.
  • the data structures can contain information corresponding to the data received from the metadata source.
  • each instance of the data structure corresponds to a respective node received from the metadata source.
  • the data structure can include a field for identification values, e.g., an
  • the data structure can also include property fields that represent properties of a node corresponding to an instance of the data structure.
  • the data structure can also pointers to identification values of other nodes, such that the pointers represent edges to the nodes corresponding to the respective identification values.
  • the procedure stores 508 the data structures.
  • the data structures can be stored in memory, e.g., the memory 135 shown in FIG. 1C.
  • the data structures are stored in random access memory. Because the data structures are used to store lineage metadata, any data not relevant to lineage (e.g., other types of metadata stored at the metadata source) can be omitted, reducing the amount of data needed to store the data structures.
  • the procedure returns to requesting 502 lineage metadata from the metadata source, e.g., on the next regularly scheduled interval.
  • FIG. 5B shows a flowchart representing a procedure 520 for causing lineage metadata to be displayed.
  • the procedure 520 can be carried out, for example, by components of the lineage server 102 shown in FIG. 1 A.
  • a lineage server is configured to return a response to a query that includes metadata describing lineage of a particular element of data, e.g., a metadata object.
  • the metadata describes a sequence of nodes and edges, wherein one of the nodes of the sequence represents the particular element of data.
  • the procedure 520 is used to access lineage metadata stored by the procedure 500 described above with respect to FIG. 5A.
  • the procedure receives 522 a query, e.g., a query for lineage metadata.
  • a query e.g., a query for lineage metadata.
  • the query identifies a metadata object for which lineage metadata is requested.
  • the query includes an identification of a type of lineage and a walk plan that identifies which types of edges are relevant to the identified type of lineage.
  • the walk plan includes conditions for following or collecting an edge based on one or more property values representing respective properties of a corresponding node. An example of a walk plan 400 is shown in FIG. 4.
  • the procedure gathers 524 lineage metadata. For example, a node representing the metadata object of the received query can be accessed and collected, and edges (e.g., pointers to memory locations) can be traversed to collect other nodes.
  • edges e.g., pointers to memory locations
  • the procedure transmits 526 the gathered lineage metadata.
  • the gathered lineage metadata can be transmitted to a computer system that issued the query.
  • the gathered lineage metadata may be caused to be displayed 528 on a computer system, e.g., the user terminal 116 shown in FIG. 1 A.
  • the lineage metadata may be displayed in the form of a lineage diagram such as the lineage diagrams 200A-200E shown in FIG. 2A - 2E.
  • FIG. 6 shows a flowchart representing a procedure 600 for traversing lineage metadata stored in the form of special-purpose data structures, e.g., instances of the data structure 300 shown in FIG. 3.
  • the procedure 600 can be carried out, for example, by components of the lineage server 102 shown in FIG. 1 A.
  • the procedure receives 602 a query and walk plan, e.g., the query 114 and walk plan 130 shown in FIG. 1C.
  • the procedure accesses 604 accesses an initial node (e.g., instance of the data structure 300 shown in FIG. 3) representing a metadata object referenced by the query 114.
  • the initial node may be identified by an identifier field 310 (FIG. 3) storing data that is associated with the metadata object.
  • the initial node is then used as the "current" node, and a recursive portion of the process begins in which the current node is selected from a queue and operations are applied to the current node. Put another way, the initial node is placed in a queue as the first node of the queue, and other nodes are subsequently added to the queue as the procedure is carried out.
  • the procedure determines 606 if there are remaining forward edge pointers in the current node (e.g., forward edge pointers that have not yet been accessed). If so, the procedure accesses 608 the next pointer that has yet to be accessed, e.g., accesses the memory location of the pointer to retrieve data stored at that memory location.
  • the procedure determines 610 whether to "walk" (e.g., process) the node at that pointer, e.g., according to the edge type associated with the pointer, based on the walk plan (as described above with respect to FIG. 4). If not, the procedure accesses 608 another pointer. If so, the procedure determines whether to collect 611 the node at that pointer. If so, the procedure stores 612 the data of the node to be returned in response to the query, and then places 614 the node in the queue so that its pointers can be accessed. If not, the procedure only puts the node in the queue.
  • walk e.g., process
  • the procedure determines 616 if there are remaining backward edge pointers in the current node. If so, the procedure accesses 608 the next backward edge pointer.
  • the procedure determines 618 if any nodes remain in the queue. If so, the procedure accesses 620 the next node in the queue, and carries out operations described above using the next node in the queue as the current node. If no nodes remain, the procedure prepares 622 the collected data for transmission to other system.
  • the collected data may be arranged in a particular format because it is transmitted.
  • encoded data in the collected data may be decoded. For example, data fields containing an encoded value can be converted to a text string corresponding to the value.
  • the data can be transmitted, e.g., as described in FIG. 5B with respect to transmission 526 of data.
  • the systems and techniques described herein can be implemented, for example, using a programmable computing system executing suitable software instructions or it can be implemented in suitable hardware such as a field-programmable gate array (FPGA) or in some hybrid form.
  • the software may include procedures in one or more computer programs that execute on one or more programmed or programmable computing system (which may be of various architectures such as distributed, client/server, or grid) each including at least one processor, at least one data storage system (including volatile and/or non-volatile memory and/or storage elements), at least one user interface (for receiving input using at least one input device or port, and for providing output using at least one output device or port).
  • the software may include one or more modules of a larger program, for example, that provides services related to the design, configuration, and execution of dataflow graphs.
  • the modules of the program e.g., elements of a dataflow graph
  • the software may be stored in non-transitory form, such as being embodied in a volatile or non-volatile storage medium, or any other non-transitory medium, using a physical property of the medium (e.g., surface pits and lands, magnetic domains, or electrical charge) for a period of time (e.g., the time between refresh periods of a dynamic memory device such as a dynamic RAM).
  • a physical property of the medium e.g., surface pits and lands, magnetic domains, or electrical charge
  • a period of time e.g., the time between refresh periods of a dynamic memory device such as a dynamic RAM.
  • the software may be provided on a tangible, non-transitory medium, such as a CD-ROM or other computer-readable medium (e.g., readable by a general or special purpose computing system or device), or may be delivered (e.g., encoded in a propagated signal) over a communication medium of a network to a tangible, non-transitory medium of a computing system where it is executed.
  • a special purpose computer or using special-purpose hardware, such as coprocessors or field-programmable gate arrays (FPGAs) or dedicated, application-specific integrated circuits (ASICs).
  • the processing may be implemented in a distributed manner in which different parts of the computation specified by the software are performed by different computing elements.
  • Each such computer program is preferably stored on or downloaded to a computer-readable storage medium (e.g., solid state memory or media, or magnetic or optical media) of a storage device accessible by a general or special purpose programmable computer, for configuring and operating the computer when the storage device medium is read by the computer to perform the processing described herein.
  • a computer-readable storage medium e.g., solid state memory or media, or magnetic or optical media
  • the inventive system may also be considered to be implemented as a tangible, non-transitory medium, configured with a computer program, where the medium so configured causes a computer to operate in a specific and predefined manner to perform one or more of the processing steps described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/US2017/064227 2016-12-01 2017-12-01 Generating, accessing, and displaying lineage metadata Ceased WO2018102691A1 (en)

Priority Applications (10)

Application Number Priority Date Filing Date Title
DE112017006106.7T DE112017006106T5 (de) 2016-12-01 2017-12-01 Erzeugen von, Zugreifen auf und Anzeigen von Abstammungsmetadaten
CA3045810A CA3045810A1 (en) 2016-12-01 2017-12-01 Generating, accessing, and displaying lineage metadata
EP17851913.8A EP3549036B1 (en) 2016-12-01 2017-12-01 Generating, accessing, and displaying lineage metadata
AU2017367772A AU2017367772B2 (en) 2016-12-01 2017-12-01 Generating, accessing, and displaying lineage metadata
CN201780074708.3A CN110023925B (zh) 2016-12-01 2017-12-01 生成、访问和显示沿袭元数据
CN202410857505.3A CN118916521A (zh) 2016-12-01 2017-12-01 用于更新数据结构的方法、系统和介质
JP2019525760A JP7170638B2 (ja) 2016-12-01 2017-12-01 系統メタデータの生成、アクセス、及び表示
AU2020203027A AU2020203027B2 (en) 2016-12-01 2020-05-07 Generating, accessing, and displaying lineage metadata
JP2021191498A JP7410919B2 (ja) 2016-12-01 2021-11-25 系統メタデータの生成、アクセス、及び表示
JP2023216403A JP7712998B2 (ja) 2016-12-01 2023-12-22 系統メタデータの生成、アクセス、及び表示

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662428860P 2016-12-01 2016-12-01
US62/428,860 2016-12-01

Publications (1)

Publication Number Publication Date
WO2018102691A1 true WO2018102691A1 (en) 2018-06-07

Family

ID=61656328

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/064227 Ceased WO2018102691A1 (en) 2016-12-01 2017-12-01 Generating, accessing, and displaying lineage metadata

Country Status (8)

Country Link
US (2) US11741091B2 (enExample)
EP (1) EP3549036B1 (enExample)
JP (3) JP7170638B2 (enExample)
CN (2) CN118916521A (enExample)
AU (2) AU2017367772B2 (enExample)
CA (1) CA3045810A1 (enExample)
DE (1) DE112017006106T5 (enExample)
WO (1) WO2018102691A1 (enExample)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442604A (zh) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 数据流向查询方法、抽取方法、处理方法及相关装置

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5847585B2 (ja) 2008-12-02 2016-01-27 アビニシオ テクノロジー エルエルシー データ要素とデータ要素の属性のグラフ表現との関係の視覚化
US10452625B2 (en) * 2016-06-30 2019-10-22 Global Ids, Inc. Data lineage analysis
US11741091B2 (en) 2016-12-01 2023-08-29 Ab Initio Technology Llc Generating, accessing, and displaying lineage metadata
US10331660B1 (en) * 2017-12-22 2019-06-25 Capital One Services, Llc Generating a data lineage record to facilitate source system and destination system mapping
US11315075B2 (en) * 2018-12-27 2022-04-26 Target Brands, Inc. Computer storage system
US11256701B2 (en) 2019-01-02 2022-02-22 Bank Of America Corporation Interactive lineage mapping system
US11194845B2 (en) 2019-04-19 2021-12-07 Tableau Software, LLC Interactive lineage analyzer for data assets
US12264301B2 (en) 2019-11-08 2025-04-01 Coors Brewing Company Method of brewing non-alcoholic beer
US11537579B2 (en) * 2020-03-12 2022-12-27 Oracle International Corporation Fast in-memory technique to build a reverse CSR graph index in an RDBMS
US11681721B2 (en) * 2020-05-08 2023-06-20 Jpmorgan Chase Bank, N.A. Systems and methods for spark lineage data capture
WO2022160335A1 (en) 2021-02-01 2022-08-04 Paypal, Inc. Graphical user interface to depict data lineage information in levels
JP2022140929A (ja) * 2021-03-15 2022-09-29 富士通株式会社 情報処理プログラム、情報処理方法、および情報処理装置
US12229145B2 (en) 2021-06-01 2025-02-18 Tableau Software, LLC Metadata inheritance for data assets
US12423333B2 (en) 2021-07-08 2025-09-23 Tableau Software, LLC Data processing for visualizing hierarchical data
US12105742B2 (en) 2021-08-31 2024-10-01 Tableau Software, LLC Providing data flow directions for data objects
CN113868253B (zh) * 2021-09-28 2024-04-23 中通服创立信息科技有限责任公司 一种数据关系捕获及大数据关系树构建方法
US12393903B2 (en) 2023-01-27 2025-08-19 Tableau Software, LLC Determining shortcut relationships in data models
KR20250096634A (ko) * 2023-12-20 2025-06-27 씨티뱅크, 엔.에이. 인공 지능 모델의 효율적인 배포 자동화

Family Cites Families (86)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3239170B2 (ja) 1994-07-25 2001-12-17 アタカ工業株式会社 攪拌曝気装置における軸流インペラ
US6003040A (en) 1998-01-23 1999-12-14 Mital; Vijay Apparatus and method for storing, navigating among and adding links between data items in computer databases
US7725433B1 (en) 1998-01-26 2010-05-25 International Business Machines Corporation Data navigation system and method employing data transformation lineage model
JPH11307412A (ja) 1998-04-20 1999-11-05 Matsushita Electron Corp 半導体製造データ処理方法
US6742003B2 (en) 2001-04-30 2004-05-25 Microsoft Corporation Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications
US6725227B1 (en) 1998-10-02 2004-04-20 Nec Corporation Advanced web bookmark database system
US7055130B2 (en) 1999-10-05 2006-05-30 Borland Software Corporation Methods and systems for identifying dependencies between object-oriented elements
EP1292887A1 (en) 2000-04-21 2003-03-19 Togethersoft Corporation Methods and systems for generating source code for object-oriented elements
US7117219B1 (en) 2000-05-05 2006-10-03 Group 1 Software, Inc. Method and apparatus for creating a lineage of a data field in a data flow system
US6859217B2 (en) 2000-07-19 2005-02-22 Microsoft Corporation System and method to display and manage data within hierarchies and polyarchies of information
JP2002288403A (ja) 2001-03-27 2002-10-04 Ntt Comware Corp プロジェクト管理システム、プロジェクト管理方法、及びプロジェクト管理プログラム
EP1258814A1 (en) 2001-05-17 2002-11-20 Requisite Technology Inc. Method and apparatus for analyzing the quality of the content of a database
JP3761156B2 (ja) 2001-07-27 2006-03-29 三菱電機株式会社 接続図面の編集表示装置、その動作方法およびその方法をコンピュータに実行させるプログラム
US7970240B1 (en) 2001-12-17 2011-06-28 Google Inc. Method and apparatus for archiving and visualizing digital images
US7401064B1 (en) 2002-11-07 2008-07-15 Data Advantage Group, Inc. Method and apparatus for obtaining metadata from multiple information sources within an organization in real time
US7546226B1 (en) 2003-03-12 2009-06-09 Microsoft Corporation Architecture for automating analytical view of business applications
US7120619B2 (en) 2003-04-22 2006-10-10 Microsoft Corporation Relationship view
US20040255239A1 (en) 2003-06-13 2004-12-16 Ankur Bhatt Generating electronic reports of data displayed in a computer user interface list view
EP1510937A1 (en) 2003-08-29 2005-03-02 Sap Ag A method of providing a visualisation graph on a computer and a computer for providing a visualisation graph
EP1510938B1 (en) 2003-08-29 2014-06-18 Sap Ag A method of providing a visualisation graph on a computer and a computer for providing a visualisation graph
CN102982065B (zh) 2003-09-15 2016-09-21 起元科技有限公司 数据处理方法、数据处理装置及计算机可读存储介质
US7698348B2 (en) 2003-12-19 2010-04-13 Kinaxis Holdings Inc. Extended database engine providing versioning and embedded analytics
US7197502B2 (en) 2004-02-18 2007-03-27 Friendly Polynomials, Inc. Machine-implemented activity management system using asynchronously shared activity data objects and journal data items
US7594227B2 (en) 2004-03-08 2009-09-22 Ab Initio Technology Llc Dependency graph parameter scoping
US7496583B2 (en) 2004-04-30 2009-02-24 Microsoft Corporation Property tree for metadata navigation and assignment
US7672950B2 (en) 2004-05-04 2010-03-02 The Boston Consulting Group, Inc. Method and apparatus for selecting, analyzing, and visualizing related database records as a network
US7177883B2 (en) 2004-07-15 2007-02-13 Hitachi, Ltd. Method and apparatus for hierarchical storage management based on data value and user interest
US7456840B2 (en) 2004-08-31 2008-11-25 Oracle International Corporation Displaying information using nodes in a graph
US7844582B1 (en) 2004-10-28 2010-11-30 Stored IQ System and method for involving users in object management
US7899833B2 (en) 2004-11-02 2011-03-01 Ab Initio Technology Llc Managing related data objects
US7650349B2 (en) * 2005-01-05 2010-01-19 Microsoft Corporation Prescribed navigation using topology metadata and navigation path
US7363315B2 (en) 2005-02-22 2008-04-22 Sap Ag Creating, editing, and displaying hierarchical data structures associated with data in a data source
US8176002B2 (en) 2005-03-24 2012-05-08 Microsoft Corporation Method and system for user alteration of the configuration of a data warehouse
US7734619B2 (en) 2005-05-27 2010-06-08 International Business Machines Corporation Method of presenting lineage diagrams representing query plans
US7877350B2 (en) 2005-06-27 2011-01-25 Ab Initio Technology Llc Managing metadata for graph-based computations
US20070061287A1 (en) 2005-09-09 2007-03-15 Jian Le Method, apparatus and program storage device for optimizing a data warehouse model and operation
US7493570B2 (en) 2005-09-12 2009-02-17 International Business Machines Corporation User interface options of a data lineage tool
US8577852B2 (en) 2006-03-23 2013-11-05 Infaxiom Group, Llc Automated records inventory and retention schedule generation system
US20070255741A1 (en) 2006-04-28 2007-11-01 Business Objects, S.A. Apparatus and method for merging metadata within a repository
GB0608926D0 (en) 2006-05-05 2006-06-14 Ibm An assessment method and apparatus for matching vendor offerings to service provider requirements
US8654125B2 (en) 2006-06-22 2014-02-18 International Business Machines Corporation System and method of chart data layout
EP1883020B1 (en) 2006-07-28 2013-05-22 Dassault Systèmes Method and system for navigating in a database of a computer system
US20080040388A1 (en) * 2006-08-04 2008-02-14 Jonah Petri Methods and systems for tracking document lineage
JP2008134705A (ja) 2006-11-27 2008-06-12 Hitachi Ltd データ処理方法及びデータ分析装置
US7590672B2 (en) 2006-12-11 2009-09-15 Bycast Inc. Identification of fixed content objects in a distributed fixed content storage system
JP4398455B2 (ja) * 2006-12-22 2010-01-13 インターナショナル・ビジネス・マシーンズ・コーポレーション 経路探索方法、プログラム及びシステム
US8640086B2 (en) 2006-12-29 2014-01-28 Sap Ag Graphical user interface system and method for presenting objects
US20080172629A1 (en) 2007-01-17 2008-07-17 Microsoft Corporation Geometric Performance Metric Data Rendering
US7849050B2 (en) 2007-01-29 2010-12-07 Business Objects Data Integration, Inc. Apparatus and method for analyzing impact and lineage of multiple source data objects
US8756673B2 (en) * 2007-03-30 2014-06-17 Ricoh Company, Ltd. Techniques for sharing data
CA2593233A1 (en) 2007-07-06 2009-01-06 Cognos Incorporated System and method for federated member-based data integration and reporting
US8266122B1 (en) 2007-12-19 2012-09-11 Amazon Technologies, Inc. System and method for versioning data in a distributed data store
US8332782B1 (en) 2008-02-22 2012-12-11 Adobe Systems Incorporated Network visualization and navigation
EP2260404A4 (en) * 2008-02-26 2016-03-30 Ab Initio Technology Llc GRAPHICAL PRESENTATIONS OF DATA RELATIONSHIPS
US8797178B2 (en) * 2008-03-10 2014-08-05 Microsoft Corporation Efficient stream sharing for multi-user sensor data collection
JP5847585B2 (ja) 2008-12-02 2016-01-27 アビニシオ テクノロジー エルエルシー データ要素とデータ要素の属性のグラフ表現との関係の視覚化
US8515911B1 (en) * 2009-01-06 2013-08-20 Emc Corporation Methods and apparatus for managing multiple point in time copies in a file system
US8972899B2 (en) 2009-02-10 2015-03-03 Ayasdi, Inc. Systems and methods for visualization of data analysis
JP2010244157A (ja) 2009-04-02 2010-10-28 Toshiba Corp 機能ブロック図処理装置,機能ブロック図処理方法,およびプログラム
US8819010B2 (en) 2010-06-28 2014-08-26 International Business Machines Corporation Efficient representation of data lineage information
US9128998B2 (en) 2010-09-03 2015-09-08 Robert Lewis Jackson, JR. Presentation of data object hierarchies
US9824091B2 (en) * 2010-12-03 2017-11-21 Microsoft Technology Licensing, Llc File system backup using change journal
US9256350B2 (en) 2011-03-30 2016-02-09 Nexsan Technologies Incorporated System for displaying hierarchical information
US9342579B2 (en) 2011-05-31 2016-05-17 International Business Machines Corporation Visual analysis of multidimensional clusters
US20120310875A1 (en) 2011-06-03 2012-12-06 Prashanth Prahlad Method and system of generating a data lineage repository with lineage visibility, snapshot comparison and version control in a cloud-computing platform
JP5772458B2 (ja) 2011-09-29 2015-09-02 富士通株式会社 データ管理プログラム、ノード、および分散データベースシステム
US9659042B2 (en) 2012-06-12 2017-05-23 Accenture Global Services Limited Data lineage tracking
US10089335B2 (en) 2012-07-10 2018-10-02 Microsoft Technology Licensing, Llc Data lineage across multiple marketplaces
US20160063106A1 (en) 2012-08-08 2016-03-03 Google Inc. Related Entity Search
SG11201408814XA (en) 2012-08-14 2015-01-29 Amadeus Sas Updating cached database query results
CN102890720A (zh) 2012-10-16 2013-01-23 南京通达海信息技术有限公司 数据库检查维护方法
US9075860B2 (en) 2012-10-18 2015-07-07 Oracle International Corporation Data lineage system
US9928287B2 (en) * 2013-02-24 2018-03-27 Technion Research & Development Foundation Limited Processing query to graph database
EP3493050B1 (en) 2013-03-15 2025-06-04 AB Initio Technology LLC System for metadata management
US20150012477A1 (en) * 2013-07-02 2015-01-08 Bank Of America Corporation Data lineage notification tools
US9524331B2 (en) * 2013-11-18 2016-12-20 Nuwafin Holdings Ltd Method and system for representing OLAP queries using directed acyclic graph structures in a datagrid to support real-time analytical operations
US20150271267A1 (en) * 2014-03-24 2015-09-24 Palo Alto Research Center Incorporated Content-oriented federated object store
US10705877B2 (en) 2014-05-29 2020-07-07 Ab Initio Technology Llc Workload automation and data lineage analysis
KR102279859B1 (ko) 2014-07-18 2021-07-20 아브 이니티오 테크놀로지 엘엘시 파라미터 세트의 관리
US10110415B2 (en) 2014-07-24 2018-10-23 Ab Initio Technology Llc Data lineage summarization
US10521459B2 (en) 2015-02-11 2019-12-31 Ab Initio Technology Llc Filtering data lineage diagrams
SG11201706228UA (en) 2015-02-11 2017-08-30 Ab Initio Technology Llc Filtering data lineage diagrams
US10793895B2 (en) * 2015-08-24 2020-10-06 Seven Bridges Genomics Inc. Systems and methods for epigenetic analysis
US10120923B2 (en) * 2015-11-30 2018-11-06 Bank Of America Corporation Data discovery and analysis tool
US10268753B2 (en) * 2015-12-22 2019-04-23 Opera Solutions Usa, Llc System and method for optimized query execution in computerized data modeling and analysis
US11741091B2 (en) 2016-12-01 2023-08-29 Ab Initio Technology Llc Generating, accessing, and displaying lineage metadata

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LUCAS ZAMBOULIS ED - HOWARD WILLIAMS ET AL: "XML Data Integration by Graph Restructuring", 19 June 2004, KEY TECHNOLOGIES FOR DATA MANAGEMENT; [LECTURE NOTES IN COMPUTER SCIENCE;;LNCS], SPRINGER-VERLAG, BERLIN/HEIDELBERG, PAGE(S) 57 - 71, ISBN: 978-3-540-22382-5, XP019008451 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442604A (zh) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 数据流向查询方法、抽取方法、处理方法及相关装置
CN110442604B (zh) * 2019-07-11 2022-03-11 新华三大数据技术有限公司 数据流向查询方法、抽取方法、处理方法及相关装置

Also Published As

Publication number Publication date
US12405949B2 (en) 2025-09-02
CN110023925A (zh) 2019-07-16
JP7712998B2 (ja) 2025-07-24
DE112017006106T5 (de) 2019-09-19
AU2020203027B2 (en) 2021-07-29
US20180157702A1 (en) 2018-06-07
EP3549036B1 (en) 2025-09-03
AU2020203027A1 (en) 2020-05-28
JP2020501235A (ja) 2020-01-16
CA3045810A1 (en) 2018-06-07
EP3549036A1 (en) 2019-10-09
CN118916521A (zh) 2024-11-08
US20240078229A1 (en) 2024-03-07
AU2017367772B2 (en) 2020-05-14
JP2022033825A (ja) 2022-03-02
US11741091B2 (en) 2023-08-29
AU2017367772A1 (en) 2019-07-04
JP7170638B2 (ja) 2022-11-14
JP2024038033A (ja) 2024-03-19
JP7410919B2 (ja) 2024-01-10
CN110023925B (zh) 2024-07-12

Similar Documents

Publication Publication Date Title
AU2020203027B2 (en) Generating, accessing, and displaying lineage metadata
CA2824319C (en) Column smart mechanism for column based database
AU2009258015B2 (en) Paging hierarchical data
US20110313969A1 (en) Updating historic data and real-time data in reports
US8626795B2 (en) Dynamic data association
US10810226B2 (en) Shared comments for visualized data
US20130346426A1 (en) Tracking an ancestry of metadata
US10394844B2 (en) Integrating co-deployed databases for data analytics
US20140130008A1 (en) Generating information models
US10311049B2 (en) Pattern-based query result enhancement
HK40009080A (en) Generating, accessing, and displaying lineage metadata
HK40009080B (en) Generating, accessing, and displaying lineage metadata
US10606502B2 (en) Data aging infrastructure for automatically determining aging temperature
US10417185B2 (en) Gesture based semantic enrichment
US20120089593A1 (en) Query optimization based on reporting specifications
US12001710B2 (en) Dynamic update of consolidated data based on granular data values

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17851913

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019525760

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 3045810

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017367772

Country of ref document: AU

Date of ref document: 20171201

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017851913

Country of ref document: EP

Effective date: 20190701

WWG Wipo information: grant in national office

Ref document number: 11201903995V

Country of ref document: SG

WWP Wipo information: published in national office

Ref document number: 11201903995V

Country of ref document: SG

WWG Wipo information: grant in national office

Ref document number: 2017851913

Country of ref document: EP