WO2023226311A1 - 一种数据资产图谱的管理方法及相关设备 - Google Patents

一种数据资产图谱的管理方法及相关设备 Download PDF

Info

Publication number
WO2023226311A1
WO2023226311A1 PCT/CN2022/130509 CN2022130509W WO2023226311A1 WO 2023226311 A1 WO2023226311 A1 WO 2023226311A1 CN 2022130509 W CN2022130509 W CN 2022130509W WO 2023226311 A1 WO2023226311 A1 WO 2023226311A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
recommended
exploration
data asset
node
Prior art date
Application number
PCT/CN2022/130509
Other languages
English (en)
French (fr)
Inventor
陈运鹏
赵颖
林晖煜
张江
季振峰
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2023226311A1 publication Critical patent/WO2023226311A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Definitions

  • the present application relates to the field of computers, and in particular to a data asset map management method, system, computing device cluster, computer-readable storage medium, and computer program product.
  • the digital assets refer to data assets that are owned or controlled by entities (such as enterprises and organizations), can bring future benefits, and are recorded in physical or electronic ways.
  • the data asset may be, for example, a document or electronic data.
  • asset entities As nodes and the relationships between asset entities as edges, the graph structure formed is called a data asset graph.
  • the data asset map can be displayed in a visual way and supports users to explore and analyze based on the relationships between data assets.
  • asset entities and the relationships between asset entities are the main analysis objects. Users can continuously switch perspectives (such as analysis dimensions) and focus (such as analysis objects) based on the interactive function of the data asset map, and conduct in-depth analysis of the data assets and their relationship evolution patterns that the user is concerned about.
  • This application provides a management method for data asset graphs. This method guides users to explore data asset graphs by determining recommended exploration information and presenting the recommended exploration information to users, avoiding blind click interactions and reducing the exploration process. time-consuming and redundant information generated during the exploration process, improving the efficiency of exploration and analysis.
  • This application also provides a management system, a computing device cluster, a computer-readable storage medium and a computer program product corresponding to the method.
  • this application provides a management method for data asset graphs.
  • the method may be executed by a management system of the data asset graph.
  • the management system may be a software system that may be deployed in a cluster of computing devices.
  • the computing device cluster executes the general management method of data assets in the embodiment of the present application by executing the program code of the software system.
  • the management system may also be a hardware system with a data asset graph management function. When the hardware system is running, it executes the data asset graph management method in the embodiment of the present application.
  • the management system may be a computing device cluster with a data asset graph management function.
  • the management system can obtain the data asset graph, and then determine the recommended exploration information for the data asset graph based on the data asset graph, such as recommended exploration starting node, recommended exploration edge, recommended exploration target node and other information. or more, and then presents recommended exploration information to the user.
  • the recommended exploration information is used to guide users to explore the data asset graph.
  • users can select appropriate nodes or edges (for example, nodes corresponding to important data assets or edges corresponding to important relationships) under the guidance of recommended exploration information for exploration and analysis, which reduces the consumption of the exploration process. time, and reduce redundant information generated during the exploration process, improving the efficiency of exploration and analysis.
  • this method can avoid repeating experienced exploration steps, thereby reducing repeated or invalid interactions and further improving analysis efficiency.
  • the recommended exploration information includes any one or more of the following:
  • the management system presents recommended exploration starting nodes to users, which can solve the "cold start” problem caused by manual exploration, that is, when users face a "point-edge dual heterogeneous network", since the network includes multiple types of nodes and multiple types of connections, It is difficult to decide which category and attribute to start analyzing, which leads to blind filtering and filtering problems.
  • the management system presents recommended exploration connections to users, which can solve the "cold expansion" problem caused by manual exploration. That is, after entering the analysis process, when users find nodes of interest, they need to expand to explore some relationships, but in the face of complex The relationship and lack of guiding prompts lead to the problem of blind click interaction.
  • the management system presents recommended exploration target nodes to users, which can solve the problem of manually determining the starting point of exploration from a large-scale data asset map, and through a large number of interactive operations, explore the target node that meets the expectations, greatly shortening the exploration time and improving improve exploration efficiency.
  • the management system can display the recommended exploration starting node on the map display interface.
  • the management system may display to the user the recommended edges related to the first node selected by the user.
  • the management system may display the recommended exploration target node to the user.
  • the recommended exploration target node is determined based on the scores of nodes passed by the path where the second node selected by the user is the starting node.
  • the management system can provide multiple atomic interactive navigation functions, and the user can select one or more of the atomic interactive navigation functions to adapt to the needs of different business scenarios.
  • the management system may also display the path from the second node to the third node to the user.
  • the third node is a node selected by the user from the recommended exploration target nodes. In this way, it can not only help users quickly find data assets that meet their expectations (the data assets corresponding to the nodes selected by the user from the recommended exploration target nodes), but also help users quickly obtain the data assets and other data assets (such as the exploration starting node). corresponding data assets).
  • the management system can obtain the impact factor of the data asset map, and determine recommended exploration information for the data asset map based on the impact factor.
  • the impact factor can be used to evaluate the importance of nodes or edges. Therefore, the recommended exploration information determined based on the impact factor has high reliability and reference value.
  • the influencing factors include structural characteristics, business characteristics, or users' historical experience with the data asset graph.
  • This method comprehensively evaluates the importance of nodes or edges in the data asset graph from dimensions such as structure, business, and user experience, so it has high accuracy.
  • the structural characteristics include centrality.
  • the centrality may include, for example, one or more of degree centrality, shortest path betweenness centrality, random walk betweenness centrality, PageRank, closeness centrality, harmony centrality, and eigenvector centrality.
  • degree centrality degree centrality
  • shortest path betweenness centrality random walk betweenness centrality
  • PageRank closeness centrality
  • harmony centrality eigenvector centrality
  • Business features include one or more of business weights or semantic features. Among them, business weights are quantitative features, and semantic features are qualitative features. Considering that most data asset graphs are typical point-edge dual heterogeneous networks, each data asset/association relationship usually has different business weights. For example, data tables and jobs are relatively more important, and the business weight value is also larger. The scale of the data asset graph is very large. There are many of the same data assets, but the semantic information (represented by semantic features) of the data assets themselves gives them different degrees of importance. For example, a data asset map contains thousands of data tables. They are of the same type and have the same business weight. However, since different data tables contain data with different semantic characteristics and represent different business activities, they have different business values.
  • Each data asset and association relationship in the data asset map naturally has business attributes.
  • the management system recommends the nodes or edges that are ranked first, allowing users to explore nodes or edges with high business importance first, thereby avoiding blind interactions.
  • Historical experience refers to the user's interactive exploration experience for the data asset graph in historical time periods. Users explore data asset maps based on their own domain knowledge and experience accumulation, and have strong tendencies and subjectivity.
  • the management system can obtain statistical indicators of users' interactive exploration of data asset maps in historical time periods. These statistical indicators can As a user’s historical experience with data asset graphs.
  • Statistical indicators may include one or more of click frequency or conditional probability. Among them, the click frequency of data assets and associated relationships represents the frequency of user interaction operations, and the click frequency is positively correlated with the experience importance of data assets and associated relationships.
  • users When exploring data asset graphs, users usually select data assets and associations consciously and sequentially. The higher the conditional probability, the higher the experience importance of the data asset/association.
  • the management system recommends top-ranked nodes or edges, allowing users to explore nodes or edges with high experience importance first, thus avoiding blind interactions.
  • the management system may also receive feedback from the user on the recommended exploration information, and update recommendation parameters based on the user's feedback on the recommended exploration information.
  • This method can keep the recommendation accuracy of the management system at a high level by continuously iteratively updating recommendation parameters based on user feedback on recommended exploration information, and provide users with more accurate recommended exploration information.
  • the user's feedback on the recommended exploration information includes the user's selection or rejection of the recommended exploration information.
  • the user's selection of recommended exploration information includes the user's selection of a certain node among multiple recommended exploration starting nodes, the user's selection of a certain connected edge among multiple recommended exploration links, or the user's selection of a certain node among multiple recommended exploration target nodes. selection of nodes.
  • the user's rejection of recommended exploration information can be the user's selection of other nodes other than the starting node of multiple recommended explorations, the user's selection of other recommended edges other than multiple recommended edges, or the user's selection of multiple recommended exploration target nodes. Selection of other nodes.
  • positive samples can be obtained based on the user's selection of recommended exploration information
  • negative samples can be obtained based on the user's rejection of recommended exploration information.
  • the recommendation parameters can be updated through positive samples and negative samples together, which can reduce overfitting and provide appropriate recommended parameters.
  • the management system can receive keywords input by the user, and then the management system can obtain the intended asset list based on the keywords, display the intended asset list to the user, and then respond to the user's response to the intended asset list.
  • the selection operation of the intended asset in the asset list generates a data asset map corresponding to the intended asset.
  • This method supports one-click generation of data asset maps without complex interactive operations, reducing the difficulty of data asset management and improving user experience.
  • the management system can receive the user-defined extended connection, and then the management system can update the data asset map according to the extended connection.
  • This method provides a channel to manually update the data asset map. Users can customize the extended edges to update the correlation between data assets in the data asset map, which has high availability.
  • the management system can present extended relationship types to users.
  • the extended relationship types include one or more of parent-child relationships, primary and foreign key relationships, logical physical relationships, and data flow relationships.
  • the management system can present users with The target relationship type selected from the extended relationship type determines the user-defined extended edge.
  • the extended edge is an edge with the user-selected node as an endpoint and the relationship type is the above-mentioned target relationship type.
  • This method provides a variety of extended relationship types to support users to flexibly select one or more relationship types, and customize corresponding extended edges based on the relationship type, which meets business needs, and users only need to perform a simple click operation. Customized extended edge connections can be realized, which is highly user-friendly.
  • the management system can identify the type of the intended asset, obtain the first associated asset that is of the same type as the intended asset, and the second associated asset that is of a different type than the intended asset. According to the intended asset, the first associated asset , the relationship between the second associated asset and the intended asset and the first associated asset, and the association between the intended asset and the second associated asset to generate a data asset graph, which is a point-edge dual heterogeneous network graph.
  • This method considers node types and edge types, and can not only explore and analyze homogeneous networks, but also explore and analyze point-edge dual heterogeneous network graphs. It has high accuracy and can be used for various businesses. Scenario with high availability.
  • this application provides a management system for data asset maps.
  • the system includes:
  • the acquisition module is used to obtain the data asset map
  • a recommendation module configured to determine recommended exploration information for the data asset map according to the data asset map
  • An interactive module configured to present the recommended exploration information to the user, and the recommended exploration information is used to guide the user to explore the data asset graph.
  • the recommended exploration information includes any one or more of the following:
  • the interactive module is specifically used to:
  • the recommended exploration target node is displayed to the user, and the recommended exploration target node is determined based on the scores of the nodes passed by the path where the second node selected by the user is the starting node.
  • the interactive module is also used to:
  • a path from the second node to the third node is displayed to the user, where the third node is a node selected by the user from the recommended exploration target nodes.
  • the recommendation module is specifically used to:
  • recommended exploration information for the data asset map is determined.
  • the influencing factors include structural characteristics, business characteristics, or users' historical experience with the data asset graph.
  • the structural features include centrality
  • the business features include one or more of business weights or semantic features
  • the historical experience includes one or more of click frequency or conditional probability. kind.
  • the interactive module is also used to:
  • the system also includes:
  • An update module configured to update recommendation parameters according to the user's feedback on the recommended exploration information.
  • the user's feedback on the recommended exploration information includes the user's selection or rejection of the recommended exploration information.
  • the interactive module is also used to:
  • the acquisition module is specifically used for:
  • the interactive module is specifically used for:
  • the acquisition module is specifically used for:
  • a data asset map corresponding to the intended asset is generated.
  • the interactive module is also used to:
  • the system also includes:
  • An update module configured to update the data asset map according to the extended edge connection.
  • this application provides a computing device cluster.
  • the cluster of computing devices includes at least one computing device including at least one processor and at least one memory.
  • the at least one processor and the at least one memory communicate with each other.
  • the at least one processor is configured to execute instructions stored in the at least one memory, so that the computing device or the computing device cluster executes the data asset graph management method as described in the first aspect or any implementation of the first aspect. .
  • the present application provides a computer-readable storage medium in which instructions are stored, and the instructions instruct a computing device or a cluster of computing devices to execute the above-mentioned first aspect or any one of the first aspects.
  • the present application provides a computer program product containing instructions that, when run on a computing device or a cluster of computing devices, cause the computing device or a cluster of computing devices to execute the first aspect or any one of the first aspects.
  • Figure 1 is an abstract model diagram of a data asset graph provided by an embodiment of the present application.
  • Figure 2 is a schematic architectural diagram of a data asset map management system provided by an embodiment of the present application.
  • Figure 3 is a flow chart of a data asset map management method provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a map management interface provided by an embodiment of the present application.
  • Figure 5 is a flow chart of a data asset map management method provided by an embodiment of the present application.
  • FIGS. 6A to 6K are schematic diagrams of a map management interface provided by embodiments of the present application.
  • Figure 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • first and second in the embodiments of this application are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • Data assets can include different asset entities, and different asset entities can have different types of associations between them.
  • asset entities of data assets as nodes and the relationships between asset entities as edges, a data asset graph can be constructed. Refer to the abstract model diagram of the data asset map shown in Figure 1.
  • the data asset map defines 10 types of entities and 4 types of relationships. Four types of relationships can evolve into 18 types of relationships.
  • the nodes in Figure 1 represent asset entities, and the unidirectional edges represent unidirectional relationships.
  • Asset entities can include databases, data tables, catalogs, jobs, nodes, logical entities, business attributes, fields, column kinship, and insights.
  • Column lineage refers to column-level data lineage (Data Lineage). Data Lineage is also called Data Provenance or Data Pedigree. Data lineage is typically defined as a lifecycle that primarily includes the origin of data and where it moves over time. Association relationships include four types: parent-child relationship (denoted as parent_child), data flow relationship (denoted as data_flow), primary key-foreign key relationship (denoted as PK_FK) or logical-physical relationship (denoted as logical_physical).
  • the data asset graph When the data asset graph includes multiple types of nodes and multiple types of connected edges, the data asset graph is a point-edge dual heterogeneous network graph.
  • the data asset graph shown in Figure 1 is a typical point-edge double heterogeneous network graph.
  • the nodes of the data asset graph are all nodes of the same type, or the edges of the data asset image are all of the same type, the data asset graph is a homogeneous network graph.
  • the data asset map can be displayed in a visual way and support users to explore and analyze based on the relationships between data assets.
  • the exploration of data asset graphs is mainly done manually.
  • the user determines the starting node for exploration. For example, the user can search for keywords, then select a data asset from the list of intended assets obtained through the search, and use the node corresponding to the asset entity of the data asset as the starting node for exploration.
  • Data Asset Map The display interface can display nodes corresponding to all types or other types of asset entities associated with the asset entity. Users can continue to explore through interactions such as selection to complete the intended task.
  • embodiments of the present application provide a method for managing data asset graphs.
  • the method may be executed by a management system of the data asset graph.
  • the management system may be a software system that may be deployed in a cluster of computing devices.
  • the computing device cluster executes the general management method of data assets in the embodiment of the present application by executing the program code of the software system.
  • the management system may also be a hardware system with a data asset map management function. When the hardware system is running, it executes the data asset map management method in the embodiment of the present application.
  • the management system may be a computing device cluster with a data asset graph management function.
  • the management system can obtain the data asset graph, and then determine the recommended exploration information for the data asset graph based on the data asset graph, such as recommended exploration starting node, recommended exploration edge, recommended exploration target node and other information. or more, and then presents recommended exploration information to the user.
  • the recommended exploration information is used to guide users to explore the data asset graph.
  • users can select appropriate nodes or edges (for example, nodes corresponding to important data assets or edges corresponding to important relationships) under the guidance of recommended exploration information for exploration and analysis, which reduces the consumption of the exploration process. time, and reduce redundant information generated during the exploration process, improving the efficiency of exploration and analysis.
  • this method can avoid repeating experienced exploration steps, thereby reducing repeated or invalid interactions and further improving analysis efficiency.
  • the management system presents recommended exploration starting nodes to users, which can solve the "cold start” problem caused by manual exploration, that is, when users face a "point-edge dual heterogeneous network", since the network includes multiple types of nodes and multiple types of connections, It is difficult to decide which category and attribute to start analyzing, which leads to blind filtering and filtering problems.
  • the management system presents recommended exploration connections to users, which can solve the "cold expansion" problem caused by manual exploration. That is, after entering the analysis process, when users find nodes of interest, they need to expand to explore some relationships, but in the face of complex The relationship and lack of guiding prompts lead to the problem of blind click interaction.
  • the management system presents recommended exploration target nodes to users, which can solve the problem of manually determining the starting point of exploration from a large-scale data asset map, and through a large number of interactive operations, explore the target node that meets the expectations, greatly shortening the exploration time and improving improve exploration efficiency.
  • the manager of a company's data asset map received a task request from the business staff, specifically "Search the data asset map, and find data for reference based on the company's product sales to support the direction of the company's new products.” design”.
  • This application can use nodes that already exist in the current data asset graph as recommended exploration starting nodes, recommend to users the recommended exploration target nodes corresponding to high-value data assets that do not appear in the current graph, and display the paths between the above nodes to users, helping users effectively save exploration time.
  • the management system 200 includes an acquisition module 202 , a recommendation module 204 and an interaction module 206 .
  • the acquisition module 202 and the recommendation module 204 are respectively connected to the interaction module 206.
  • Each module is introduced below.
  • the acquisition module 202 is used to acquire the data asset map.
  • the acquisition module 202 can generate a data asset map corresponding to the asset entity selected by the user from the entity list.
  • the data asset graph corresponding to the asset entity is specifically the data asset graph centered on the asset entity (such as the root node).
  • the recommendation module 204 is used to determine recommended exploration information for the data asset graph according to the data asset graph.
  • the recommendation module 204 may have a built-in recommendation algorithm.
  • the recommendation module 204 determines the recommended exploration information for the data asset graph through the recommendation algorithm according to the data asset graph.
  • the interaction module 206 is used to present the recommended exploration information to the user.
  • the recommended exploration information is used to guide the user to explore the data asset graph.
  • the interaction module 206 can provide an interactive interface, which can also be called a user interface (UI) interface, and the interactive module 206 can present recommended exploration information to the user through the UI interface.
  • the interactive interface may include a graphical user interface (graphical user interface, GUI) or a command user interface (command user interface, CUI).
  • the interaction module 206 can provide a variety of UI interfaces.
  • the interaction module 206 can provide a graph display interface.
  • the interaction module 206 can present the data asset graph to the user through the graph display interface.
  • the interaction module 206 can directly present recommended exploration information to the user through the graph display interface.
  • the interaction module 206 can overlay recommended exploration information on the basis of the data asset map for display.
  • the UI interface provided by the interaction module 206 may also include a search interface.
  • the search interface may include a search box.
  • the user enters a keyword in the search box to trigger a search operation.
  • the acquisition module 202 acquires data assets that match the keywords entered by the user.
  • This asset is also called an intent asset.
  • Interaction Module 206 can display the searched data assets through a list on the search interface. This list is used to display intended assets, so it is also called an intended asset list.
  • the user can select a data asset (such as a target data asset) from the intended asset list, and the acquisition module 202 can generate a data asset map centered on the data asset according to the data asset selected by the user. In this way, the interaction module 206 can present the above-mentioned data asset graph to the user through the graph display interface.
  • map display interface and search interface can also be integrated into one interface.
  • search interface can be integrated into the map display interface.
  • the right side of the map display interface is used to display the data asset map, and the left side of the map display interface is used to display the search box and intended asset list.
  • the management system 200 may also include an update module 208.
  • the update module 208 is used to update the data asset map or update the recommendation parameters of the recommendation algorithm.
  • the interaction module 206 is also configured to receive user feedback on the recommended exploration information, where the user's feedback on the recommended exploration information may include the user's selection or rejection of the recommended exploration information.
  • the update module 208 is configured to update the data asset graph according to the user's feedback on the recommended exploration information, or the update module 208 is configured to update the recommendation parameters according to the user's feedback on the recommended exploration information.
  • management system 200 shown in Figure 2 schematically provides a division method from the perspective of functional modularization.
  • the management system 200 may also include other functions.
  • module, or the above-mentioned functional modules in Figure 2 can also be replaced and implemented by other functional modules, and the embodiment of the present application does not limit this.
  • the method includes:
  • S302 The management system 200 receives the keyword input by the user.
  • Management system 200 provides search functionality for data assets. Specifically, the management system can provide a search interface, which includes a search box, and the user can enter keywords in the search box. In some embodiments, the management system 200 may not provide a separate search interface, but may provide a map display interface integrated with a search box. The image display interface integrates a search box, and the user enters keywords in the search box of the map display interface. .
  • the keywords can be input by the user according to the business requirements, that is, the keywords can be related to the business.
  • the keywords can be related to the business.
  • a sports brand manufacturer produces sportswear, sports shoes, sports equipment and other products
  • the management system 200 manages various data assets of the sports brand. Users can enter the keyword "sneakers" to trigger a search for data assets matching "sneakers”.
  • S304 The management system 200 obtains the intended asset list according to the keywords.
  • the management system 200 searches for data assets that match the keywords from the data assets managed by the management system 200 based on the keywords, and generates a list of intended assets based on the data assets that match the keywords.
  • the intended asset list includes the unique identifier of the data asset (that is, the intended asset) that matches the keyword.
  • the unique identifier may be an asset name or ID.
  • the intended asset list may also include meta-information of data assets matching the above keywords.
  • the meta-information may include one or more of the creator, creation time, or business category. Among them, meta-information can provide a reference for users to select data assets from the intended asset list.
  • S306 The management system 200 presents the intended asset list to the user.
  • the management system 200 may present the intended asset list to the user through the search interface.
  • the management system 200 can also present the intended asset list to the user through a graph display interface. For example, when the management system 200 integrates a search box in the map display interface, the management system 200 can present the intended asset list to the user in a local area of the map display interface, such as the left area.
  • the map display interface 400 includes a search box 402 and a search control 404.
  • the user can enter keywords in the search box 402 according to business needs, and then trigger the search by clicking or touching.
  • Control 404 to trigger a search operation.
  • the management system 200 displays the data assets that match the keywords, that is, the intended assets, in the form of a list on the left side of the graph display interface 400.
  • the graph display interface 400 includes the intended asset list 406 .
  • the user can browse the intended asset list to select the asset of interest.
  • the management system 200 can determine the intended asset based on the intended asset, the association between the intended asset and other data assets, and other data assets associated with the intended asset. Generate a data asset map corresponding to the intended asset.
  • the data asset map corresponding to the intended asset may be a data asset map centered on the intended asset.
  • the management system 200 can present the data asset graph 410 corresponding to the intended asset 408 to the user on the graph display interface 400 .
  • the management system 200 can also provide a filtering function to support conditional filtering of the intended asset list, thereby narrowing the scope and helping users select intended assets faster.
  • the user can perform conditional filtering on the intended asset list according to one or more of asset type, storage time, and update time to assist the user in accurately selecting intended assets 408 .
  • the user selects the intended asset 408 in the intended asset list through radio selection, etc., and the management system 200 can automatically draw the corresponding data asset map 410 on the right side of the interface.
  • the management system 200 can also display the meta-information 411 of the data asset graph in the graph display interface 400 .
  • the meta-information 411 of the data asset graph includes one or more of the node size and edge size of the data asset graph.
  • the management system 200 can also display the meta-information 412 of the root node of the data asset graph in the information column of the graph display interface 400.
  • the cloud information 412 of the root node may include the node name, node type, or node ID of the root node.
  • S302 to S308 are a specific implementation method for the management system 200 to obtain the data asset map in the embodiment of the present application.
  • the data asset map may also be obtained in other ways by executing the management method of the data asset map in the embodiment of the present application.
  • the management system 200 can store the data asset map when generating it for the first time. When the user subsequently inputs a keyword, the management system 200 can directly obtain the data asset map corresponding to the keyword from the stored data asset map.
  • S310 The management system 200 obtains the structural characteristics of the data asset map.
  • Structural characteristics refer to some characteristics of a geometric figure or space that remain unchanged after continuously changing its shape. Structural features can be used as influencing factors to determine the structural importance of nodes and edges in graphs/networks (such as data asset graphs), thereby realizing intelligent interactive navigation of data asset graphs.
  • centrality can usually be used to represent the structural characteristics of the graph/network.
  • Centrality refers to the degree to which a node/edge plays a central role in the network. The greater the centrality, the more important the node/edge is.
  • centralities such as degree centrality, betweenness centrality, closeness centrality, etc. The details are as follows:
  • degree centrality, betweenness centrality, closeness centrality, etc. can be used to measure the structural characteristics of the data asset graph, thereby realizing intelligent interactive navigation of the data asset graph.
  • degree centrality refers to determining the importance of a node in the network by measuring the degree value of the node. If the degree value of a node is high, the more nodes it can directly influence, the greater its influence, and the higher its structural importance in the data asset graph.
  • the management system 200 can sort the nodes in the current data asset graph based on degree centrality as a score, and recommend the top n1 nodes to the user.
  • the first n1 nodes recommended to users are also called recommended exploration starting nodes.
  • n1 is a custom parameter that can be configured by the user.
  • the management system 200 can traverse all potential extended edges of the node selected by the user, and use degree centrality to calculate the importance of the potential extended edges corresponding to the data asset. Sort the potential extended edges of the selected data assets for scoring, and recommend the top n2 potential extended edges to the user. Among them, the first n2 connected edges recommended to the user are also called recommended connected edges. Similarly, n2 is a custom parameter that can be controlled by the user.
  • the management system 200 can also use degree centrality to calculate the importance of all nodes passing through the node (representing a data asset), and use the importance as a score to rank these nodes. , recommend the first n3 nodes to the user. Among them, the first n3 nodes recommended to the user are also called recommended exploration target nodes. n3 is a custom parameter that can be configured by the user.
  • S312 The management system 200 obtains the business characteristics of the data asset map.
  • the data asset map records the data assets included in an enterprise or organization and the relationships between data assets.
  • the data asset graph represents business activities in an enterprise or organization through nodes or edges that represent relationships, reflecting strong business relevance. Therefore, business characteristics can be used as influencing factors to determine the business importance of nodes and edges in the data asset graph, thereby realizing intelligent interactive exploration of the data asset graph.
  • Business features may include one or more of business weights or semantic features.
  • business weights are quantitative features
  • semantic features are qualitative features.
  • business weights are quantitative features
  • semantic features are qualitative features.
  • a data asset map contains thousands of data tables. They are of the same type and have the same business weight. However, since different data tables contain data with different semantic characteristics and represent different business activities, they have different business values.
  • the data asset map is closely related to the enterprise's business. Each data asset and association relationship in the data asset map naturally has business attributes. The larger the business weight value of a data asset/association relationship, the higher its business importance in the data asset map.
  • the management system 200 can comprehensively measure the business weight of data assets represented by nodes in the data asset map and the business weights of data assets represented by nodes that are associated with the node, and use the business weight as a score to rank the nodes in the data asset map. , recommend the first n1 nodes to the user. Among them, the first n1 nodes recommended to the user are the recommended exploration starting nodes.
  • the management system 200 can traverse the potential expansion edges of the node selected by the user, set the initial business weight according to the type of edge, and then combine the data of the node corresponding to the edge. Business weight, comprehensively calculate the final business weight of each connected edge.
  • the management system 200 can rank the potential expansion edges of the nodes selected by the user based on the final business weight, and recommend the top n2 potential expansion edges (which can represent explorable directions) to the user. Among them, the first n2 potential expansion links recommended to the user are the recommended links.
  • the management system 200 can also use the business weight to calculate the business importance of all nodes passing through the node (representing a data asset), and use the business importance as a score to evaluate these nodes. Sort and recommend the top n3 nodes to the user. Among them, the first n3 nodes recommended to the user are also the recommended exploration target nodes.
  • S314 The management system 200 obtains the user's historical experience with the data asset map.
  • the historical experience of the data asset map refers to the user's interactive exploration experience of the data asset map in historical time periods. Users of data asset maps usually have strong domain background knowledge, and the interactive exploration experience of these users has reference and guiding significance. Therefore, historical experience can be used as an influencing factor to judge the empirical importance of nodes and edges in the data asset graph, thereby realizing intelligent interactive exploration of the data asset graph.
  • the management system 200 can obtain statistical indicators of users' interactive exploration of the data asset map in historical time periods. The statistical indicators It can be used as the user’s historical experience of the data asset map.
  • the statistical indicator may include one or more of click frequency or conditional probability.
  • the click frequency of data assets and associated relationships represents the frequency of user interaction operations, and the click frequency is positively correlated with the experience importance of data assets and associated relationships. For example, if multiple users access a certain data asset and their click frequency value is higher, it means that the experience importance of the data asset is higher.
  • Conditional probability refers to the probability of an event occurring given that another event has already occurred. When exploring data asset graphs, users usually select data assets and associations consciously and sequentially. The higher the conditional probability, the higher the experience importance of the data asset/association.
  • data asset graph exploration requires users to have certain professional domain knowledge, which means that each user's exploration experience is valuable and can provide suggestions for latecomers.
  • the management system 200 can calculate the click frequency of nodes (represented data assets) in the current data asset map, sort the nodes in the current data asset map using the click frequency as a score, and recommend the top n1 nodes to the user.
  • the first n1 nodes recommended to the user are the recommended exploration starting nodes.
  • the management system 200 can also traverse the potential extended edges of the node selected by the user, comprehensively calculate the click frequency of the extended relationship and the click frequency of the associated node, and use the click frequency as a score to select the user.
  • the potential extension edges of the node are sorted, and the top n2 potential extension edges are recommended to the user. Among them, the first n2 potential expansion links recommended to the user are the recommended links.
  • the management system 200 can also calculate the click frequency of all nodes (representing a data asset) starting from the node, ranking these nodes based on the click frequency as a score, and recommending them to the user
  • the first n3 nodes are also the recommended exploration target nodes.
  • the management system 200 determines the recommended exploration information based on one or more of the structural characteristics, business characteristics, or historical experience of the data asset map.
  • the above-mentioned structural characteristics, business characteristics or historical experience respectively measure the importance of nodes or edges in the data asset graph from different dimensions.
  • the management system 200 can comprehensively combine the characteristics of the above-mentioned different dimensions to recommend nodes and/or connect edges. Recommendations, thereby determining recommended exploration information.
  • the management system 200 may also perform node recommendation and/or edge recommendation based on the characteristics of a single dimension, thereby determining recommended exploration information.
  • node recommendation may include starting node recommendation and target node recommendation.
  • the recommended exploration information may include one or more of recommended exploration starting nodes, recommended edges (which may also be called recommended exploration directions), and recommended exploration destination nodes.
  • the management system 200 can set recommendation weights for the features of different dimensions, and then the management system 200 can obtain the nodes or edges through weighted operations. Recommendation scores, and make node recommendations or edge recommendations based on the recommendation scores of nodes or edges.
  • S318 The management system 200 presents recommended exploration information to the user.
  • the management system 200 can present the recommended exploration information to the user through the graph display interface 400 .
  • the management system 200 can superimpose the recommended exploration information on the data asset graph, and then present the data asset graph and the recommended exploration information to the user through the graph display interface.
  • the management system 200 of this application can realize intelligent interactive navigation for data asset maps.
  • the management system 200 provides three types of atomic interactive navigation (N1, N2, N3) and three types of impact factors (A1, A2, A3).
  • atomic interactive navigation refers to indivisible interactive navigation.
  • N1 represents important asset navigation (implemented by presenting recommended exploration starting nodes)
  • N2 represents important exploration direction navigation (implemented by presenting recommended edges)
  • N3 represents important target asset navigation (implemented by presenting recommended exploration target nodes).
  • A1 represents structural characteristics
  • A2 represents business characteristics
  • A3 represents historical experience.
  • atomic interactive navigation can be flexibly combined into 7 forms (N1, N2, N3, N1N2, N2N3, N1N3, N1N2N3), and impact factors can also be flexibly combined into 7 forms (A1, A2, A3, A1A2, A2A3, A1A3 , A1A2A3), in this way, the management system 200 in the embodiment of the present application can cover 7*7 intelligent interactive navigation forms in the data asset map.
  • the data asset map management method of the embodiment of the present application can obtain recommended exploration information based on the data asset map, guide users to interactively explore the data asset map through the recommended exploration information, reduce the blindness of interaction, and improve the data asset map. analysis efficiency and exploration experience. Furthermore, this method provides a variety of atomic interactive navigations that cover the basic core needs of users in data asset exploration. This method also designs a variety of influencing factors, which evaluate the importance of nodes or edges from different dimensions, thus making the navigation results (recommended results) more reasonable. Moreover, this method supports the flexible combination of atomic interactive navigation and influence factors, thereby obtaining a rich data asset map management method that can adapt to changing data asset map exploration scenarios.
  • the management system 200 also supports the user to choose whether to enable the intelligent interactive navigation function.
  • the management system 200 can execute the management method of the data asset map as shown in FIG. 3 .
  • the intelligent interactive navigation function includes a variety of atomic interactive navigations.
  • the management system 200 can support users to flexibly configure atomic interactive navigations.
  • the users can use any one or any combination of multiple atomic interactive navigations. For example, users can use important asset navigation and exploration direction navigation to explore and analyze the data asset map through multiple interactions; for another example, users can use important asset navigation and important target asset navigation to explore and analyze data assets through a small amount of interaction. Exploration and analysis of graphs.
  • the method includes:
  • Step 1 The management system 200 receives keywords input by the user, obtains a list of intended assets based on the keywords, and generates a data asset map corresponding to the intended assets based on the intended assets selected by the user from the intended asset list.
  • the management system 200 may present the intended asset list to the user. Users can click the intended asset with the left mouse button to trigger the one-click generation of a data asset map.
  • the data asset map is centered on the intended asset selected by the user, including the intended asset, the association between the intended asset and other data assets, and other data assets associated with the intended asset.
  • Step 2 The management system 200 presents the data asset graph to the user through the graph display interface.
  • the recommended exploration starting node is displayed on the graph display interface.
  • the map display interface 400 carries a recommendation control 413 .
  • the management system 200 can start the important asset navigation function. Specifically, it judges the data assets represented by the nodes in the data asset map 410, determines the nodes corresponding to the important data assets in the map display interface 400, and puts the important data assets into the map display interface 400. The node is displayed as the starting node for recommended exploration.
  • the management system 200 can add a recommendation mark 414 to the node corresponding to the important data asset in the graph display interface 400, marking the node as the recommended exploration starting node to remind the user that such data assets can be explored with priority.
  • the recommendation mark 414 may be a "red exclamation point", and the management system 200 may uniformly add the recommendation mark 414 in the upper right corner of the node corresponding to the important data asset.
  • the management system 200 can also set the default state of the recommendation control 413.
  • the default state can be set to the triggered state.
  • the management system 200 can directly display the data to the user in the map display interface.
  • the user is presented with a data asset graph 410, and a recommendation mark 414 of the recommended exploration starting node is presented to the user, thereby displaying the recommended exploration starting node to the user.
  • Step 3 The user selects the first node corresponding to the important data asset to open the menu bar, then selects the connection extension from the menu bar, and then selects "Get recommended connections" from the secondary menu bar to trigger the connection recommendation operation and manage
  • the system 200 displays recommended edges related to the first node selected by the user to the user.
  • the user can select a node corresponding to a data asset (for convenience of description, this embodiment calls it the first node) according to the prompt of the recommended exploration information (such as the recommended exploration starting node identified by the recommendation mark 414) on the graph display interface 400. node), you can proceed to the next step of exploration.
  • the user can right-click the mouse to select the first node corresponding to the data asset, and the management system 200 will pop up the menu bar 416 .
  • Data asset relationships can be extended in many ways. If the user does not want to blindly expand manually, he can select "Get recommended expansion links" in the menu bar. Click this item, and the management system 200 can automatically obtain the potential expansion relationship of the data asset corresponding to the first node, and compare each potential expansion relationship. Extended relationships for scoring. The management system 200 can sort the potential expansion relationships according to the scores, and then display the potential expansion relationships on the map display interface 400 in descending order as shown in FIG. 6C .
  • the management system 200 can determine the edges corresponding to the potential extension relationships that meet the conditions as recommended edges based on the scores of the potential extension relationships, and add recommendation icons 418 to the potential extension relationships that meet the conditions, thereby realizing the display interface 400 on the graph. Show recommended links to users.
  • the potential expansion relationship that meets the conditions can be that the score is greater than the preset score, or the score ranks among the top m, and m is a positive integer.
  • the preset score or m can be set based on experience value, for example, the preset score can be set to 85.
  • All relationships of the data asset are calculated and scored, sorted in descending order and displayed on the interface. Users can select relationships with higher ratings (for example, above 85 points, where the system can automatically recommend and mark the "recommended" icon) for expansion based on the ratings and relationship types in the list. Due to the adoption of important exploration direction navigation, there is no large amount of redundant information in the data asset map, and users can efficiently find directions worth exploring through the starting data assets.
  • Step 4 The management system 200 updates the data asset map 410 according to the recommended edges selected by the user.
  • the edges representing the corresponding relationships and the nodes connected by the edges are drawn in the asset map 410, thereby updating the data asset map 410.
  • Figure 6D shows the updated data asset graph 410.
  • the updated data asset graph 410 includes the extended recommended edge selected by the user and the nodes connected by the edge.
  • Figure 6D illustrates an example in which the nodes connected by the recommended edges are nodes that do not appear in the data asset graph 410.
  • the nodes connected by the recommended edges can also be data assets.
  • the embodiment of the present application does not limit the nodes that have appeared in the graph 410 .
  • the recommended links selected by the user are a kind of feedback from the user on the recommended exploration information.
  • the management system 200 updates the data asset map 410 based on the recommended links selected by the user.
  • the management system 200 updates the data assets based on the user's feedback on the recommended exploration information.
  • An implementation manner of the graph 410 In other possible implementation manners of this embodiment of the present application, the management system 200 can also update the data asset graph 410 based on other feedback.
  • the recommended links displayed by the management system 200 do not include the links that the user wants to expand.
  • the user can also customize the extended links to manually expand the association of data assets. Specifically, the user can select the first node corresponding to the data asset by right-clicking the mouse. In this way, the management system 200 can return to the menu bar 416. Referring to Figure 6E, the user can select "Customized Extended Edges" in the menu bar. Click this item, and the management system 200 pops up the sub-menu bar of the item.
  • the sub-menu bar includes different relationship types, and the user can select one of them. relationship types, or select all types to customize extended edges.
  • the graph display interface 400 displays at least one connected edge 420 with the first node selected by the user as the starting node and the relationship type being "parent-child relationship". It can be seen that the management system 200 can also update the data asset map 410 according to the user's rejection of the recommended exploration information.
  • Step 5 The user selects the second node corresponding to the important data asset to open the menu bar, and then selects "Recommended Exploration Target” from the menu bar to trigger the target recommendation operation, and the management system 200 displays the recommended exploration target node to the user.
  • the user selects a node corresponding to an important data asset (for ease of description, this embodiment may be called the second node), opens the menu bar, and then performs the operation shown in Figure 6G, selects "Recommended Exploration Target", and manages
  • the system 200 can obtain the importance of the nodes passed by the above path based on the path where the second node is selected by the user as the starting node, and then the management system 200 can use the importance as a score to determine recommendations from the nodes passed by the above path. Explore the target node.
  • the management system 200 can present to the user in descending order the nodes passed by the path where the second node selected by the user is the starting node according to the score. Among them, the management system 200 can determine the nodes whose scores meet the conditions as recommended exploration target nodes. As shown in Figure 6H, the management system 200 can add recommendation marks 422 to the nodes that meet the conditions on the graph display interface 400, thereby displaying the recommended exploration to the user. target node.
  • Step 6 The user selects the third node from the recommended exploration target node, and the management system 200 displays the path from the second node to the third node to the user.
  • the user can select a node from the recommended exploration target nodes (for ease of description, this embodiment calls it the third node), and then the management system 200 can automatically draw the second node as the starting node.
  • the exploration path with the third node selected by the user as the target node see FIG. 6I
  • the management system 200 can display the exploration path 424 on the graph display interface 400 .
  • the management system 200 may display the exploration path 424 in a highlighted manner to guide the user to quickly obtain the desired target asset.
  • the user can select other nodes other than the recommended exploration target node as the target node for exploration.
  • the management system 200 automatically draws an exploration path from the second node to the target node selected by the user. Then, referring to Figure 6K, the management system 200 can The graph display interface 400 displays the exploration path 424 .
  • the user's selection or rejection of the recommended exploration target node also belongs to the user's feedback on the recommended exploration information.
  • the management system 200 can update the data asset map according to the user's selection or rejection of the recommended exploration target node. .
  • the user's feedback on the recommended exploration information can be used to update the recommendation parameters of the recommendation algorithm, so that the management system 200 can determine the recommended exploration information that is more in line with the user's intention and improve the recommendation accuracy. For example, when the user refuses to recommend an edge, recommends an exploration target node, selects a custom extended edge, or selects another exploration target, the management system 200 can adjust the structural characteristics, Recommendation weights of influencing factors such as business characteristics or historical experience to improve recommendation accuracy.
  • the second node selected by the user and the first node may be the same node or different nodes.
  • the user can choose to perform steps 3 and 4 to enable exploration direction navigation, or choose to perform steps 5 and 6 to enable important target asset navigation, or perform steps 3 to 6 to enable exploration direction navigation and important targets.
  • Asset Navigation may be the same node or different nodes.
  • This method proposes an intelligent interactive navigation framework for data asset graphs, which provides a variety of atomic interactive navigation functions for user core needs in data asset graph exploration (such as important data asset exploration, important exploration direction exploration, and important target data asset exploration). , realize intelligent interactive navigation for exploring data asset maps, reduce users’ blind choices, and improve the efficiency of users in exploring and analyzing data asset maps.
  • the embodiment of the present application also provides a data asset map management system 200 as described above.
  • the management system 200 of the data asset map will be introduced below with reference to the accompanying drawings.
  • the system 200 includes:
  • the acquisition module 202 is used to obtain the data asset map
  • the recommendation module 204 is configured to determine recommended exploration information for the data asset graph according to the data asset graph;
  • the interaction module 206 is used to present the recommended exploration information to the user, and the recommended exploration information is used to guide the user to explore the data asset graph.
  • the above-mentioned acquisition module 202, recommendation module 204 and interaction module 206 can be implemented through hardware modules or through software modules.
  • the acquisition module 202, the recommendation module 204, and the interaction module 206 may be application programs or application program modules running on a computing device or a cluster of computing devices.
  • the acquisition module 202 may include at least one computing device, such as a server.
  • the acquisition module 202 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • the recommendation module 204 may include at least one computing device, or a device implemented using ASIC or PLD.
  • the interaction module 206 may include a communication interface device, such as a display.
  • the recommended exploration information includes any one or more of the following:
  • the interaction module 206 is specifically used to:
  • the recommended exploration target node is displayed to the user, and the recommended exploration target node is determined based on the scores of the nodes passed by the path where the second node selected by the user is the starting node.
  • the interaction module 206 is also used to:
  • a path from the second node to the third node is displayed to the user, where the third node is a node selected by the user from the recommended exploration target nodes.
  • the recommendation module 204 is specifically used to:
  • recommended exploration information for the data asset map is determined.
  • the influencing factors include structural characteristics, business characteristics, or users' historical experience with the data asset graph.
  • the structural features include centrality
  • the business features include one or more of business weights or semantic features
  • the historical experience includes one or more of click frequency or conditional probability. kind.
  • the interaction module 206 is also used to:
  • the system 200 also includes:
  • the update module 208 is configured to update recommendation parameters according to the user's feedback on the recommended exploration information.
  • the user's feedback on the recommended exploration information includes the user's selection or rejection of the recommended exploration information.
  • the interaction module 206 is also used to:
  • the acquisition module 202 is specifically used for:
  • the interactive module 206 is specifically used for:
  • the acquisition module 202 is specifically used for:
  • a data asset map corresponding to the intended asset is generated.
  • the interaction module 206 is also used to:
  • the system 200 also includes:
  • the update module 208 is configured to update the data asset graph according to the extended edge connection.
  • update module 208 can be implemented by software or hardware.
  • update module 208 may be an application or application module running on a computing device or cluster of computing devices.
  • update module 208 may include at least one computing device, such as a server.
  • the update module 208 may also be a device implemented using ASIC or PLD.
  • computing device 700 includes: bus 702, processor 704, memory 706, and communication interface 708.
  • the processor 704, the memory 706 and the communication interface 708 communicate through the bus 702.
  • Computing device 700 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 700.
  • the bus 702 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 7, but it does not mean that there is only one bus or one type of bus.
  • Bus 704 may include a path that carries information between various components of computing device 700 (eg, memory 706, processor 704, communications interface 708).
  • the processor 704 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (micro processor, MP) or a digital signal processor (digital signal processor, DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 706 may include volatile memory, such as random access memory (RAM).
  • the processor 704 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 706 stores executable program code, and the processor 704 executes the executable program code to implement the aforementioned data asset graph management method. Specifically, the memory 706 stores instructions for the data asset map management system 200 to execute the data asset map management method.
  • the communication interface 703 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 700 and other devices or communication networks.
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device may be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
  • the computing device cluster includes at least one computing device 700 .
  • the memory 706 in one or more computing devices 700 in the computing device cluster may store instructions for the management system 200 of the same data asset graph for executing the management method of the data asset graph.
  • one or more computing devices 700 in the computing device cluster may also be used to execute part of the instructions used by the management system 200 of the data asset graph to execute the management method of the data asset graph.
  • a combination of one or more computing devices 700 may jointly execute instructions of the data asset graph management system for performing the data asset graph management method.
  • the memory 706 in different computing devices 700 in the computing device cluster can store different instructions for executing some functions of the management system of the data asset graph.
  • Figure 9 shows a possible implementation. As shown in Figure 9, two computing devices 700A and 700B are connected through a communication interface 708. Instructions for performing the functions of acquisition module 202 and interaction module 206 are stored on memory in computing device 700A. Stored on memory in computing device 700B are instructions for performing the functions of recommendation module 204 . Further, memory in computing device 700B may also store instructions for performing the functions of update module 208 . In other words, the memories 706 of the computing devices 700A and 700B jointly store instructions used by the management system of the data asset graph to execute the management method of the data asset graph.
  • connection method between computing device clusters shown in Figure 9 may be based on the fact that the data asset graph management method provided by this application requires the interaction of data asset graphs and recommended exploration information. Therefore, it is considered that the functions implemented by the acquisition module 202 and the interaction module 206 are performed by the computing device 700A, and the functions implemented by the recommendation module 204 and the update module 208 are performed by the computing device 700B.
  • computing device 700A shown in FIG. 9 may also be performed by multiple computing devices 700.
  • the functions of computing device 700B may also be performed by multiple computing devices 700 .
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network may be a wide area network or a local area network, etc.
  • Figure 10 shows a possible implementation. As shown in Figure 10, two computing devices 700C and 700D are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device.
  • instructions for performing the functions of the acquisition module 202 and the interaction module 206 are stored in the memory 706 of the computing device 700C.
  • instructions for performing the functions of the recommendation module 204 and the update module 208 are stored in the memory 706 of the computing device 700D.
  • connection method between the computing device clusters shown in Figure 10 can be: Considering that the management method of the data asset graph provided by this application requires the interaction of the data asset graph and recommended exploration information, the functions implemented by the acquisition module 202 and the interaction module 206 are considered It is executed by the computing device 700C, and the functions implemented by the recommendation module 204 and the update module 208 are executed by the computing device 700D. It should be understood that the functions of computing device 700C shown in FIG. 10 may also be performed by multiple computing devices 700. Likewise, the functions of computing device 700D may also be performed by multiple computing devices 700 .
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
  • the computer-readable storage medium includes instructions, the instructions instruct the computing device to execute the above-mentioned management method applied to the data asset graph for performing the management method of the data asset graph.
  • An embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium.
  • the computer program product is run on at least one computing device, at least one computing device is caused to execute the above-mentioned management method of the data asset graph.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种数据资产图谱的管理方法,包括:获取数据资产图谱,根据数据资产图谱,确定针对数据资产图谱的推荐探索信息,向用户呈现推荐探索信息,以引导用户探索数据资产图谱。如此可以避免重复经历过的探索步骤,从而减少重复或无效的交互,通过引导用户选择合适的节点或连边进行探索分析,减少了探索过程的耗时,以及减少了探索过程产生的冗余信息,提高了探索、分析的效率。

Description

一种数据资产图谱的管理方法及相关设备
本申请要求于2022年05月23日提交中国国家知识产权局、申请号为202210562551.1、发明名称为“智能交互导航的方法、装置、服务器及存储介质”的中国专利申请的优先权,以及要求于2022年07月08日提交中国国家知识产权局、申请号为202210800256.5、发明名称为“一种数据资产图谱的管理方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机领域,尤其涉及一种数据资产图谱的管理方法、系统、计算设备集群、计算机可读存储介质、计算机程序产品。
背景技术
随着计算机技术的不断发展,产生了各种形式的数字资产。该数字资产是指由实体(如企业、组织)拥有或者控制的,能够带来未来利益的,以物理或电子的方式记录的数据资产。该数据资产例如可以是文件资料或者是电子数据。
数据资产中的不同资产实体之间可能存在各种各样的关系。以资产实体为节点、资产实体间的关联关系为连边,所构成的图结构称为数据资产图谱。数据资产图谱能够以可视化的方式进行展示,并支持用户根据数据资产之间关系进行探索和分析。在探索、分析过程中,资产实体和资产实体间关联关系是主要的分析对象。用户可以基于数据资产图谱的交互功能不断切换视角(例如是分析维度)和关注点(例如是分析对象),深入分析该用户所关注的数据资产及其关系演化模式。
然而,上述方法完全依赖于人工探索,用户面对大量的节点与连边,难以选择合适的节点或连边进行探索分析。并且,用户选择了不合适的节点或连边时,可以导致探索过程比较耗时,并且产生大量的冗余信息,上述冗余信息增加了发现有用信息的难度,严重影响分析效率。
发明内容
本申请提供了一种数据资产图谱的管理方法,该方法通过确定推荐探索信息,并向用户呈现该推荐探索信息,以引导用户探索数据资产图谱,避免了盲目的点选交互,减少了探索过程的耗时,以及减少了探索过程产生的冗余信息,提高了探索、分析的效率。本申请还提供了该方法对应的管理系统系统、计算设备集群、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供了一种数据资产图谱的管理方法。该方法可以由数据资产图谱的管理系统执行。为了便于描述,下文也可以简称为管理系统。管理系统可以是软件系统,该软件系统可以部署在计算设备集群中。计算设备集群通过执行软件系统的程序代码,从而执行本申请实施例的数据资产普通的管理方法。在一些实施例中,管理系统也可以是具有数据资产图谱管理功能的硬件系统,该硬件系统运行时,执行本申请实施例的数据资产 图谱的管理方法。例如,管理系统可以是具有数据资产图谱管理功能的计算设备集群。
具体地,管理系统可以获取数据资产图谱,然后根据数据资产图谱,确定针对数据资产图谱的推荐探索信息,例如是推荐探索起始节点、推荐探索连边、推荐探索目标节点等信息中的一种或多种,然后向用户呈现推荐探索信息。该推荐探索信息用于引导用户探索数据资产图谱。
在该方法中,用户可以在推荐探索信息的引导下,选择合适的节点或连边(例如是重要数据资产对应的节点或重要关联关系对应的连边)进行探索分析,减少了探索过程的耗时,以及减少了探索过程产生的冗余信息,提高了探索、分析的效率。而且,该方法可以避免重复经历过的探索步骤,从而减少重复或无效的交互,进一步提高分析效率。
在一些可能的实现方式中,所述推荐探索信息包括以下任意一种或多种:
推荐探索起始节点、推荐连边、推荐探索目标节点。
其中,管理系统向用户呈现推荐探索起始节点,可以解决人工探索产生的“冷启动”问题,即解决用户面对“点边双异质网络”时,由于网络包括多类型节点、多类型连边,难以决断从哪个类别、哪个属性开始分析,进而导致盲目过滤和筛选的问题。
管理系统向用户呈现推荐探索连边,可以解决人工探索产生的“冷扩展”问题,即解决进入分析流程后,用户发现了感兴趣节点时,需要扩展性探索一些关联关系,但面对复杂的关联关系,缺乏引导性提示,导致盲目的点选交互的问题。
管理系统向用户呈现推荐探索目标节点,可以解决需要人工从大规模的数据资产图谱中确定探索起点,并经过大量的交互操作,探索出符合期望的目标节点的问题,大幅缩短了探索时间,提高了探索效率。
在一些可能的实现方式中,当图谱展示界面上的推荐控件被触发时,管理系统可以在所述图谱展示界面展示所述推荐探索起始节点。或者,当用户触发连边推荐操作时,管理系统可以向所述用户展示所述用户选中的第一节点相关的所述推荐连边。或者,当用户触发目标推荐操作时,管理系统可以向所述用户展示所述推荐探索目标节点。其中,推荐探索目标节点根据所述用户选中的第二节点为起始节点的路径所经过的节点的评分确定。
在该方法中,管理系统可以提供多种原子交互导航功能,用户可以选择其中一种或多种原子交互导航功能,从而适应不同业务场景的需求。
在一些可能的实现方式中,管理系统还可以向所述用户展示所述第二节点至所述第三节点的路径。所述第三节点为所述用户从所述推荐探索目标节点中选中的节点。如此,不仅可以帮助用户快速找到符合期望的数据资产(用户从推荐探索目标节点中选中的节点所对应的数据资产),还可以帮助用户快速获取该数据资产与其他数据资产(如探索起始节点对应的数据资产)之间的关联关系。
在一些可能的实现方式中,管理系统可以获取所述数据资产图谱的影响因子,根据所述影响因子,确定针对所述数据资产图谱的推荐探索信息。其中,影响因子可以用于评价节点或连边的重要性,因此,基于影响因子确定的推荐探索信息具有较高可靠性和参考价值。
在一些可能的实现方式中,所述影响因子包括结构特征、业务特征或用户针对所述数据资产图谱的历史经验。
该方法通过从结构、业务、用户经验等维度对数据资产图谱中节点或连边的重要性进行全面地评估,因而具有较高准确度。
在一些可能的实现方式中,所述结构特征包括中心性。该中心性例如可以包括度中心性、最短路径介数中心性、随机游走介数中心性、PageRank、紧密中心性、和谐中心性、特征向量中心性中的一种或多种。节点或连边的中心性越高,节点或连边在数据资产图谱中的结构重要性越高,管理系统可以根据节点或连边的结构重要性,推荐排序靠前的节点或连边,使得用户优先探索结构重要性高的节点或连边,从而避免盲目交互。
业务特征包括业务权重或语义特征中的一种或多种。其中,业务权重为定量特征,语义特征为定性特征。考虑到数据资产图谱大多是典型的点边双异质网络,每种数据资产/关联关系通常具有不同的业务权重。比如,数据表和作业相对更加重要,业务权重的值也更大。数据资产图谱的规模十分庞大,同一种数据资产有较多的数量,但是数据资产本身具备的语义信息(通过语义特征表示)赋予了它们不同的重要程度。比如,某数据资产图谱中包含数千个数据表,它们的类型是一样的,具备相同的业务权重。但是,由于不同的数据表包含的数据的语义特征不同,代表不同的业务活动,它们具备的业务价值是不同的。数据资产图谱中的每种数据资产和关联关系天然地具有业务属性。如果一个数据资产/关联关系的业务权重值越大,代表其在数据资产图谱中的业务重要性越高。管理系统根据节点或连边的业务重要性,推荐排序靠前的节点或连边,使得用户优先探索业务重要性高的节点或连边,从而避免盲目交互。
历史经验是指用户在历史时间段针对数据资产图谱的交互探索经验。用户基于自身的领域知识和经验积累探索数据资产图谱,具有较强的倾向性和主观性,管理系统可以获取用户在历史时间段针对数据资产图谱进行交互探索的统计学指标,该统计学指标可以作为用户针对数据资产图谱的历史经验。统计学指标可以包括点击频率或条件概率中的一种或多种。其中,数据资产和关联关系的点击频率代表了用户交互操作的频繁程度,点击频率与数据资产、关联关系的经验重要程度呈正相关。在进行数据资产图谱探索时,用户通常是有意识、有顺序的选择数据资产和关联关系,条件概率越高,代表该数据资产/关联关系的经验重要程度越高。管理系统根据节点或连边的经验重要程度,推荐排序靠前的节点或连边,使得用户优先探索经验重要程度高的节点或连边,从而避免盲目交互。
在一些可能的实现方式中,管理系统还可以接收所述用户对所述推荐探索信息的反馈,根据所述用户对所述推荐探索信息的反馈更新推荐参数。
该方法通过基于用户对推荐探索信息的反馈对推荐参数不断迭代更新,可以使得管理系统的推荐精度保持在较高水平,为用户提供比较精准的推荐探索信息。
在一些可能的实现方式中,所述用户对所述推荐探索信息的反馈包括所述用户对所述推荐探索信息的选择或拒绝。
其中,用户对推荐探索信息的选择包括用户对多个推荐探索起始节点中某个节点的选择、多个推荐连边中某个连边的选择或者是用户对多个推荐探索目标节点中某个节点的选择。用户对推荐探索信息的拒绝可以是用户对多个推荐探索起始节点之外其他节点的选择、用户对多个推荐连边之外其他推荐连边的选择或者用户对多个推荐探索目标节点之外的其他节点的选择。
在该方法中,基于用户对推荐探索信息的选择可以获得正样本,基于用户对推荐探索信息的拒绝可以获得负样本,通过正样本和负样本共同更新推荐参数,可以减少过拟合,提供合适的推荐参数。
在一些可能的实现方式中,管理系统可以接收用户输入的关键词,然后管理系统可以根据所述关键词获取意向资产列表,并向所述用户展示意向资产列表,接着响应于用户对所述意向资产列表中意向资产的选中操作,生成所述意向资产对应的数据资产图谱。
该方法支持一键生成数据资产图谱,无需复杂的交互操作,降低了数据资产的管理难度,提升了用户体验。
在一些可能的实现方式中,管理系统可以接收所述用户自定义的扩展连边,然后管理系统可以根据所述扩展连边更新所述数据资产图谱。
该方法提供了手动更新数据资产图谱的通道,用户可以自定义扩展连边,从而更新数据资产图谱中数据资产之间的关联关系,具有较高可用性。
在一些可能的实现方式中,管理系统可以向用户呈现扩展关系类型,扩展关系类型包括父子关系、主外键关系、逻辑物理关系、数据流关系中的一种或多种,管理系统可以根据用户从扩展关系类型中选择的目标关系类型,确定用户自定义的扩展连边,该扩展连边是以用户选择的节点为端点,关系类型为上述目标关系类型的连边。
该方法提供了多种扩展关系类型支持用户灵活选择一种或多种关系类型,并基于该关系类型自定义相应的扩展连边,满足了业务需求,而且用户只需要进行简单的点选操作即可实现自定义扩展连边,用户友好度较高。
在一些可能的实现方式中,管理系统可以识别意向资产的类型,获取与意向资产属于同一类型的第一关联资产以及与意向资产属于不同类型的第二关联资产,根据意向资产、第一关联资产、第二关联资产以及意向资产与第一关联资产的关联关系、意向资产与第二关联资产的关联关系生成数据资产图谱,该数据资产图谱为点边双异质网络图。
该方法考虑了节点类型、连边类型,不仅可以实现对同质网络进行探索分析,还能对点边双异质网络图进行探索分析,并且具有较高的精度,因而能够用于各种业务场景,具有较高可用性。
第二方面,本申请提供了一种数据资产图谱的管理系统。所述系统包括:
获取模块,用于获取数据资产图谱;
推荐模块,用于根据所述数据资产图谱,确定针对所述数据资产图谱的推荐探索信息;
交互模块,用于向用户呈现所述推荐探索信息,所述推荐探索信息用于引导所述用户探索所述数据资产图谱。
在一些可能的实现方式中,所述推荐探索信息包括以下任意一种或多种:
推荐探索起始节点、推荐连边、推荐探索目标节点。
在一些可能的实现方式中,所述交互模块具体用于:
当图谱展示界面上的推荐控件被触发时,在所述图谱展示界面展示所述推荐探索起始节点;或者,
当用户触发连边推荐操作时,向所述用户展示所述用户选中的第一节点相关的所述推荐连边;或者,
当用户触发目标推荐操作时,向所述用户展示所述推荐探索目标节点,所述推荐探索目标节点根据所述用户选中的第二节点为起始节点的路径所经过的节点的评分确定。
在一些可能的实现方式中,所述交互模块还用于:
向所述用户展示所述第二节点至所述第三节点的路径,所述第三节点为所述用户从所述推荐探索目标节点中选中的节点。
在一些可能的实现方式中,所述推荐模块具体用于:
获取所述数据资产图谱的影响因子;
根据所述影响因子,确定针对所述数据资产图谱的推荐探索信息。
在一些可能的实现方式中,所述影响因子包括结构特征、业务特征或用户针对所述数据资产图谱的历史经验。
在一些可能的实现方式中,所述结构特征包括中心性,所述业务特征包括业务权重或语义特征中的一种或多种,所述历史经验包括点击频率或条件概率中的一种或多种。
在一些可能的实现方式中,所述交互模块还用于:
接收所述用户对所述推荐探索信息的反馈;
所述系统还包括:
更新模块,用于根据所述用户对所述推荐探索信息的反馈更新推荐参数。
在一些可能的实现方式中,所述用户对所述推荐探索信息的反馈包括所述用户对所述推荐探索信息的选择或拒绝。
在一些可能的实现方式中,所述交互模块还用于:
接收用户输入的关键词;
所述获取模块具体用于:
根据所述关键词获取意向资产列表;
所述交互模块具体用于:
向所述用户展示意向资产列表;
所述获取模块具体用于:
响应于用户对所述意向资产列表中意向资产的选中操作,生成所述意向资产对应的数据资产图谱。
在一些可能的实现方式中,所述交互模块还用于:
接收所述用户自定义的扩展连边;
所述系统还包括:
更新模块,用于根据所述扩展连边更新所述数据资产图谱。
第三方面,本申请提供一种计算设备集群。所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器。所述至少一个处理器、所述至少一个存储器进行相互的通信。所述至少一个处理器用于执行所述至少一个存储器中存储的指令,以使得计算设备或计算设备集群执行如第一方面或第一方面的任一种实现方式所述的数据资产图谱的管理方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令指示计算设备或计算设备集群执行上述第一方面或第一方面的任一种实现 方式所述的数据资产图谱的管理方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算设备或计算设备集群上运行时,使得计算设备或计算设备集群执行上述第一方面或第一方面的任一种实现方式所述的数据资产图谱的管理方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1为本申请实施例提供的一种数据资产图谱的抽象模型图;
图2为本申请实施例提供的一种数据资产图谱的管理系统的架构示意图;
图3为本申请实施例提供的一种数据资产图谱的管理方法的流程图;
图4为本申请实施例提供的一种图谱管理界面的示意图;
图5为本申请实施例提供的一种数据资产图谱的管理方法的流程图;
图6A至6K为本申请实施例提供的一种图谱管理界面的示意图;
图7为本申请实施例提供的一种计算设备的结构示意图;
图8为本申请实施例提供的一种计算设备集群的结构示意图;
图9为本申请实施例提供的一种计算设备集群的结构示意图;
图10为本申请实施例提供的一种计算设备集群的结构示意图。
具体实施方式
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本申请实施例中所涉及到的一些技术术语进行介绍。
数据资产可以包括不同资产实体,不同资产实体之间可以具有不同类别的关联关系。以数据资产的资产实体为节点,资产实体间的关联关系为连边,可以构成数据资产图谱。参见图1所示的数据资产图谱的抽象模型图,该数据资产图谱中定义了10种实体和4类关联关系。4类关联关系可以演化为18种关联关系。
图1中的节点表示资产实体,单向连边表示单向关联关系。资产实体可以包括数据库、数据表、目录、作业、节点、逻辑实体、业务属性、字段、列血缘、见解。列血缘是指列级别的数据血缘(Data Lineage)。数据血缘(Data Lineage)又叫做数据起源(Data Provenance)或者数据家谱(Data Pedigree)。数据血缘通常被定义为一种生命周期,主要包含数据的来源以及数据随时间移动的位置。关联关系包括父子关系(记作parent_child)、数据流关系(记作data_flow)、主外键关系(primary key-foreign key,记作PK_FK)或逻辑物理关系(记作logical_physical)等4类。
这4类关联关系可以演化为图1所示的目录与目录的父子关系、目录与逻辑实体的父 子关系、数据库与数据表的父子关系、作业与节点的父子关系、逻辑实体与数据表的逻辑物理关系、数据表与数据表的主外键关系、数据表与节点的数据流关系、节点与数据表的数据流关系、节点与节点的父子关系、节点与见解的数据流关系、逻辑实体与业务属性的父子关系、业务属性与字段的逻辑物理关系、字段与字段的主外键关系、数据表与字段的父子关系、字段与列血缘的数据流关系、列血缘与字段的数据流关系、列血缘与见解的数据流关系、节点与列血缘的父子关系。
数据资产图谱包括多种类型的节点以及多种类型的连边时,则该数据资产图谱为点边双异质网络图。图1所示的数据资产图谱即为一个典型的点边双异质网络图。数据资产图谱的节点均为同类型的节点,或者数据资产图片的连边均为同类型时,则数据资产图谱为同质网络图。
数据资产图谱可以能够以可视化的方式进行展示,并支持用户根据数据资产之间关系进行探索和分析。目前,对数据资产图谱的探索主要采用手动的方式。用户确定探索的起始节点,例如用户可以通过搜索关键词,然后从搜索得到的意向资产列表中选择一个数据资产,将该数据资产的资产实体对应的节点作为探索的起始节点,数据资产图谱的展示界面可以展示出与该资产实体关联的所有类型或其他类型的资产实体对应的节点。用户可以继续通过选择等交互操作进行探索,以完成预期的任务。
然而,上述方法完全依赖于人工探索。用户面对大量的节点与连边,难以选择合适的节点或连边进行探索分析。并且,用户选择了不合适的节点或连边时,可以导致探索过程比较耗时,并且产生大量的冗余信息,上述冗余信息增加了发现有用信息的难度,严重影响分析效率。此外,用户很可能会重复经历过的探索步骤,造成很多重复和无效的交互,进一步影响分析效率。
有鉴于此,本申请实施例提供了一种数据资产图谱的管理方法。该方法可以由数据资产图谱的管理系统执行。为了便于描述,下文也可以简称为管理系统。管理系统可以是软件系统,该软件系统可以部署在计算设备集群中。计算设备集群通过执行软件系统的程序代码,从而执行本申请实施例的数据资产普通的管理方法。在一些实施例中,管理系统也可以是具有数据资产图谱管理功能的硬件系统,该硬件系统运行时,执行本申请实施例的数据资产图谱的管理方法。例如,管理系统可以是具有数据资产图谱管理功能的计算设备集群。
具体地,管理系统可以获取数据资产图谱,然后根据数据资产图谱,确定针对数据资产图谱的推荐探索信息,例如是推荐探索起始节点、推荐探索连边、推荐探索目标节点等信息中的一种或多种,然后向用户呈现推荐探索信息。该推荐探索信息用于引导用户探索数据资产图谱。
在该方法中,用户可以在推荐探索信息的引导下,选择合适的节点或连边(例如是重要数据资产对应的节点或重要关联关系对应的连边)进行探索分析,减少了探索过程的耗时,以及减少了探索过程产生的冗余信息,提高了探索、分析的效率。而且,该方法可以避免重复经历过的探索步骤,从而减少重复或无效的交互,进一步提高分析效率。
其中,管理系统向用户呈现推荐探索起始节点,可以解决人工探索产生的“冷启动”问题,即解决用户面对“点边双异质网络”时,由于网络包括多类型节点、多类型连边, 难以决断从哪个类别、哪个属性开始分析,进而导致盲目过滤和筛选的问题。
管理系统向用户呈现推荐探索连边,可以解决人工探索产生的“冷扩展”问题,即解决进入分析流程后,用户发现了感兴趣节点时,需要扩展性探索一些关联关系,但面对复杂的关联关系,缺乏引导性提示,导致盲目的点选交互的问题。
管理系统向用户呈现推荐探索目标节点,可以解决需要人工从大规模的数据资产图谱中确定探索起点,并经过大量的交互操作,探索出符合期望的目标节点的问题,大幅缩短了探索时间,提高了探索效率。例如,某企业的数据资产图谱的管理人员,接到业务人员的任务需求,具体为“检索数据资产图谱,根据公司的产品销售情况,找到可供参考的数据,以支持公司新产品的方向性设计”。本申请可以以当前数据资产图谱中已经存在的节点为推荐探索起始节点,向用户推荐当前图谱中未出现的高价值数据资产对应的推荐探索目标节点,并将上述节点之间的路径显示给用户,帮助用户有效节省探索时间。
为了使得本申请的技术方案更加清楚、易于理解,下面结合附图对本申请实施例的系统架构进行介绍。
参见图2所示的管理系统的架构示意图,该管理系统200包括获取模块202、推荐模块204和交互模块206。获取模块202、推荐模块204分别与交互模块206连接。下面分别对各模块进行介绍。
获取模块202用于获取数据资产图谱。例如,获取模块202可以根据用户从实体列表中选择的资产实体,生成与该资产实体对应的数据资产图谱。与该资产实体对应的数据资产图谱具体是以该资产实体为中心(如根节点)的数据资产图谱。
推荐模块204用于根据数据资产图谱,确定针对数据资产图谱的推荐探索信息。其中,推荐模块204可以内置推荐算法。相应地,推荐模块204根据数据资产图谱,通过推荐算法,确定针对数据资产图谱的推荐探索信息。
交互模块206用于向用户呈现所述推荐探索信息。该推荐探索信息用于引导所述用户探索所述数据资产图谱。具体地,交互模块206可以提供交互界面,该交互界面也可以称作用户接口(user interface,UI)界面,交互模块206可以通过UI界面向用户呈现推荐探索信息。其中,交互界面可以包括图形化用户界面(graphical user interface,GUI)或者是命令用户界面(command user interface,CUI)。
交互模块206可以提供多种UI界面。例如,交互模块206可以提供图谱展示界面,一方面,交互模块206可以通过图谱展示界面向用户呈现数据资产图谱,另一方面,交互模块206可以直接通过图谱展示界面向用户呈现推荐探索信息。例如,交互模块206可以在数据资产图谱的基础上叠加推荐探索信息进行展示。
交互模块206提供的UI界面还可以包括搜索界面。该搜索界面可以包括搜索框,用户在搜索框中输入关键词触发搜索操作,获取模块202响应于该搜索操作,获取与用户输入的关键词匹配的数据资产,该资产也称作意向资产,交互模块206可以通过在搜索界面通过列表展示搜索得到的数据资产。该列表用于展示意向资产,因此也称作意向资产列表。用户可以从意向资产列表中选择一个数据资产(例如目标数据资产),获取模块202可以根据用户选择的数据资产,生成以该数据资产为中心的数据资产图谱。如此,交互模块206 可以通过图谱展示界面向用户呈现上述数据资产图谱。
需要说明的是,上述图谱展示界面和搜索界面也可以集成在一个界面中。例如,搜索界面可以集成在图谱展示界面中,图谱展示界面的右侧用于展示数据资产图谱,图谱展示界面的左侧用于展示搜索框、意向资产列表。
进一步地,管理系统200还可以包括更新模块208。该更新模块208用于对数据资产图谱进行更新,或者对推荐算法的推荐参数进行更新。具体地,交互模块206还用于接收用户对推荐探索信息的反馈,其中,用户对推荐探索信息的反馈可以包括用户对推荐探索信息的选择或拒绝。相应地,更新模块208用于根据用户对所述推荐探索信息的反馈更新数据资产图谱,或者,更新模块208用于根据用户对所述推荐探索信息的反馈更新推荐参数。
需要说明的是,图2所示的管理系统200是从功能模块化的角度示意性地提供了一种划分方式,在本申请实施例其他可能的实现方式中,管理系统200也可以包括其他功能模块,或者图2中的上述功能模块也可以由其他功能模块代替实现,本申请实施例对此不作限制。
接下来,从管理系统200的角度,结合附图对本申请实施例的数据资产图谱的管理方法进行详细介绍。
参见图3所示的数据资产图谱的管理方法的流程图,该方法包括:
S302:管理系统200接收用户输入的关键词。
管理系统200提供针对数据资产的搜索功能。具体地,管理系统可以提供搜索界面,该搜索界面中包括搜索框,用户可以在搜索框中输入关键词。在一些实施例中,管理系统200也可以不提供单独的搜索界面,而是提供集成有搜索框的图谱展示界面,该图片展示界面集成搜索框,用户在图谱展示界面的搜索框中输入关键词。
其中,关键词可以是用户根据业务需求输入的,也即关键词可以与业务相关。例如,一个运动品牌制造商生产运动服饰、运动鞋、运动装备等产品,管理系统200对该运动品牌的各种数据资产进行管理。用户可以输入关键词“运动鞋”,以触发搜索与“运动鞋”匹配的数据资产。
S304:管理系统200根据关键词获取意向资产列表。
管理系统200根据关键词,从管理系统200管理的数据资产中搜索与关键词匹配的数据资产,并根据与关键词匹配的数据资产生成意向资产列表。该意向资产列表中包括与上述与关键词匹配的数据资产(也即意向资产)的唯一标识,该唯一标识可以是资产名称或ID。
进一步地,意向资产列表中还可以包括与上述关键词匹配的数据资产的元信息。在一些实施例中,元信息可以包括创建人、创建时间或者业务类别中的一种或多种。其中,元信息可以为用户从意向资产列表中选择数据资产提供参考。
S306:管理系统200向用户呈现意向资产列表。
具体地,管理系统200可以通过搜索界面向用户呈现意向资产列表。在一些实施例中,管理系统200也可以通过图谱展示界面向用户呈现意向资产列表。例如,管理系统200在图谱展示界面集成搜索框时,管理系统200可以在图谱展示界面的局部区域,如左侧区域 向用户呈现意向资产列表。
为了便于理解,下面结合示例进行说明。
参见图4所示的图谱展示界面的示意图,该图谱展示界面400包括搜索框402和搜索控件404,用户可以根据业务需求在搜索框402内输入关键词,然后通过点击或触控等方式触发搜索控件404,以触发搜索操作,管理系统200响应于用户的搜索操作,将与关键词匹配的数据资产即意向资产以列表的形式展示在图谱展示界面400的左侧。换言之,图谱展示界面400包括意向资产列表406。
S308:管理系统200响应于用户对意向资产列表中意向资产的选中操作,生成意向资产对应的数据资产图谱。
用户可以浏览意向资产列表选择感兴趣的资产,当用户选中意向资产列表中的意向资产时,管理系统200可以根据意向资产、意向资产与其他数据资产的关联关系、意向资产关联的其他数据资产,生成意向资产对应的数据资产图谱。其中,意向资产对应的数据资产图谱可以是以意向资产为中心的数据资产图谱。如图4所示,管理系统200可以在图谱展示界面400向用户呈现意向资产408对应的数据资产图谱410。
在一些可能的实现方式中,管理系统200还可以提供过滤功能,以支持对意向资产列表进行条件过滤,从而缩小范围,帮助用户更快地选择出意向资产。具体地,用户可以按照资产类型、入库时间、更新时间中的一种或多种对意向资产列表进行条件过滤,辅助用户精准选择意向资产408。用户通过单选等方式选中意向资产列表中的意向资产408,管理系统200可以自动在界面右侧绘制出对应的数据资产图谱410。
进一步地,参见图4,管理系统200还可以在图谱展示界面400展示数据资产图谱的元信息411。数据资产图谱的元信息411包括数据资产图谱的节点规模、连边规模中的一种或多种。在一些可能的实现方式中,管理系统200还可以在图谱展示界面400的信息栏展示数据资产图谱的根节点的元信息412。根节点的云信息412可以包括根节点的节点名称、节点类型或者节点ID。
需要说明的是,S302至S308为本申请实施例中管理系统200获取数据资产图谱的一种具体实现方式,执行本申请实施例的数据资产图谱的管理方法也可以采用其他方式获取数据资产图谱。例如,管理系统200可以在首次生成数据资产图谱时,存储该数据资产图谱,后续用户输入关键词时,管理系统200可以直接从存储的数据资产图谱中获取该关键词对应的数据资产图谱。
S310:管理系统200获取数据资产图谱的结构特征。
结构特征是指几何图形或空间在连续改变形状后还能保持不变的一些特征。结构特征可以作为影响因子,用于判断图/网络(如数据资产图谱)中节点和连边的结构重要性,从而实现数据资产图谱的智能交互导航。
在图/网络研究中,通常可以使用中心性表示图/网络的结构特征。中心性是指一个节点/连边在网络中处于核心地位的程度,中心性越大说明节点/连边越重要。常用的中心性有很多,比如度中心性、介数中心性、紧密中心性等,具体如下所示:
表1中心性的基本概念
Figure PCTCN2022130509-appb-000001
Figure PCTCN2022130509-appb-000002
其中,度中心性、介数中心性、紧密中心性等均可以用于衡量数据资产图谱的结构特征,进而实现数据资产图谱的智能交互导航。
以度中心性为例,度中心性是指通过衡量节点的度值大小来确定节点在网络中的重要性。如果节点的度值很高,那么该节点可以直接影响到的节点就越多,该节点的影响力就越大,其在数据资产图谱中的结构重要性越高。
相应地,管理系统200可以以度中心性为评分对当前数据资产图谱中的节点进行排序,向用户推荐前n1个节点。其中,向用户推荐的前n1个节点也称作推荐探索起始节点。n1为自定义参数,可由用户配置。
当用户选中一个节点(表征一个数据资产)进行下一步探索时,管理系统200可以遍历用户选中节点的所有潜在扩展连边,利用度中心性计算潜在扩展连边对应数据资产的重要程度,以此为评分对选中数据资产的潜在扩展连边进行排序,向用户推荐前n2个潜在扩展连边。其中,向用户推荐的前n2个连边也称作推荐连边。类似地,n2为自定义参数,可由用户控制。
当用户选中一个节点(表征一个数据资产),管理系统200也可以利用度中心性计算从该节点出发经过的所有节点(表征数据资产)的重要程度,以该重要程度为评分对这些节点进行排序,向用户推荐前n3个节点。其中,向用户推荐的前n3个节点也称作推荐探索目标节点。n3为自定义参数,可由用户配置。
S312:管理系统200获取数据资产图谱的业务特征。
数据资产图谱记录了企业或组织中包括的数据资产以及数据资产之间的关联关系。数据资产图谱通过节点或表征关联关系的连边表征企业或组织中的业务活动,体现了极强的业务相关性。因此,业务特征可以作为影响因子,用于判断数据资产图谱中节点和连边的业务重要性,从而实现数据资产图谱的智能交互探索。
业务特征可以包括业务权重或语义特征中的一种或多种。其中,业务权重为定量特征,语义特征为定性特征。考虑到数据资产图谱大多是典型的点边双异质网络,每种数据资产/关联关系通常具有不同的业务权重。比如,数据表和作业相对更加重要,业务权重的值也更大。数据资产图谱的规模十分庞大,同一种数据资产有较多的数量,但是数据资产本身 具备的语义信息(通过语义特征表示)赋予了它们不同的重要程度。比如,某数据资产图谱中包含数千个数据表,它们的类型是一样的,具备相同的业务权重。但是,由于不同的数据表包含的数据的语义特征不同,代表不同的业务活动,它们具备的业务价值是不同的。
以业务特征包括业务权重为例。数据资产图谱与企业的业务息息相关,数据资产图谱中的每种数据资产和关联关系天然地具有业务属性。如果一个数据资产/关联关系的业务权重值越大,代表其在数据资产图谱中的业务重要性越高。
管理系统200可以综合衡量数据资产图谱中节点表征的数据资产的业务权重以及与该节点具备关联关系的节点所表征的数据资产的业务权重,以业务权重为评分对数据资产图谱中的节点进行排序,向用户推荐前n1个节点。其中,向用户推荐的前n1个节点即为推荐探索起始节点。
当用户选中一个节点(表征一个数据资产)进行下一步探索时,管理系统200可以遍历用户选中节点的潜在扩展连边,根据连边的类型设置初始的业务权重,然后结合连边对应的节点的业务权重,综合计算出每条连边的最终的业务权重。管理系统200可以以最终的业务权重为评分对用户选中节点的潜在扩展连边进行排序,向用户推荐前n2个潜在扩展连边(能够表征可探索方向)。其中,向用户推荐的前n2个潜在扩展连边即为推荐连边。
当用户选中一个节点(表征一个数据资产),管理系统200也可以利用业务权重计算从该节点出发经过的所有节点(表征数据资产)的业务重要性,以该业务重要性为评分对这些节点进行排序,向用户推荐前n3个节点。其中,向用户推荐的前n3个节点也即推荐探索目标节点。
S314:管理系统200获取用户针对数据资产图谱的历史经验。
数据资产图谱的历史经验是指用户在历史时间段针对数据资产图谱的交互探索经验。数据资产图谱的用户通常具备较强的领域背景知识,这些用户的交互探索经验具有借鉴和指导意义。因此,历史经验可以作为影响因子,用于判断数据资产图谱中节点和连边的经验重要性,从而实现数据资产图谱的智能交互探索。
用户基于自身的领域知识和经验积累探索数据资产图谱,具有较强的倾向性和主观性,管理系统200可以获取用户在历史时间段针对数据资产图谱进行交互探索的统计学指标,该统计学指标可以作为用户针对数据资产图谱的历史经验。
在本实施例中,统计学指标可以包括点击频率或条件概率中的一种或多种。其中,数据资产和关联关系的点击频率代表了用户交互操作的频繁程度,点击频率与数据资产、关联关系的经验重要程度呈正相关。比如,多名用户均访问某数据资产,其点击频率的值较高,代表该数据资产的经验重要程度较高。条件概率是指一个事件在另一个事件已经发生条件下的发生概率。在进行数据资产图谱探索时,用户通常是有意识、有顺序的选择数据资产和关联关系,条件概率越高,代表该数据资产/关联关系的经验重要程度越高。
以历史经验包括点击频率为例,数据资产图谱探索要求用户具备一定的专业领域知识,意味着每位用户的探索经验是有价值的,可以为后来者提供建议。如果一个数据资产/关联关系的点击频率越大,代表其在数据资产图谱中的经验重要性越高。
管理系统200可以计算当前数据资产图谱中节点(表征的数据资产)的点击频率,以点击频率为评分对当前数据资产图谱中的节点进行排序,向用户推荐前n1个节点。其中, 向用户推荐的前n1个节点即为推荐探索起始节点。
当用户选中一个节点(表征一个数据资产),管理系统200也可以遍历用户选中节点的潜在扩展连边,综合计算扩展关系的点击频率,以及关联节点的点击频率,以点击频率为评分对用户选中节点的潜在扩展连边进行排序,向用户推荐前n2个潜在扩展连边。其中,向用户推荐的前n2个潜在扩展连边即为推荐连边。
当用户选中一个节点(表征一个数据资产),管理系统200也可以计算从该节点出发经过的所有节点(表征数据资产)的点击频率,以该点击频率为评分对这些节点进行排序,向用户推荐前n3个节点。其中,向用户推荐的前n3个节点也即推荐探索目标节点。
S316:管理系统200根据数据资产图谱的结构特征、业务特征或历史经验中的一种或多种,确定推荐探索信息。
上述结构特征、业务特征或历史经验分别从不同维度衡量节点或连边在数据资产图谱中的重要性,基于此,管理系统200可以综合上述不同维度的特征的组合进行节点推荐和/或连边推荐,从而确定推荐探索信息。管理系统200也可以基于单个维度的特征进行节点推荐和/或连边推荐,从而确定推荐探索信息。其中,节点推荐可以包括起始节点推荐、目标节点推荐。基于此,推荐探索信息可以包括推荐探索起始节点、推荐连边(也可以称作推荐探索方向)、推荐探索目的节点中的一种或多种。
进一步地,管理系统200综合不同维度的特征的组合进行节点推荐或连边推荐时,可以针对不同维度的特征分别设置推荐权重,然后管理系统200可以通过加权运算的方式,获得节点或连边的推荐评分,并根据节点或连边的推荐评分进行节点推荐或连边推荐。
S318:管理系统200向用户呈现推荐探索信息。
具体地,管理系统200可以通过图谱展示界面400向用户呈现推荐探索信息。例如,管理系统200可以将推荐探索信息叠加在数据资产图谱上,然后通过图谱展示界面向用户呈现数据资产图谱以及推荐探索信息。
如此,本申请的管理系统200可以实现面向数据资产图谱的智能交互导航,该管理系统200提供3种原子交互导航(N1,N2,N3)和3种影响因子(A1,A2,A3)。其中,原子交互导航是指不可分割的交互导航。N1代表重要资产导航(通过呈现推荐探索起始节点实现),N2代表重要探索方向导航(通过呈现推荐连边实现),N3代表重要目标资产导航(通过呈现推荐探索目标节点实现)。A1代表结构特征,A2代表业务特征,A3代表历史经验。
其中,原子交互导航可以灵活组合为7种形式(N1,N2,N3,N1N2,N2N3,N1N3,N1N2N3),影响因子也可以灵活组合为7种形式(A1,A2,A3,A1A2,A2A3,A1A3,A1A2A3),如此,本申请实施例的管理系统200可以涵盖数据资产图谱中的7*7种智能交互导航形式。
基于上述内容描述,本申请实施例的数据资产图谱的管理方法可以根据数据资产图谱,获得推荐探索信息,通过推荐探索信息引导用户交互探索数据资产图谱,减少了交互的盲目性,提高数据资产图谱的分析效率和探索体验。进一步地,该方法提供多种原子交互导航,涵盖了用户在数据资产探索中的基本核心需求。该方法还设计了多种影响因子,这多种影响因子分别从不同维度评估节点或连边的重要性,如此使得导航结果(推荐结果)更 为合理。而且,该方法支持灵活地组合原子交互导航和影响因子,从而得到丰富的数据资产图谱的管理方法,能够适应多变的数据资产图谱探索场景。
在一些可能的实现方式中,管理系统200还支持用户选择是否开启智能交互导航功能。当用户选择开启智能交互导航功能,则管理系统200可以执行如图3所示的数据资产图谱的管理方法。
进一步地,智能交互导航功能包括多种原子交互导航,管理系统200可以支持用户灵活配置原子交互导航,用户可以使用多种原子交互导航中的任意一种或者是任意组合。例如,用户可以使用重要资产导航和探索方向导航,从而通过多次交互实现对数据资产图谱的探索、分析;又例如,用户可以使用重要资产导航和重要目标资产导航,通过少量交互实现对数据资产图谱的探索、分析。
为了便于理解,下面以用户使用重要资产导航、探索方向导航和重要目标资产导航为例,对本申请的数据资产图谱的管理方法进行详细说明。
参见图5所示的数据资产图谱的管理方法的流程图,该方法包括:
步骤1:管理系统200接收用户输入的关键词,根据关键词获取意向资产列表,根据用户从意向资产列表中选中的意向资产,生成意向资产对应的数据资产图谱。
具体地,在获取意向资产列表后,管理系统200可以向用户呈现意向资产列表。用户可以通过鼠标左键单击意向资产,从而触发一键生成数据资产图谱。该数据资产图谱以用户选中的意向资产为中心,包括该意向资产、意向资产与其他数据资产的关联关系、意向资产关联的其他数据资产。
管理系统200向用户呈现的相关实现可以参见图3所示实施例相关内容描述,在此不再赘述。
步骤2:管理系统200通过图谱展示界面向用户呈现数据资产图谱,当用户触发推荐控件,在图谱展示界面展示推荐探索起始节点。
参见图6A所示的图谱展示界面400的示意图,图谱展示界面400上承载有推荐控件413。当用户触发推荐控件,管理系统200可以启动重要资产导航功能,具体是对数据资产图谱410中的节点所表征的数据资产进行判断,确定图谱展示界面400中重要数据资产所对应的节点,将该节点作为推荐探索起始节点进行展示。
其中,管理系统200可以在图谱展示界面400为重要数据资产对应的节点添加推荐标记414,标识该节点为推荐探索起始节点,以提醒用户可以优先探索此类数据资产。在本实施例中,推荐标记414可以是“红色感叹号”,管理系统200可以统一在重要数据资产对应的节点的右上角添加该推荐标记414。
需要说明的是,管理系统200还可以设置推荐控件413的默认状态,例如默认状态可以设置为触发态,如此,当用户触发一键生成数据资产图谱后,管理系统200可以直接在图谱展示界面向用户呈现数据资产图谱410,以及向用户呈现推荐探索起始节点的推荐标记414,从而向用户展示推荐探索起始节点。
步骤3:用户选中重要数据资产对应的第一节点以打开菜单栏,然后从菜单栏中选中连边扩展,接着从次级菜单栏中选择“获取推荐连边”以触发连边推荐操作,管理系统200 向用户展示用户选中的第一节点相关的推荐连边。
具体地,用户可以根据图谱展示界面400的推荐探索信息(如推荐标记414所标识的推荐探索起始节点)的提示选中一个数据资产对应的节点(为了便于描述,本实施例称之为第一节点),可以进行下一步的探索。其中,如图6B所示,用户可以通过鼠标右键单击选中该数据资产对应的第一节点,管理系统200即可弹出菜单栏416。
数据资产的关系扩展方式可以有多种。如果用户不想手动盲目扩展,那么可以选择菜单栏中的“获取推荐扩展连边”,点击该项,管理系统200可以自动地获取该第一节点对应的数据资产的潜在扩展关系,并对各潜在扩展关系进行评分。管理系统200可以按照评分对潜在扩展关系进行排序,然后如图6C所示,按照降序排列方式将潜在扩展关系显示在图谱展示界面400。
其中,管理系统200可以根据潜在扩展关系的评分,将满足条件的潜在扩展关系对应的连边确定为推荐连边,通过为满足条件的潜在扩展关系添加推荐图标418,从而实现在图谱展示界面400向用户展示推荐连边。其中,满足条件的潜在扩展关系可以是评分大于预设分值,或者评分排名前m,m为正整数。预设分值或者m可以根据经验值设置,例如预设分值可以设置为85。
将该数据资产的所有关系进行计算评分,并按照降序排列后显示在界面上。用户可以根据列表中的评分和关系类型,选择评分较高(如在85分以上,这里系统可以自动推荐,标注“荐”图标)的关系进行扩展。由于采用了重要探索方向导航,数据资产图谱中没有大量的冗余信息,用户可以通过起始数据资产高效地寻找值得探索的方向。
由于采用了探索方向导航,数据资产图谱中没有大量的冗余信息,用户可以通过管理系统200展示的推荐连边高效地寻找值得探索的方向,提高探索、分析的效率。
步骤4:管理系统200根据用户选中的推荐连边,更新数据资产图谱410。
具体地,用户根据推荐图标418选中想要扩展的推荐连边后,左键单击“扩展所选连边”,触发扩展连边操作,管理系统200响应于用户的扩展连边操作,在数据资产图谱410中绘制表征相应关联关系的连边以及连边所连接的节点,从而实现更新数据资产图谱410。图6D展示了更新后的数据资产图谱410,更新后的数据资产图谱410包括用户选中的、进行扩展的推荐连边,以及该连边所连接的节点。
需要说明的是,图6D是以推荐连边所连接的节点为数据资产图谱410中未出现的节点进行示例说明,在一些可能的实现方式中,推荐连边所连接的节点也可以为数据资产图谱410中已出现的节点,本申请实施例对此不作限制。
用户选中的推荐连边属于用户对推荐探索信息的一种反馈,相应地,管理系统200根据用户选中的推荐连边更新数据资产图谱410是管理系统200根据用户对推荐探索信息的反馈更新数据资产图谱410的一种实现方式,在本申请实施例其他可能的实现方式中,管理系统200也可以根据其他反馈更新数据资产图谱410。
例如,管理系统200展示的推荐连边中并不包括用户想要扩展的连边,用户也可以自定义扩展连边,以人工扩展数据资产的关联关系。具体地,用户可以通过鼠标右键单击选中该数据资产对应的第一节点,如此,管理系统200可以返回菜单栏416。参见图6E,用户可以选择菜单栏中的“自定义扩展连边”,点击该项,管理系统200弹出该项的次级菜单 栏,该次级菜单栏包括不同关系类型,用户可以选择其中一种关系类型,或者选择全部类型,以自定义扩展连边。参见图6F,用户选择“父子关系”后,图谱展示界面400展示了以用户选中的第一节点为起始节点,关系类型为“父子关系”的至少一个连边420。可见,管理系统200也可以根据用户对推荐探索信息的拒绝,更新数据资产图谱410。
步骤5:用户选中重要数据资产对应的第二节点以打开菜单栏,然后从菜单栏中选中“推荐探索目标”,触发目标推荐操作,管理系统200向用户展示推荐探索目标节点。
用户选中某一重要数据资产对应的节点(为了便于描述,本申请实施例可以称之为第二节点),打开菜单栏,然后执行如图6G所示的操作,选中“推荐探索目标”,管理系统200可以根据用户选中的第二节点为起始节点的路径,获得上述路径所经过的节点的重要程度,然后管理系统200可以以该重要程度为评分,从上述路径所经过的节点中确定推荐探索目标节点。
参见图6H,管理系统200可以按照评分,将用户选中的第二节点为起始节点的路径所经过的节点降序排列呈现给用户。其中,管理系统200可以将评分满足条件的节点确定为推荐探索目标节点,如图6H所示,管理系统200可以在图谱展示界面400为满足条件的节点添加推荐标记422,从而向用户展示推荐探索目标节点。
步骤6:用户从推荐探索目标节点选中第三节点,管理系统200向用户展示第二节点至第三节点的路径。
具体地,参见图6H,用户可以从推荐探索目标节点中选中一个节点(为了便于描述,本申请实施例称之为第三节点),则管理系统200可以自动绘制以第二节点为起始节点,以用户选中第三节点为目标节点的探索路径,参见图6I,管理系统200可以在图谱展示界面400展示该探索路径424。其中,管理系统200可以通过高亮方式显示探索路径424,以引导用户快速获取期望的目标资产。
需要说明的是,当用户不满意推荐结果时,可以自由选择其他探索目标。参见图6J,用户可以选择推荐探索目标节点之外的其他节点作为探索的目标节点,管理系统200自动绘制第二节点至用户选择的目标节点的探索路径,然后参见图6K,管理系统200可以在图谱展示界面400展示该探索路径424。
与推荐连边的选择或拒绝类似,用户对推荐探索目标节点的选择或拒绝也属于用户对推荐探索信息的反馈,管理系统200可以根据用户对推荐探索目标节点的选择或拒绝,更新数据资产图谱。
进一步地,用户对推荐探索信息的反馈可以用于更新推荐算法的推荐参数,从而使得管理系统200能够确定出更符合用户意向的推荐探索信息,提高推荐精度。例如,当用户拒绝推荐连边、推荐探索目标节点,选择自定义扩展连边、或选择其他探索目标时,管理系统200可以根据用户自定义的扩展连边,或者其他探索目标,调整结构特征、业务特征或历史经验等影响因子的推荐权重,以提高推荐精度。
还需要说明的是,用户选中的第二节点与第一节点可以是同一节点,或者是不同节点。并且,针对同一节点,用户可以选择执行步骤3、4,以启用探索方向导航,或者选择执行步骤5、6以启用重要目标资产导航,或者执行步骤3至6,以启用探索方向导航和重要目标资产导航。
该方法提出了面向数据资产图谱的智能交互导航框架,针对数据资产图谱探索中的用户核心需求(如重要数据资产探索、重要探索方向探索和重要目标数据资产探索),提供多种原子交互导航功能,实现探索数据资产图谱的智能交互导航,减少用户的盲目选择,提高用户探索和分析数据资产图谱的效率。
基于本申请实施例的数据资产图谱的管理方法,本申请实施例还提供了一种如前述的数据资产图谱的管理系统200。下面结合附图对数据资产图谱的管理系统200进行介绍。
参见图2所示的数据资产图谱的管理系统200的结构示意图,该系统200包括:
获取模块202,用于获取数据资产图谱;
推荐模块204,用于根据所述数据资产图谱,确定针对所述数据资产图谱的推荐探索信息;
交互模块206,用于向用户呈现所述推荐探索信息,所述推荐探索信息用于引导所述用户探索所述数据资产图谱。
上述获取模块202、推荐模块204和交互模块206可以通过硬件模块实现或通过软件模块实现。
当通过软件实现时,获取模块202、推荐模块204、交互模块206可以是运行在计算设备或计算设备集群上的应用程序或者应用程序模块。
当通过硬件实现时,获取模块202可以包括至少一个计算设备,如服务器。或者获取模块202也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。类似地,推荐模块204可以包括至少一个计算设备,或者是是利用ASIC实现、PLD实现的设备。交互模块206可以包括通信接口设备,例如是显示器。
在一些可能的实现方式中,所述推荐探索信息包括以下任意一种或多种:
推荐探索起始节点、推荐连边、推荐探索目标节点。
在一些可能的实现方式中,所述交互模块206具体用于:
当图谱展示界面上的推荐控件被触发时,在所述图谱展示界面展示所述推荐探索起始节点;或者,
当用户触发连边推荐操作时,向所述用户展示所述用户选中的第一节点相关的所述推荐连边;或者,
当用户触发目标推荐操作时,向所述用户展示所述推荐探索目标节点,所述推荐探索目标节点根据所述用户选中的第二节点为起始节点的路径所经过的节点的评分确定。
在一些可能的实现方式中,所述交互模块206还用于:
向所述用户展示所述第二节点至所述第三节点的路径,所述第三节点为所述用户从所述推荐探索目标节点中选中的节点。
在一些可能的实现方式中,所述推荐模块204具体用于:
获取所述数据资产图谱的影响因子;
根据所述影响因子,确定针对所述数据资产图谱的推荐探索信息。
在一些可能的实现方式中,所述影响因子包括结构特征、业务特征或用户针对所述数据资产图谱的历史经验。
在一些可能的实现方式中,所述结构特征包括中心性,所述业务特征包括业务权重或语义特征中的一种或多种,所述历史经验包括点击频率或条件概率中的一种或多种。
在一些可能的实现方式中,所述交互模块206还用于:
接收所述用户对所述推荐探索信息的反馈;
所述系统200还包括:
更新模块208,用于根据所述用户对所述推荐探索信息的反馈更新推荐参数。
在一些可能的实现方式中,所述用户对所述推荐探索信息的反馈包括所述用户对所述推荐探索信息的选择或拒绝。
在一些可能的实现方式中,所述交互模块206还用于:
接收用户输入的关键词;
所述获取模块202具体用于:
根据所述关键词获取意向资产列表;
所述交互模块206具体用于:
向所述用户展示意向资产列表;
所述获取模块202具体用于:
响应于用户对所述意向资产列表中意向资产的选中操作,生成所述意向资产对应的数据资产图谱。
在一些可能的实现方式中,所述交互模块206还用于:
接收所述用户自定义的扩展连边;
所述系统200还包括:
更新模块208,用于根据所述扩展连边更新所述数据资产图谱。
与获取模块202、推荐模块204类似,更新模块208可以通过软件实现或通过硬件实现。当更新模块208通过软件实现时,更新模块208可以是运行在计算设备或计算设备集群上的应用程序或者应用程序模块。当通过硬件实现时,更新模块208可以包括至少一个计算设备,如服务器。或者更新模块208也可以是利用ASIC实现、或PLD实现的设备等。
本申请还提供一种计算设备700。如图7所示,计算设备700包括:总线702、处理器704、存储器706和通信接口708。处理器704、存储器706和通信接口708之间通过总线702通信。计算设备700可以是服务器或终端设备。应理解,本申请不限定计算设备700中的处理器、存储器的个数。
总线702可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线704可包括在计算设备700各个部件(例如,存储器706、处理器704、通信接口708)之间传送信息的通路。
处理器704可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
存储器706可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器704还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。存储器706中存储有可执行的程序代码,处理器704执行该可执行的程序代码以实现前述数据资产图谱的管理方法。具体的,存储器706上存有数据资产图谱的管理系统200用于执行数据资产图谱的管理方法的指令。
通信接口703使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备700与其他设备或通信网络之间的通信。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。
如图8所示,所述计算设备集群包括至少一个计算设备700。计算设备集群中的一个或多个计算设备700中的存储器706中可以存有相同的数据资产图谱的管理系统200用于执行数据资产图谱的管理方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备700也可以用于执行数据资产图谱的管理系统200用于执行数据资产图谱的管理方法的部分指令。换言之,一个或多个计算设备700的组合可以共同执行数据资产图谱的管理系统用于执行数据资产图谱的管理方法的指令。
需要说明的是,计算设备集群中的不同的计算设备700中的存储器706可以存储不同的指令,用于执行数据资产图谱的管理系统的部分功能。
图9示出了一种可能的实现方式。如图9所示,两个计算设备700A和700B通过通信接口708实现连接。计算设备700A中的存储器上存有用于执行获取模块202和交互模块206的功能的指令。计算设备700B中的存储器上存有用于执行推荐模块204的功能的指令。进一步地,计算设备700B中的存储器还可以存储用于执行更新模块208的功能的指令。换言之,计算设备700A和700B的存储器706共同存储了数据资产图谱的管理系统用于执行数据资产图谱的管理方法的指令。
图9所示的计算设备集群之间的连接方式可以是考虑到本申请提供的数据资产图谱的管理方法需要交互数据资产图谱和推荐探索信息。因此,考虑将获取模块202和交互模块206实现的功能交由计算设备700A执行,推荐模块204和更新模块208实现的功能由计算设备700B执行。
应理解,图9中示出的计算设备700A的功能也可以由多个计算设备700完成。同样,计算设备700B的功能也可以由多个计算设备700完成。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。 其中,所述网络可以是广域网或局域网等等。图10示出了一种可能的实现方式。如图10所示,两个计算设备700C和700D之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备700C中的存储器706中存有执行获取模块202和交互模块206的功能的指令。同时,计算设备700D中的存储器706中存有执行推荐模块204和更新模块208的功能的指令。
图10所示的计算设备集群之间的连接方式可以是考虑到本申请提供的数据资产图谱的管理方法需要交互数据资产图谱和推荐探索信息,因此考虑将获取模块202和交互模块206实现的功能交由计算设备700C执行,推荐模块204和更新模块208实现的功能由计算设备700D执行。应理解,图10中示出的计算设备700C的功能也可以由多个计算设备700完成。同样,计算设备700D的功能也可以由多个计算设备700完成。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行上述应用于数据资产图谱的管理系统用于执行数据资产图谱的管理方法。
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行上述数据资产图谱的管理方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的保护范围。

Claims (25)

  1. 一种数据资产图谱的管理方法,其特征在于,所述方法包括:
    获取数据资产图谱;
    根据所述数据资产图谱,确定针对所述数据资产图谱的推荐探索信息;
    向用户呈现所述推荐探索信息,所述推荐探索信息用于引导所述用户探索所述数据资产图谱。
  2. 根据权利要求1所述的方法,其特征在于,所述推荐探索信息包括以下任意一种或多种:
    推荐探索起始节点、推荐连边、推荐探索目标节点。
  3. 根据权利要求2所述的方法,其特征在于,所述向用户呈现所述推荐探索信息,包括:
    当图谱展示界面上的推荐控件被触发时,在所述图谱展示界面展示所述推荐探索起始节点;或者,
    当用户触发连边推荐操作时,向所述用户展示所述用户选中的第一节点相关的所述推荐连边;或者,
    当用户触发目标推荐操作时,向所述用户展示所述推荐探索目标节点,所述推荐探索目标节点根据所述用户选中的第二节点为起始节点的路径所经过的节点的评分确定。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    向所述用户展示所述第二节点至所述第三节点的路径,所述第三节点为所述用户从所述推荐探索目标节点中选中的节点。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述根据所述数据资产图谱,确定针对所述数据资产图谱的推荐探索信息,包括:
    获取所述数据资产图谱的影响因子;
    根据所述影响因子,确定针对所述数据资产图谱的推荐探索信息。
  6. 根据权利要求5所述的方法,其特征在于,所述影响因子包括结构特征、业务特征或用户针对所述数据资产图谱的历史经验。
  7. 根据权利要求6所述的方法,其特征在于,所述结构特征包括中心性,所述业务特征包括业务权重或语义特征中的一种或多种,所述历史经验包括点击频率或条件概率中的一种或多种。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:
    接收所述用户对所述推荐探索信息的反馈;
    根据所述用户对所述推荐探索信息的反馈更新推荐参数。
  9. 根据权利要求8所述的方法,其特征在于,所述用户对所述推荐探索信息的反馈包括所述用户对所述推荐探索信息的选择或拒绝。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述获取数据资产图谱,包括:
    接收用户输入的关键词;
    根据所述关键词获取意向资产列表,并向所述用户展示意向资产列表;
    响应于用户对所述意向资产列表中意向资产的选中操作,生成所述意向资产对应的数据资产图谱。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述方法还包括:
    接收所述用户自定义的扩展连边;
    根据所述扩展连边更新所述数据资产图谱。
  12. 一种数据资产图谱的管理系统,其特征在于,所述系统包括:
    获取模块,用于获取数据资产图谱;
    推荐模块,用于根据所述数据资产图谱,确定针对所述数据资产图谱的推荐探索信息;
    交互模块,用于向用户呈现所述推荐探索信息,所述推荐探索信息用于引导所述用户探索所述数据资产图谱。
  13. 根据权利要求12所述的系统,其特征在于,所述推荐探索信息包括以下任意一种或多种:
    推荐探索起始节点、推荐连边、推荐探索目标节点。
  14. 根据权利要求13所述的系统,其特征在于,所述交互模块具体用于:
    当图谱展示界面上的推荐控件被触发时,在所述图谱展示界面展示所述推荐探索起始节点;或者,
    当用户触发连边推荐操作时,向所述用户展示所述用户选中的第一节点相关的所述推荐连边;或者,
    当用户触发目标推荐操作时,向所述用户展示所述推荐探索目标节点,所述推荐探索目标节点根据所述用户选中的第二节点为起始节点的路径所经过的节点的评分确定。
  15. 根据权利要求14所述的系统,其特征在于,所述交互模块还用于:
    向所述用户展示所述第二节点至所述第三节点的路径,所述第三节点为所述用户从所述推荐探索目标节点中选中的节点。
  16. 根据权利要求12至15任一项所述的系统,其特征在于,所述推荐模块具体用于:
    获取所述数据资产图谱的影响因子;
    根据所述影响因子,确定针对所述数据资产图谱的推荐探索信息。
  17. 根据权利要求16所述的系统,其特征在于,所述影响因子包括结构特征、业务特征或用户针对所述数据资产图谱的历史经验。
  18. 根据权利要求17所述的系统,其特征在于,所述结构特征包括中心性,所述业务特征包括业务权重或语义特征中的一种或多种,所述历史经验包括点击频率或条件概率中的一种或多种。
  19. 根据权利要求12至18任一项所述的系统,其特征在于,所述交互模块还用于:
    接收所述用户对所述推荐探索信息的反馈;
    所述系统还包括:
    更新模块,用于根据所述用户对所述推荐探索信息的反馈更新推荐参数。
  20. 根据权利要求19所述的系统,其特征在于,所述用户对所述推荐探索信息的反馈包括所述用户对所述推荐探索信息的选择或拒绝。
  21. 根据权利要求12至20任一项所述的系统,其特征在于,所述交互模块还用于:
    接收用户输入的关键词;
    所述获取模块具体用于:
    根据所述关键词获取意向资产列表;
    所述交互模块具体用于:
    向所述用户展示意向资产列表;
    所述获取模块具体用于:
    响应于用户对所述意向资产列表中意向资产的选中操作,生成所述意向资产对应的数据资产图谱。
  22. 根据权利要求12至21任一项所述的系统,其特征在于,所述交互模块还用于:
    接收所述用户自定义的扩展连边;
    所述系统还包括:
    更新模块,用于根据所述扩展连边更新所述数据资产图谱。
  23. 一种计算设备集群,其特征在于,所述计算设备集群包括至少一台计算设备,所述至少一台计算设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机可读指令;所述至少一个处理器执行所述计算机可读指令,以使得所述计算设备集群执行如权利要求1至11中任一项所述的方法。
  24. 一种计算机可读存储介质,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至11任一项所述的方法。
  25. 一种计算机程序产品,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至11任一项所述的方法。
PCT/CN2022/130509 2022-05-23 2022-11-08 一种数据资产图谱的管理方法及相关设备 WO2023226311A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210562551.1 2022-05-23
CN202210562551 2022-05-23
CN202210800256.5A CN117149890A (zh) 2022-05-23 2022-07-08 一种数据资产图谱的管理方法及相关设备
CN202210800256.5 2022-07-08

Publications (1)

Publication Number Publication Date
WO2023226311A1 true WO2023226311A1 (zh) 2023-11-30

Family

ID=88884858

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130509 WO2023226311A1 (zh) 2022-05-23 2022-11-08 一种数据资产图谱的管理方法及相关设备

Country Status (2)

Country Link
CN (1) CN117149890A (zh)
WO (1) WO2023226311A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170359236A1 (en) * 2016-06-12 2017-12-14 Apple Inc. Knowledge graph metadata network based on notable moments
CN109800278A (zh) * 2018-12-29 2019-05-24 亚信科技(南京)有限公司 数据资产图谱使用方法、装置、计算机设备和存储介质
CN112100400A (zh) * 2020-09-14 2020-12-18 京东方科技集团股份有限公司 基于知识图谱的节点推荐方法及装置
CN112732924A (zh) * 2020-12-04 2021-04-30 国网安徽省电力有限公司 一种基于知识图谱的电网数据资产管理系统与方法
CN113495978A (zh) * 2020-03-18 2021-10-12 中电长城网际系统应用有限公司 一种数据检索方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170359236A1 (en) * 2016-06-12 2017-12-14 Apple Inc. Knowledge graph metadata network based on notable moments
CN109800278A (zh) * 2018-12-29 2019-05-24 亚信科技(南京)有限公司 数据资产图谱使用方法、装置、计算机设备和存储介质
CN113495978A (zh) * 2020-03-18 2021-10-12 中电长城网际系统应用有限公司 一种数据检索方法及装置
CN112100400A (zh) * 2020-09-14 2020-12-18 京东方科技集团股份有限公司 基于知识图谱的节点推荐方法及装置
CN112732924A (zh) * 2020-12-04 2021-04-30 国网安徽省电力有限公司 一种基于知识图谱的电网数据资产管理系统与方法

Also Published As

Publication number Publication date
CN117149890A (zh) 2023-12-01

Similar Documents

Publication Publication Date Title
US20200301916A1 (en) Query Template Based Architecture For Processing Natural Language Queries For Data Analysis
US11216453B2 (en) Data visualization in a dashboard display using panel templates
US20230350883A1 (en) Dynamic Dashboard with Guided Discovery
US10033714B2 (en) Contextual navigation facets panel
US9569506B2 (en) Uniform search, navigation and combination of heterogeneous data
US8131748B2 (en) Search query formulation
US8060519B2 (en) Ontology-integration-position specifying apparatus, ontology-integration supporting method, and computer program product
US7840601B2 (en) Editable table modification
RU2623809C2 (ru) Автоматический анализ элементов данных
US11222048B2 (en) Knowledge search system
US11966419B2 (en) Systems and methods for combining data analyses
US20160110670A1 (en) Relational analysis of business objects
US20140330821A1 (en) Recommending context based actions for data visualizations
US20110276915A1 (en) Automated development of data processing results
US20090024940A1 (en) Systems And Methods For Generating A Database Query Using A Graphical User Interface
US20110078603A1 (en) Method and system of providing search results for a query
US20150127688A1 (en) Facilitating discovery and re-use of information constructs
US10860163B1 (en) Generating user interface for viewing data records
US10353958B2 (en) Discriminative clustering
EP3721354A1 (en) Systems and methods for querying databases using interactive search paths
US11475081B2 (en) Combining catalog search results from multiple package repositories
WO2023226311A1 (zh) 一种数据资产图谱的管理方法及相关设备
US20230244675A1 (en) Intent driven dashboard recommendations
Budiselić et al. Component recommendation for composite application development

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22943498

Country of ref document: EP

Kind code of ref document: A1