CN113360496A

CN113360496A - Method and device for constructing metadata tag library

Info

Publication number: CN113360496A
Application number: CN202110578763.4A
Authority: CN
Inventors: 崔维平; 孙艺新; 郑厚清; 王智敏; 王程; 贾德香; 王玓; 李心达; 陈�光; 高洪达; 刘睿; 于灏; 刘素蔚; 陈睿欣; 颜拥; 姚影; 雷涛; 赵琳; 叶文广; 齐媛媛
Original assignee: Tianyun Rongchuang Data Science & Technology Beijing Co ltd; State Grid Energy Research Institute Co Ltd; Electric Power Research Institute of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Tianyun Rongchuang Data Science & Technology Beijing Co ltd; State Grid Energy Research Institute Co Ltd; Electric Power Research Institute of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2021-09-07

Abstract

The application relates to a method, a device, a computer device and a computer readable storage medium for constructing a metadata tag library, wherein the method comprises the following steps: obtaining a plurality of metadata entities; obtaining a dimension label corresponding to each metadata entity according to the relationship among the metadata entities; the dimension tag is used for indicating a dimension of a relationship between the metadata entity and another metadata entity; obtaining a blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity; and acquiring and adding at least one of an activity label, an influence label and a similar label aiming at each metadata entity according to the blood relationship of each metadata entity. By adopting the method, the dimension reflecting the incidence relation of the metadata entities is included in the data asset tag system, and various metadata entities and the blood relationship corresponding to the metadata entities can be completely obtained, so that the complete metadata tag system is constructed.

Description

Method and device for constructing metadata tag library

Technical Field

The present application relates to the field of big data management technologies, and in particular, to a method and an apparatus for constructing a metadata tag library.

Background

The concept boundary of the data asset is continuously expanded along with the change of a data management technology, in a big data stage, along with the application of distributed storage, distributed computation and various artificial intelligence technologies, data except structured data is also brought into the category of the data asset, and the data asset boundary is expanded to massive contents such as a tag library, an enterprise-level knowledge map, documents, pictures, videos and the like. Currently, the existence form of data assets for large enterprises has been upgraded from "database + data warehouse" to large data repository. When managing data assets, mining, implementing and outputting data asset values, an important basic task is to establish a data asset tag library.

In data asset management, metadata management is a foundation, and the management system not only has unique management requirements, management characteristics and technical implementation modes, but also has direct influence and effect on data use support, data development support, data quality management and data value management. The metadata tag is an important implementation mode and a bearing mode of metadata management, and is mainly applied to the following aspects: data classification and data classification, data consanguinity relation, data quality transfer analysis, data value analysis and judgment, data exploration and federal access, data organization and fusion.

In data assets, there are complex data dependencies and consanguinity relationships that are difficult to reflect through relational databases. The traditional data tag generation mode is obtained only by calculation based on entity attribute values in a relational database, when the relationship attributes among entities are involved, a small amount of local entity relationship attribute information can be obtained only by a manual mode or a rule calculation mode, so that complete entity relationship information of the whole network cannot be obtained, and a complete metadata tag system cannot be constructed.

Disclosure of Invention

To solve the above technical problem or at least partially solve the above technical problem, the present application provides a method for constructing a metadata tag library, which is used for realizing automatic batch calculation and automatic marking of metadata tags.

In a first aspect, the present application provides a method for constructing a metadata tag library, including:

obtaining a plurality of metadata entities;

obtaining a dimension label corresponding to each metadata entity according to the relationship among the metadata entities; the dimension tag is used for indicating a dimension of a relationship between the metadata entity and another metadata entity;

obtaining a blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity;

and acquiring and adding at least one of an activity label, an influence label and a similar label aiming at each metadata entity according to the blood relationship of each metadata entity.

As an optional implementation manner of the embodiment of the present invention, the acquiring and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to a blood relationship of each metadata entity includes:

acquiring the quoted times, the quoted frequency and the referrer weight of each metadata entity according to the blood relationship of each metadata entity;

and acquiring the activity label of each metadata entity according to the referenced times, the referenced frequency and the referrer weight.

acquiring one or more of the centrality, the intermedium and the compactness of each metadata entity according to the blood relationship of each metadata entity;

and acquiring the influence label of each metadata entity according to one or more of the centrality, the intermedium and the compactness.

performing clustering analysis on the plurality of metadata entities to obtain a clustering result;

and acquiring the influence label of each metadata entity according to the clustering result.

As an optional implementation manner of the embodiment of the present invention, according to a blood relationship of each metadata entity, at least one of an activity tag, an influence tag, and a similar tag is obtained and added for each metadata entity, and the method further includes:

obtaining a calculation result of the similarity degree between the metadata entities according to the blood relationship of the metadata entities;

and obtaining the similar labels of the metadata entities according to the calculation result of the similarity degree between the metadata entities.

As an optional implementation manner of the embodiment of the present invention, before obtaining the dimension tag corresponding to each metadata entity according to the relationship between the plurality of metadata entities, the method further includes:

and acquiring the relationship among the plurality of metadata entities by one or more modes of analyzing a data dictionary, analyzing an SQL sentence, analyzing a database and analyzing an audit log.

As an optional implementation manner of the embodiment of the present invention, the method further includes:

generating a relation map by taking each metadata entity as a vertex and taking the relation between the metadata entity and other metadata entities in the blood relationship corresponding to each metadata entity as an edge;

and storing the relationship map into the map database.

In a second aspect, the present application provides an apparatus for building a metadata tag library, including:

the acquisition metadata entity module is used for acquiring a plurality of metadata entities;

the first acquisition module is used for acquiring the dimension labels corresponding to the metadata entities according to the relationship among the metadata entities; the dimension tag is used for indicating a dimension of a relationship between the metadata entity and another metadata entity;

the second acquisition module is used for acquiring the blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity;

and the third acquisition module is used for acquiring and adding at least one of an activity label, an influence label and a similar label for each metadata entity according to the blood relationship of each metadata entity.

As an optional implementation manner of the embodiment of the present invention, acquiring and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to a blood relationship of each metadata entity includes:

and storing the relationship map into the map database.

In a third aspect, the present application provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the method for constructing a metadata tag library according to the first aspect or any embodiment of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program is executed by a processor to perform the method for constructing a metadata tag library according to the first aspect or any embodiment of the first aspect.

Compared with the prior art, the technical scheme provided by the application has the following advantages:

the method for constructing the metadata tag library comprises the steps of firstly, obtaining a plurality of metadata entities; then obtaining a dimension label corresponding to each metadata entity according to the relationship among the metadata entities; obtaining a blood relationship corresponding to each metadata entity according to the corresponding dimension label of each metadata entity; and finally, acquiring and adding at least one of an activity label, an influence label and a similar label aiming at each metadata entity according to the blood relationship of each metadata entity. Compared with the traditional data tag generation mode, the method enables the relation of the metadata entities to be digitalized, and enables the dimension reflecting the association relation of the metadata entities to be included in the data asset tag system, so that various metadata entities and the consanguinity relation corresponding to the metadata entities can be completely obtained, and a complete metadata tag system is constructed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow diagram illustrating a method for building a metadata tag library in one embodiment;

FIG. 2 is a diagram that illustrates a manner in which relationships between metadata entities may be stored, according to one embodiment;

FIG. 3 is a block diagram of an apparatus for building a metadata tag library for a computer device in one embodiment;

FIG. 4 is a block diagram of a computer device in one embodiment.

Detailed Description

In order that the above-mentioned objects, features and advantages of the present application may be more clearly understood, the solution of the present application will be further described below. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways than those described herein; it is to be understood that the embodiments described in this specification are only some embodiments of the present application and not all embodiments.

Paraphrasing the term:

the data label is an important means for combing the data assets, on one hand, a data asset manager can supplement and expand classification and grading management of data through the data label to enrich expression of data characteristics and attributes, and on the other hand, a user of the data assets can quickly find data required by the user through the data label.

Big data label: when the data resource is big data, the data tag evolves to be a big data tag, which is different from the traditional enterprise data tag by the following:

(1) the starting points are different: traditional enterprise tags are more from an enterprise business perspective; and the big data label is theoretically oriented to each type of data ID with marking value and marking feasibility.

(2) The data mining method is different: the traditional enterprise label more depends on experience to carry out relevant dimension combination and threshold setting; and the large data label is subjected to dimension screening and threshold setting through a data model.

(3) The management of the tags is different: the management of traditional enterprise tags generally cannot be managed in a self-organizing way; whereas big data tags emphasize full lifecycle management and dynamic management.

(4) The support application of the label is different: the application of the traditional enterprise label is more based on experience, for example, from the product perspective, potential marking customers of the product are found out through the label, and then product popularization is carried out through related channel contacts; the big data label is more based on the deep understanding of the scene, emphasizes the portrait of a certain data ID, and adapts to the integration scheme of solution, information, channel, value and the like.

Blood-based analysis is a technical means for comprehensive tracking of data processing process, so as to find all relevant metadata objects with a certain data object as a starting point and the relationship between these metadata objects. The relationships between metadata objects refer specifically to data stream input-output relationships that represent these metadata objects. Purpose of blood-related analysis: obtaining source information of the result data by blood-margin tracking according to the integrated database or view; when the data is updated, the change of the original database can be reflected, and the change process of the data in the data stream can be checked.

The blood relationship analysis is a mapping of the internal relationship of the data objects, and certain correlation and its antecedent consequence can be reflected by combining the time sequence and the successive relationship. Therefore, the application range can be very wide, and the method is also a core interest tool for data asset management.

The embodiment of the application provides a method, a device, electronic equipment, a computer-readable storage medium and a program product for constructing a metadata tag library. The method includes the steps that a plurality of metadata entities are obtained, the dimension reflecting the association relation of the metadata entities is contained in a data asset tag system, and at least one of an activity tag, an influence tag and a similar tag is obtained and added aiming at each metadata entity according to the blood relationship of each metadata entity. In the process of obtaining the liveness label, the influence label and the similar label, the artificial intelligence modeling means is introduced into the label printing work, so that the defects of low efficiency, high cost and long period of artificial label printing can be overcome.

In one embodiment, the present application provides a method for building a metadata tag library. As shown in fig. 1, the method for constructing the metadata tag library includes the following steps:

s101, acquiring a plurality of metadata entities.

The operation mode of acquiring the metadata entity is divided into automatic acquisition and manual acquisition.

Specifically, the automatic acquisition means that an acquisition task is automatically and regularly completed. The collection task is an automatically scheduled work unit, providing an automated, periodic, or time-triggered mechanism for the collection of metadata. The method supports the maintenance of tasks through interfaces, such as inquiry, addition, modification and deletion, and can configure the time and the state of automatic execution of the tasks. Illustratively, the metadata is automatically collected after obtaining the connection rights of the data source. The data source types may include: conventional databases such as Oracle (Oracle Database, a relational Database management system), MYSQL (a relational Database management system), DB2(IBM DB2, a relational Database management system), Informix (Informix is a relational Database management system produced by IBM corporation), maridb (maridb Database management system), Sybase (relational Database system), and the like. Manual collection refers to selecting a local file to upload to a server to manually collect metadata. Compared with automatic collection, the manual collection function is a manual mode for collecting metadata information stored in a local file in real time, such as metadata stored in an excel file.

Further, after connecting to the database, the file storing the metadata is read to the parsing platform. For example, in the case of a data dictionary, the data dictionary contains tables describing language data such as table names, field names, data types, data storage procedures, etc., and the tables of this type are canonical and have a uniform format. Different types of databases may use different parsers to parse tables describing language data, such as table names, field names, data types, data storage procedures, etc., into nodes in a database. Meanwhile, the method provides the check of the collected log information and checks whether the collection is successful. Viewing the collection log may query the collection task for the following information: start time, task status, end time, process logs, number of acquisitions, etc.

After metadata collection is completed, the metadata is stored in a database, and metadata applications including metadata statistics, query, blood relationship analysis, influence analysis, data asset maps and the like are supported.

S102, obtaining the corresponding dimension label of each metadata entity according to the relation among the metadata entities.

Wherein the dimension tag is used to indicate a dimension of a relationship between the metadata entity and another metadata entity.

S103, obtaining the blood relationship corresponding to each metadata entity according to the corresponding dimension label of each metadata entity.

The relationship between the blood relationship and the related data is found in the process of data tracing. A big data bloodline refers to the link in which the data is generated, i.e., the source of the data and what processes and phases the data has gone through. Through the relationship of the blood relationship of different levels, the migration flow of the data can be clearly known, and the dimension of the relationship between one metadata entity and another metadata entity can be obtained according to the dimension label corresponding to each metadata entity, namely, the relationship of several layers of blood relationship between the two metadata entities is obtained.

The dimension reflecting the metadata association relation is contained in the data asset tag system, various metadata nodes and relations can be completely and automatically obtained, and a guarantee is provided for automatically constructing a complete data map. The data asset tag is a description of the asset from a plurality of different angles, one tag can be labeled on different assets, one asset can be labeled with a plurality of different tags at the same time, and the data asset tag can be classified and managed in a grouping or catalogue mode. Therefore, the construction of the label system should consider different application angles of querying, checking, recommending and the like on the data assets. For example, data security management may define a tag catalog and tags from the perspective of data security level, and label the tags to various assets, and after the labeling is completed, data assets may be searched from different tag systems. It is also contemplated that the tag directory hierarchy may be defined from the perspective of business lines, data life cycles, and the like.

S104, according to the blood relationship of each metadata entity, at least one of an activity label, an influence label and a similar label is obtained and added aiming at each metadata entity.

In one embodiment, the obtaining and adding at least one of an activity tag, an influence tag and a similar tag for each metadata entity according to the blood relationship of each metadata entity includes: acquiring the quoted times, the quoted frequency and the referrer weight of each metadata entity according to the blood relationship of each metadata entity; and acquiring the activity label of each metadata entity according to the referenced times, the referenced frequency and the referrer weight.

Specifically, for the metadata activity label, an activity calculation rule is set, which contains the number of times of reference, the frequency of reference and the weight of the referrer.

In one embodiment, the obtaining and adding at least one of an activity tag, an influence tag and a similar tag for each metadata entity according to the blood relationship of each metadata entity includes: acquiring one or more of the centrality, the intermedium and the compactness of each metadata entity according to the blood relationship of each metadata entity; and acquiring the influence label of each metadata entity according to one or more of the centrality, the intermedium and the compactness.

The centrality is a measurement index for measuring the importance degree of the node. There are three most fundamental dimensions for centrality: degree centrality, medium centrality, feature vector centrality. The most direct measurement index of the centrality of the node is characterized in network analysis. The larger the node degree of a node is, the higher the centrality of the node is, and the more important the node is in the network.

Degree of centrality: one node is in direct contact with many other nodes, and the node is in a central position. I.e. the broader the relationship of a node, the more adjacent nodes, the more important this node is.

The intermediate centrality, i.e. indirect centrality, refers to the number of shortest paths that a node has among other nodes. I.e. this node acts as a gate through which the node to which it is connected wants to get other nodes to go.

Tight centrality: i.e., closeness, reflects how close a node is to other nodes. If a node is closer to other nodes, it does not need to rely on other nodes when it propagates information. A node is not constrained by other nodes because of the short distance from the node to each point in the network.

The nodes are the concept of graph database, namely libraries, tables, fields, views and the like; relationships are those between libraries and tables, relationships between libraries, and the like. During the design of the database, the nodes behave as entities. An entity in a database often refers to a collection of something. The entity can be a specific person, thing; or abstract concepts and relationships.

Since the centrality, the intermediary degree, and the closeness all represent the importance of the metadata entity in the network, different determination criteria can be used in different application scenarios, and no specific limitation is imposed in this embodiment. Besides, a method for calculating the importance ranking of the nodes in the network is also provided, namely a PageRank algorithm.

PageRank, the web page rank, is the earliest algorithm used by Google to rank web pages, indicating how important the web pages are by treating links as votes. The calculation process of PageRank is not complicated: before the first iteration starts, all vertexes set the PageRank values of the vertexes as 1; in each iteration, each vertex contributes the current PageRank value of each vertex to all the neighbors, the number of the edges is divided by the number of the edges to serve as votes, and then all the received votes from the neighbors are accumulated to serve as new PageRank values; and repeating the steps until the PageRank values of all the vertexes change between two adjacent rounds to reach a certain threshold value. PageRank takes web pages as vertexes and hyperlinks between web pages as edges, and the whole Internet can be modeled into a very large graph. When the search engine returns the result, the quality of the web page itself needs to be considered in addition to the degree of correlation between the content of the web page and the keywords.

In one embodiment, the obtaining and adding at least one of an activity tag, an influence tag and a similar tag for each metadata entity according to the blood relationship of each metadata entity includes: performing clustering analysis on the plurality of metadata entities to obtain a clustering result; and acquiring the influence label of each metadata entity according to the clustering result.

Clustering is an important unsupervised algorithm in machine learning, and can classify data points into a series of specific combinations. Data points that fall into one class theoretically have the same characteristics, while data points of different classes have different attributes.

In this embodiment, a clustering result may be obtained by adopting a density clustering method or a community clustering method.

Wherein, density clustering examines the connectivity among samples from the perspective of sample density, and continuously expands connectable samples until the final clustering result is obtained.

Community discovery algorithm based on spectrum analysis: the graph is represented by a specific matrix by using an adjacency matrix and a diagonal matrix of the graph, such as a Laplace matrix L of the graph which is D-W, D is the diagonal matrix taking the degree of each node as a diagonal element, and W is the adjacency matrix of the graph. And taking the matrix characteristic components corresponding to the nodes as space coordinates, mapping the nodes in the network into a multi-dimensional characteristic vector space, and clustering the nodes into communities by using a traditional clustering method.

In an embodiment, the obtaining and adding at least one of an activity label, an influence label and a similar label for each metadata entity according to a blood relationship of each metadata entity obtains and adds a calculation result of a similarity degree between each metadata entity according to the blood relationship of each metadata entity; and obtaining the similar labels of the metadata entities according to the calculation result of the similarity degree between the metadata entities.

Illustratively, in this embodiment, a similarity algorithm is used for modeling, and a calculation result of the similarity between metadata entities is obtained. In the current natural language processing, data mining and machine learning, the similarity measurement algorithm is a relatively common algorithm and is the basis of text calculation. Similarity measures help developers to discover data relevance, and have two core points: the first aspect is a characteristic representation of the data and the second aspect is a representation method between sets.

In the embodiment, an artificial intelligent modeling method is adopted in the process of obtaining each label of the metadata entity, so that the defects of low efficiency, high cost and long period of artificial labeling can be overcome.

In one embodiment, before obtaining the dimension label corresponding to each metadata entity according to the relationship between the plurality of metadata entities, the relationship between the plurality of metadata entities is obtained by one or more of parsing a data dictionary, parsing an SQL statement, parsing a database, and parsing an audit log.

Specifically, one method is to access a data dictionary table of the database to obtain user permission information in the database and information such as table names, field names, data types, primary keys and foreign keys in a basic table, define all basic tables and data items as entities in a data map, and construct a library/table relationship, a table/field relationship, an inter-table foreign key relationship, and a relationship between a user and data.

Wherein, the table is the setting of the related attribute of a table. In a relational database, another name for a table is called "relationship".

The data types are data classified according to data structures, and data with the same data structure belong to the same class, that is, the same class of data is called a data type. Illustratively, in the MYSQL relational database management system, there are three main types of data: text, number, and date/time type.

A view, a table constructed from one or more base tables (tables that actually store data), i.e., temporarily stored data, is not a real table, but essentially only a select statement, according to some condition.

The stored procedure is stored in a large database system, a group of SQL statement sets for completing specific functions are stored in the database, after first compiling, the stored procedure does not need to be compiled again when called again, and a user calls the stored procedure by specifying the name of the stored procedure and giving parameters (if the stored procedure has the parameters).

Relationships, one relationship corresponds to what is commonly referred to as a table.

The primary key, often a column or combination of columns in a table, has a value that uniquely identifies each row in the table.

Foreign keys, which set the value of a field in one table, must be derived from the value of a primary key field in another table.

And secondly, analyzing the source, the destination and the processing process of the data stream through the SQL statement so as to construct a storage process relation among tables, a functional relation among tables/fields and a view relation among tables.

And thirdly, information such as tables, columns, data types, views, storage processes, relationships, main keys, foreign keys and the like is acquired through Schema analysis, a data dictionary analysis result and an SQL statement analysis result are supplemented, and a more comprehensive metadata association relationship is acquired.

And fourthly, obtaining the relation among the user access library, the table and the field and the time and frequency information of the view by analyzing the audit log.

Specifically, in the data assets, the higher the frequency of accessing a certain library, table, or field by a user, the more important the access department is, which indicates that the importance of the data is higher, and when the metadata tag is marked, the information is needed as one of the bases for marking the tag.

In one embodiment, each metadata entity is taken as a vertex, and the relationship between the metadata entity and other metadata entities in the blood relationship corresponding to each metadata entity is taken as an edge, so as to generate a relationship map; and storing the relationship map into the map database.

The graph is an abstract data structure for representing the association relationship between objects, and is described by using a vertex and an edge, wherein the vertex represents the objects, and the edge represents the relationship between the objects. That is, the data depicted in the graph is graph data.

Graph databases provide efficient associative queries. In the data map stored by the map database, the other entity related to the entity can be quickly acquired by inquiring the edge of the entity and the label on the edge of the entity, so that the complicated association operation of various tables is omitted, the relationship inquiry is more convenient and faster, and the efficiency is obviously improved.

The data map is a metadata relationship map constructed based on a map database and all metadata information, the entities contained in the data map are metadata entities such as a library, a table and a field, and management entities such as visitors and owners, and the relationships contained in the data map are relationships among the metadata entities, such as foreign key relationships, Schema relationships and kindred relationships, and relationships between the metadata entities and data visitors.

Illustratively, referring to fig. 2, the obtained entity and relationship data are stored in the form of adjacency matrix, and the adjacency matrix is divided into rows and stored on each physical node of the large data platform. In addition, other data indicates that there is no relation between the two entities, the relation between the two entities is '1', the relation between the two entities is '0', and a one-degree relation can be directly calculated in the adjacency matrix.

Specifically, each row of the segmentation matrix stored on the physical node of the big data platform is continuously stored in a data block on the file system, and in order to save space, the value of "0" is not stored but marked by a position mark.

Through a given main entity, finding a corresponding row on a cluster physical node of a corresponding segmentation matrix and obtaining an associated entity with the associated entity having a layer 1 relationship with the main entity, through message transmission among the cluster nodes, transmitting the associated entity value to a physical node of the corresponding matrix row to find a layer 2 relationship, and so on until finding all associated entities with the main entity according to an appointed level.

By applying the embodiment of the application, a plurality of metadata entities are obtained firstly; then obtaining a dimension label corresponding to each metadata entity according to the relationship among the metadata entities; obtaining a blood relationship corresponding to each metadata entity according to the corresponding dimension label of each metadata entity; and finally, acquiring and adding at least one of an activity label, an influence label and a similar label aiming at each metadata entity according to the blood relationship of each metadata entity. Compared with the traditional data tag generation mode, the method enables the relation of the metadata entities to be digitalized, and enables the dimension reflecting the association relation of the metadata entities to be included in the data asset tag system, so that various metadata entities and the consanguinity relation corresponding to the metadata entities can be completely obtained, and a complete metadata tag system is constructed.

In one embodiment, as shown in FIG. 3, there is provided an apparatus for building a metadata tag library, the apparatus comprising:

an obtain metadata entity module 301, configured to obtain a plurality of metadata entities.

A first obtaining module 302, configured to obtain, according to a relationship between the multiple metadata entities, a dimension tag corresponding to each metadata entity; the dimension tag is used to indicate a dimension of a relationship between the metadata entity and another metadata entity.

The second obtaining module 303 is configured to obtain a blood relationship corresponding to each metadata entity according to the dimension tag corresponding to each metadata entity.

A third obtaining module 304, configured to obtain and add at least one of an activity label, an influence label, and a similar label for each metadata entity according to a blood relationship of each metadata entity.

As an optional implementation manner of the embodiment of the present invention, the acquiring and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to a blood relationship of each metadata entity includes: acquiring the quoted times, the quoted frequency and the referrer weight of each metadata entity according to the blood relationship of each metadata entity; and acquiring the activity label of each metadata entity according to the referenced times, the referenced frequency and the referrer weight.

As an optional implementation manner of the embodiment of the present invention, the acquiring and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to a blood relationship of each metadata entity includes: acquiring one or more of the centrality, the intermedium and the compactness of each metadata entity according to the blood relationship of each metadata entity; and acquiring the influence label of each metadata entity according to one or more of the centrality, the intermedium and the compactness.

As an optional implementation manner of the embodiment of the present invention, the acquiring and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to a blood relationship of each metadata entity includes: performing clustering analysis on the plurality of metadata entities to obtain a clustering result; and acquiring the influence label of each metadata entity according to the clustering result.

As an optional implementation manner of the embodiment of the present invention, acquiring and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to a blood relationship of each metadata entity includes: obtaining a calculation result of the similarity degree between the metadata entities according to the blood relationship of the metadata entities; and obtaining the similar labels of the metadata entities according to the calculation result of the similarity degree between the metadata entities.

As an optional implementation manner of the embodiment of the present invention, before obtaining the dimension tag corresponding to each metadata entity according to the relationship between the plurality of metadata entities, the method further includes: and acquiring the relationship among the plurality of metadata entities by one or more modes of analyzing a data dictionary, analyzing an SQL sentence, analyzing a database and analyzing an audit log.

As an optional implementation manner of the embodiment of the present invention, the method further includes: generating a relation map by taking each metadata entity as a vertex and taking the relation between the metadata entity and other metadata entities in the blood relationship corresponding to each metadata entity as an edge; and storing the relationship map into the map database.

The specific definition of the device for constructing the metadata tag library can be referred to the above definition of the method for constructing the metadata tag library, and is not described herein again. The modules in the above described apparatus for constructing a metadata tag library may be implemented in whole or in part by software, hardware, or a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a notebook computer, and its internal structure diagram may be as shown in fig. 4. The computer device comprises a processor, a memory, a communication interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The computer program, when executed by a processor, implements a method of building a metadata tag library. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the apparatus for constructing a metadata tag library provided in the present application may be implemented in the form of a computer, and a computer program may be run on a computer device as shown in fig. 4. The memory of the computer device may store various program modules constituting the metadata tag library construction apparatus of the computer device, such as the metadata entity acquiring module, the first acquiring module, the second acquiring module, and the third acquiring module shown in fig. 3. The computer program of each program module causes the processor to execute the steps of the method for building the metadata tag library of the computer device of each embodiment of the present application described in the present specification.

For example, the computer device shown in fig. 4 may perform step S101 through the obtain metadata entity module in the build metadata tag library apparatus of the computer device shown in fig. 3. The computer device may perform step S102 through the first obtaining module. The computer device may perform step S103 through the second obtaining module. The computer device may perform step S104 through the third obtaining module.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: obtaining a plurality of metadata entities; obtaining a dimension label corresponding to each metadata entity according to the relationship among the metadata entities; the dimension tag is used for indicating a dimension of a relationship between the metadata entity and another metadata entity; obtaining a blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity; and acquiring and adding at least one of an activity label, an influence label and a similar label aiming at each metadata entity according to the blood relationship of each metadata entity.

In one embodiment, the processor, when executing the computer program, further performs the steps of: the obtaining and adding at least one of an activity label, an influence label and a similar label for each metadata entity according to the blood relationship of each metadata entity includes: acquiring the quoted times, the quoted frequency and the referrer weight of each metadata entity according to the blood relationship of each metadata entity; and acquiring the activity label of each metadata entity according to the referenced times, the referenced frequency and the referrer weight.

In one embodiment, the processor, when executing the computer program, further performs the steps of: the obtaining and adding at least one of an activity label, an influence label and a similar label for each metadata entity according to the blood relationship of each metadata entity includes: acquiring one or more of the centrality, the intermedium and the compactness of each metadata entity according to the blood relationship of each metadata entity; and acquiring the influence label of each metadata entity according to one or more of the centrality, the intermedium and the compactness.

In one embodiment, the processor, when executing the computer program, further performs the steps of: the obtaining and adding at least one of an activity label, an influence label and a similar label for each metadata entity according to the blood relationship of each metadata entity includes: performing clustering analysis on the plurality of metadata entities to obtain a clustering result; and acquiring the influence label of each metadata entity according to the clustering result.

In one embodiment, the processor, when executing the computer program, further performs the steps of: according to the blood relationship of each metadata entity, at least one of an activity label, an influence label and a similar label is obtained and added for each metadata entity, and the method comprises the following steps: obtaining a calculation result of the similarity degree between the metadata entities according to the blood relationship of the metadata entities; and obtaining the similar labels of the metadata entities according to the calculation result of the similarity degree between the metadata entities.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and obtaining the relation among the plurality of metadata entities by one or more modes of analyzing a data dictionary, analyzing an SQL statement, analyzing a database and analyzing an audit log before obtaining the dimension label corresponding to each metadata entity according to the relation among the plurality of metadata entities.

In one embodiment, the processor, when executing the computer program, further performs the steps of: generating a relation map by taking each metadata entity as a vertex and taking the relation between the metadata entity and other metadata entities in the blood relationship corresponding to each metadata entity as an edge; and storing the relationship map into the map database.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when executed by a processor, performs the steps of: obtaining a plurality of metadata entities; obtaining a dimension label corresponding to each metadata entity according to the relationship among the metadata entities; the dimension tag is used for indicating a dimension of a relationship between the metadata entity and another metadata entity; obtaining a blood relationship corresponding to each metadata entity according to the dimension label corresponding to each metadata entity; and acquiring and adding at least one of an activity label, an influence label and a similar label aiming at each metadata entity according to the blood relationship of each metadata entity.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and generating a relation map by taking each metadata entity as a vertex and taking the relation between the metadata entity and other metadata entities in the blood relationship corresponding to each metadata entity as an edge, and storing the relation map into the map database.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

It will be understood by those of ordinary skill in the art that all or part of the processes described above in the example methods may be implemented by hardware associated with instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and which, when executed, may comprise processes according to the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The non-volatile Memory may include a ROM (Read-Only Memory), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, or the like. Volatile Memory can include RAM (Random Access Memory) or external cache Memory. By way of illustration and not limitation, RAM is available in many forms, such as SRAM (Static Random Access Memory), DRAM (Dynamic Random Access Memory), and the like.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of building a metadata tag library, the method comprising:

obtaining a plurality of metadata entities;

2. The method of claim 1, wherein the obtaining and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to the blood relationship of each metadata entity comprises:

3. The method of claim 1, wherein the obtaining and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to the blood relationship of each metadata entity comprises:

4. The method of claim 1, wherein the obtaining and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to the blood relationship of each metadata entity comprises:

5. The method of claim 1, wherein obtaining and adding at least one of an activity tag, an influence tag, and a similar tag for each metadata entity according to a blood relationship of each metadata entity comprises:

6. The method of claim 1, wherein before obtaining the dimension label corresponding to each metadata entity according to the relationship between the plurality of metadata entities, the method further comprises:

7. The method according to any one of claims 1-6, further comprising:

and storing the relationship map into the map database.

8. An apparatus for building a metadata tag library, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the method of building a metadata tag library of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of building a metadata tag library according to any one of claims 1 to 7.