CN116450757A

CN116450757A - Method, device, equipment and storage medium for determining evaluation index of data asset

Info

Publication number: CN116450757A
Application number: CN202310723013.0A
Authority: CN
Inventors: 曾标; 张伟宁; 魏强; 陈其宇; 覃刚
Original assignee: Shenzhen Suoxinda Data Technology Co ltd
Current assignee: Shenzhen Suoxinda Data Technology Co ltd
Priority date: 2023-06-19
Filing date: 2023-06-19
Publication date: 2023-07-18

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for determining an evaluation index of a data asset, wherein the method can determine the evaluation index of at least three dimensions of the target data asset, such as a target value index, a target cost index and a target cost performance, can improve the richness of the evaluation index of the data asset, improve the exploration definition of the data asset, provide a multidimensional evaluation index for the management or modification of the data asset, and the evaluation index is obtained through analysis of blood-edge relation network information, resource consumption information and activity information of the target data asset in operation and maintenance log information of a database, so that the authenticity and accuracy of the evaluation index are improved, a reliable reference is provided for the subsequent reduction of the storage pressure and resource occupation of the storage medium, and finally, the visualized data of the evaluation index displayed on a preset display terminal can be more clearly and intuitively displayed for a user to be referred, and the exploration efficiency of the database is further improved.

Description

Method, device, equipment and storage medium for determining evaluation index of data asset

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining an evaluation index of a data asset.

Background

Under the pushing of technologies such as mobile internet and cloud computing, people can acquire and control data increasingly, and the data generation of creating data, acquiring data and using data has been entered. The sales platform can make accurate propelling movement in order to improve sales volume according to buyer's browsing record, and manufacturing enterprise can in time make the adjustment in order to improve production efficiency to the production condition through analysis production line data, and home company can create "wisdom family" in order to improve life service quality through analysis customer's life habit data, and various applications show that data can produce huge value after by effectual excavation integration.

The data volume of the enterprise-level data warehouse is increased year by year, so that the data storage pressure of the storage medium is increased, and some data which does not need to be kept continuously occupies the resources of the storage medium, so that the resource waste is caused, but the problems of the increase of the storage pressure and the occupation of the resources caused by the fact that the available value of the data asset cannot be clearly explored, and the effective management or the correction of the data warehouse cannot be carried out cannot be effectively solved.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a storage medium for determining an evaluation index of a data asset, which can solve the problems that the storage pressure of the storage medium is increased and the resources are occupied because the data warehouse cannot be effectively managed or modified in the prior art.

To achieve the above object, a first aspect of the present invention provides a method for determining an evaluation index of a data asset, the method comprising:

acquiring operation and maintenance log information of a target database, wherein the target database comprises a plurality of target data assets, the operation and maintenance log information at least comprises blood-edge relation network information, resource consumption information and liveness information of each target data asset, and the blood-edge relation network information is used for reflecting the dependency relationship of the target data assets;

performing value evaluation by utilizing the blood relationship network information and the liveness information, and determining a target value index of the target data asset;

performing cost evaluation by utilizing the resource consumption information, and determining a target cost index of the target data asset;

determining a target cost performance of the target data asset according to the target value index and the target cost index;

And performing visual processing by using the target value index, the target cost index and the target cost performance to generate target visual prompt data, and outputting the target visual prompt data to a preset display terminal, wherein the preset display terminal is used for receiving and displaying the target visual prompt data, the target visual prompt data is used for reflecting the visual data of a target evaluation index, and the target evaluation index at least comprises the target value index, the target cost index and the target cost performance.

In one possible implementation, the determining the target cost performance of the target data asset according to the target value index and the target cost index includes:

and determining a target ratio between the target value index and the target cost index, wherein the target cost performance comprises the target ratio.

In one possible implementation, the determining the target value index of the target data asset using the blood relationship network information and the liveness information for value evaluation includes:

performing influence evaluation according to the blood relationship network information, and determining a target influence index of the target data asset;

Performing liveness evaluation according to the liveness information, and determining a target liveness index of the target data asset;

and determining a target value index of the target data asset according to the target influence index and the target liveness index.

In a possible implementation manner, the liveness information at least includes target access amount data of the target data asset, the target access amount data at least includes a corresponding relation between an access type and an access amount, the target evaluation index further includes a value grade of the target data asset, and the value grade is used for reflecting the comprehensive importance degree of the target data asset;

the method further comprises:

determining access levels of the access types, wherein the access levels are proportional to the value levels;

and carrying out clustering processing on the value grades according to the access quantity of the access type of each target data asset, the access grade and a preset Kmeans clustering algorithm, and determining the target value grade of each target data asset.

In a possible implementation manner, the resource consumption information at least includes resource consumption data of each resource type, the resource type at least includes a central processing unit resource, a read-write resource, a buffer space resource and a disk space resource, and the resource consumption data at least includes time consumption data of the central processing unit resource, read-write times of the read-write resource, first occupation data of the buffer space resource and second occupation data of the disk space resource;

Then the cost evaluation is performed by using the resource consumption information, and the target cost index of the target data asset is determined, including:

respectively carrying out normalization processing on the time-consuming data, the read-write times, the first occupied data and the second occupied data to determine target resource consumption data of each resource type;

determining a target resource duty ratio of target resource consumption data in a preset time period, wherein the target resource duty ratio is used for reflecting the resource consumption degree of the resource type in the preset time period;

determining target weights of the resource types according to target resource duty ratios of the resource types and a preset first weight algorithm;

and determining a target cost index of the target data asset by using the target weight, the target resource consumption data and a preset first weighted summation algorithm.

In a possible implementation manner, the determining the target liveness index of the target data asset according to the liveness evaluation performed by the liveness information includes:

normalizing the access quantity of each access type by using a preset normalization algorithm, and determining the target access quantity of each normalized access type;

Calculating entropy values by using a preset entropy weight method and target access amounts of all access types, and determining target entropy values of all access types;

determining the target weight of each access type according to a preset second weight algorithm and the target entropy value of each access type;

and determining a target activity index of the target data asset by using a preset second weighted summation algorithm, the target weight of each access type and the target access quantity.

In one possible implementation, the target data asset includes a target data table, the blood-edge relationship network information includes at least a dependency relationship of the target data table, and the determining the target impact indicator of the target data asset includes:

counting the dependency relationship of each target data table respectively, and determining the number of times that the target of each target data table is referenced;

and determining target influence indexes of the target data assets according to the number of times of target quoted of each target data table and a preset webpage ranking algorithm.

To achieve the above object, a second aspect of the present invention provides a device for determining an evaluation index of a data asset, the device comprising:

The log acquisition module is used for: the operation and maintenance log information is used for acquiring operation and maintenance log information of a target database, wherein the target database comprises a plurality of target data assets, the operation and maintenance log information at least comprises blood-margin relation network information, resource consumption information and liveness information of each target data asset, and the blood-margin relation network information is used for reflecting the dependency relationship of the target data assets;

a value evaluation module: the value evaluation is carried out by utilizing the blood relationship network information and the liveness information, so as to determine a target value index of the target data asset;

and a cost evaluation module: the method comprises the steps of performing cost evaluation by using the resource consumption information, and determining a target cost index of the target data asset;

cost performance determination module: determining a target cost performance of the target data asset based on the target value index and the target cost index;

the result display module: the target visual prompt data are used for reflecting visual data of target evaluation indexes, and the target evaluation indexes at least comprise the target value indexes, the target cost indexes and the target cost performance.

To achieve the above object, a third aspect of the present invention provides a computer-readable storage medium storing a computer program, which when executed by a processor causes the processor to perform the steps as described in the first aspect and any one of the possible implementations.

To achieve the above object, a fourth aspect of the present invention provides a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps as described in the first aspect and any one of the possible implementations.

The embodiment of the invention has the following beneficial effects:

the invention provides a method for determining an evaluation index of a data asset, which comprises the following steps: acquiring operation and maintenance log information of a target database, wherein the target database comprises a plurality of target data assets, and the operation and maintenance log information at least comprises blood-edge relation network information, resource consumption information and liveness information of each target data asset, wherein the blood-edge relation network information is used for reflecting the dependency relationship of the target data assets; performing value evaluation by using the blood relationship network information and the liveness information to determine a target value index of the target data asset; performing cost evaluation by utilizing the resource consumption information, and determining a target cost index of the target data asset; determining a target cost performance of the target data asset according to the target value index and the target cost index; and performing visual processing by using the target value index, the target cost index and the target cost performance to generate target visual prompt data, outputting the target visual prompt data to a preset display terminal, wherein the preset display terminal is used for receiving and displaying the target visual prompt data, and the target visual prompt data is used for reflecting the visual data of target evaluation indexes, and the target evaluation indexes at least comprise the target value index, the target cost index and the target cost performance.

According to the method, the evaluation indexes of at least three dimensions of the target data asset of the target database, such as the target value index, the target cost index and the target cost performance, can be determined, the richness of the evaluation indexes of the data asset can be improved, the exploration definition of the data asset is improved, the multidimensional evaluation indexes are provided for management or rectification of the data asset, the evaluation indexes are obtained through analysis of the blood-edge relation network information, the resource consumption information and the liveness information of the target data asset in the operation and maintenance log information of the database, the authenticity and the accuracy of the evaluation indexes are improved, reliable references are provided for the subsequent reduction of the storage pressure and the resource occupation of the storage medium, and finally, the visualized data of the evaluation indexes displayed on the preset display terminal can be displayed for the user to review more clearly and intuitively, so that the exploration efficiency of the database is further improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Wherein:

FIG. 1 is a flow chart of a method for determining an evaluation index of a data asset according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a visual interface corresponding to visual cue data according to an embodiment of the present invention;

FIG. 3 is another flow chart of a method for determining an evaluation index of a data asset in accordance with an embodiment of the present invention;

FIG. 4 is a block diagram of a device for determining an evaluation index of a data asset according to an embodiment of the present invention;

fig. 5 is a block diagram of a computer device in an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart of a method for determining an evaluation index of a data asset, where the data asset includes, but is not limited to, various data tables of a database, and the method can be applied to a terminal or a server, and the embodiment is exemplified by the application to the server, and the terminal may be a desktop terminal or a mobile terminal, and the mobile terminal may be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers. The method shown in fig. 1 comprises the following steps:

101. Acquiring operation and maintenance log information of a target database, wherein the target database comprises a plurality of target data assets, the operation and maintenance log information at least comprises blood-edge relation network information, resource consumption information and liveness information of each target data asset, and the blood-edge relation network information is used for reflecting the dependency relationship of the target data assets;

in order to better alleviate the storage pressure and storage resources of the storage medium, the application explores the available value of the data assets by performing multidimensional evaluation on the data assets in the data warehouse, so as to guide a user to manage the data assets well, and alleviate the storage pressure and the storage resources of the storage medium. Among them, the storage medium includes, but is not limited to, various physical storage media such as cache (main memory), flash memory (flash memory), magnetic-disk memory (magnetic-disk storage), optical storage (optical storage), tape storage (tape storage), and the like, which are not limited herein. Among these, there are several bins of various data assets on the various storage media described above in a data warehouse, i.e., a server. The data asset comprises but is not limited to various data tables in a database, the data asset is taken as the data table for illustration, and the evaluation index comprises but is not limited to a value index, a cost performance, an influence index, an activity index and the like of the data asset, wherein the value index is used for reflecting the available value of the data asset, the cost index is used for reflecting the operation and maintenance cost of the data asset, and the cost performance is used for reflecting the operation and maintenance cost performance of the data asset.

Further, in order to determine that the available value of the probed data asset is truly accurate, the application determines an evaluation index through the operation and maintenance condition of the data table of the database, and further obtains operation and maintenance log information of the target database through step 101, wherein the operation and maintenance log information is used for reflecting the operation and maintenance condition of the target data asset, and the operation and maintenance log information at least comprises blood-edge relation network information, resource consumption information and liveness information of the target data asset, wherein the blood-edge relation network information is used for reflecting the dependency relationship of the target data asset, the resource consumption information is used for reflecting the resource consumption condition of the target data asset in a storage medium, and the liveness information is used for reflecting the liveness condition of the target data asset.

The data asset value analysis learning model is used as a data asset value analysis learning model, the model needs to acquire operation and maintenance log data of a data warehouse for training, and mainly acquires data table blood edge relations, data table inquiry log record history tables, data table access log record history tables, data table space use condition history tables, data tables, view relation tables and other related tables from a data warehouse batch script, acquires information such as a data area, a database name and the like of a table object, and calculates the table object capacity according to days by taking an access date, the data area, the database name and the table object as dimensions, total access quantity, access quantity of various classified users and other indexes. And completing preliminary statistics in a data warehouse, and collecting and synchronizing statistical results into a MYSQL server table by using a Python program so as to provide the statistical results for analysis of an algorithm model. The statistics result is operation log information, and the operation log information includes, but is not limited to, blood relationship network information, liveness information, resource consumption information and the like.

By way of example, the blood relationship network information may include data asset relationships and usage information as shown in Table 1 below:

TABLE 1

Illustratively, the liveness information may include user usage information for the data asset as shown in Table 2 below:

TABLE 2

Illustratively, the resource consumption information may include data asset resource consumption information as shown in Table 3 below:

TABLE 3 Table 3

The number of the target databases can be one or more, the number of the target databases is not limited herein, the databases comprise a plurality of data tables, the target data asset is any one of the data tables in the target databases, the target data asset is the data table of the evaluation index to be determined, one or more different target data assets can exist in the target data asset, the different target data asset can be distinguished by unique identification, the unique identification comprises but not limited to identification information such as table names, namely, the evaluation index of a plurality of data assets can be determined in parallel, and the evaluation index of each data asset can be determined in sequence in series, and the method is not limited herein. The target data asset may be obtained through a data asset evaluation request sent by a user on a preset display terminal, where the data asset evaluation request may include a unique identifier of a data asset of which an evaluation index is to be determined, or may be that the server traverses data asset information in real time, and the data asset information is used as the target data asset when traversing to one data asset, so as to execute the method for determining the evaluation index of the data asset shown in the application. The present application is not limited herein.

102. Performing value evaluation by utilizing the blood relationship network information and the liveness information, and determining a target value index of the target data asset;

it should be noted that, since the database is a "repository for organizing, storing, and managing data according to a data structure", it is a collection of large amounts of data that are stored in a computer for a long period of time, organized, sharable, and uniformly managed. While the availability value as a data asset in a database generally manifests itself in two main aspects, namely the extent to which the data asset is relied upon and the liveness of the data asset.

Therefore, in order to improve the accuracy of the evaluation of the available value of the data asset, the method evaluates the available value of the data table through the blood relationship network information and the liveness information, specifically, the blood relationship network information and the liveness information are used for performing value evaluation, and the target value index of the target data asset is determined, wherein the target value index refers to the value index corresponding to the target data asset. Wherein the blood relationship network information may include table-to-table dependencies. For example, the value rating may be that if the blood relationship network information reflects a high dependency on the target data asset, the target data asset may be considered a more important data asset, indicating that the target value index of the target data asset is high; if the liveness information reflects that the target data asset is very active, then the target data asset may be considered to be frequently used, indicating that the target value index of the target data asset is high; if the dependency is numerous and very active, the target value index for the target data asset is high.

It will be appreciated that the number of target data assets may be one or more, that is, the blood relationship network information and liveness information for each target data asset may be obtained through step 102 to obtain the target value index for each target data asset.

103. Performing cost evaluation by utilizing the resource consumption information, and determining a target cost index of the target data asset;

further, in order to evaluate the available value of the data asset more accurately, the data asset may be evaluated in multiple dimensions, and further, in addition to determining a value index, the data asset may be evaluated from a dimension of maintenance cost of the data asset, specifically, the cost evaluation is performed by using the resource consumption information, so as to determine a target cost index of the target data asset. If the resource consumption information reflects that the resource consumption of the target data asset is relatively high, indicating that the maintenance cost is high, the corresponding cost index may reflect the high maintenance cost, such as assigning the cost index at that time a higher value. Wherein the resource consumption information of each target data asset may be obtained by step 103 to obtain a target cost index for each target data asset.

104. Determining a target cost performance of the target data asset according to the target value index and the target cost index;

it should be noted that, in order to evaluate the available value of the data asset more accurately, in addition to evaluating from a single angle, the present application integrates each single angle, and explores the available condition of the data asset from one integrated angle, that is, another evaluation dimension may be an integrated evaluation angle, specifically, determine the target cost performance of the target data asset according to the target value index and the target cost index. After the two single-angle evaluation indexes of the target cost performance of the target data asset are determined according to the target value index and the target cost index, the two single-angle evaluation indexes can be utilized to obtain an evaluation index of a comprehensive angle, and the dimension of the evaluation index is increased. For example, some weighting algorithm may be used to obtain a target cost performance by weighting the target value index and the target cost index. An evaluation index is obtained that measures the available value of the target data asset from multiple angles. Finally, the target value index and the target cost index of each target data asset can be obtained through step 104, and the target cost performance of each target data asset is obtained.

105. And performing visual processing by using the target value index, the target cost index and the target cost performance to generate target visual prompt data, and outputting the target visual prompt data to a preset display terminal, wherein the preset display terminal is used for receiving and displaying the target visual prompt data, the target visual prompt data is used for reflecting the visual data of a target evaluation index, and the target evaluation index at least comprises the target value index, the target cost index and the target cost performance.

Finally, in order to improve the clarity and intuitiveness of the user's exploration of the data asset, the user is facilitated to review the data asset evaluation index in time, the obtained multidimensional evaluation index can be visually displayed, specifically, the target value index, the target cost index and the target cost performance are utilized to perform visual processing, and target visual prompt data are generated, wherein the target visual prompt data comprise but are not limited to any static or dynamic image data. The target evaluation index includes, but is not limited to, a target value index, a target cost index, and a target cost performance of the target data asset.

For example, referring to fig. 2, fig. 2 is a schematic diagram of a visual interface corresponding to visual hint data in an embodiment of the present invention, where the higher the value of the cost performance of an asset in fig. 2, that is, the target cost performance of a certain data asset to be determined in a server, the higher the value of the cost performance of the asset in fig. 2, the better the overall performance, the higher the value of the value index in fig. 2, that is, the target value index of a certain database to be determined, the higher the value of the value index in fig. 2, that is, the higher the available value of the target data asset, that is, the target cost index of the database to be determined, and the higher the value of the cost index in fig. 2, that is, the lower the operation and maintenance cost of the target data asset. Wherein the value classification in fig. 2 is further obtained by classifying the value grades of the data tables in the databases, wherein the value grades of the data tables in three databases, namely, the database a, the database B and the database C, are exemplarily shown in fig. 2, wherein the value grades comprise grade 1 to grade 4, the grade 1 value is the highest, the grade 4 value is the lowest, and as can be seen from the value classification in fig. 2, the data table of the database a has the most grade 1 data tables and the grade 4 data tables are the least; for the database B, the data table of the level 1 data table is the most, and the data table of the level 4 data table is the least; for database C, the data table of 3 levels is the most, and the data table of 2 levels is the least. The user can intuitively see the evaluation index of the database through fig. 2, so that the user can look up the database conveniently, for example, a developer can timely determine whether to change and how to change the database through the detection result, for example, some data tables with low cost performance can be optimized to relieve the storage pressure of the storage medium and reduce redundant resource occupation, so that the maintenance cost is reduced.

Furthermore, the user can optionally analyze and view the evaluation index of the data area or the database on the preset display terminal, for example, analyze and view each item of value analysis characteristic data after the data area or the database is summarized. Wherein the selectable screening range includes, but is not limited to, a data area, a database, a date of data, etc.; analytical data items that may be presented include, but are not limited to, cost performance of an asset, value index, cost index, value rating, space, CPU, IO, spool, impact index, etc., and characteristic values of the numerical display include, but are not limited to, raw values or duty cycles. Visual flowers may be displayed in forms including, but not limited to, tables or graphics. Detail browsing and querying, ranking analysis, trend analysis, anomaly analysis, and the like may also be performed.

The invention provides a method for determining the evaluation index of a data asset, by the method, the evaluation index of at least three dimensions of the target data asset of a target database can be determined, such as a target value index, a target cost index and a target cost performance, the richness of the evaluation index of the data asset can be improved, the exploration definition of the data asset is improved, the multi-dimensional evaluation index is provided for the management or modification of the data asset, the evaluation index is obtained through analysis of the blood-edge relation network information, the resource consumption information and the activity information of the target data asset in the operation and maintenance log information of the database, the authenticity and the accuracy of the evaluation index are improved, reliable references are provided for the subsequent reduction of the storage pressure and the resource occupation of a storage medium, and finally, the visualized data of the evaluation index displayed at a preset display terminal can be more clearly and intuitively displayed for the user to review, and the exploration efficiency of the database is further improved.

Referring to fig. 3, fig. 3 is another flowchart of a method for determining an evaluation index of a data asset according to an embodiment of the invention, where the method shown in fig. 3 includes the following steps:

301. acquiring operation and maintenance log information of a target database, wherein the target database comprises a plurality of target data assets, the operation and maintenance log information at least comprises blood-edge relation network information, resource consumption information and liveness information of each target data asset, and the blood-edge relation network information is used for reflecting the dependency relationship of the target data assets;

it should be noted that, the content of step 301 is similar to that of step 101 shown in fig. 1, and for avoiding repetition, reference may be made to the content of step 101 shown in fig. 1.

302. Performing influence evaluation according to the blood relationship network information, and determining a target influence index of the target data asset;

it should be noted that, in order to improve accuracy of determining the value index, the value index is obtained by quantifying influence and liveness, and then a DataRank model may be established: the value index of the database is quantified according to influence (the dependence quantity is found out through the blood relationship) and liveness (access quantity and the like), wherein the influence index can be calculated by using a webpage ranking algorithm (PageRank algorithm) of Google, and the value index is obtained by weighting and summarizing the two indexes. Thus, it is necessary to determine an impact indicator and an activity indicator, specifically, a target impact indicator of a target data asset is obtained through step 302, and a target activity indicator of the target data asset is obtained through step 303.

In one possible implementation, the target data asset comprises a target data table and the blood relationship network information comprises at least a dependency of the target data table, and thus step 302 may comprise steps A1 to A2:

a1, respectively counting the dependency relationship of each target data table, and determining the number of times that the target of each target data table is referenced;

a2, determining target influence indexes of the target data assets according to the number of times that targets of the target data tables are referenced and a preset webpage ranking algorithm.

The influence indexes of the data asset can be used for obtaining the influence evaluation scores by using the PageRank algorithm of Google through the blood relationship network of the data asset. The PageRank algorithm is mainly used for an evaluation method of academic paper importance, and the more times a specific algorithm expresses who is quoted, the more important the specific algorithm is. The dependency in the blood relationship of the data asset may be analogous to a link in a web page. If one data asset is relied upon by many other data assets, this is indicated to be of high impact, i.e., the PageRank value will be relatively high; if a data asset with a high PageRank value depends on another data asset, the PageRank value of the data asset is correspondingly increased, then if a data asset has a dependency relationship, the data asset references other data assets or is referenced by other data assets, and the dependency relationship comprises the reference times and the reference times, wherein the higher the reference times are the higher the influence of the data asset in the database, therefore, the dependency relationship of each data asset of the target database can be used for statistics, the target reference times of each data asset can be determined, and then the target influence index of the target data asset is determined through the target reference times and a preset webpage ranking algorithm (PageRank algorithm). The target influence index refers to an influence index of the target data asset, and the influence index is used for reflecting the influence of the data asset.

303. Performing liveness evaluation according to the liveness information, and determining a target liveness index of the target data asset;

for the target data asset as the data table, the activity of the target data asset can be regarded as the read-write activity of the target data table, and the activity of the target data asset can be measured according to the access conditions of different users, namely, the target activity index can be formed by weighted summation of multiple access times of different access types, wherein the weight is determined by adopting an objective weight entropy method, so that the accuracy of the weight is improved.

The basic thought of the entropy method for determining the weight is to determine the objective weight according to the index variability. Generally, if the information entropy of a certain index is smaller, the degree of variation of the index is larger, the provided information amount is larger, the function in comprehensive evaluation is also larger, and the weight is also larger. Conversely, the larger the information entropy of a certain index, the smaller the degree of variation of the index value, the smaller the information amount provided, and the smaller the function played in the comprehensive evaluation, and the smaller the weight.

Specifically, the liveness information includes at least target access amount data of the target data asset, where the target access amount data includes at least a correspondence between an access type and an access amount, and step 303 may include steps B1 to B4:

B1, carrying out normalization processing on the access quantity of each access type by using a preset normalization algorithm, and determining the target access quantity of each normalized access type;

b2, calculating entropy values by using a preset entropy weight method and target access amount data of each access type, and determining target entropy values of each access type;

b3, determining the target weight of each access type according to a preset second weight algorithm and the target entropy value of each access type;

and B4, determining a target activity index of the target data asset by using a preset second weighted summation algorithm, the target weight of each access type and the target access quantity.

It should be noted that the access types include, but are not limited to, individual user access, derived user access, and various downstream data area accesses. For example, the downstream data area includes a downstream data area 1, a downstream data area 2, and a downstream data area 3, … …, and referring to the contents shown in table 2, the maximum value of N in the present application is 10, and further the target access amount data includes an access amount of a personal user, an access amount of a derived user, and access amounts corresponding to different downstream data areas. Further, the specific process of the entropy weight method mainly comprises three steps, namely, data normalization, calculation of entropy values of all indexes and calculation of weight coefficients of all indexes. And finally, obtaining an activity index through weighted calculation.

Exemplary, the target access volume data includesMatrix of->Where n represents the total number of data tables in the database and m represents the total number of access types:

1) For the original data asMatrix of->Normalization processing is carried out according to columns, and the following normalization method is suggested to be adopted in the scheme: matrix->The matrix obtained after normalization>The expression is as follows:

in the method, in the process of the invention,Z _ij to the normalized data table ijThe target access volume of the individual access types can be understood as the firstjThe type of access is atiThe access amount of each data table, X _ij Is the first of the data table ijAccess amount of individual access types;

2) Calculating entropy of each indexWherein the index refers to the type of accessj：

The base number of the logarithm generally takes a value of e, 2 or 10, and in the present evaluation scheme, the base number is e. And define ifThen->。

3) Calculating the weight coefficient of each indexWherein the index refers to the access type, +.>Target weight for access type j:

in the method, in the process of the invention,mis the total number of access types.

4) And carrying out weighted summation on the target access amount data and the weight of the corresponding access type, and obtaining the activity index by the target.

304. Determining a target value index of the target data asset according to the target influence index and the target liveness index;

It should be noted that after the target influence index and the target liveness index are obtained, the target value index of the target data asset may be determined according to the target influence index and the target liveness index, specifically, the target value index may be obtained by performing weighted calculation on two target influence indexes and the target liveness index of the target data asset, where the weight coefficients of the target influence index and the target liveness index may be determined by the target influence index, the target liveness index and the entropy weight method of all the data assets in the server, which are not described herein.

In one possible implementation, the value of the data assets (data sheets) of the database may be ranked, and in particular, may be implemented using a Kmeans clustering algorithm, which is an unsupervised learning clustering algorithm based on euclidean distance. It is believed that the closer the two targets are, the greater the similarity.

Specifically, the target data asset comprises a target data table, the liveness information at least comprises target access amount data of the target data asset, the target access amount data at least comprises a corresponding relation between an access type and an access amount, the target evaluation index also comprises a value grade of the target data asset, and the value grade is used for reflecting the comprehensive importance degree of the target data asset; the method further comprises the steps C1 and C2:

C1, determining access levels of all access types, wherein the access levels are in direct proportion to the value levels;

specifically, clustering is required based on access characteristics of the data table, wherein the access characteristics can comprise access types of a database, then access levels of different access types are preset, and clustering can be performed through the access levels owned by the data table, so that a value level cluster to which the data table belongs is obtained, and the access levels are in direct proportion to the value levels. For example, the access level of the personal user access > the access level of the derived user access > the access level of the downstream data area access, and the value level of each downstream data area access also has different value levels according to different downstream data areas, for example, the downstream data area 1 access > the downstream data area 2 access > the downstream data area 3 access > the downstream data area 4 access > other user accesses.

And C2, carrying out clustering processing on the value grades according to the access quantity of the access type of each target data asset, the access grade and a preset Kmeans clustering algorithm, and determining the target value grade of each target data asset.

Further, kmeans clustering can be performed through the access type, the access amount and the access level of each data table, and finally, the target value level of each data table is determined, wherein one possible clustering result can refer to the value classification in fig. 2.

Illustratively, the Kmeans core algorithm steps are:

1) The initialized K samples are selected as initial cluster centers a=a1, a2, a3, … … aK.

2) And calculating the distance from each sample Xi in the data set to K clustering centers and dividing the distance into classes corresponding to the clustering centers with the smallest distance.

3) For each class aj, its cluster center is recalculated.

4) And repeating the operations 2 and 3 until a certain termination condition is reached, and obtaining a final clustering result.

305. Performing cost evaluation by utilizing the resource consumption information, and determining a target cost index of the target data asset;

it should be noted that, the step 305 is similar to the step 103 shown in fig. 1, and for avoiding repetition, reference may be made to the step 103 shown in fig. 1.

In one possible implementation, the resource consumption information includes at least resource consumption data of each resource type, where it can be known from table 3 that the resource types include at least a Central Processing Unit (CPU) resource, a read-write (IO) resource, a buffer space (space) resource, and a disk space (DiskSpace) resource, where the resource consumption data includes at least time-consuming data (CPU time consumption) of the central processing unit resource, a read-write number (IO number) of the read-write resource, first occupation data (space occupation) of the buffer space resource, and second occupation data (DiskSpace occupation) of the disk space resource;

Step 303 may include steps D1 to D4:

d1, respectively carrying out normalization processing on the time-consuming data, the read-write times, the first occupied data and the second occupied data to determine target resource consumption data of each resource type;

after obtaining the resource consumption data of each resource type of the target data asset, performing normalization processing on the data, specifically, performing normalization processing on the time-consuming data, the read-write times, the first occupation data and the second occupation data respectively, and determining the target resource consumption data of each resource type.

Illustratively, the normalization algorithm for normalizing the data is as follows:

X`=2*(X-MinValue)/(MaxVaule-MinValue)-1；

wherein, X' is normalized target resource consumption data, X is resource consumption data, and MaxValue, minValue is the maximum value and the minimum value of the sample respectively.

D2, determining a target resource duty ratio of target resource consumption data in a preset time period, wherein the target resource duty ratio is used for reflecting the resource consumption degree of the resource type in the preset time period;

further, determining a target resource duty ratio of target resource consumption data in a preset time period, wherein the target resource duty ratio is used for reflecting the resource consumption degree of the resource type in the preset time period, namely calculating the resource duty ratio of each resource consumption data in the past month through normalized data.

For example, the target resource duty ratio may be obtained by statistics of target resource consumption data of the same resource type of different target data assets, and the preset period may reflect a statistical period, and the specific statistical period may be one month.

D3, determining target weights of the resource types according to target resource duty ratios of the resource types and a preset first weight algorithm;

further, after the target resource duty ratio is obtained, determining a target weight of the resource type according to the target resource duty ratio of each resource type and a preset first weight algorithm, where the target weight is a target weight of the resource type of the target data asset, and the first weight algorithm may refer to the following formula:

target weight for a certain resource type = target resource duty cycle/target resource duty cycle sum for all resource types.

Illustratively, the target resource duty cycle for all resource types may be a sum of target resource duty cycles for different resource types of the target data asset.

And D4, determining a target cost index of the target data asset by using the target weight, the target resource consumption data and a preset first weighted summation algorithm.

Finally, a target cost index for the target data asset may be determined using the target weight, target resource consumption data, and a preset first weighted summation algorithm, with an exemplary first weighted summation algorithm referencing the following equation:

target cost index for a certain data asset = CPU normalized CPU weight + IO normalized IO weight + SPOOL normalized SPOOL weight + DISKSPACE normalized DISKSPACE weight.

306. Determining a target cost performance of the target data asset according to the target value index and the target cost index;

it should be noted that, the content of step 306 is similar to that of step 104 shown in fig. 1, and for avoiding repetition, reference may be made to the content of step 104 shown in fig. 1.

Illustratively, step 306 may include: and determining a target ratio between the target value index and the target cost index, wherein the target cost performance comprises the target ratio. Such as targeting the target ratio as a target cost performance.

Specifically, target cost performance = target ratio = target value index/target cost index.

307. And performing visual processing by using the target value index, the target cost index and the target cost performance to generate target visual prompt data, and outputting the target visual prompt data to a preset display terminal, wherein the preset display terminal is used for receiving and displaying the target visual prompt data, the target visual prompt data is used for reflecting the visual data of a target evaluation index, and the target evaluation index at least comprises the target value index, the target cost index and the target cost performance.

It should be noted that, the step 307 is similar to the step 105 shown in fig. 1, and for avoiding repetition, reference may be made to the step 105 shown in fig. 1.

Further, the user may send various data analysis requests on the preset terminal according to different consideration of the storage pressure and the resource occupation of the alleviation server or the storage medium, and the storage pressure and the resource occupation of the server or the storage medium are alleviated through the result of the request, such as a summary analysis request, a detail browsing and querying request, a ranking analysis request, a trend analysis request, an anomaly analysis request and the like.

Illustratively, the summary analysis request refers to analysis, check and processing of each item of value analysis characteristic data after being summarized according to the data area/database. The aggregate analysis request may carry a screening scope, an analysis data item, wherein the screening scope includes, but is not limited to, a data area, a database, a date of data, etc.; analytical data items include, but are not limited to, asset cost performance, value index, cost index, value ratings, space, CPU, IO, spool, impact index, and the like. And finally, the request feedback result can be displayed on a display screen of a preset display terminal through a numerical value, and the numerical value type can be selected from an original value or a duty ratio. The display mode is as follows: tables and graphs.

The list browsing and query request refers to the value analysis feature data of the browsable query list data table. The detail browse and query request can carry query conditions, browse modes and display modes, wherein the query conditions comprise, but are not limited to, data area/database names, table names, support for fuzzy query, date limiting conditions (date updated recently by default) and numerical ranges; browsing means include, but are not limited to, screenable by date, data area. The display mode includes, but is not limited to, form and page display.

The ranking analysis request means that ranking analysis can be performed on each value analysis characteristic data; the rank analysis request may carry query/filter criteria, including but not limited to date, data field, rank requirements, and display mode; ranking requirements include, but are not limited to, top/Bottom ranking of the terms and number of ranks, i.e., the number of ranks can be defined. The display mode includes, but is not limited to, form and page display.

The trend analysis request refers to a trend that shows each item of value analysis characteristic data over a period of time. The trend analysis request may carry query/screening conditions and display means; query/filter criteria include, but are not limited to, date range, data field, data table, and index item; the display mode includes, but is not limited to, a table and a discount chart.

The abnormality analysis request refers to detail objects and summarized data capable of browsing data abnormality according to a set threshold value; the exception analysis request can carry exception threshold setting information, query conditions and display modes; wherein the abnormality threshold setting information includes, but is not limited to, a conventional numerical value, a numerical mutation, and a trend abnormality; conventional values such as zero, maximum, minimum, etc.; numerical mutation refers to the percentage of change from the last cycle of data; trend anomaly refers to the percentage of mean change over a period of time. Query conditions include, but are not limited to, data area name, alert level (summary, detail), anomaly type, threshold, anomaly index; the display mode includes, but is not limited to, form and page display.

The invention provides a method for determining the evaluation index of a data asset, by the method, the evaluation index of at least three dimensions of the target data asset of a target database can be determined, such as a target value index, a target cost index and a target cost performance, the richness of the evaluation index of the data asset can be improved, the exploration definition of the data asset is improved, the multi-dimensional evaluation index is provided for the management or modification of the data asset, the evaluation index is obtained through analysis of the blood-edge relation network information, the resource consumption information and the activity information of the target data asset in the operation and maintenance log information of the database, the authenticity and the accuracy of the evaluation index are improved, reliable references are provided for the subsequent reduction of the storage pressure and the resource occupation of a storage medium, and finally, the visualized data of the evaluation index displayed at a preset display terminal can be more clearly and intuitively displayed for the user to review, and the exploration efficiency of the database is further improved. In addition, the value classification of the data table can be carried out by using an unsupervised learning clustering algorithm such as Kmeans algorithm and the like, and compared with the traditional equal-division method, the machine learning clustering is more intelligent and reasonable; the PageRank model is introduced, PR values are all calculated offline and stored, and calculation is not performed when a user searches, so that the query efficiency can be improved; has certain anti-cheating capability. It is difficult for an owner of one object to add an association to other important objects that points to its own object. The weight setting uses the entropy weight method, subjective assignment on the importance of the indexes is not needed, the influence of subjectivity on the decision result is reduced, the mutual influence among the indexes can be better processed, the operability is better, the calculation is simple and visual, and the implementation is easy.

Referring to fig. 4, fig. 4 is a block diagram illustrating a device for determining an evaluation index of a data asset according to an embodiment of the present invention, where the device shown in fig. 4 includes the following steps:

log acquisition module 401: the operation and maintenance log information is used for acquiring operation and maintenance log information of a target database, wherein the target database comprises a plurality of target data assets, the operation and maintenance log information at least comprises blood-margin relation network information, resource consumption information and liveness information of each target data asset, and the blood-margin relation network information is used for reflecting the dependency relationship of the target data assets;

value evaluation module 402: the value evaluation is carried out by utilizing the blood relationship network information and the liveness information, so as to determine a target value index of the target data asset;

cost evaluation module 403: the method comprises the steps of performing cost evaluation by using the resource consumption information, and determining a target cost index of the target data asset;

cost performance determination module 404: determining a target cost performance of the target data asset based on the target value index and the target cost index;

the results display module 405: the target visual prompt data are used for reflecting visual data of target evaluation indexes, and the target evaluation indexes at least comprise the target value indexes, the target cost indexes and the target cost performance.

It should be noted that the functions of each module in the apparatus shown in fig. 4 are similar to those of each step in the method shown in fig. 1, and for avoiding repetition, reference may be made to the contents of each step in the method shown in fig. 1.

The invention provides a device for determining the evaluation index of a data asset, through the device, the evaluation index of at least three dimensions of the target data asset of a target database can be determined, for example, the target value index, the target cost index and the target cost performance can be improved, the richness of the evaluation index of the data asset can be improved, the exploration definition of the data asset can be improved, the multi-dimensional evaluation index is provided for the management or modification of the data asset, the evaluation index is obtained through analysis of the blood-edge relation network information, the resource consumption information and the activity information of the target data asset in the operation and maintenance log information of the database, the authenticity and the accuracy of the evaluation index are improved, reliable references are provided for the subsequent reduction of the storage pressure and the resource occupation of a storage medium, and finally, the visualized data of the evaluation index displayed at a preset display terminal can be more clearly and intuitively displayed for the user to review, and the exploration efficiency of the database is further improved.

FIG. 5 illustrates an internal block diagram of a computer device in one embodiment. The computer device may specifically be a terminal or a server. As shown in fig. 5, the computer device includes a processor, a memory, and a network interface connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program which, when executed by a processor, causes the processor to implement the method described above. The internal memory may also have stored therein a computer program which, when executed by a processor, causes the processor to perform the method described above. It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is presented comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method as shown in fig. 1 or 3.

In an embodiment, a computer-readable storage medium is proposed, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as shown in fig. 1 or fig. 3.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of determining an evaluation index for a data asset, the method comprising:

2. The method of claim 1, wherein determining the target cost performance of the target data asset based on the target value index and the target cost index comprises:

3. The method of claim 1, wherein said using said blood relationship network information and liveness information for value assessment to determine a target value index for said target data asset comprises:

4. A method according to any one of claims 1 to 3, wherein the liveness information comprises at least target access volume data of the target data asset, the target access volume data comprising at least a correspondence of access type and access volume, the target evaluation index further comprising a value level of the target data asset, the value level being used to reflect a comprehensive importance of the target data asset;

the method further comprises:

5. The method of claim 1, wherein the resource consumption information includes at least resource consumption data of each resource type, the resource types include at least a central processor resource, a read-write resource, a buffer space resource, and a disk space resource, and the resource consumption data includes at least time-consuming data of the central processor resource, a read-write number of times of the read-write resource, first occupied data of the buffer space resource, and second occupied data of the disk space resource;

6. The method of claim 4, wherein said determining a target liveness indicator for the target data asset based on liveness assessment of the liveness information comprises:

7. The method of claim 1, wherein the target data asset comprises a target data table, the blood relationship network information comprises at least a dependency relationship of the target data table, and the performing the impact assessment according to the blood relationship network information to determine a target impact indicator of the target data asset comprises:

8. A device for determining an evaluation index of a data asset, the device comprising:

9. A computer readable storage medium storing a computer program, which when executed by a processor causes the processor to perform the steps of the method according to any one of claims 1 to 7.

10. A computer device comprising a memory and a processor, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 7.