CN111724084A

CN111724084A - Data asset value display method, device, equipment and storage medium

Info

Publication number: CN111724084A
Application number: CN202010729454.8A
Authority: CN
Inventors: 勇萌哲; 尹星富; 滕一帆; 王世清; 史双
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2020-09-29

Abstract

The application discloses a method, a device, equipment and a storage medium for displaying the value of a data asset, and belongs to the technical field of computers. The method comprises the following steps: and acquiring a data set corresponding to the data assets. And calling a data asset analysis model, and determining a data quality value grade and a data application value grade of the data set. And determining the data asset value grade of the data set according to the data quality value grade and the data application value grade. The data asset value ratings are displayed in a user interface. The data asset value rating of the data set can reflect the value of the data asset in terms of the quality of the data and the value in terms of the application of the data. Information collection and analysis need not be performed manually in determining the value of a data asset. The efficiency of determining the value of the data asset is improved.

Description

Data asset value display method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for displaying a value of a data asset.

Background

With the continuous improvement of the informatization degree of each industry, the importance of data assets is higher and higher. Data assets include data collected, used, generated, or managed using a business application system that can be a basis for decision making.

Currently, market methods are commonly used to determine the value of data assets. In determining the value of a data asset using market methods, it is first necessary to collect recent trading prices for the same or similar data assets in the market. And then comparing the data assets needing to be subjected to value determination with the collected data assets, and estimating the value of the data assets according to the comparison result.

In determining the value of a data asset using market methods, it is necessary to rely on manual collection and analysis of recent traded prices for similar or identical data assets. Determining the value of a data asset is inefficient.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for displaying the value of a data asset, which can improve the efficiency of determining the value of the data asset. The technical scheme is as follows:

according to an aspect of the present application, there is provided a value display method of a data asset, the method including:

acquiring a data set corresponding to the data assets, wherein the data assets are assets existing in a data form;

calling a data asset analysis model, and determining a data quality value grade and a data application value grade of the data set, wherein the data asset analysis model is a calculation model for determining the data quality value grade and the data application value grade through at least two kinds of data quantization indexes;

determining a data asset value grade of the data set according to the data quality value grade and the data application value grade;

displaying the data asset value rating in a user interface.

According to another aspect of the present application, there is provided a value determining apparatus for a data asset, the apparatus comprising:

the acquisition module is used for acquiring a data set corresponding to the data assets, wherein the data assets are assets existing in a data form;

the first determination module is used for calling a data asset analysis model and determining a data quality value grade and a data application value grade of the data set, wherein the data asset analysis model is a calculation model for determining the data quality value grade and the data application value grade through at least two kinds of data quantization indexes;

a second determining module, configured to determine a data asset value level of the data set according to the data quality value level and the data application value level;

a display module to display the data asset value rating in a user interface.

Optionally, the at least two kinds of data quantization indexes include: the quality quantization index corresponding to the data quality type and the application quantization index corresponding to the data application type;

the first determining module is configured to:

extracting data in the dataset;

determining the data quality value grade of the data according to the quality quantization index; and determining the data application value level of the data according to the application quantization index.

Optionally, the quality quantization indexes include data integrity, data correctness, data consistency and data repeatability, and the quality quantization indexes correspond to respective quality quantization standards;

the first determining module is configured to:

determining an integrity level, a correctness level, a consistency level and a repeatability level of the data according to the quality quantization standard, wherein the integrity level is the level of the data under the integrity of the data, the correctness level is the level of the data under the correctness of the data, the consistency level is the level of the data under the consistency of the data, and the repeatability level is the level of the data under the repeatability of the data;

and determining the data quality value grade according to the integrity grade, the correctness grade, the consistency grade and the repeatability grade.

Optionally, the first determining module includes:

a first determining submodule, configured to determine the integrity level according to a first quality quantization criterion corresponding to the data integrity, where the first quality quantization criterion is used to indicate that the integrity level is equal to a ratio of the number of complete data in the data to the total number of data multiplied by one hundred;

and a second determining submodule, configured to determine the correctness level according to a second quality quantization standard corresponding to the correctness of the data, where the second quality quantization standard is used to indicate that the correctness level is equal to a ratio of the number of correct data in the data to the total number of data multiplied by one hundred;

and a third determining submodule, configured to determine the consistency level according to a third quality quantization standard corresponding to the data consistency, where the third quality quantization standard is used to indicate that the consistency level is equal to a ratio of the number of consistent data in the data to the total number of data multiplied by one hundred;

and a fourth determining submodule, configured to determine the repeatability level according to a fourth quality quantization criterion corresponding to the data repeatability, where the fourth quality quantization criterion is used to indicate that the repeatability level is equal to one minus a ratio of the number of repeated data in the data to the total number of the data, and then multiplied by one hundred.

Optionally, the application quantization indexes include data timeliness, data application extent and data application heat, and the application quantization indexes correspond to respective application quantization standards;

the first determining module is configured to:

according to the application quantization standard, determining a timeliness grade, an application breadth grade and an application heat grade of the data, wherein the timeliness grade is the grade of the data under the timeliness of the data, the application breadth grade is the grade of the data under the data application breadth, and the application heat grade is the grade of the data under the data application heat;

and determining the data application value grade according to the timeliness grade, the application breadth grade and the application heat grade.

Optionally, the first determining module includes:

a fifth determining submodule, configured to determine the timeliness level according to a first application quantization standard corresponding to the timeliness of the data, where the first application quantization standard is used to indicate that the timeliness level is related to an update frequency of the data;

the sixth determining submodule is used for determining the application breadth level according to a second application quantization standard corresponding to the data application breadth, and the second application quantization standard is used for indicating that the application breadth level is positively correlated with the number of systems using the data;

and a seventh determining sub-module, configured to determine the application heat level according to a third application quantization criterion corresponding to the data application heat, where the third application quantization criterion is used to indicate that the application heat level is positively correlated with the number of times the data is used.

Optionally, the first determining module is configured to:

determining a first weight corresponding to the integrity of the data, a second weight corresponding to the correctness of the data, a third weight corresponding to the consistency of the data and a fourth weight corresponding to the repeatability of the data through a first machine learning model, wherein the first machine learning model is obtained by training through a first sample set based on a Bayesian algorithm, the first sample set comprises first sample data and first relative importance corresponding to the first sample data, every two quality quantization indexes have the same relative importance, and the first sample data and the data comprise the same data item;

determining the data quality value level according to the completeness level, the correctness level, the consistency level, the repeatability level, the first weight, the second weight, the third weight and the fourth weight.

Optionally, the first determining module is configured to:

determining a fifth weight corresponding to the timeliness of the data, a sixth weight corresponding to the application breadth of the data and a seventh weight corresponding to the application heat of the data through a second machine learning model, wherein the second machine learning model is obtained by training through a second sample set based on a Bayesian algorithm, the second sample set comprises second sample data and second relative importance corresponding to the second sample data, every two application quantization indexes have second relative importance, and the second sample data and the data comprise the same data item;

and determining the data application value grade according to the timeliness grade, the application breadth grade, the application heat grade, the fifth weight, the sixth weight and the seventh weight.

Optionally, the obtaining module is configured to:

reducing the dimension of the data set corresponding to the data asset through a principal component analysis algorithm to obtain a dimension reduction data set;

and acquiring the dimension reduction data set.

Optionally, the second determining module is configured to:

determining the average of the data quality value rating and the data application value rating as the data asset value rating of the data set;

or, determining the data asset value rating of the data set as a weighted average of the data quality value rating and the data application value rating.

According to yet another aspect of the present application, there is provided a computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the method of value display of a data asset of the above aspect.

According to yet another aspect of the present application, there is provided a computer storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that, when loaded and executed by a processor of a computer device, implements a method of value display for a data asset according to the above aspect.

According to yet another aspect of the application, a computer program product or a computer program is provided, comprising computer instructions, which are stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the method for displaying the value of a data asset provided in the various alternative implementations of the above aspects.

The beneficial effect that technical scheme that this application provided brought includes at least:

a data asset value rating for the data set is determined by the data asset analysis model. Since the data asset value rating of the data set is determined according to the data quality value rating and the data application value rating, the data asset value rating of the data set can reflect the value of the data asset in terms of the quality of the data and the value of the data in terms of the application. Information collection and analysis need not be performed manually in determining the value of a data asset. The efficiency of determining the value of the data asset is improved. The method for determining the value of the data assets provided by the application can enable the owner of the data assets to clearly know the value of the data assets in charge and cause the problem that the value of the data assets is low. And then can be used for symptomatic medicine administration, and the value of the data assets is continuously improved. The method is favorable for the assets of the data and the value preservation and increment of the data assets.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a data asset analysis model provided by an embodiment of the present application;

FIG. 2 is a flow chart illustrating a method for displaying value of a data asset according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart diagram illustrating another method for displaying the value of a data asset provided by an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a method for acquiring a data set according to an embodiment of the present application;

FIG. 5 is a flow chart illustrating a method for determining a data quality value level according to an embodiment of the present application;

FIG. 6 is a schematic flow chart diagram of a method for determining an integrity level, a correctness level, a consistency level, and a repeatability level according to an embodiment of the present application;

FIG. 7 is a schematic flow chart diagram of a method for determining a data quality value grade according to an integrity grade, a correctness grade, a consistency grade and a repeatability grade according to an embodiment of the application;

FIG. 8 is a schematic flow chart diagram illustrating a method for determining a value rating of a data application provided by an embodiment of the present application;

FIG. 9 is a schematic flow chart diagram of a method for determining a timeliness rating, an application breadth rating, and an application heat rating provided by an embodiment of the present application;

FIG. 10 is a schematic flow chart diagram of a method for determining a data application value rating based on a timeliness rating, an application breadth rating, and an application heat rating provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of a graph of a statistical analysis of the value of a data asset provided by an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a value display device for data assets according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a first determining module provided in an embodiment of the present application;

fig. 14 is a schematic structural diagram of another first determining module provided in an embodiment of the present application;

fig. 15 is a schematic structural diagram of a server according to an embodiment of the present application.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of a data asset analysis model provided by an embodiment of the application. As shown in fig. 1, a data set corresponding to a data asset is input in a client. The client determines a data quality value rating and a data application value rating for the data set based on the data asset analysis model 100. The data asset analysis model determines a data quality value grade through data quantization indexes belonging to a data quality category, and determines a data application value grade through the data quantization indexes belonging to a data application category. The data quantization indexes belonging to the data quality category include data integrity 1001, data correctness 1002, data consistency 1003 and data repeatability 1004. The data quantization indexes belonging to the data application category include data timeliness 1005, data application breadth 1006, and data application heat 1007. Each data quantization index also corresponds to a weight and a quantization standard. And then the client determines the data asset value grade of the data set according to the data quality value grade and the data application value grade of the data set. Optionally, the client determines an average of the data quality value rating and the data application value rating as a data asset value rating for the data set. The data asset value rating of the data set can reflect the value of the data asset.

Illustratively, Table 1 shows the information included in the data asset analysis model.

TABLE 1

As shown in table 1, the data quantization index belonging to the data quality category includes at least one of data integrity, data correctness, data consistency, and data repeatability. The data quantization index belonging to the data application category includes at least one of data timeliness, data application breadth, and data application heat. Optionally, the client determines the weight corresponding to each data quantization index through an analytic hierarchy process. And the client judges the grade of the data in the data set under the data quantization index according to the quantization standard of the data quantization index. And determining the weighted average of the grades of the data set under the data quantization indexes belonging to the data quality category as the data quality value grade of the data set. And determining the weighted average of the grades of the data set under the data quantization indexes belonging to the data application category as the data application value grade of the data set. Illustratively, the data age rating of the data set is 96, the data application breadth rating is 92, and the data application heat rating is 91, and the data application value rating of the data set is 96 × 0.14+92 × 0.43+91 × 0.43 — 92.13.

In the process of determining the value of the data asset, manual information collection and analysis are not needed, and the computer equipment can determine the data asset value grade of the data set according to the data set corresponding to the data asset. And the data asset value rating of the data set can reflect the value of the data asset in terms of the quality of the data and the value in terms of the application of the data. The efficiency of determining the value of the data asset is improved.

Fig. 2 is a schematic flow chart of a method for displaying the value of a data asset according to an embodiment of the present application. The method may be used for a computer device or a client on a computer device. As shown in fig. 2, the method includes:

step 201, acquiring a data set corresponding to the data asset.

The data assets are assets that exist in the form of data, including data assets in any industry. For example, the data asset is a government affairs data asset. Optionally, the data assets are data in a database, operation records of the system, statistical data, and the like. The data set corresponding to the data asset includes all or a portion of the data in the data asset.

Step 202, calling a data asset analysis model, and determining a data quality value grade and a data application value grade of the data set.

The data asset analysis model is a computational model that determines data quality value classes and data application value classes through at least two kinds of data quantization indexes. Optionally, the client determines, according to the data quantization index, a level of the data set under the data quantization index, so as to determine a data quality value level and a data application value level. Optionally, the data quantization index corresponds to a weight and a quantization criterion. The weight is used for reflecting the level of the data set under the data quantization index, and the importance degree when the client determines the data quality value level or the data application value level. The quantization standard is used for the client to judge the grade of the data set under the data quantization index.

The data quality value rating is used to reflect the quality of the data in the data set. For example, the less erroneous data and duplicate data in a data set, the higher the value level of data quality. The data application value level is used for reflecting the application condition of the data in the data set. For example, the more times the data in the data set is used by the system, the higher the update frequency of the data in the data set, and the higher the data application value level.

Optionally, the client establishes the data asset analysis model through an analytic hierarchy process.

And step 203, determining the data asset value grade of the data set according to the data quality value grade and the data application value grade.

Optionally, the client determines an average of the data quality value rating and the data application value rating as a data asset value rating for the data set. Or the client determines the weighted average of the data quality value grade and the data application value grade according to the weight of the data quality value grade and the weight of the data application value grade, so that the data asset value grade of the data set is obtained.

Step 204, displaying the data asset value rating in the user interface.

Optionally, the user interface is a statistical analysis interface corresponding to the data asset. The client can display a statistical graph of the value of the data assets in the user interface according to the determined data quality value grade, data application value grade and data asset value grade.

In summary, the value display method for data assets provided by the embodiment of the application determines the value grade of the data assets of the data set through the data asset analysis model. Since the data asset value rating of the data set is determined according to the data quality value rating and the data application value rating, the data asset value rating of the data set can reflect the value of the data asset in terms of the quality of the data and the value of the data in terms of the application. Information collection and analysis need not be performed manually in determining the value of a data asset. The efficiency of determining the value of the data asset is improved. The method for determining the value of the data asset provided by the embodiment of the application can enable the owner of the data asset to clearly know the value of the data asset in charge and cause the problem of low value of the data asset. And then can be used for symptomatic medicine administration, and the value of the data assets is continuously improved. The method is favorable for the assets of the data and the value preservation and increment of the data assets.

FIG. 3 is a flow chart illustrating another method for displaying the value of a data asset according to an embodiment of the present application. The method may be used for a computer device or a client on a computer device. As shown in fig. 3, the method includes:

step 301, acquiring a data set corresponding to the data asset.

Optionally, as shown in fig. 4, the implementation process of step 301 includes the following steps 301a and 301 b:

in step 301a, a principal component analysis algorithm is used to perform dimensionality reduction on a data set corresponding to the data asset, so as to obtain a dimensionality reduction data set.

Optionally, when the data volume of the data set corresponding to the data asset reaches the target data volume, the client performs dimensionality reduction on the data set corresponding to the data asset through a Principal Component Analysis (PCA) algorithm. Alternatively, the target data amount refers to 1 TB. Illustratively, the data set corresponding to the data assets includes data of identification numbers of residents and data of ages. Because the age can be determined through the identification number, the client reduces the dimension of the data set through the PCA algorithm, and the obtained dimension reduction data set only comprises identification number data of residents.

In step 301b, a dimension reduction dataset is acquired.

And the client side takes the acquired dimension reduction data set as a data set corresponding to the data asset. The dimensionality reduction data set can improve the efficiency of a client in determining the value of a data asset.

Step 302, extracting data in the data set.

Optionally, the client extracts all data in the dataset and determines the value of the data asset from the all data. Alternatively, the client extracts a portion of the data in the data set and determines the value of the data asset based on the portion of the data. Illustratively, when the client determines the value of the demographic data asset, only the demographic data in the dataset is extracted.

And step 303, calling a data asset analysis model, and determining the data quality value grade of the data according to the quality quantization index.

The data asset analysis model is a calculation model for determining a data quality value grade and a data application value grade through at least two kinds of data quantization indexes. Optionally, the at least two kinds of data quantization indexes include: the quality quantization index corresponding to the data quality type and the application quantization index corresponding to the data application type.

Optionally, the quality quantization indexes include data integrity, data correctness, data consistency and data repeatability, and each quality quantization index corresponds to a quality quantization standard. As shown in fig. 5, the implementation procedure of step 303 includes the following steps 3031 and 3032:

in step 3031, the integrity level, correctness level, consistency level, and repeatability level of the data are determined based on quality quantization standards.

The integrity level is the level of data under data integrity, the correctness level is the level of data under data correctness, the consistency level is the level of data under data consistency, and the repeatability level is the level of data under data repeatability.

Optionally, as shown in fig. 6, the implementation procedure of step 3031 includes the following steps 3031a to 3031 d:

in step 3031a, an integrity level is determined according to a first quality quantization standard corresponding to data integrity.

Optionally, the first quality quantification criterion is used to indicate that the integrity level is equal to a ratio of the number of complete data in the data to the total number of data multiplied by one hundred. Complete data means that all elements in the rule requirement corresponding to the data have numerical values.

For example, in population data, a name, a native place, an identity card number and the like must be included, and when a certain data lacks an identity card number value, the data does not have integrity. In monthly expenditure data, when a month does not have a corresponding value for expenditure, the data is not complete. The education level data includes name, identification number, and academic calendar, and when a certain data is missing for a period of time, the data does not have integrity.

In step 3031b, a correctness level is determined according to a second quality quantization standard corresponding to the correctness of the data.

Optionally, the second quality quantification criterion is used to indicate that the level of correctness is equal to the ratio of the number of correct data in the data to the total number of data multiplied by one hundred. Correct data means that the value of the data is the correct value.

For example, the mobile phone number is 11 digits, and when the mobile phone number value in a certain data is 10 digits, the data is not correct. When a market appears in the province data, the data is not correct. When the average rate data per month is more than 3 times the rate data of a certain month than the rate data of other months, the data is not correct.

In step 3031c, a consistency level is determined according to a third quality quantization standard corresponding to the data consistency.

Optionally, a third quality quantization criterion is used to indicate that the level of consistency is equal to the ratio of the number of consistent data in the data to the total number of data multiplied by one hundred. Consistent data refers to the case where there is no contradiction between data and values of other data representing the same information.

For example, if the native place in one data is Beijing city, the native place in another data is beijing city, and the native place in other data is Beijing city, the data of the native place is not consistent. The numerical value of one data in the payroll data is reserved behind the decimal point, and the numerical values of other data are reserved behind the decimal point, so that the data of the numerical value reserved behind the decimal point is not consistent.

In step 3031d, a repeatability grade is determined according to a fourth quality quantification standard corresponding to data repeatability.

Optionally, a fourth quality quantification criterion is used to indicate a level of repeatability equal to one minus the ratio of the number of duplicate data in the data to the total number of data multiplied by one hundred. Duplicate data means that there is no possibility that the data is identical, and at least two identical pieces of data still exist.

For example, in the population data, if the identification numbers of three pieces of data are the same, two pieces of data are duplicated data. In the employee contact telephone data, if the mobile phone numbers in five pieces of data are the same, the four pieces of data are repeated data. The total number of data is 100, the number of duplicate data is 30, and the data has a repetition rating of (1-30/100) × 100 ═ 70.

Optionally, the client performs the steps 3031a to 3031d simultaneously, or sequentially performs the steps 3031a to 3031d in sequence, which is not limited herein in this embodiment of the application.

In step 3032, a data quality value level is determined based on the integrity level, correctness level, consistency level, and repeatability level.

Optionally, the client determines an average of the integrity level, the correctness level, the consistency level, and the repeatability level as the data quality value level. Alternatively, as shown in fig. 7, the implementation procedure of step 3032 includes the following

steps

3032a and 3032 b:

in step 3032a, a first weight corresponding to data integrity, a second weight corresponding to data correctness, a third weight corresponding to data consistency, and a fourth weight corresponding to data repeatability are determined by the first machine learning model.

The first machine learning model is obtained by training with a first sample set based on a Bayesian algorithm. The first machine learning model is used for determining the relative importance between every two quality quantization indexes in all the quality quantization indexes according to input data. The first sample set includes first sample data including the same data items as data extracted from the data set and first relative importance between every two quality quantization indexes of all the quality quantization indexes corresponding to the first sample data. Optionally, the first relative importance is manually calibrated by a worker. The first relative importance is used to reflect the degree of importance of one quality quantization index compared to another quality quantization index. By way of example, table 2 shows the values and meanings of relative importance.

TABLE 2

Value taking	Means of
		1	Indicating that two objects have the same importance compared
3	Representing one object as being slightly more important than another object
		5	Representing one object as significantly important as compared to another object
7	Representing one object as being very important compared to another object
		9	Representing one object as being extremely important compared to another
2、4、6、8	The expression importance is between the importance of the upper and lower values corresponding to the value

As shown in table 2, when the importance of data integrity compared to data consistency is 3, it means that data integrity is slightly more important than data consistency.

Optionally, the client inputs the extracted data into the first machine learning model, and obtains a relative importance degree matrix of the quality quantization indexes according to the relative importance between every two quality quantization indexes in all the quality quantization indexes determined by the first machine learning model. Illustratively, table 3 shows a relative importance matrix of the quality quantization index.

TABLE 3

	Data integrity	Data correctness	Data consistency	Data repeatability
					Data integrity
	1	1	3	5
					Data correctness	1	1	3	5
Data consistency	1/3	1/3	1	3
					Data repeatability	1/5	1/5	1/3	1

As shown in table 3, data integrity is significantly important compared to data repeatability, and data consistency is slightly important compared to data repeatability. And the client side performs consistency check on the relative importance degree matrix of the quality quantization index through an analytic hierarchy process according to the relative importance degree matrix of the quality quantization index, so that the characteristic vector corresponding to the maximum characteristic value of the relative importance degree matrix of the quality quantization index can be obtained. And then, normalizing the characteristic vector to determine a first weight corresponding to data integrity, a second weight corresponding to data correctness, a third weight corresponding to data consistency and a fourth weight corresponding to data repeatability. Optionally, the client determines the weight corresponding to each quality quantization index by calling a system corresponding to the analytic hierarchy process according to the relative importance degree matrix of the quality quantization index. Illustratively, the client determines that the first weight corresponding to the integrity of the data is 0.33, the second weight corresponding to the correctness of the data is 0.33, the third weight corresponding to the consistency of the data is 0.21, and the fourth weight corresponding to the repeatability of the data is 0.13.

In step 3032b, a data quality value level is determined based on the integrity level, the correctness level, the consistency level, the repeatability level, the first weight, the second weight, the third weight, and the fourth weight.

Optionally, the client sums the product of the integrity level and the first weight, the product of the correctness level and the second weight, the product of the consistency level and the third weight, and the product of the repeatability level and the fourth weight, and determines a result of the summation as the data quality value level.

Illustratively, the integrity level is q1, the correctness level is q2, the consistency level is q3, and the repeatability level is q 4. The first weight is w1, the second weight is w2, the third weight is w3, and the fourth weight is w 4. The data quality value rating q1 w1+ q2 w2+ q3 w3+ q4 w 4. Integrity rating of 90, correctness rating of 89, consistency rating of 94, repeatability rating of 98, data quality value rating of 90 x 0.33+89 x 0.33+94 x 0.21+98 x 0.13 ═ 91.55.

And step 304, calling a data asset analysis model, and determining the data application value grade of the data according to the application quantization index.

Optionally, the application quantization index includes data timeliness, data application extent and data application heat, and each application quantization index corresponds to an application quantization standard. As shown in fig. 8, the implementation process of step 304 includes the following steps 3041 and 3042:

in step 3041, a timeliness level, an application breadth level, and an application heat level of the data are determined according to the application quantization standard.

The timeliness grade is the grade of data under the timeliness of the data, the application breadth grade is the grade of the data under the application breadth, and the application heat grade is the grade of the data under the application heat.

Optionally, as shown in fig. 9, the implementation process of step 3041 includes the following steps 3041a to 3041 c:

in step 3041a, a timeliness level is determined according to a first application-level corresponding to the timeliness of the data.

Optionally, the first application quantization criterion is used to indicate that the timeliness level is related to the frequency of updating of the data.

For example, for data that needs to be updated frequently, the timeliness level is positively correlated with the update frequency of the data. Such as new-born population data, the higher the update frequency the higher the timeliness level. For data that does not require frequent updates, the timeliness level is related to the required update frequency of the data. Such as annual revenue data, require that the frequency of updates be once a year. If the update frequency is less than once a year, the timeliness level is 60. If the update frequency is once a year, the timeliness level is 100.

In step 3041b, an application breadth level is determined according to a second application quantization standard corresponding to the data application breadth.

Optionally, the second application quantification criterion is used to indicate that the level of application breadth is positively correlated with the number of systems using the data. A system using data refers to a system accessing, downloading, or transmitting the data.

Illustratively, when the number of systems using data is 5 and 5 or less, the application breadth level is 60. When the number of systems using data is 6 to 10, the application breadth level is 80. When the number of systems using data is 11 to 20, the application breadth level is 90. When the number of systems using data is 21 and 21 or more, the application breadth level is 100. The average human output data was used by 7 systems, and the application breadth rating was 80. The new population data is used by 2 systems, and the application breadth rating is 60.

In step 3041c, an application heat level is determined according to a third application quantization standard corresponding to the data application heat.

Optionally, the third application quantification criterion is used to indicate that the application heat level is positively correlated with the number of times the data is used. Once the data is accessed, downloaded or transmitted, the data is used once.

Illustratively, the application heat level is 60 when the data is used 100 times and 100 times or less. When the number of times data is used is 101 to 1000 times, the application heat rank is 70. When the number of times data is used is 1001 to 5000 times, the application heat rank is 85. When the number of times data is used is 5001 times and 5001 times or more, the application heat level is 100. The average human output data is used 70 times, and the application heat rating is 60. Newborn population data was used 2000 times. The application heat level is 85.

Optionally, the client executes the steps 3041a to 3041c simultaneously, or sequentially executes the steps 3041a to 3041c according to the sequence, which is not limited herein in this embodiment of the present application.

In step 3042, a data application value level is determined based on the timeliness level, the application breadth level, and the application heat level.

Optionally, the client determines the average of the timeliness level, the application breadth level and the application heat level as the data application value level. Alternatively, as shown in fig. 10, the implementation process of step 3042 includes the following steps 3042a and 3042 b:

in step 3042a, a fifth weight corresponding to the timeliness of the data, a sixth weight corresponding to the extent of the data application, and a seventh weight corresponding to the heat of the data application are determined by the second machine learning model.

The second machine learning model is obtained by training through a second sample set based on a Bayesian algorithm. The second machine learning model is used for determining the relative importance between every two application quantization indexes in all the application quantization indexes according to the input data. The second sample set comprises second sample data and second relative importance between every two applied quantization indexes in all the applied quantization indexes, wherein the second sample data comprises the same data items as the data extracted from the data set. Optionally, the second relative importance is manually calibrated by a worker. The second relative importance is used to reflect the degree of importance of one applied quantization index compared to another applied quantization index. Optionally, the first sample data is the same as or different from the second sample data.

Optionally, the client inputs the extracted data into a second machine learning model, and obtains a relative importance degree matrix of the application quantization indexes according to the relative importance between every two application quantization indexes in all the application quantization indexes determined by the second machine learning model. Optionally, the first machine learning model is the same as or different from the second machine learning model. Illustratively, table 4 shows a relative importance matrix to which the quantization index is applied.

TABLE 4

As shown in table 4, the data application breadth is slightly important compared to the data timeliness, and the data application heat is also important compared to the data application breadth. And the client side performs consistency check on the relative importance degree matrix of the application quantization index through an analytic hierarchy process according to the relative importance degree matrix of the application quantization index, so that the characteristic vector corresponding to the maximum characteristic value of the relative importance degree matrix of the application quantization index can be obtained. And then, normalizing the feature vector to determine a fifth weight corresponding to the timeliness of the data, a sixth weight corresponding to the application breadth of the data and a seventh weight corresponding to the application heat of the data. Optionally, the client determines the weight corresponding to each application quantization index by calling a system corresponding to the analytic hierarchy process according to the relative importance degree matrix of the application quantization index. Illustratively, the fifth weight corresponding to the timeliness of the data determined by the client is 0.14, the sixth weight corresponding to the data application breadth is 0.43, and the seventh weight corresponding to the data application heat is 0.43.

In step 3042b, a data application value level is determined based on the timeliness level, the application breadth level, the application heat level, the fifth weight, the sixth weight, and the seventh weight.

Optionally, the client sums the product of the timeliness level and the fifth weight, the product of the application breadth level and the sixth weight, and the product of the application popularity level and the seventh weight, and determines the result of the summation as the data application value level.

Illustratively, the timeliness rating is a1, the application breadth rating is a2, and the application heat rating is a 3. The fifth weight is w5, the sixth weight is w6, and the seventh weight is w 7. The data utility value rating a-a 1 w5+ a2 w6+ a3 w 7.

And 305, determining a data asset value grade of the data set according to the data quality value grade and the data application value grade.

Optionally, the client determines the average of the data quality value level and the data application value level as the data asset value level of the data set. Or, determining the weighted average of the data quality value grade and the data application value grade as the data asset value grade of the data set. The weight of the data quality value grade is higher than the weight of the data application value grade, and the data asset value grade mainly reflects the data quality value. The weight of the data application value grade is higher than the weight of the data quality value grade, and the data asset value grade mainly reflects the data application value.

Optionally, the data quality value level is q, and the data application value level is a. The data asset value rating of the data set, s ═ q + a)/2. Alternatively, the data asset value rating of the data set, s ═ q ═ w_q+a*w_a)/2. Wherein, w_qWeight corresponding to data quality value level, w_aA weight corresponding to the value level is applied to the data.

Step 306, displaying the data asset value rating in a user interface.

Optionally, the user interface is a statistical analysis interface corresponding to the data asset. Optionally, the client can display a statistical graph of the value of the data asset in the user interface according to the determined data quality value level, data application value level and data asset value level.

Illustratively, FIG. 11 is a schematic illustration of a statistical plot of the value of a data asset provided by an embodiment of the present application. As shown in fig. 11, the statistical graph 1101 includes data asset value grade information 1103, an application heat grade change trend analysis graph 1104, an application breadth grade change trend analysis graph 1105, an application heat distribution analysis graph 1106, an application heat change trend analysis graph 1107, an application breadth distribution analysis graph 1108, an application breadth change trend analysis graph 1109, an application breadth scene distribution analysis graph 1110, and application heat details 1111. The statistical chart 1101 also includes update time information 1102 of the statistical chart. Optionally, the client determines the data asset value level of the data asset according to a sum of the data asset value levels of various types of data in the data set corresponding to the data asset. The types of data in the dataset include data tables, index data, model data, tag data, and files. Wherein, the application heat distribution refers to the proportion of the data of which the application heat is respectively hot, warm, cold and ice. And the client determines that the data corresponds to heat, warm, cold or ice according to the application heat level of the data. The application breadth distribution refers to the proportion of the application breadth to the data of wide, medium, small and micro. And the client determines whether the data corresponds to a wide, medium, small or micro range according to the application breadth grade of the data. The data application scenario refers to the type of system that uses data, such as a news system, a vehicle management system, and an identity management system.

In addition, the dimensionality of the data set is reduced through the PCA algorithm, and the value of the data asset is determined according to the dimensionality-reduced data set, so that the efficiency of determining the value of the data asset can be further improved. The data quality value grade is determined according to the quality quantization index, the data application value grade is determined according to the application quantization index, the data assets can be comprehensively evaluated from multiple dimensions, and the accuracy of determining the value of the data assets is improved. The weight corresponding to the quality quantization index and the weight corresponding to the application quantization index are determined according to the machine learning model and the analytic hierarchy process, so that the efficiency and the accuracy of determining the weight are improved, and the accuracy of the determined data quality value grade and the data application value grade are improved.

It should be noted that, the sequence of the steps of the method for displaying the value of the data asset provided in the embodiment of the present application may be appropriately adjusted, and the steps may also be increased or decreased according to the circumstances, and any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application should be included in the protection scope of the present application, and therefore, the details are not repeated.

Fig. 12 is a schematic structural diagram of a value display device for a data asset according to an embodiment of the present application. The apparatus may be for a computer device or a client on a computer device. As shown in fig. 12, the apparatus 120 includes:

an obtaining module 1201, configured to obtain a data set corresponding to a data asset, where the data asset is an asset existing in a data form.

The first determining module 1202 is configured to invoke a data asset analysis model, and determine a data quality value level and a data application value level of the data set, where the data asset analysis model is a calculation model that determines the data quality value level and the data application value level through at least two kinds of data quantization indexes.

A second determining module 1203 is configured to determine a data asset value level of the data set according to the data quality value level and the data application value level.

A display module 1204 for displaying the data asset value ratings in a user interface.

In summary, the value display apparatus for data assets provided in the embodiments of the present application determines the value grade of the data assets of the data set through the data asset analysis model. Since the data asset value grade of the data set is determined by the second determination module according to the data quality value grade and the data application value grade, the data asset value grade of the data set can reflect the value of the data asset in the quality aspect of the data and the value of the data in the application aspect of the data. Information collection and analysis need not be performed manually in determining the value of a data asset. The efficiency of determining the value of the data asset is improved. The method for determining the value of the data asset provided by the embodiment of the application can enable the owner of the data asset to clearly know the value of the data asset in charge and cause the problem of low value of the data asset. And then can be used for symptomatic medicine administration, and the value of the data assets is continuously improved. The method is favorable for the assets of the data and the value preservation and increment of the data assets.

Optionally, the at least two kinds of data quantization indexes include: the quality quantization index corresponding to the data quality type and the application quantization index corresponding to the data application type. A first determining module 1202 for:

data in the dataset is extracted. Determining the data quality value grade of the data according to the quality quantization index; and determining the data application value grade of the data according to the application quantization index.

Optionally, the quality quantization indexes include data integrity, data correctness, data consistency and data repeatability, and the quality quantization indexes correspond to respective quality quantization standards. A first determining module 1202 for:

according to the quality quantification standard, determining an integrity level, a correctness level, a consistency level and a repeatability level of the data, wherein the integrity level is the level of the data under the integrity of the data, the correctness level is the level of the data under the correctness of the data, the consistency level is the level of the data under the consistency of the data, and the repeatability level is the level of the data under the repeatability of the data.

Optionally, as shown in fig. 13, the first determining module 1202 includes:

the first determining submodule 12021 is configured to determine an integrity level according to a first quality quantization criterion corresponding to data integrity, where the first quality quantization criterion is used to indicate that the integrity level is equal to a ratio of the number of complete data in the data to the total number of data multiplied by one hundred.

And a second determining submodule 12022, configured to determine a correctness level according to a second quality quantization criterion corresponding to correctness of the data, where the second quality quantization criterion is used to indicate that the correctness level is equal to a ratio of the number of correct data in the data to the total number of data multiplied by one hundred.

And a third determining submodule 12023, configured to determine a consistency level according to a third quality quantization standard corresponding to data consistency, where the third quality quantization standard is used to indicate that the consistency level is equal to a ratio of the number of consistent data in the data to the total number of data multiplied by one hundred.

And a fourth determining submodule 12024, configured to determine a repeatability level according to a fourth quality quantization standard corresponding to data repeatability, where the fourth quality quantization standard is used to indicate that the repeatability level is equal to one minus a ratio of the number of repeated data in the data to the total number of data, and then multiplied by one hundred.

Optionally, the application quantization indexes include data timeliness, data application extent and data application heat, and the application quantization indexes correspond to respective application quantization standards. A first determining module 1202 for:

according to the application quantization standard, determining the timeliness grade, the application breadth grade and the application heat grade of the data, wherein the timeliness grade is the grade of the data under the timeliness of the data, the application breadth grade is the grade of the data under the application breadth, and the application heat grade is the grade of the data under the application heat.

Optionally, as shown in fig. 14, the first determining module 1202 includes:

a fifth determining submodule 12025, configured to determine the timeliness level according to the first application quantization standard corresponding to the timeliness of the data, where the first application quantization standard is used to indicate that the timeliness level is related to the update frequency of the data.

And a sixth determining submodule 12026, configured to determine an application extent level according to a second application quantization standard corresponding to the data application extent, where the second application quantization standard is used to indicate that the application extent level is positively correlated with the number of systems using the data.

And a seventh determining sub-module 12027 configured to determine an application heat level according to a third application quantization standard corresponding to the data application heat, where the third application quantization standard is used to indicate that the application heat level is positively correlated with the number of times the data is used.

Optionally, the first determining module 1202 is configured to:

the method comprises the steps that a first weight corresponding to data integrity, a second weight corresponding to data correctness, a third weight corresponding to data consistency and a fourth weight corresponding to data repeatability are determined through a first machine learning model, the first machine learning model is obtained by training through a first sample set based on a Bayesian algorithm, the first sample set comprises first sample data and first relative importance corresponding to the first sample data, every two quality quantization indexes have first relative importance, and the first sample data and the data comprise the same data items.

And determining the data quality value grade according to the integrity grade, the correctness grade, the consistency grade, the repeatability grade, the first weight, the second weight, the third weight and the fourth weight.

Optionally, the first determining module 1202 is configured to:

and determining a fifth weight corresponding to the timeliness of the data, a sixth weight corresponding to the application breadth of the data and a seventh weight corresponding to the application heat of the data through a second machine learning model, wherein the second machine learning model is obtained by training a second sample set based on a Bayesian algorithm, the second sample set comprises second sample data and second relative importance corresponding to the second sample data, every two application quantization indexes have second relative importance, and the second sample data and the data comprise the same data item.

Optionally, the obtaining module 1201 is configured to:

and reducing the dimension of the data set corresponding to the data assets through a principal component analysis algorithm to obtain a dimension reduction data set. And obtaining a dimension reduction data set.

Optionally, the second determining module 1203 is configured to:

and determining the average value of the data quality value grade and the data application value grade as the data asset value grade of the data set.

Or, determining the weighted average of the data quality value grade and the data application value grade as the data asset value grade of the data set.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and the modules described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Embodiments of the present application further provide a computer device, including: a processor and a memory, the device memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, the at least one instruction, the at least one program, set of codes, or set of instructions being loaded and executed by the processor to implement the method for value display of data assets provided by the above-described method embodiments.

Optionally, the computer device is a server. Illustratively, fig. 15 is a schematic structural diagram of a server provided in an embodiment of the present application.

The server 1500 includes a Central Processing Unit (CPU) 1501, a system memory 1504 including a Random Access Memory (RAM) 1502 and a read-only memory (ROM) 1503, and a system bus 1505 connecting the system memory 1504 and the central processing unit 1501. The server 1500 also includes a basic input/output system (I/O system) 1506 for facilitating information transfer between various devices within the computer apparatus, and a mass storage device 1507 for storing an operating system 1513, application programs 1514 and other program modules 1515.

The basic input/output system 1506 includes a display 1508 for displaying information and an input device 1509 such as a mouse, keyboard, etc. for a user to input information. Wherein the display 1508 and the input device 1509 are connected to the central processing unit 1501 via an input output controller 1510 connected to the system bus 1505. The basic input/output system 1506 may also include an input/output controller 1510 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller 1510 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1507 is connected to the central processing unit 1501 through a mass storage controller (not shown) connected to the system bus 1505. The mass storage device 1507 and its associated computer-readable storage media provide non-volatile storage for the server 1500. That is, the mass storage device 1507 may include a computer-readable storage medium (not shown) such as a hard disk or a compact disc-only memory (CD-ROM) drive.

Without loss of generality, the computer-readable storage media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable storage instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other solid state memory devices, CD-ROM, Digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1504 and mass storage device 1507 described above may be collectively referred to as memory.

The memory stores one or more programs configured to be executed by the one or more central processing units 1501, the one or more programs containing instructions for implementing the method embodiments described above, and the central processing unit 1501 executes the one or more programs to implement the methods provided by the respective method embodiments described above.

The server 1500 may also operate as a remote server connected to a network via a network, such as the internet, according to various embodiments of the present application. That is, the server 1500 may be connected to the network 1512 through the network interface unit 1511 connected to the system bus 1505, or may be connected to other types of networks or remote server systems (not shown) using the network interface unit 1511.

The memory also includes one or more programs, which are stored in the memory, and the one or more programs include instructions for performing the steps performed by the server in the methods provided by the embodiments of the present application.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the value display method of the data assets provided by the method embodiments.

The embodiment of the present application further provides a computer storage medium, where at least one instruction, at least one program, a code set, or a set of instructions may be stored in the storage medium, and when the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor of a computer device, the method for displaying the value of the data asset provided by the above method embodiments is implemented.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an example of the present application and should not be taken as limiting, and any modifications, equivalent switches, improvements, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of displaying value of a data asset, the method comprising:

displaying the data asset value rating in a user interface.

2. The method of claim 1, wherein the at least two categories of data quantization indices comprise: the quality quantization index corresponding to the data quality type and the application quantization index corresponding to the data application type;

the determining a data quality value rating and a data application value rating of the data set comprises:

extracting data in the dataset;

3. The method of claim 2, wherein the quality metrics include data integrity, data correctness, data consistency, and data repeatability, the quality metrics corresponding to respective quality metrics;

the determining the data quality value grade of the data according to the quality quantization index comprises:

4. The method of claim 3, wherein determining the integrity level, correctness level, consistency level, and repeatability level of the data according to the quality quantification standard comprises:

determining the integrity level according to a first quality quantization standard corresponding to the data integrity, wherein the first quality quantization standard is used for indicating that the integrity level is equal to the ratio of the number of complete data in the data to the total number of the data multiplied by one hundred;

and determining the correctness level according to a second quality quantization standard corresponding to the correctness of the data, wherein the second quality quantization standard is used for indicating that the correctness level is equal to the ratio of the number of correct data in the data to the total number of the data multiplied by one hundred;

and determining the consistency level according to a third quality quantization standard corresponding to the data consistency, wherein the third quality quantization standard is used for indicating that the consistency level is equal to the ratio of the number of consistent data in the data to the total number of the data multiplied by one hundred;

and determining the repeatability grade according to a fourth quality quantization standard corresponding to the data repeatability, wherein the fourth quality quantization standard is used for indicating that the repeatability grade is equal to one minus the ratio of the number of repeated data in the data to the total number of the data and then multiplied by one hundred.

5. The method of claim 3, wherein said determining the data quality value rating based on the integrity rating, the correctness rating, the consistency rating, and the repeatability rating comprises:

6. The method according to claim 1 or 2, wherein the obtaining of the data set corresponding to the data asset comprises:

and acquiring the dimension reduction data set.

7. The method of any of claims 1 to 5, wherein determining a data asset worth rating for the data set based on the data quality worth rating and the data application worth rating comprises:

or the like, or, alternatively,

determining the data asset value rating of the data set as a weighted average of the data quality value rating and the data application value rating.

8. An apparatus for displaying value of a data asset, the apparatus comprising:

the first determination module is used for calling a data asset analysis model and determining a data quality value grade and a data application value grade of the data set, wherein the data asset quantification model is a calculation model for determining the data quality value grade and the data application value grade through at least two kinds of data quantification indexes;

a display module to display the data asset value rating in a user interface.

9. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement a method of value display of a data asset of any of claims 1 to 7.

10. A computer storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions which, when loaded and executed by a processor of a computer device, carries out a method of value display of a data asset according to any one of claims 1 to 7.