CN106845846A - Big data asset evaluation method - Google Patents
Big data asset evaluation method Download PDFInfo
- Publication number
- CN106845846A CN106845846A CN201710058720.7A CN201710058720A CN106845846A CN 106845846 A CN106845846 A CN 106845846A CN 201710058720 A CN201710058720 A CN 201710058720A CN 106845846 A CN106845846 A CN 106845846A
- Authority
- CN
- China
- Prior art keywords
- data
- attribute
- value
- sigma
- tables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Educational Administration (AREA)
- Technology Law (AREA)
- Game Theory and Decision Science (AREA)
- Accounting & Taxation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of big data asset evaluation method, big data asset evaluation method, including:First, data quality accessment, the index of the quality of data includes accuracy, integrality, uniformity, ageing;2nd, data scale assessment, data scale index includes data attribute number, data tuple number and unit information amount;3rd, data content assessment, data content includes transaction data, personal information, merchandise news, production management data, user's evaluating data and social network data;4th, industry value calculation;5th, data assets value calculation.Big data asset evaluation method of the present invention, it provides specific quantitative criteria for the assessment of data assets, makes Appraisal process simpler apparent, eliminates the subjective factor influence of judge, evaluation result is more consistent with actual.
Description
Technical field
The present invention relates to assets assessment technical field, more particularly to a kind of appraisal procedure of data assets.
Background technology
Data value in view of different industries is different, and tax revenue may determine that the size of the business transaction amount of money, therefore according to
The data such as tax yearbook, data are divided into by industry:
(1) agricultural data
(2) mining industry data
(3) manufacturing industry data
(4) the production and supply industry data of electric power, heating power, combustion gas and water
(5) construction industry data
(6) wholesale and retail industry data
(7) communications and transportation, storage and postal industry data
(8) accommodation and catering industry data
(9) information transfer, software and information technology service industry
(10) financial circles data
(11) real estate data
(12) lease and commerce services industry data
(13) educational data
(14) health and social work data
(15) culture, physical culture and show business data
(16) public administration, social security and social organization's data
(17) other industry data
It is known that each data file generally includes much information, therefore can be splitted data into again by data content:
(1) transaction data
(2) personal information
(3) commodity (service) information
(4) production management data
(5) user's evaluating data
(6) social network data
Wherein, personal information includes vendor information and consumer's information.It is worth noting that, each data file is included
A class or multi-class data in above-mentioned six classes data.
In recent years, in appearing in our life again and again with " big data " this vocabulary, the evaluation problem of data assets
Also the hot spot of society is turned into.The current research on data assets is perfect not enough.In view of intangible asset assessment in state
Certain achievement in research, thus data assets have inside been obtained as a kind of special intangible asset, its value assessment can with it is general
Logical intangible assessment is connected.Sun Rongling etc. proposes the quantization side of the value and Value Realization to intangible asset first
Face is studied, but traditional appraisal procedure is relatively rough;Then Chen Chang clouds propose Black-Scholes Black-Scholes Option Pricing Model Black-Scholes and
EVA methods, and introduce it into the assessment to enterprise's integral value, model is more accurate, but do not consider different enterprises it
Between gap;The then continuous research and inquirement of experts and scholars, forms a set of more perfect intangible asset system, mainly has
Income approach, market method and cost-or-market method, but still presence conflicts with this several method for the evaluation criteria of data assets and key element,
It is thus impossible to these methods are applied in data assets completely;At the same time, data assets value defines heterogeneity,
The shortage and data assets of the data assets value assessment dimension of data assets appraisal Model or reference model and system
Assessment lack a specific quantitative criteria, this brings more difficulties to researcher.
In addition, data assessment importance for different classes of is different, it is faintly regarded as a class, analysis result shows
Obtain some to lose contact with reality, the actual demand with society to data is runed counter to.In addition, the evaluation structure of data assets is considered as more
Many aspects, past research there is also deficiency in structure.
The content of the invention
In view of this, the purpose of the present invention is directed to the deficiency of method in the past, proposes a kind of assessment of new data assets
Method.
Big data asset evaluation method of the present invention, it is characterised in that including:
First, data quality accessment, including:
1st, the calculating of data accuracy
Sampling obtains training set, inspection set and accuracy rate forecast set respectively first from tables of data, every time for training set
In a predictable attribute f, it is class label to set it, and training obtains a grader, and carries out performance inspection by inspection set
Survey;Then the value of the attribute f of each tuple in forecast set is predicted with this grader, predicted value is consistent with actual value
(for numerical attribute, its difference without departing from certain threshold value, such as standard deviation) if think that the property value is correct, it is and accurate pre-
The tuple ratio of survey is accuracy rate a of the tables of data on the attributef.This process is repeated to each attribute in tables of data,
Obtain the accuracy rate a of each attributej;
Wherein j=1,2 ..., m, m is the number of predictable attribute;
Wherein, ntIt is the number of tuples in forecast set, nrjIt is the number of tuples correctly classified in forecast set;Calculate these aj's
Weighted arithmetic average obtains the comprehensive accuracy rate A of tables of data, i.e.,:
Wherein, j is the numbering for being predicted attribute, wfjIt is the weight of attribute j, its value can be according to the span of attribute j
Determined with dispersion degree, because attribute span is bigger, dispersion degree is higher, the accuracy rate of its prediction is lower, imparting
Weight should be smaller;The computing formula of weight is:
Wherein, hjIt is the entropy of attribute j, entropy represents the size of attribute span and the height of dispersion degree, its calculating
Formula is:
Wherein, v is the number of value, pfFor attribute takes f-th probability of value;
Finally total accuracy rate of whole data set is:
Wherein, wtiIt is the weight of table i, t is the total number of table in evaluated data set;The formula of weight is:
Wherein, ntiIt is the number of tuples of table i, nfiIt is the attribute number of whole data set, nt is the number of tuples of whole data set,
Nf is the attribute number of whole data set;
2nd, the calculating of data integrity degree I
Wherein, nnullTo lack or being the data item number of null, nitemIt is data item total number;
3rd, the calculating of data consistent degree C
This formula is to investigate object with a database in data set, wherein, Ci is evaluated i-th database of data set
Consistent degree;fniIt is total attribute number, n in i-th databasenameIt is the inconsistent attribute number of naming convention in i-th database,
ncodeIt is the inconsistent attribute number of data code used in i-th database, nformIt is the lattice of input field in i-th database
The inconsistent attribute number of formula, L is the number for being evaluated the database included in data set, WiIt is i-th weight of database;
4th, data time is worth the calculating of T
Wherein, tpThe time of expression information issue, tcRepresent current time, C (tc,tp) represent information in tcThe shadow at moment
Ring power size, i.e. tcThe time value at moment, what a was represented is the aging rate coefficient of information, and aging rate coefficient a is set to 0.1;
5th, the quality of data is assessed by formula
Wherein, QiIt is the Quality factors of prior and sample data of the i-th class data classified according to data content;
2nd, data scale assessment, including:
1st, the calculating of data attribute number
1) attribute number of numeric data is calculated
(1) by the correlation coefficient r of formula evaluation attribute A and BA,B,
Wherein, n is the number of data tuple, aiAnd biIt is respectively tuple i values on A and B,WithIt is respectively the equal of A and B
Value, σAAnd σBIt is respectively the standard deviation of A and B;
(2) after coefficient correlation is obtained, the attribute number of logarithm Value Data is compressed, obtain each attribute attribute number it
With;
2) attribute number of nominal, grouped data is calculated
(1) by χ2Check to judge correlation;
Wherein, oijIt is joint event (Ai, Bj) observation frequency, and eijIt is (Ai, Bj) expectation frequency;
Wherein, n is the number of data tuple, count (A=ai) it is that value is a on AiTuple number, count (B=bi)
It is that value is b on BiTuple number;χ2A and B independences are assumed in statistical check, based on insolation level, with the free degree (R-1) × (C-
1);χ is calculated by above-mentioned formula2Value, then with χ2The region of rejection of inspection is compared, then can sentence two correlations of attribute of section;
According to repeatedly calculate inspection, obtain it is autocorrelative in the case of χ2=n, therefore in χ2>On the premise of 10.828, can be by rA,B
Used as the degree of correlation between two attributes, formula is as follows:
Wherein, R, C are the classification numbers of classified variable;
(2) after coefficient correlation is obtained, the attribute number of logarithm Value Data is compressed, attribute compression step:
1. correlation matrix is built
Wherein, rij=it is attribute fiAnd fjThe degree of correlation, RiIt is attribute fiWith the summation of other Attribute Correlations,
2. the row of R matrixes is pressed into RiOrder sequence from big to small, obtains
3. increase by one and arrange f0Represent the initial scale benchmark of single attribute
4. condensation matrix is obtained
5. the element on diagonal is added the attribute number after just being compressed
fnc=r '11+r′22+…+r′nn
2nd, directly statistics obtains the data tuple number in tables of data;tnj;
3rd, the calculating of unit information amount
(1) the comentropy computing formula of discrete type attribute is:
Wherein, P (xi) be each property value occur probability;
(2) calculating of the comentropy of continuous type attribute:
After a kind of discretization method is first selected to its discretization, then carried out by the computing formula of discrete type Attribute information entropy
Calculate;
(3) after obtaining the comentropy of each attribute, the average information entropy of attribute is obtained:
Fn is the attribute number of individual data table before compression;
Then the computing formula of individual data table scale is obtained:
Wherein, S is that a certain data scale of tables of data weighs the factor (unit is bit), fncAfter the compression of this tables of data
Data attribute number, tn is the number of tuples of this tables of data,It is the average information entropy of all properties;
3rd, data content assessment
One comparator matrix B=(b is constructed using AHP three scale methodsij)n×n,bijFor on same level element ratio compared with gained
Scale value, specially
The importance ranking index of each element is calculated with following formula:
Note rmax=MAX { ri},rmin=MIN { ri},bm=rmax/rmin, obtain judgment matrix C=(cij)n×n:
So as to obtain
After obtaining judgment matrix, calculate according to the following steps and check:
(1) weight is calculated with root method, formula is as follows:
Calculation procedure:1. by the element of C by the new vector of row mutually multiplied,
2. each component of new vector is opened into n powers,
3. gained vector normalization is weight vectors;
(2) coincident indicator CI is calculated
Wherein, λmaxIt is the eigenvalue of maximum of judgment matrix C;
(3) coincident indicator RI is searched
(4) consistency ration CR is calculated
Work as CR<When 0.10, it is believed that the uniformity of judgment matrix can be receiving, otherwise tackle judgment matrix and make to repair in right amount
Just;Thus, obtain with the weight of every class data of classifying content;
4th, industry value calculation
1st, tax revenues highest industry is taken, fraction is worth and is set to 100;
2nd, the tax revenue of other industry and highest industry tax revenue are divided by, multiplied by with 100, obtain the industry valency of other industry
Value;
5th, data assets value calculation
1st, by Quality factors of prior and sample data Qij, data scale factor SijAnd by the weight W of classifying contentiIt is multiplied, if the i-th class
Packet contains multiple tables of data, then first calculate individual tables of data, then the result of this several tables of data is added up;
2nd, calculated by above-mentioned computational methods by every class of classifying content, the result for obtaining adds up successively;
3rd, accumulated result is multiplied with the industry value being calculated and obtains the value fraction V of data assets;
Value fraction
4th, data assets value is assessed by being worth fraction V.
Beneficial effects of the present invention:
Big data asset evaluation method of the present invention, it provides specific quantitative criteria for the assessment of data assets, makes to comment
Sentence that process is simpler apparent, eliminate the subjective factor influence of judge, make evaluation result and be actually more consistent.
Brief description of the drawings
Fig. 1 is data assets value assessment overall construction drawing.
Specific embodiment
The invention will be further described with reference to the accompanying drawings and examples.
The present embodiment big data asset evaluation method, including:
First, data quality accessment, including:
1st, whether the calculating of data accuracy, data accuracy describes the feature phase one of the corresponding Subject of data
Cause;
Sampling obtains training set, inspection set and accuracy rate forecast set respectively first from tables of data, every time for training set
In a predictable attribute f, it is class label to set it, and training obtains a grader, and carries out performance inspection by inspection set
Survey;Then the value of the attribute f of each tuple in forecast set is predicted with this grader, predicted value and actual value one
Cause, for numerical attribute, its difference is without departing from certain threshold value, then it is assumed that the property value is correct, and the unit of Accurate Prediction
Group ratio is accuracy rate a of the tables of data on the attributef, this process is repeated to each attribute in tables of data, obtain every
The accuracy rate a of individual attributej;
Wherein j=1,2 ..., m, m is the number of predictable attribute;
Wherein, ntIt is the number of tuples in forecast set, nrjIt is the number of tuples correctly classified in forecast set;Wherein sorting algorithm
Can voluntarily select (such as:Decision Tree Inductive C4.5, CART etc.);
Calculate these ajWeighted arithmetic average obtain the comprehensive accuracy rate A of tables of data, i.e.,;
Wherein, j is the numbering for being predicted attribute, wfjIt is the weight of attribute j, its value can be according to the span of attribute j
Determined with dispersion degree, because attribute span is bigger, dispersion degree is higher, the accuracy rate of its prediction is lower, imparting
Weight should be smaller;The computing formula of weight is:
Wherein, hjIt is the entropy of attribute j, entropy represents the size of attribute span and the height of dispersion degree, its calculating
Formula is:
Wherein, v is the number of value, pfFor attribute takes f-th probability of value;
Finally total accuracy rate of whole data set is:
Wherein, wtiIt is the weight of table i, t is the total number of table in evaluated data set.The formula of weight is:
Wherein, ntiIt is the number of tuples of table i, nfiIt is the attribute number of whole data set, nt is the number of tuples of whole data set,
Nf is the attribute number of whole data set;
Predictable attribute:The span of some attributes is very big and with certain randomness, and some of which information is often
It is related to the privacy of individual, being generally required in the application for being related to big data to conclude the business and analyzing carries out desensitization process, such as:Name,
Telephone number, address etc.;Some do not have physical meaning then, such as:Tuple ID, some project codes etc., to this kind of attribute evaluation its
Accuracy there is no need, referred to as unpredictable attribute, and other are referred to as predictable attribute;
2nd, the calculating of data integrity degree I, data integrity degree I describes data with the presence or absence of missing record or absent field,
Wherein, nnullTo lack or being the data item number of null, nitemIt is data item total number;
3rd, the calculating of data consistent degree C, data consistent degree describes the value of the same attribute of same entity in different systems
Or it is whether consistent in data set;
This formula is to investigate object with a database in data set, wherein, Ci is evaluated i-th database of data set
Consistent degree;fniIt is total attribute number, n in i-th databasenameIt is the inconsistent attribute number of naming convention in i-th database,
ncodeIt is the inconsistent attribute number of data code used in i-th database, nformIt is the lattice of input field in i-th database
The inconsistent attribute number of formula, L is the number for being evaluated the database included in data set, WiIt is i-th weight of database;
4th, the calculating of data time value (T)
Wherein, tpThe time of expression information issue, tcRepresent current time, C (tc,tp) represent information in tcThe shadow at moment
Ring power size, i.e. tcThe time value at moment, what a was represented is the aging rate coefficient of information, and aging rate coefficient a is set to 0.1;
5th, the quality of data is assessed by formula
Wherein, QiIt is the Quality factors of prior and sample data of the i-th class data classified according to data content;
2nd, data scale assessment, including:
1st, the calculating of data attribute number, also known as field, when most of, the row of table are referred to as field to data attribute, each field
Information comprising a certain special topic;
1) attribute number of numeric data is calculated
(1) by the correlation coefficient r of formula evaluation attribute A and BAB,
Wherein, n is the number of data tuple, aiAnd biIt is respectively tuple i values on A and B,WithIt is respectively the equal of A and B
Value, σAAnd σBIt is respectively the standard deviation of A and B;
(2) after coefficient correlation is obtained, the attribute number of logarithm Value Data is compressed, obtain each attribute attribute number it
With;
2) attribute number of nominal, grouped data is calculated
(1) by χ2Check to judge correlation;
Wherein, oijIt is joint event (Ai, Bj) observation frequency, and eijIt is (Ai, Bj) expectation frequency;
Wherein, n is the number of data tuple, count (A=ai) it is that value is a on AiTuple number, count (B=bi)
It is that value is b on BiTuple number;χ2A and B independences are assumed in statistical check, based on insolation level, with the free degree (R-1) × (C-
1);χ is calculated by above-mentioned formula2Value, then with χ2The region of rejection of inspection is compared, then can sentence two correlations of attribute of section;
According to repeatedly calculate inspection, obtain it is autocorrelative in the case of χ2=n, therefore in χ2>On the premise of 10.828, can be by rA,B
Used as the degree of correlation between two attributes, formula is as follows:
Wherein, R, C are the classification numbers of classified variable;
(2) after coefficient correlation is obtained, the attribute number of logarithm Value Data is compressed, attribute compression step:
1. correlation matrix is built
Wherein, rij=it is attribute fiAnd fjThe degree of correlation, RiIt is attribute fiWith the summation of other Attribute Correlations,
2. the row of R matrixes is pressed into RiOrder sequence from big to small, obtains
3. increase by one and arrange f0The initial scale benchmark of single attribute is represented, 1 is set to;
4. following procedure computation attribute scale compression matrix is pressed
Obtain
5. the element on diagonal is added the attribute number after just being compressed
fnc=r '11+r′22+…+r′nn
2nd, directly statistics obtains the data tuple number in tables of data;tnj;In bivariate table, tuple also known as record, in table
Often go, i.e., every record in database, is exactly a tuple;
3rd, the calculating of unit information amount, during unit information volume index is according to file, same attribute includes different numerical value
How much;
(1) the comentropy computing formula of discrete type attribute is:
Wherein, P (xi) be each property value occur probability;
(2) calculating of the comentropy of continuous type attribute:
After a kind of discretization method is first selected to its discretization, then carried out by the computing formula of discrete type Attribute information entropy
Calculate;
(3) after obtaining the comentropy of each attribute, the average information entropy of attribute is obtained:
Fn is the attribute number of individual data table before compression;
Then the computing formula of individual data table scale is obtained:
Wherein, S is that a certain data scale of tables of data weighs the factor (unit is bit), fncAfter the compression of this tables of data
Data attribute number, tn is the number of tuples of this tables of data,It is the average information entropy of all properties;
3rd, data content assessment
One comparator matrix B=(b is constructed using AHP three scale methodsij)n×n,bijFor on same level element ratio compared with gained
Scale value, specially
The importance ranking index of each element is calculated with following formula:
Note rmax=MAX { ri},rmin=MIN { ri},bm=rmax/rmin, obtain judgment matrix c=(cij)n×n:
So as to obtain
After obtaining judgment matrix, calculate according to the following steps and check:
(1) weight is calculated with root method, formula is as follows:
Calculation procedure:1. by the element of c by the new vector of row mutually multiplied,
2. each component of new vector is opened into n powers,
3. gained vector normalization is weight vectors;
(2) coincident indicator CI is calculated
Wherein, λmaxIt is the eigenvalue of maximum of judgment matrix C;
(3) coincident indicator RI is searched
(4) consistency ration CR is calculated
Work as CR<When 0.10, it is believed that the uniformity of judgment matrix can be receiving, otherwise tackle judgment matrix and make to repair in right amount
Just;Thus, obtain with the weight of every class data of classifying content;
4th, industry value calculation
1st, tax revenues highest industry is taken, fraction is worth and is set to 100;
2nd, the tax revenue of other industry and highest industry tax revenue are divided by, multiplied by with 100, obtain the industry valency of other industry
Value;
5th, data assets value calculation
1st, by Quality factors of prior and sample data Qij, data scale factor SijAnd by the weight W of classifying contentiIt is multiplied, if the i-th class
Packet contains multiple tables of data, then first calculate individual tables of data, then the result of this several tables of data is added up;
2nd, calculated by above-mentioned computational methods by every class of classifying content, the result for obtaining adds up successively;
3rd, accumulated result is multiplied with the industry value being calculated and obtains the value fraction V of data assets;
Value fraction
4th, data assets value is assessed by being worth fraction V.
The present embodiment big data asset evaluation method, it provides specific quantitative criteria, makes for the assessment of data assets
Appraisal process is simpler apparent, eliminates the subjective factor influence of judge, evaluation result is more consistent with actual.
Finally illustrate, the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted, although with reference to compared with
Good embodiment has been described in detail to the present invention, it will be understood by those within the art that, can be to skill of the invention
Art scheme is modified or equivalent, and without deviating from the objective and scope of technical solution of the present invention, it all should cover at this
In the middle of the right of invention.
Claims (1)
1. a kind of big data asset evaluation method, it is characterised in that including:
First, data quality accessment, including:
1st, the calculating of data accuracy
Sampling obtains training set, inspection set and accuracy rate forecast set respectively first from tables of data, every time in training set
One predictable attribute f, it is class label to set it, and training obtains a grader, and carries out performance detection by inspection set;So
The value of the attribute f of each tuple in forecast set is predicted with this grader afterwards, predicted value is consistent with actual value, for
Numerical attribute, its difference is without departing from certain threshold value, then it is assumed that the property value is correct, and the tuple ratio of Accurate Prediction is
The accuracy rate a on the attribute that is tables of dataf, this process is repeated to each attribute in tables of data, obtain each attribute
Accuracy rate aj;
Wherein j=1,2 ..., m, m is the number of predictable attribute;
Wherein, ntIt is the number of tuples in forecast set, nrjIt is the number of tuples correctly classified in forecast set;Calculate these ajWeighting
Arithmetic average is worth to the comprehensive accuracy rate A of tables of data, i.e.,:
Wherein, j is the numbering for being predicted attribute, wfjBe the weight of attribute j, its value can according to the span of attribute j and from
Scattered degree determines, because attribute span is bigger, dispersion degree is higher, the accuracy rate of its prediction is lower, the weight of imparting
Should be smaller;The computing formula of weight is:
Wherein, hjIt is the entropy of attribute j, entropy represents the size of attribute span and the height of dispersion degree, its computing formula
For:
Wherein, v is the number of value, pfFor attribute takes f-th probability of value;
Finally total accuracy rate of whole data set is:
Wherein, wtiIt is the weight of table i, t is the total number of table in evaluated data set.The formula of weight is:
Wherein, ntiIt is the number of tuples of table i, nfiIt is the attribute number of whole data set, nt is the number of tuples of whole data set, and nf is
The attribute number of whole data set;
2nd, the calculating of data integrity degree I
Wherein, nnullTo lack or being the data item number of null, nitemIt is data item total number;
3rd, the calculating of data consistent degree C
This formula is to investigate object with a database in data set, wherein, Ci is the one of evaluated i-th database of data set
Cause degree;fniIt is total attribute number, n in i-th databasenameIt is the inconsistent attribute number of naming convention in i-th database, ncode
It is the inconsistent attribute number of data code used in i-th database, nformFor input field in i-th database form not
Consistent attribute number, L is the number for being evaluated the database included in data set, WiIt is i-th weight of database;
4th, data time is worth the calculating of T
Wherein, tpThe time of expression information issue, tcRepresent current time, C (tc,tp) represent information in tcThe influence power at moment
Size, i.e. tcThe time value at moment, what a was represented is the aging rate coefficient of information, and aging rate coefficient a is set to 0.1;
5th, the quality of data is assessed by formula
Wherein, QiIt is the Quality factors of prior and sample data of the i-th class data classified according to data content;
2nd, data scale assessment, including:
1st, the calculating of data attribute number
1) attribute number of numeric data is calculated
(1) by the correlation coefficient r of formula evaluation attribute A and BA,B,
Wherein, n is the number of data tuple, aiAnd biIt is respectively tuple i values on A and B,WithIt is respectively the average of A and B, σA
And σBIt is respectively the standard deviation of A and B;
(2) after coefficient correlation is obtained, the attribute number of logarithm Value Data is compressed, and obtains the attribute number sum of each attribute;
2) attribute number of nominal, grouped data is calculated
(1) by χ2Check to judge correlation;
Wherein, oijIt is joint event (Ai, Bj) observation frequency, and eijIt is (Ai, Bj) expectation frequency;
Wherein, n is the number of data tuple, count (A=ai) it is that value is a on AiTuple number, count (B=bi) it is on B
It is b to be worthiTuple number;χ2A and B independences are assumed in statistical check, based on insolation level, with the free degree (R-1) × (C-1);It is logical
Cross above-mentioned formula and calculate χ2Value, then with χ2The region of rejection of inspection is compared, then can sentence two correlations of attribute of section;
According to repeatedly calculate inspection, obtain it is autocorrelative in the case of χ2=n, therefore in χ2>On the premise of 10.828, can be by rA,BAs
The degree of correlation between two attributes, formula is as follows:
Wherein, R, C are the classification numbers of classified variable;
(2) after coefficient correlation is obtained, the attribute number of logarithm Value Data is compressed, attribute compression step:
1. correlation matrix is built
Wherein, rij=it is attribute fiAnd fjThe degree of correlation, RiIt is attribute fiWith the summation of other Attribute Correlations,
2. the row of R matrixes is pressed into RiOrder sequence from big to small, obtains
3. increase by one and arrange f0Represent the initial scale benchmark of single attribute
4. condensation matrix is obtained
5. the element on diagonal is added the attribute number after just being compressed
fnc=r '11+r′22+…+r′nn
2nd, directly statistics obtains the data tuple number in tables of data:tnj;
3rd, the calculating of unit information amount
(1) the comentropy computing formula of discrete type attribute is:
Wherein, P (xi) be each property value occur probability;
(2) calculating of the comentropy of continuous type attribute:
After a kind of discretization method is first selected to its discretization, then based on carrying out by the computing formula of discrete type Attribute information entropy
Calculate;
(3) after obtaining the comentropy of each attribute, the average information entropy of attribute is obtained:
Fn is the attribute number of individual data table before compression;
Then the computing formula of individual data table scale is obtained:
Wherein, S is that a certain data scale of tables of data weighs the factor (unit is bit), fncIt is the number after the compression of this tables of data
According to attribute number, tn is the number of tuples of this tables of data,It is the average information entropy of all properties;
3rd, data content assessment
One comparator matrix B=(b is constructed using AHP three scale methodsij)n×n,bijFor on same level element ratio compared with gained scale
Value, specially
The importance ranking index of each element is calculated with following formula:
Note rmax=MAX { ri},rmin=MIN { ri},bm=rmax/rmin, obtain judgment matrix c=(cij)n×n:
So as to obtain
After obtaining judgment matrix, calculate according to the following steps and check:
(1) weight is calculated with root method, formula is as follows:
Calculation procedure:1. by the element of c by the new vector of row mutually multiplied,
2. each component of new vector is opened into n powers,
3. gained vector normalization is weight vectors;
(2) coincident indicator CI is calculated
Wherein, λmaxIt is the eigenvalue of maximum of judgment matrix C;
(3) coincident indicator RI is searched
(4) consistency ration CR is calculated
Work as CR<When 0.10, it is believed that the uniformity of judgment matrix can be receiving, otherwise tackle judgment matrix and make appropriate amendment;
Thus, obtain with the weight of every class data of classifying content;
4th, industry value calculation
1st, tax revenues highest industry is taken, fraction is worth and is set to 100;
2nd, the tax revenue of other industry and highest industry tax revenue are divided by, multiplied by with 100, obtain the industry value of other industry;
5th, data assets value calculation
1st, by Quality factors of prior and sample data Qij, data scale factor SijAnd by the weight W of classifying contentiIt is multiplied, if the i-th class data
Comprising multiple tables of data, then individual tables of data is first calculated, then the result of this several tables of data is added up;
2nd, calculated by above-mentioned computational methods by every class of classifying content, the result for obtaining adds up successively;
3rd, accumulated result is multiplied with the industry value being calculated and obtains the value fraction V of data assets;
4th, data assets value is assessed by being worth fraction V.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710058720.7A CN106845846A (en) | 2017-01-23 | 2017-01-23 | Big data asset evaluation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710058720.7A CN106845846A (en) | 2017-01-23 | 2017-01-23 | Big data asset evaluation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106845846A true CN106845846A (en) | 2017-06-13 |
Family
ID=59122737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710058720.7A Pending CN106845846A (en) | 2017-01-23 | 2017-01-23 | Big data asset evaluation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106845846A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107633257A (en) * | 2017-08-15 | 2018-01-26 | 上海数据交易中心有限公司 | Data Quality Assessment Methodology and device, computer-readable recording medium, terminal |
CN107995020A (en) * | 2017-10-23 | 2018-05-04 | 北京兰云科技有限公司 | A kind of asset valuation method and apparatus |
CN108804655A (en) * | 2018-06-07 | 2018-11-13 | 福建江夏学院 | A kind of intangible asset method and system based on big data |
CN108829750A (en) * | 2018-05-24 | 2018-11-16 | 国信优易数据有限公司 | A kind of quality of data determines system and method |
CN109615431A (en) * | 2018-12-13 | 2019-04-12 | 普元信息技术股份有限公司 | The system and method for data assets perception and pricing function are realized under big data background |
CN110766429A (en) * | 2018-07-26 | 2020-02-07 | 国信优易数据有限公司 | Data value evaluation system and method |
CN111475695A (en) * | 2020-03-30 | 2020-07-31 | 贵阳大数据交易所有限责任公司 | Service data asset pricing method based on metadata |
CN111539770A (en) * | 2020-04-27 | 2020-08-14 | 启迪数华科技有限公司 | Intelligent data asset assessment method and system |
CN112101447A (en) * | 2020-09-10 | 2020-12-18 | 北京百度网讯科技有限公司 | Data set quality evaluation method, device, equipment and storage medium |
CN113673889A (en) * | 2021-08-26 | 2021-11-19 | 上海罗盘信息科技有限公司 | Intelligent data asset identification method |
CN113806356A (en) * | 2020-06-16 | 2021-12-17 | 中国移动通信集团重庆有限公司 | Data identification method and device and computing equipment |
-
2017
- 2017-01-23 CN CN201710058720.7A patent/CN106845846A/en active Pending
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107633257B (en) * | 2017-08-15 | 2020-04-17 | 上海数据交易中心有限公司 | Data quality evaluation method and device, computer readable storage medium and terminal |
CN107633257A (en) * | 2017-08-15 | 2018-01-26 | 上海数据交易中心有限公司 | Data Quality Assessment Methodology and device, computer-readable recording medium, terminal |
CN107995020B (en) * | 2017-10-23 | 2021-05-07 | 北京兰云科技有限公司 | Asset value assessment method and device |
CN107995020A (en) * | 2017-10-23 | 2018-05-04 | 北京兰云科技有限公司 | A kind of asset valuation method and apparatus |
CN108829750A (en) * | 2018-05-24 | 2018-11-16 | 国信优易数据有限公司 | A kind of quality of data determines system and method |
CN108804655A (en) * | 2018-06-07 | 2018-11-13 | 福建江夏学院 | A kind of intangible asset method and system based on big data |
CN110766429A (en) * | 2018-07-26 | 2020-02-07 | 国信优易数据有限公司 | Data value evaluation system and method |
CN109615431A (en) * | 2018-12-13 | 2019-04-12 | 普元信息技术股份有限公司 | The system and method for data assets perception and pricing function are realized under big data background |
CN111475695A (en) * | 2020-03-30 | 2020-07-31 | 贵阳大数据交易所有限责任公司 | Service data asset pricing method based on metadata |
CN111539770A (en) * | 2020-04-27 | 2020-08-14 | 启迪数华科技有限公司 | Intelligent data asset assessment method and system |
CN111539770B (en) * | 2020-04-27 | 2023-06-16 | 国云数字科技(重庆)有限公司 | Intelligent evaluation method and system for data assets |
CN113806356A (en) * | 2020-06-16 | 2021-12-17 | 中国移动通信集团重庆有限公司 | Data identification method and device and computing equipment |
CN113806356B (en) * | 2020-06-16 | 2024-03-19 | 中国移动通信集团重庆有限公司 | Data identification method and device and computing equipment |
CN112101447A (en) * | 2020-09-10 | 2020-12-18 | 北京百度网讯科技有限公司 | Data set quality evaluation method, device, equipment and storage medium |
CN112101447B (en) * | 2020-09-10 | 2024-04-16 | 北京百度网讯科技有限公司 | Quality evaluation method, device, equipment and storage medium for data set |
CN113673889A (en) * | 2021-08-26 | 2021-11-19 | 上海罗盘信息科技有限公司 | Intelligent data asset identification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106845846A (en) | Big data asset evaluation method | |
Tang et al. | Neural networks analysis in business failure prediction of Chinese importers: A between-countries approach | |
CN101894316A (en) | Method and system for monitoring indexes of international market prosperity conditions | |
JP2004500642A (en) | Methods and systems for assessing cash flow recovery and risk | |
CN111738843B (en) | Quantitative risk evaluation system and method using running water data | |
Tang et al. | Online-purchasing behavior forecasting with a firefly algorithm-based SVM model considering shopping cart use | |
Xu et al. | Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode | |
Kiv et al. | Machine learning of emerging markets in pandemic times | |
CN112966962A (en) | Electric business and enterprise evaluation method | |
CN112419030B (en) | Method, system and equipment for evaluating financial fraud risk | |
CN109146611A (en) | A kind of electric business product quality credit index analysis method and system | |
Hamal et al. | A novel integrated AHP and MULTIMOORA method with interval-valued spherical fuzzy sets and single-valued spherical fuzzy sets to prioritize financial ratios for financial accounting fraud detection | |
Chimonaki et al. | Identification of financial statement fraud in Greece by using computational intelligence techniques | |
Wardana et al. | Comparation of SAW method and TOPSIS in assesing the best area using HSE standards | |
US20140156544A1 (en) | Non-Tangible Assets Valuation Tool | |
Sharma et al. | Predicting purchase probability of retail items using an ensemble learning approach and historical data | |
CN105447117A (en) | User clustering method and apparatus | |
CN113450004A (en) | Power credit report generation method and device, electronic equipment and readable storage medium | |
CN115905319B (en) | Automatic identification method and system for abnormal electricity fees of massive users | |
CN105956012A (en) | Database mode abstract method based on graphical partition strategy | |
CN114626940A (en) | Data analysis method and device and electronic equipment | |
Li et al. | A commonsense knowledge-enabled textual analysis approach for financial market surveillance | |
Shmueli et al. | Data mining in excel: Lecture notes and cases | |
CN111091410B (en) | Node embedding and user behavior characteristic combined net point sales prediction method | |
Kasinadh et al. | Building fuzzy OLAP using multi-attribute summarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170613 |