CN108764995A - A kind of data value determines system and method - Google Patents

A kind of data value determines system and method Download PDF

Info

Publication number
CN108764995A
CN108764995A CN201810511472.1A CN201810511472A CN108764995A CN 108764995 A CN108764995 A CN 108764995A CN 201810511472 A CN201810511472 A CN 201810511472A CN 108764995 A CN108764995 A CN 108764995A
Authority
CN
China
Prior art keywords
data
determined
index
value
index value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810511472.1A
Other languages
Chinese (zh)
Inventor
吴燕飞
庞钰宁
王肃
夏珺峥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guoxin Youe Data Co Ltd
Original Assignee
Guoxin Youe Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guoxin Youe Data Co Ltd filed Critical Guoxin Youe Data Co Ltd
Priority to CN201810511472.1A priority Critical patent/CN108764995A/en
Publication of CN108764995A publication Critical patent/CN108764995A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application provides a kind of data values to determine system and method, which includes:Index value determining module, the index value for determining pre-set level for data to be determined;Wherein, the pre-set level includes the business value index of the quality of data index and characterize data service application value of characterize data quality;Market value determining module, for being that the data to be determined determine market value according to market information;Data value determining module, for based on the default operation relation between determining index value and market value, determining the data value of the data to be determined.

Description

A kind of data value determines system and method
Technical field
This application involves data analysis technique fields, and system and method are determined in particular to a kind of data value.
Background technology
In today of digital information rapid development, influence of the data to enterprise is increasingly enhanced, and more and more enterprises need " being spoken with data ".For enterprise, the proportion that intangible asset occupies is increasing, in addition to patent, software copyright, trade mark etc. The importance of the intangible assets such as intellectual property, this intangible asset of business datum should not be underestimated, and the value of business datum is sometimes straight Connect the value for determining enterprise.
The assessment business for providing business datum in the prior art, is assessed for realizing the value to business datum. The supplier that business datum assesses business is mainly Asset assessment organizations;When carrying out business datum assessment, person to be assessed needs It is contacted with Asset assessment organizations, both sides link up evaluation condition face to face;After evaluation condition is settled, person to be assessed is by business datum Be supplied to Asset assessment organizations, then by Asset assessment organizations assets assessment expert according to certain estimation flow to business datum It is assessed.Such assessment mode result in evaluation process influenced by artificial subjective factor it is more so that assessment result Not objective enough, accuracy is relatively low.
Invention content
In view of this, a kind of data value of being designed to provide of the application determines system and method, it is existing for solving The low problem of data value accuracy is determined in technology.
In a first aspect, the embodiment of the present application, which provides a kind of data value, determines system, which includes:
Index value determining module, the index value for determining pre-set level for data to be determined;
Wherein, the pre-set level includes the quality of data index and characterize data service application value of characterize data quality Business value index;
Market value determining module, for being that the data to be determined determine market value according to market information;
Data value determining module is used for based on the default operation relation between determining index value and market value, really The data value of the fixed data to be determined.
Optionally, the quality of data index includes data consistency index;
The index value determining module waits for really specifically for determination the included data content of data to be determined with described Fixed number is according to the degree of consistency for corresponding to description information;And the data one of the data to be determined are determined based on the degree of consistency The index value of cause property index, and the degree of consistency is higher, characterizes the finger of the data consistency index of the data to be determined Scale value is higher.
Optionally, the index value determining module, specifically for determining that following one or more data contents are retouched with corresponding State the degree of consistency between information, and the higher characterization of the degree of consistency between any one data content and corresponding description information The index value of the data consistency index of the data to be determined is higher:
Data volume described in description information of the data volume that the data to be determined include with the data to be determined;
Size described in description information of the size of the data to be determined with the data to be determined;
Data format described in description information of the data format of the data to be determined with the data to be determined.
Optionally, the quality of data index includes the one or more of following index:Data integrity index, data are superfluous Remaining index, data age index, data figureofmerit;
The case where for including data integrity index, the index value determining module, specifically for being waited for described in determination really Fixed number is according to the null value accounting in included data entry;And determine that the data of the data to be determined are complete based on the null value accounting The index value of whole property index, and the null value accounting is lower, the data integrity for characterizing the data to be determined is higher;
The case where for including data redudancy index, the index value determining module, specifically for being waited for described in determination really Fixed number according to repeated entries in the data entry for being included accounting;And it is determined based on the accounting of the repeated entries described to be determined The index value of the data redudancy index of data, and the accounting of the repeated entries is lower, characterizes the number of the data to be determined It is lower according to redundancy.
The case where for including data age index, the index value determining module, specifically for being waited for described in determination really The time interval and the data generation time to be determined that fixed number is crossed over according to generation time are provided with the data to be determined Time difference between time;Determine that the data age of the data to be determined refers to based on the time interval and the time difference Target index value:Wherein, the time interval span is bigger, characterizes the index of the data age index of the data to be determined Value is higher;And time difference is smaller, the index value for characterizing the data age index of the data to be determined is higher;
The case where for including data figureofmerit, the index value determining module are specifically used for determining the number to be determined According to comprising data volume;And amount determines the index value of the data figureofmerit of the data to be determined based on the data, and it is described Data volume is bigger, and the index value for characterizing the data figureofmerit of the data to be determined is higher.
Optionally, which further includes:Data crawl module and set of metadata of similar data determining module;
The data crawl module, for crawling multiple data sets from the default platform;
The set of metadata of similar data determining module, for being solved respectively to the data to be determined and the multiple data set Analysis, determines the lexical feature of the data to be determined and each data set;By the lexical feature of the data to be determined respectively with The lexical feature of each data set carries out text similarity matching;The data set that text similarity is reached to default similarity threshold is true It is set to the set of metadata of similar data of the data to be determined.
Optionally, the quality of data index includes data scarcity index;
The index value determining module is specifically used for determining the data to be determined and similar to the data to be determined Set of metadata of similar data default platform occurrence number;And determine that the data of the data to be determined are rare based on the occurrence number Property index index value, and the occurrence number is fewer, and the scarcity for characterizing the data to be determined is higher.
Optionally, the business value index includes the one or more of following index:Industry field classification indicators, application Scene index, supplier's index;
The case where for including industry field classification indicators, the index value determining module, specifically for being waited for described in determination Determine the corresponding industry field number of labels of the affiliated data set of data row corresponding with the affiliated data category of data to be determined The ratio of industry field number of labels;And the index of the data industry domain classification index to be determined is determined based on the ratio Value, and the ratio is bigger, the index value for characterizing the industry field classification indicators of the data to be determined is bigger;
It is described to be determined to be specifically used for determination for the case where for including application scenarios index, the index value determining module Data correspond to the quantity of application scenarios;And determine that the application scenarios of the data to be determined refer to based on the quantity of the application scenarios Target index value, and the application scenarios quantity is more, the index value for characterizing the application scenarios index of the data to be determined is got over Greatly;
The case where for including supplier's index, the index value determining module are specifically used for judging the number to be determined According to whether be data set provider native data;And the finger of the availability of data quotient index to be determined is determined based on judging result Scale value.
Optionally, the market value determining module is specifically used for determining the data to be determined and the number to be determined According to set of metadata of similar data default platform value weighted sum.
Optionally, the data value determining module, specifically for the weighted sum of the index value determined is determined as valence It is worth compensation coefficient;Value after using the value compensation coefficient to correct the market value is determined as the data to be determined Data value.
Second aspect, the embodiment of the present application provide a kind of data value and determine that method, this method include:
The index value of pre-set level is determined for data to be determined;Wherein, the pre-set level includes characterize data quality The business value index of quality of data index and characterize data service application value;
It is that the data to be determined determine market value according to market information;
Based on the default operation relation between determining index value and market value, the data of the data to be determined are determined Value.
Data value provided by the embodiments of the present application determines system, and characterize data quality and table are determined by data to be determined The index value for levying the pre-set level of data service application value, the market value of data to be determined, root are determined by market information According to the operation relation between pre-set index value and market value, the data value of data to be determined is determined.In this way, passing through Quantify the index value of diversified pre-set level and the market value of data to be determined, increases the data value of determining data Accuracy, and the factor that the application considers is more comprehensive, increases the reliability of finally determining data value.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate Appended attached drawing, is described in detail below.
Description of the drawings
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the structural schematic diagram that a kind of data value provided by the embodiments of the present application determines system;
Fig. 2 is the structural schematic diagram that a kind of data value provided by the embodiments of the present application determines method;
Fig. 3 is a kind of structural schematic diagram of computer equipment 300 provided by the embodiments of the present application.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application Middle attached drawing, technical solutions in the embodiments of the present application are clearly and completely described, it is clear that described embodiment is only It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real Applying the component of example can be arranged and designed with a variety of different configurations.Therefore, below to the application's for providing in the accompanying drawings The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application Apply example.Based on embodiments herein, institute that those skilled in the art are obtained without making creative work There is other embodiment, shall fall in the protection scope of this application.
The embodiment of the present application provides a kind of data value and determines system, as shown in Figure 1, the system includes:
Index value determining module 11, the index value for determining pre-set level for data to be determined;Wherein, the default finger Mark includes the business value index of the quality of data index and characterize data service application value of characterize data quality;
Market value determining module 12, for being that the data to be determined determine market value according to market information;
Data value determining module 13 is used for based on the default operation relation between determining index value and market value, Determine the data value of the data to be determined.
Here, data to be determined can be the business datum for needing to carry out data value determination, the acquisition of data to be determined Mode includes a variety of, for example, the data in each default field crawled from default platform, default platform includes enterprise web site, statistics Office, data trade platform, button platform etc., default field can be the communications field, internet arena etc., or be commented with data Estimate the data to be determined that the data source of demand directly provides;Quality of data index includes:Data consistency index, data are complete Property index, data redudancy index, data age index, data scarcity index, data figureofmerit etc., business value index Including industry field classification indicators, application scenarios index, supplier's index etc.;Market information can be data to be determined each pre- If the data value in platform;Default operation relation between index value and market value can be linear relationship, nonlinear dependence System, exponential relationship etc., the application not limit this.
In specific implementation, the index value and business value of each quality of data index are determined from the data to be determined of acquisition The index value (described below) of index, according to data value (e.g., data of the data to be determined of acquisition in each default platform Cost), determine the market value of data to be determined, using the default operation relation between preset index value and market value, Calculate the data value of data to be determined.
Preferably, the object that the embodiment of the present application is implemented every time can be a kind of data, if such data includes multiple Data set, then the data value of the embodiment of the present application determines that object can be a data set.
The application can also determine the set of metadata of similar data of data to be determined, which also wraps other than obtaining data to be determined It includes:Data crawl module 14 and set of metadata of similar data determining module 15;
The data crawl module 14, for crawling multiple data sets from the default platform;
The set of metadata of similar data determining module 15 is for respectively solving the data to be determined and the multiple data set Analysis, determines the lexical feature of the data to be determined and each data set;By the lexical feature of the data to be determined respectively with The lexical feature of each data set carries out text similarity matching;The data set that text similarity is reached to default similarity threshold is true It is set to the set of metadata of similar data of the data to be determined.
Here it is possible to by reptile, crawl the technologies such as tool and crawl data set, the application not limits this;Text phase It is explained in detail like degree matching process the prior art is existing, it will not be described herein, it being understood, however, that text can be calculated The method of similarity is within the scope of protection of this application.
In specific implementation, word segmentation processing is carried out to each data set of acquisition, obtains the first vocabulary number after word segmentation processing According to;The sequence of the frequency of occurrence concentrated in corresponding data according to each first lexical data after word segmentation processing from high to low, sieve The first lexical data of preset quantity before selecting, for each data of each data set, according to each first vocabulary filtered out The frequency that data occur in the data set determines the lexical feature of the data.
Word segmentation processing is carried out to data to be determined, obtains the second lexical data after word segmentation processing;After word segmentation processing Frequency of occurrence sequence from high to low of each second lexical data in data to be determined, preset quantity the before filtering out Two lexical datas, for each data in data to be determined, according to each second lexical data filtered out in the number to be determined According to the frequency of middle appearance, the lexical feature of the data is determined.
For each lexical feature in each data set, calculate the lexical feature in the data set respectively with it is to be determined The text similarity between lexical feature in data.Text similarity is greater than or equal to the data set of default similarity threshold It is determined as the set of metadata of similar data of data to be determined.
Further, the case where determining multiple feature vocabulary for data to be determined and data set, for number to be determined According to each feature vocabulary, can by each feature vocabulary of this feature vocabulary and data set, composition notebook similarity compares respectively, will The feature vocabulary that similarity reaches the first default similarity threshold is determined as the similar vocabulary of this feature vocabulary, similar vocabulary quantity When reaching the second predetermined threshold value, data to be determined and data set are determined as set of metadata of similar data.
Further, there is the case where industry label marked for data to be determined and data set, it can also be direct Using industry label as the feature vocabulary of corresponding data, feature vocabulary is directly subjected to similarity comparison.
When computing market is worth, data and set of metadata of similar data to be determined, market value determining module 12 can be considered Specifically for the set of metadata of similar data of the determination data to be determined and the data to be determined in the weighted sum for presetting platform value Value.
In specific implementation, there is no the case where data to be determined for default platform, calculate the similarity number of data to be determined According to the weighted sum of the value of each default platform of calculating, using the weighted sum as the market value of data to be determined;For pre- If platform has the case where data to be determined, valence of the set of metadata of similar data in each default platform of data to be determined and data to be determined is calculated The weighted sum of value, using the weighted sum as the market value of data to be determined.
In specific implementation, data to be determined are counted in the data cost (value) of each default platform, count number to be determined According to data value of the set of metadata of similar data in each default platform, the data value and number to be determined of the data to be determined of counting statistics According to set of metadata of similar data data value and value average value, using the average value as the market value of data to be determined;Alternatively, In advance weight, different platform are set for each data value of data to be determined and the data value of the set of metadata of similar data of data to be determined Different weights can be set, the data value of each data value of data to be determined and the set of metadata of similar data of data to be determined is calculated Weighted average, using the weighted average as the market value of data to be determined;Alternatively, only by data to be assessed each The value of default platform and value average value not limits this as market value, the application
It can refer to the market value that following formula calculates data to be determined
Wherein,For the market value of data to be determined, PiFor value of the data to be determined in i-th of default platform, k For the number of data trade platform, generally positive integer.Alternatively,
PiFor value of i-th of data set in any default platform, k is the number of data set (to be had for default platform The case where data to be determined, k are the quantity of data to be determined and the data set as set of metadata of similar data;It is not waited for for default platform The case where determining data, k are the quantity of the data set as set of metadata of similar data), generally positive integer.
Based on the index value of obtained market value and pre-set level, data value determining module 13 is by determining index value Weighted sum be determined as be worth compensation coefficient;Value after using the value compensation coefficient to correct the market value is true It is set to the data value of the data to be determined.
In specific implementation, the weighted sum h of the index value of determining each pre-set level is calculated, above-mentioned weighted sum is based on H and market value determine the data value of data to be determined.
The weighted sum h of the index value of each pre-set level is calculated using following formula:
Wherein, h is the weighted sum of the index value of each pre-set level, ωiFor the index value of i-th of pre-set level, βiIt is The weight of i pre-set level, r be pre-set level total number, be positive integer, preferably 9.
The weight beta of different pre-set levels may be the same or different, and weight is generally data fields to be determined Personnel give a mark determining, or are calculated by deep learning algorithm, the application not limits this.
The data value I of data to be determined is calculated using following formula:
Wherein, I is the data value of data to be determined, and h is the weighted sum of the index value of each pre-set level.
Describe the calculating process of the index value of the pre-set level in detail below for each pre-set level.
When quality of data index is data consistency index, it is described to be determined that index value determining module 11 is specifically used for determination The degree of consistency of the included data content of data description information corresponding with the data to be determined;It is determined based on the degree of consistency The index value of the data consistency index of the data to be determined, and the degree of consistency is higher, characterizes the number to be determined According to data consistency index index value it is higher.
Specifically, index value determining module 11 is determining the included data content of data to be determined and the data to be determined When the degree of consistency of corresponding description information, determine following consistent between one or more data contents and corresponding description information Property degree, and the degree of consistency between any one data content and corresponding description information higher characterization data to be determined The index value of data consistency index is higher:
First:Data described in description information of the data volume that the data to be determined include with the data to be determined Amount;
The data content of data to be determined is carried in the file of a certain format;Data to be determined can be by a plurality of number It constitutes according to entry, is made of multiple data elements per data entry, wherein data element is the most basic of composition data to be determined Data unit.
For example, when data to be determined are commodity price data, the data element of a data to be determined can be:Trade name Title, commodity production quotient, the place of production, production time, shelf-life, net content, nutritional ingredient, product batch number, on-sale date.
That is data to be assessed are preferably the form of data entry, are text for the data with evaluation requirement The case where data, can carry out text data key message extraction operation in advance before being assessed, and generate data entry shape The data of formula.Such as:Data with evaluation requirement are buyer's guide text, can be before assessment according to trade name, quotient The keyword extractions such as product manufacturer, the place of production, production time at data entry form, using the data entry of extraction as to be determined Data.
The data volume that data to be determined are included can also may be used with the data volume for the valid data member that data to be determined include Think the quantity for the data element for including, or the quantity of data entry, by taking the data volume of valid data member as an example, above-mentioned In example, the quantity for the data element that a complete data to be determined include is 9, then is per the corresponding data volume of data entry 9, if data to be determined include 100 data entries, the data volume that should have should be 900, that is to say, that waits for really The data volume of fixed number evidence is 900;In practical applications, it is understood that there may be certain data elements are sky, no practical for empty data element Content causes the actual amount of data of data to be determined less than data volume described in description information;With the quantity of data entry For, it here can be described by description information of the data more to be determined data entry quantity that includes with the data to be determined Data entry quantity.
Therefore it can be retouched by the description information of data volume and the data to be determined that determination data to be determined include The degree of consistency for the data volume stated characterizes the data content of data to be determined and the degree of consistency of description information.
Second:Size described in description information of the size of the data to be determined with the data to be determined;
Herein, the size of data to be determined can essentially regard the file size for the file for carrying data to be determined as. For example, the data element missing (i.e. data element is sky) of data entry will also result in the true of the file data for carrying data to be determined Cause not of uniform size described in file size and description information.
It therefore, can be by describing described in the size of determination data to be determined and the description information of the data to be determined The degree of consistency of size characterizes the data content of data to be determined and the degree of consistency of description information.
Third:Data lattice described in description information of the data format of the data to be determined with the data to be determined Formula;
Herein, the data format of data to be determined, and carry the file format of the file of data to be determined.Carrying waits for really The file format of fixed number evidence may be different from file format described in description information.
Therefore it can be retouched described in the description information by the data format of determination data to be determined with the data to be determined The degree of consistency for the data format stated characterizes the data content of data to be determined and the degree of consistency of description information.
It being understood, however, that the data content that data to be determined are included can be but not limited to data volume, size sum number According to format etc.;The corresponding description information of data to be determined is generally used for describing the data of data to be determined, data pair to be determined The description information answered also includes the contents such as data volume, size and data format.
In specific implementation, described in the description information for calculating data volume and data to be determined that data to be determined include The first absolute difference (namely absolute value) of data volume, calculates the description information of the size and data to be determined of data to be determined Size the second absolute difference (namely absolute value), if the description of the data format of data to be determined and data to be determined is believed The described data format of breath is consistent, it is determined that the consistent degree D of data to be determined is the first preset value, and otherwise, D is second default Value calculates the index value of data consistency index according to the first absolute difference, the second absolute difference and consistent degree.Wherein, first Preset value is generally 0, and the second preset value is generally 1, and the first preset value and the second preset value may be other values, can be according to reality Border situation determines that usually, the second preset value is more than the first preset value.
First absolute difference L1 is calculated using following formula:
L1=| La-Lm|
Wherein, L1 is the first absolute difference of data to be determined, LaThe data volume for including by data to be determined, LmTo wait for Determine the data volume that the description information of data is included.
Second absolute difference L2 is calculated using following formula:
L2=| Sa-Sm|
Wherein, L2 is the second absolute difference of data to be determined, SaFor the size of data to be determined, SmFor data to be determined Description information size.
The index value ω of data consistency index is calculated using following formula1
Wherein, ω1For the data consistency index of data to be determined, α is the positive real number no more than 1, preferably 1/3, D For the consistent degree of data to be determined.
ω1Value range is generally [0,1], ω1Value is bigger, illustrates that the degree of consistency of data to be determined is higher, then, The data value of the data to be determined is also higher.
When quality of data index is data integrity index, it is described to be determined that index value determining module 11 is specifically used for determination Null value accounting in the included data entry of data;And determine that the data of the data to be determined are complete based on the null value accounting Property index index value, and the null value accounting is lower, and the data integrity for characterizing the data to be determined is higher.
Here, null value accounting can be accounting of the data invalid data element to be determined in data entry sum.
It in specific implementation, will be for assignment the having as the data entry of every data entry in data to be determined Validity detects whether the data element in data to be determined in each data entry is empty successively;According to testing result to every number Integrality assignment is carried out according to member, obtains the integrity value of each data element, and data element is if it is empty, then corresponding integrity value (namely availability) is 0;Data element is not sky, then corresponding integrity value is 1, by the integrity value of all data elements With, the ratio with data element quantity, as null value accounting, alternatively, it is empty to count in data to be determined in all data entries The total quantity of data element, by be in all data entries empty data element total quantity and data to be determined in all data elements Total quantity ratio as null value accounting;Calculate the availability in data to be determined per data entry and value be averaged Value, and using above-mentioned average value as the index value of data integrity index.
For example, having 10 rows, 10 column datas in data to be determined, each data element in data to be determined is traversed, if the i-th row The data element of jth row is sky, at this point, the availability of the i-th row jth row is 0, if the data element of the i-th row jth row is not sky, at this point, The availability of i-th row jth row is 1.
The data integrity index ω of data to be determined is calculated using following formula2Index value:
Wherein, ω2For the data integrity index of data to be determined, aijFor the i-th row jth column data in data to be determined The availability of member, S are the data strip mesh number (i.e. line number) in data to be determined, T for the data strip mesh number in data to be determined (i.e. Columns), N is the sum of data entry in data to be determined, wherein N=S × T.
ω2Value range be [0,1], ω2Value is bigger, indicates that the data integrity of data to be determined is better.
Further, null value accounting can also be invalid data entry accounting in data entry sum in data to be determined Than.There are the data entries of preset quantity sky data element can be determined as invalid data entry.ω2For invalid data entry and number According to the quotient of entry sum.
Quality of data index is data redudancy index, and index value determining module 11 is specifically used for determining the number to be determined According to the accounting of repeated entries in the data entry for being included;And the data to be determined are determined based on the accounting of the repeated entries Data redudancy index index value, and the accounting of the repeated entries is lower, and the data for characterizing the data to be determined are superfluous Remaining is lower.
The accounting of repeated entries characterizes the ratio of the number of duplicate data entry and data entry sum in data to be determined, Information redundance is the ratio for calculating duplicate data and occurring.In a data acquisition system, the data repeated are data redundancy, information Redundancy is higher, and data value is lower.
In specific implementation, it counts in data to be determined per the repeat number of data entry;It calculates each in data to be determined The repeat number of data entry and value average value, the average value based on calculating calculate the index value of data redudancy index.
It, be according to the row of data entry in the number repeated per data entry in counting the data to be determined Whether cloth sequence detects occurred before per data entry successively;Wherein, data element in two identical data entries The completely the same either content of content is consistent or similar data element quantity reaches predetermined threshold value.It appears again if data entry is attached most importance to Existing data entry, namely before detecting current data entry, there is another data identical with current data entry Entry is tested, then corresponding repeatability value is 1;If data entry is not the data entry repeated, namely is being detected Before current data entry, another data entry not identical with current data entry is detected, then corresponding repetition Property value be 0, by the sum of the repeatability value of all data entries, the ratio with data entry quantity wrapped as data to be determined The accounting of repeated entries in the data entry contained.
Data redudancy index ω is calculated using following formula3Index value:
Wherein, ω3For the data redudancy index of data to be determined, biFor the weight of i-th of data entry in data to be determined Plural number, N are the sum of data entry in data to be determined.
ω3Value range is [0,1], ω3Value is bigger, shows that the data redundancy of data to be determined is smaller, then corresponding Data value is also higher.
For example, including 5 data entries, respectively a, b, c, d, e in data to be determined, wherein a, b are identical with e, c, d Whether identical, it is the data entry repeated to be detected successively per data entry from a to e;A occurs for the first time, repeatability Value is 0;B is identical with a, and the repeatability value for the data entry repeated, therefore b is that 1, c occurs for the first time, repeatability value It is 0;D is identical with c, and for the data entry repeated, repeatability value is 1;E is identical as a, for the data strip repeated Mesh, repeatability value are 1, and the accounting of repeated entries is 0.6 in the data entry that finally obtained data to be determined are included.Root According to above-mentioned formula, it is known that final gained really fixed number according to the index value ω under data redudancy index3=0.4.
Quality of data index includes data age index, and it is described to be determined that index value determining module 11 is specifically used for determination When the time interval and the data generation time to be determined that data generation time is crossed over are provided with the data to be determined Between between time difference;The data age index of the data to be determined is determined based on the time interval and the time difference Index value:Wherein, the time interval span is bigger, and the time difference is smaller, characterizes the data of the data to be determined The index value of timeliness index is higher.
Here, the time interval that data generation time to be determined is crossed over starts generation time to waiting for for data to be determined Determine that data terminate the time interval crossed between generation time.The unit of time interval will be according to the length of the time interval Specifically set, it, can retouching by data to be determined if the initial time of data to be determined and final time can not be determined State information determination;Initial time, final time in the time interval that generation time can cross over for data to be determined, can also For average time, preferably initial time.
In specific implementation, the maximum time span between each data in data to be determined is calculated, it is, time interval In the final time and initial time between difference;When the generation of offer time and data to be determined based on data to be determined Between between difference, calculate data age index index value.
For example, if the length of time interval is 1 day, it sets the unit of time interval to minute;If time interval Length is 2 months, then sets the unit of time interval to day;If the length of the time interval is 3 years, can be by time zone Between unit this be for week.It should be noted that the unit in above-mentioned setting time section is only that the embodiment of the present application is provided Example, cannot be considered as being the restriction to technical scheme.
Data provide the time, refer to that the quality of data determines that the data acquisition module of system obtains the time of data to be determined. It is noted herein that being actually to be not easy since data to be determined have certain data volume, data acquisition module 14 Obtain whole data to be determined from scratch at some time point, therefore, the data offer time can be that data obtain Modulus block obtains the initial time of data to be determined, can also be the termination time that data acquisition module obtains data to be determined; In addition, since data acquisition module is after obtaining data to be determined, it can be in a short period of time by data transmission to be determined It is handled to index determining module, data acquisition module obtains the initial time of data to be determined or terminates time gap valence Value determining module determines it time difference of current time of index value under timeliness index is smaller, therefore, goes back The current time of its index value under timeliness index can be determined as data data to be determined value determining module Time is provided.
For example, including 100 data entries in data to be determined;In 100 data entries, earliest data strip is generated Purpose generation time (namely initial time of data to be determined) is on January 1st, 2018;The data entry of generation time the latest Generation time (namely termination time of data to be determined) is on January 30th, 2018;Then data generation time to be determined is crossed over Time interval (namely maximum time span) be 30 days.If it is on April 1st, 2018 that data to be determined, which provide the time, wait for Determine that data provide the time difference between time and data generation time to be determined, as on April 1st, 2018, until in January, 2018 Time difference between 1 day.
Data age index ω is calculated using following formula4Index value:
Wherein, ω4For the data age index of data to be determined;TfBy data generation time to be determined cross over when Between section the final time, if data to be determined can not determine the final time, use the corresponding description information of data to be determined The final time, unit is day;TsBy the initial time for the time interval that data generation time to be determined is crossed over, if to be determined Data can not determine initial time, then use the initial time of the corresponding description information of data to be determined, unit is day;TnIt waits for really The offer time of fixed number evidence.
ω4Value range is [0,1], ω4Value it is bigger, indicate that the timeliness of data to be determined is stronger.
When quality of data index is data scarcity index, index value determining module 11 is specifically used for waiting for really described in determination Fixed number according to this and set of metadata of similar data similar with the data to be determined default platform occurrence number;And go out occurrence based on described in Number determines the index value of the data scarcity index of the data to be determined, and the occurrence number is fewer, is waited for described in characterization really The scarcity of fixed number evidence is higher.
Here, scarcity refers to the offer for homogeneous data according to the preset platform and data information of acquisition Situation calculates the degree of scarcity of data;Homogeneous data is more, and scarcity is lower;Homogeneous data is fewer, and scarcity is higher;It is rare Property higher data to be determined value it is also corresponding higher.
In specific implementation, similarity of the data to be determined respectively with the data of each default platform is calculated separately, calculating waits for Determine that the set of metadata of similar data of the data similarity with the data of each default platform respectively, statistics are more than the similar of setting similarity threshold The number of corresponding default platform is spent, the ratio of the number of counting statistics and the total number of default platform calculates the ratio of natural number It is worth the inverse of power, according to the inverse of calculating, calculates the index value of data scarcity index.
For example, after obtaining data to be determined and set of metadata of similar data, a large amount of data are crawled from each default platform, from each pre- If the data that platform crawls can be transaction data, final each default platform corresponds to a data set for including mass data, For each default platform, the phase between data to be determined and the data set of the default platform is calculated by calculating formula of similarity Like degree, the similarity between the set of metadata of similar data of data to be determined and the data set of the default platform can also be calculated, is obtained from final To a large amount of similarities in, statistics similarity is more than the set of metadata of similar data quantity of setting similarity threshold, and counting statistics is similar The data of the ratio of the number of data set and default platform total number, the above-mentioned ratio calculation data to be determined based on calculating are rare The index value of property index.
The data scarcity index ω of data to be determined is calculated using following formula5Index value:
ω5=1-e-x/y
Wherein, ω5For the data scarcity index of data to be determined, x is the similar of data to be determined and data to be determined For data in the occurrence number of default platform, y is the sum of default platform.
Furthermore it is also possible to calculate index value ω of the data to be determined under data scarcity index using following formula5
Wherein, x is occurrence number of the set of metadata of similar data in default platform of data to be determined and data to be determined, and y is to crawl The total quantity of the data set arrived.
ω5Value range be [0,1], work as ω5Close to 1, illustrate that each default platform has set of metadata of similar data, it is to be determined The scarcity of data is lower, ω5Equal to 0, show that set of metadata of similar data is not present in each default platform, the scarcity of data to be determined is got over It is high.
Quality of data index includes data figureofmerit, and index value determining module 11 is specifically used for determining the data to be determined Including data volume;And amount determines the index value of the data figureofmerit of the data to be determined, and the number based on the data Bigger according to measuring, the index value for characterizing the data figureofmerit of the data to be determined is higher.
It in specific implementation, can be by the total data of the data volume of the data to be determined of calculating and the data of each default platform Index value of the ratio of amount as data figureofmerit, can also be directly using the data volume of data to be determined as data figureofmerit Index value, can be determines according to actual conditions.
In practical applications, a large amount of data can be crawled from each default platform through but not limited to the mode crawled, from May exist and the same or analogous data of data to be determined in the data that default platform crawls, or with data to be determined Different data.
The data crawled from each default platform are made of multiple data entries, and each data entry includes multiple data Member, for each default platform, it includes 100 data entries to preset the data that platform crawls from this, and each data entry includes 10 data elements, then the data volume that the default platform crawls is 100, if default platform is 5, the data of each default platform Total quantity is 5000.The statistical of the data volume of data to be determined has been introduced above, is no longer excessively introduced herein.
When specific implementation, any one in following two methods may be used and determine data to be determined in data volume Index value under index:
First, can be by the ratio of the data volume of the data to be determined of calculating and the total amount of data of the data of each default platform As the index value of data figureofmerit, can also directly using the data volume of data to be determined as the index value of data figureofmerit, It can be determines according to actual conditions.
For example, using the ratio of the data volume of data to be determined and the total amount of data of the data of each default platform as data When the index value of figureofmerit, the index value ω that following formula calculates data figureofmerit may be used6
Wherein, N is the data volume of data in data to be determined, and O is the total amount of data of the data of each default platform.
ω6Value be [0,1], work as ω6When=0, illustrate that the data volume of data to be determined is small, otherwise data volume is big.
Second, the committed amount of data and description information that are carried in the description information based on the data to be determined are retouched The data volume stated;Data volume that data to be determined include and data acquisition obtains is carried out to the data for presetting platform with wait for It determines the similar set of metadata of similar data amount of data, calculates index value of the data to be determined under data figureofmerit.
Wherein, when which refers to that user provides data to be determined, it is contemplated that the number of data to be determined to be offered According to amount.
The data volume that data to be determined are included, the data volume for the valid data member that data as to be determined include.
Data to presetting platform carry out the set of metadata of similar data amount similar with data to be determined that data acquisition obtains, and obtain Process is similar to the acquisition process of set of metadata of similar data with when determining the index value of data to be determined under data scarcity index.Specifically Process is:
Data acquisition module 10 crawls multiple data sets from the default platform;Set of metadata of similar data determining module 40, for dividing It is other that the data to be determined and the multiple data set are parsed, determine the word of the data to be determined and each data set Remittance feature;The lexical feature of the data to be determined is subjected to text similarity matching with the lexical feature of each data set respectively; The data set that text similarity is reached to default similarity threshold is determined as the set of metadata of similar data of the data to be determined;To determining Set of metadata of similar data carries out the operation of data volume determination, to obtain set of metadata of similar data amount similar with data to be determined.
Specifically, following formula may be used and calculate index value of the data to be determined under data figureofmerit:
Wherein, m indicates the data volume that data to be determined include;N1Indicate that carrying out data acquisition to the data for presetting platform obtains The set of metadata of similar data amount similar with data to be determined taken;N2Indicate data described in description information;N3Indicate committed amount of data.
Business value index is industry field classification indicators, and it is described to be determined that index value determining module 11 is specifically used for determination The corresponding industry field number of labels of the affiliated data set of data industry corresponding with the affiliated data category of data to be determined is led The ratio of domain number of labels;And the index value of the data industry domain classification index to be determined is determined based on the ratio, and The index value of the industry field classification indicators of the bigger characterization data to be determined of ratio is bigger.
Here, the corresponding industry field of industry field tag characterization data set, data category is generally pre-set, number Multiple data sets are generally comprised according to classification, each data set is corresponding with multiple industry field labels;Industry field classification is got over The industry field number of labels of data set belonging to the bright data to be determined of multilist is more, then data industry domain classification index Index value is bigger.
In specific implementation, the corresponding industry field number of labels of data set described in data to be determined and number to be determined are counted The sum for the industry field number of labels corresponding to multiple data sets for including according to affiliated data category, counting statistics it is to be determined The sum of the industry field number of labels of the affiliated data category of data to be determined of the industry field number of labels and statistics of data Ratio can also be by data institute to be determined using above-mentioned ratio as the index value of data industry domain classification index to be determined Belong to index value of the industry field number of labels of data set as data industry domain classification index, the application not limits this System.
Data set belonging to data to be determined can be a data set in the affiliated data category of data to be determined, also may be used Other data sets are thought, for this sentences data set to be determined as a data set in data category described in data to be determined It illustrates.For example, the data set belonging to data to be determined is Chinese patent digest data, Chinese patent digest data are corresponding Industry field label includes government affairs, patent, intellectual property, digest, enterprise etc., and the affiliated data category of data to be determined is patent number According to the patent data includes multiple data sets, and the title of data set can be Chinese patent digest data, Chinese patent law shape State data (authorizing bulletin), Chinese patent law status data (Invention Announce bulletin) etc., Chinese patent law status data (is awarded Power bulletin) corresponding industry field label includes government affairs, patent, instruction property right, law, enterprise etc., Chinese patent law state The corresponding industry field label of data (Invention Announce bulletin) includes government affairs, patent, instruction property right, law, enterprise etc., then waits for really Fixed number is 5 according to the industry field number of labels of affiliated data set, the industry field number of tags of the affiliated data category of data to be determined It is 15 according to amount, then the index value of industry field classification indicators is 0.3.
Business value index packet is application scenarios index, and index value determining module 11 is specifically used for determining the number to be determined According to the quantity of corresponding application scenarios;And the application scenarios index of the data to be determined is determined based on the quantity of the application scenarios Index value, and the index value of the application scenarios quantity more multilist application scenarios index of levying the data to be determined is bigger.
In specific implementation, application scenarios are the adaptable field of data to be determined, the scene of data application to be determined It is more, illustrate that the application of data to be determined is better, data value is also higher.
The quantity of the application scenarios of data to be determined is counted, e.g., if data to be determined can be applied to 5 application scenarios In, then the quantity of the application scenarios of data to be determined is 5, can be referred to the quantity of statistics as the application scenarios of data to be determined Target index value, it is linear, non-linear, positive correlation that can also be between the quantity based on application scenarios and application scenarios index, negative Correlativity determines that the application not limits this.
Business value index is supplier's index, and index value determining module 11 is specifically used for judging that the data to be determined are The no native data for data set provider;And the index of the availability of data quotient index to be determined is determined based on judging result Value.
Here, native data can be the data that data set provider generates.The higher source for illustrating data of supplier's index It is more reliable, more have authority, the value of data also higher.
In specific implementation, if data to be determined are the native data of data set provider, availability of data quotient to be determined The index value of index be the first preset value, if data to be determined be not data set provider native data (e.g., the data of purchase, Or the data etc. crawled from other platforms), then the index value of availability of data quotient index to be determined is the second preset value.Its In, the first preset value is 1, and the second preset value is 0, it should be noted that the first preset value is more than the second preset value, first is default Value and the second preset value may be other values, can determines according to actual conditions, and the application not limits this.Alternatively, root According to the ratio of the quantity and the total amount of data in data to be determined of the native data of data set provider, the confession as data to be determined Answer the index value of quotient's index.
For example, data to be determined include the native data and secondary data that supplier provides, the general source of secondary data In other platforms or website (with the relevant data of supplier's business), if supplier is A enterprises, native data is that A enterprises are direct The business datum of offer, secondary data can be the business with A enterprises for crawling or buying from platforms such as Netease's platform, statistics bureau Relevant data, if the data in data to be determined are the native data of A enterprises, at this point, availability of data quotient to be determined refers to Target index value is 1;If the data 50% of data to be determined are the native data of A enterprises, 50% is secondary data, at this point, waiting for Determine that the index value of availability of data quotient's index is 0.5;If the data in data to be determined are secondary data, number to be determined According to supplier's index index value be 0.
Data value provided by the embodiments of the present application determines system, and characterize data quality and table are determined by data to be determined The index value for levying the pre-set level of data service application value, the market value of data to be determined, root are determined by market information According to the operation relation between pre-set index value and market value, the data value of data to be determined is determined.In this way, passing through Quantify the index value of diversified pre-set level and the market value of data to be determined, increases the data value of determining data Accuracy, and the factor that the application considers is more comprehensive, increases the reliability of finally determining data value.
The embodiment of the present application provides a kind of data value and determines method, as shown in Fig. 2, this method includes walking in detail below Suddenly:
S201 determines the index value of pre-set level for data to be determined;Wherein, the pre-set level includes characterize data matter The business value index of quality of data index and characterize data the service application value of amount;
S202 is that the data to be determined determine market value according to market information;
S203 determines the data to be determined based on the default operation relation between determining index value and market value Data value.
Optionally, the quality of data index includes data consistency index;
The index value that pre-set level is determined for data to be determined, including:
Determine the consistency of the included data content of data to be determined description information corresponding with the data to be determined Degree;And the index value of the data consistency index of the data to be determined is determined based on the degree of consistency, and described one Cause property degree is higher, and the index value for characterizing the data consistency index of the data to be determined is higher.
Optionally, the determination the included data content of data to be determined description corresponding with the data to be determined is believed The degree of consistency of breath, including:
Determine the following degree of consistency between one or more data contents and corresponding description information, and any item data The finger of the data consistency index of the higher characterization data to be determined of the degree of consistency between content and corresponding description information Scale value is higher:
Data volume described in description information of the data volume that the data to be determined include with the data to be determined;
Size described in description information of the size of the data to be determined with the data to be determined;
Data format described in description information of the data format of the data to be determined with the data to be determined.
Optionally, the quality of data index includes data integrity index;
The index value that pre-set level is determined for data to be determined, including:
Determine the null value accounting in the included data entry of data to be determined;And institute is determined based on the null value accounting The index value of the data integrity index of data to be determined is stated, and the null value accounting is lower, characterizes the data to be determined Data integrity is higher.
Optionally, the quality of data index includes data redudancy index;
The index value that pre-set level is determined for data to be determined, including:
Determine the accounting of repeated entries in the data entry that the data to be determined are included;And it is based on the repeated entries Accounting determine the data to be determined data redudancy index index value, and the accounting of the repeated entries is lower, table The data redudancy for levying the data to be determined is lower.
Optionally, the quality of data index includes data age index;
The index value that pre-set level is determined for data to be determined, including:
Determine the time interval and the data generation time to be determined that the data generation time to be determined is crossed over Time difference between time is provided with the data to be determined;It is waited for really based on described in the time interval and time difference determination The index value of the data age index of fixed number evidence:
Wherein, the time interval span is bigger, and the time difference is smaller, when characterizing the data of the data to be determined The index value of effect property index is higher.
Optionally, this method further includes:Multiple data sets are crawled from the default platform;
The data to be determined and the multiple data set are parsed respectively, determine data to be determined and each The lexical feature of data set;The lexical feature of the data to be determined is subjected to text phase with the lexical feature of each data set respectively It is matched like degree;The data set that text similarity is reached to default similarity threshold is determined as the similarity number of the data to be determined According to.
Optionally, the quality of data index includes data scarcity index;
The index value that pre-set level is determined for data to be determined, including:
Determine the appearance of the data to be determined and set of metadata of similar data similar with the data to be determined in default platform Number;And the index value of the data scarcity index of the data to be determined is determined based on the occurrence number, and the appearance Number is fewer, and the scarcity for characterizing the data to be determined is higher.
Optionally, the quality of data index includes data figureofmerit;
The index value that pre-set level is determined for data to be determined, including:
Determine the data volume that the data to be determined include;And amount determines the number of the data to be determined based on the data According to the index value of figureofmerit, and the data volume is bigger, and the index value for characterizing the data figureofmerit of the data to be determined is higher.
Optionally, the business value index includes industry field classification indicators;
The index value that pre-set level is determined for data to be determined, including:
Determine the corresponding industry field number of labels of the affiliated data set of data to be determined and the data institute to be determined Belong to the ratio of the corresponding industry field number of labels of data category;And determine that the data industry to be determined is led based on the ratio The index value of domain classification indicators, and the ratio is bigger, characterizes the index of the industry field classification indicators of the data to be determined Value is bigger.
Optionally, the business value index includes application scenarios index;
The index value that pre-set level is determined for data to be determined, including:
Determine that the data to be determined correspond to the quantity of application scenarios;And based on described in the determination of the quantity of the application scenarios The index value of the application scenarios index of data to be determined, and the application scenarios quantity is more, characterizes the data to be determined The index value of application scenarios index is bigger.
Optionally, the business value index includes supplier's index;
The index value that pre-set level is determined for data to be determined, including:
Judge the data to be determined whether be data set provider native data;And it is waited for based on described in judging result determination Determine the index value of availability of data quotient's index.
Optionally, described that market value is determined for the data to be determined according to market information, including:
Determine weighted sum of the set of metadata of similar data in default platform value of the data to be determined and the data to be determined Value.
Optionally, the default operation relation based between determining index value and market value, determine described in wait for really The data value of fixed number evidence, including:
The weighted sum of determining index value is determined as to be worth compensation coefficient;The value compensation coefficient will be used to institute State the data value that the value after market value correction is determined as the data to be determined.
Method is determined corresponding to the data value in Fig. 1, the embodiment of the present application also provides a kind of computer equipment 300, As shown in figure 3, the equipment includes memory 301, processor 302 and is stored on the memory 301 and can be in the processor 302 The computer program of upper operation, wherein above-mentioned processor 302 realizes that above-mentioned data value determines when executing above computer program Method.
Specifically, above-mentioned memory 301 and processor 302 can be general memory and processor, do not do have here Body limits, and when the computer program of 302 run memory 301 of processor storage, is able to carry out above-mentioned data value determination side Method, to solve the problems, such as that calculating data value accuracy is low in the prior art, the application passes through data to be determined and determines characterization The index value of the pre-set level of the quality of data and characterize data service application value, data to be determined are determined by market information Market value determines the data valence of data to be determined according to the operation relation between pre-set index value and market value Value.In this way, by quantifying the index value of diversified pre-set level and the market value of data to be determined, determining number is increased According to data value accuracy, and the factor that the application considers is more comprehensive, increases finally determining data value Reliability.
Method is determined corresponding to the data value in Fig. 1, and the embodiment of the present application also provides a kind of computer-readable storages Medium is stored with computer program on the computer readable storage medium, is executed when which is run by processor State the step of data value determines method.
Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, be able to carry out above-mentioned data value and determine method, to solve to calculate data in the prior art It is worth the low problem of accuracy, the application determines that characterize data quality and characterize data service application are worth by data to be determined Pre-set level index value, the market value of data to be determined is determined by market information, according to pre-set index value Operation relation between market value determines the data value of data to be determined.In this way, by quantifying diversified default finger The market value of target index value and data to be determined increases the accuracy of the data value of determining data, and this Shen The factor that please be considered is more comprehensive, increases the reliability of finally determining data value.
In embodiment provided herein, it should be understood that disclosed system and method, it can be by others side Formula is realized.System embodiment described above is only schematical, for example, the division of the unit, only one kind are patrolled Volume function divides, formula that in actual implementation, there may be another division manner, in another example, multiple units or component can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some communication interfaces, system or unit It connects, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in embodiment provided by the present application can be integrated in a processing unit, also may be used It, can also be during two or more units be integrated in one unit to be that each unit physically exists alone.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer read/write memory medium.Based on this understanding, the technical solution of the application is substantially in other words The part of the part that contributes to existing technology or the technical solution can be expressed in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of step. And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-OnlyMemory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing, in addition, term " the One ", " second ", " third " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
Finally it should be noted that:Embodiment described above, the only specific implementation mode of the application, to illustrate the application Technical solution, rather than its limitations, the protection domain of the application is not limited thereto, although with reference to the foregoing embodiments to this Shen It please be described in detail, it will be understood by those of ordinary skill in the art that:Any one skilled in the art In the technical scope that the application discloses, it can still modify to the technical solution recorded in previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of the embodiment of the present application technical solution.The protection in the application should all be covered Within the scope of.Therefore, the protection domain of the application shall be subject to the protection scope of the claim.

Claims (10)

1. a kind of data value determines system, which is characterized in that the system includes:
Index value determining module, the index value for determining pre-set level for data to be determined;
Wherein, the pre-set level includes the industry of the quality of data index and characterize data service application value of characterize data quality Business value index;
Market value determining module, for being that the data to be determined determine market value according to market information;
Data value determining module, for based on the default operation relation between determining index value and market value, determining institute State the data value of data to be determined.
2. the system as claimed in claim 1, which is characterized in that the quality of data index includes data consistency index;
The index value determining module is specifically used for determining the included data content of data to be determined and the number to be determined According to the degree of consistency of corresponding description information;And the data consistency of the data to be determined is determined based on the degree of consistency The index value of index, and the degree of consistency is higher, characterizes the index value of the data consistency index of the data to be determined It is higher.
3. system as claimed in claim 2, which is characterized in that the index value determining module is specifically used for determining as next Item or the degree of consistency between multinomial data content and corresponding description information, and any one data content and corresponding description information Between the higher characterization of the degree of consistency data to be determined data consistency index index value it is higher:
Data volume described in description information of the data volume that the data to be determined include with the data to be determined;
Size described in description information of the size of the data to be determined with the data to be determined;
Data format described in description information of the data format of the data to be determined with the data to be determined.
4. the system as claimed in claim 1, which is characterized in that the quality of data index includes one kind or more of following index Kind:Data integrity index, data redudancy index, data age index, data figureofmerit;
The case where for including data integrity index, the index value determining module are specifically used for determining the number to be determined According to the null value accounting in included data entry;And the data integrity of the data to be determined is determined based on the null value accounting The index value of index, and the null value accounting is lower, the data integrity for characterizing the data to be determined is higher;
The case where for including data redudancy index, the index value determining module are specifically used for determining the number to be determined According to the accounting of repeated entries in the data entry for being included;And the data to be determined are determined based on the accounting of the repeated entries Data redudancy index index value, and the accounting of the repeated entries is lower, and the data for characterizing the data to be determined are superfluous Remaining is lower.
The case where for including data age index, the index value determining module are specifically used for determining the number to be determined The time interval crossed over according to generation time and the data generation time to be determined provide the time with the data to be determined Between time difference;The data age index of the data to be determined is determined based on the time interval and the time difference Index value:Wherein, the time interval span is bigger, and the index value for characterizing the data age index of the data to be determined is got over It is high;And time difference is smaller, the index value for characterizing the data age index of the data to be determined is higher;
The case where for including data figureofmerit, the index value determining module are specifically used for determining the data packet to be determined The data volume contained;And amount determines the index value of the data figureofmerit of the data to be determined, and the data based on the data Amount is bigger, and the index value for characterizing the data figureofmerit of the data to be determined is higher.
5. the system as claimed in claim 1, which is characterized in that further include:Data crawl module and set of metadata of similar data determining module;
The data crawl module, for crawling multiple data sets from the default platform;
The set of metadata of similar data determining module, for being parsed respectively to the data to be determined and the multiple data set, really Make the lexical feature of the data to be determined and each data set;By the lexical feature of the data to be determined respectively with each data The lexical feature of collection carries out text similarity matching;The data set that text similarity is reached to default similarity threshold is determined as institute State the set of metadata of similar data of data to be determined.
6. system as claimed in claim 5, which is characterized in that the quality of data index includes data scarcity index;
The index value determining module is specifically used for determining the data to be determined and the similar phase with the data to be determined Occurrence number of the likelihood data in default platform;And determine that the data scarcity of the data to be determined refers to based on the occurrence number Target index value, and the occurrence number is fewer, the scarcity for characterizing the data to be determined is higher.
7. the system as claimed in claim 1, which is characterized in that the business value index includes one kind or more of following index Kind:Industry field classification indicators, application scenarios index, supplier's index;
It is described to be determined to be specifically used for determination for the case where for including industry field classification indicators, the index value determining module The corresponding industry field number of labels of the affiliated data set of data industry corresponding with the affiliated data category of data to be determined is led The ratio of domain number of labels;And the index value of the data industry domain classification index to be determined is determined based on the ratio, and The ratio is bigger, and the index value for characterizing the industry field classification indicators of the data to be determined is bigger;
The case where for including application scenarios index, the index value determining module are specifically used for determining the data to be determined The quantity of corresponding application scenarios;And the application scenarios index of the data to be determined is determined based on the quantity of the application scenarios Index value, and the application scenarios quantity is more, the index value for characterizing the application scenarios index of the data to be determined is bigger;
The case where for including supplier's index, the index value determining module, specifically for judging that the data to be determined are The no native data for data set provider;And the index of the availability of data quotient index to be determined is determined based on judging result Value.
8. system as claimed in claim 5, which is characterized in that the market value determining module is specifically used for waiting for described in determination Determine weighted sum of the set of metadata of similar data in default platform value of data and the data to be determined.
9. the system as claimed in claim 1, which is characterized in that
The data value determining module, specifically for being determined as the weighted sum of the index value determined to be worth compensation coefficient; Value after using the value compensation coefficient to correct the market value is determined as to the data value of the data to be determined.
10. a kind of data value determines method, which is characterized in that this method includes:
The index value of pre-set level is determined for data to be determined;Wherein, the pre-set level includes the data of characterize data quality The business value index of quality index and characterize data service application value;
It is that the data to be determined determine market value according to market information;
Based on the default operation relation between determining index value and market value, the data valence of the data to be determined is determined Value.
CN201810511472.1A 2018-05-24 2018-05-24 A kind of data value determines system and method Pending CN108764995A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810511472.1A CN108764995A (en) 2018-05-24 2018-05-24 A kind of data value determines system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810511472.1A CN108764995A (en) 2018-05-24 2018-05-24 A kind of data value determines system and method

Publications (1)

Publication Number Publication Date
CN108764995A true CN108764995A (en) 2018-11-06

Family

ID=64006285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810511472.1A Pending CN108764995A (en) 2018-05-24 2018-05-24 A kind of data value determines system and method

Country Status (1)

Country Link
CN (1) CN108764995A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019174A (en) * 2018-12-13 2019-07-16 阿里巴巴集团控股有限公司 The quality of data determines method, apparatus, electronic equipment and storage medium
CN112732726A (en) * 2021-04-02 2021-04-30 云和恩墨(北京)信息技术有限公司 Data processing method and device, processor and computer storage medium
CN113704811A (en) * 2021-07-16 2021-11-26 杭州医康慧联科技股份有限公司 Data value management method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013506180A (en) * 2009-09-25 2013-02-21 ファキーフ,アドナン Database and data evaluation method from database
CN106355447A (en) * 2016-08-31 2017-01-25 国信优易数据有限公司 Price evaluation method and system for data commodities
CN106469195A (en) * 2016-08-31 2017-03-01 国信优易数据有限公司 Based on conforming data file Valuation Method and system
CN106469395A (en) * 2016-08-31 2017-03-01 国信优易数据有限公司 A kind of data commodity dynamic comprehensive appraisal procedure and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013506180A (en) * 2009-09-25 2013-02-21 ファキーフ,アドナン Database and data evaluation method from database
CN106355447A (en) * 2016-08-31 2017-01-25 国信优易数据有限公司 Price evaluation method and system for data commodities
CN106469195A (en) * 2016-08-31 2017-03-01 国信优易数据有限公司 Based on conforming data file Valuation Method and system
CN106469395A (en) * 2016-08-31 2017-03-01 国信优易数据有限公司 A kind of data commodity dynamic comprehensive appraisal procedure and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019174A (en) * 2018-12-13 2019-07-16 阿里巴巴集团控股有限公司 The quality of data determines method, apparatus, electronic equipment and storage medium
CN112732726A (en) * 2021-04-02 2021-04-30 云和恩墨(北京)信息技术有限公司 Data processing method and device, processor and computer storage medium
CN112732726B (en) * 2021-04-02 2022-04-29 云和恩墨(北京)信息技术有限公司 Data processing method and device, processor and computer storage medium
CN113704811A (en) * 2021-07-16 2021-11-26 杭州医康慧联科技股份有限公司 Data value management method

Similar Documents

Publication Publication Date Title
CN108734405A (en) A kind of data value Evaluation Platform and method
CN108764705A (en) A kind of data quality accessment platform and method
CN108763277B (en) Data analysis method, computer readable storage medium and terminal device
CN109558541B (en) Information processing method and device and computer storage medium
CN108764707A (en) A kind of data assessment system and method
CN106355447A (en) Price evaluation method and system for data commodities
CN104834731A (en) Recommendation method and device for self-media information
CN103927615B (en) Entity is associated with classification
CN109615454A (en) Determine the method and device of user's finance default risk
CN110544155A (en) User credit score acquisition method, acquisition device, server and storage medium
CN108764995A (en) A kind of data value determines system and method
CN110019163A (en) Method, system, equipment and the storage medium of prediction, the recommendation of characteristics of objects
CN110766428A (en) Data value evaluation system and method
CN112560491A (en) Information extraction method and device based on AI technology and storage medium
CN111260189B (en) Risk control method, risk control device, computer system and readable storage medium
KR101441164B1 (en) Object customization and management system
CN110659926A (en) Data value evaluation system and method
CN115409419A (en) Value evaluation method and device of business data, electronic equipment and storage medium
CN111209480A (en) Method and device for determining pushed text, computer equipment and medium
CN114723492A (en) Enterprise portrait generation method and equipment
CN108399545B (en) Method and device for detecting quality of electronic commerce platform
CN108829750A (en) A kind of quality of data determines system and method
CN109450963B (en) Message pushing method and terminal equipment
CN107093103B (en) Brand value evaluation method and system based on big data statistical analysis
CN114782224A (en) Webpage evaluation cheating monitoring method and device based on user characteristics and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100070, No. 101-8, building 1, 31, zone 188, South Fourth Ring Road, Beijing, Fengtai District

Applicant after: Guoxin Youyi Data Co., Ltd

Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing

Applicant before: SIC YOUE DATA Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20181106

RJ01 Rejection of invention patent application after publication