CN106469195A - Based on conforming data file Valuation Method and system - Google Patents
Based on conforming data file Valuation Method and system Download PDFInfo
- Publication number
- CN106469195A CN106469195A CN201610791831.4A CN201610791831A CN106469195A CN 106469195 A CN106469195 A CN 106469195A CN 201610791831 A CN201610791831 A CN 201610791831A CN 106469195 A CN106469195 A CN 106469195A
- Authority
- CN
- China
- Prior art keywords
- data
- data file
- conforming
- file
- estimating system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
Abstract
The present invention provides one kind to be based on conforming data file Valuation Method and system, and methods described includes:Gather data file to be assessed;The type of data file of collection is divided, and determines the ratio shared in whole data file of data file of each type;Using default processing method, the concordance of described data file is processed.The present invention by processing to the concordance of data file, thus being that data data transaction of fixing a price provides certain foundation from the conforming angle of data structure.
Description
Technical field
The present invention relates to big data field is and in particular to a kind of be based on conforming data file Valuation Method and be
System.
Background technology
Data trade is currently in the industry initial stage, and development is very fast, but lacks the theoretical direction of maturation.By data value
Quantization is an extremely difficult thing, and this is to be determined by the substitutive characteristics of data and current business environment.Meanwhile, this
One work also will be hindered by numerous objective factors, such as accurate assessment, the devaluation of data and the Life Cycle of data compiling costs
Phase changes, and the surcharge of data etc..
Therefore urgently need one kind that data value can be quantified, valuation is carried out to data assets, to be preferably
Data Market behavior service, promotes Data Market transaction data project to land.
Content of the invention
For above-mentioned technical problem, the present invention provides one kind from data structure conforming angle, data value to be commented
Estimate, appraisal procedure and the system of certain reference frame is provided for data price data transaction.
Concordance is one of internationally recognized spatial data quality index, can divide into Space Consistency, attribute consistent
Property, the type such as topological coherence, semantic consistency.Mainly in detection source data and Backup Data whether existing agreement
Unanimously to guarantee the technology of Backup Data high availability.The present invention is not to weigh source data and Backup Data whether crash consistency
Problem, but be directed to a data folder, the inside comprise various data types file, such as JSON, picture, video, sound
Frequency etc. file, how to weigh this document consistency problem be present invention mainly solves problem.
For this reason, one embodiment of the invention provides one kind to be based on conforming data file Valuation Method, including:Adopt
Collect data file to be assessed;The type of the data file of collection is divided, and calculates the data file of each type and exist
Shared ratio in whole data file;Using default processing method, the concordance of described data file is processed.
Another embodiment of the present invention provides one kind to be based on conforming data file valve estimating system, and its feature exists
In, including:Data acquisition module, gathers data file to be assessed;Type division module, the type to the data file of collection
Divided, and calculated the shared ratio in whole data file of the data file of each type;Consistency treatment module, profit
With default processing method, the concordance of described data file is processed.
The present invention provide based on conforming data file Valuation Method and system, the method is passed through data literary composition
In part set, different files carry out stylistic division, sort out unstructured data, semi-structured data and structural data
Ratio, then by the ratio of different shape file, calculate the form concordance of this set of data files, solve data valency
Value assessment and a link of price evaluation, set of data files is worth from the concordance angle of data form
Assessment, provides certain foundation for data value price data transaction.
Brief description
Fig. 1 is the schematic diagram of data value evaluation process provided in an embodiment of the present invention;
Fig. 2 is the structural representation of data value assessment system provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described.
【The technological thought of the present invention】
The present invention be based on principle of congruity, by concordance score formula come to comprise various data types file,
As JSON, picture, video, the concordance of the data folder of audio frequency etc. file is estimated, thus assessing data folder
Value, realize data value quantify.
Fig. 1 is the schematic diagram of data value evaluation process provided in an embodiment of the present invention.Fig. 2 provides for the embodiment of the present invention
Data value assessment system structural representation.Below in conjunction with accompanying drawing, the data assessment method and system of the present invention is situated between
Continue.
【Data file appraisal procedure】
As shown in figure 1, the data file appraisal procedure of the present invention comprises the following steps:
S101:Gather data file to be assessed;
S102:Data file class divides, and determines ratio
S103:The concordance of data file is processed;
In above-mentioned steps S101, data acquisition can be carried out using existing data acquisition unit, for example, can pass through network
Reptile carrys out gathered data file from network, and the data file in the present invention can be the data comprising multiple set of data files
Bag, alternatively single document, the packet of collection can comprise JSON, picture, video, the file such as audio frequency, but is not limited to
This.
In above-mentioned steps S102, according to data type, the data file of collection is divided into unstructured data, half structure
Change data and structural data, and calculate the ratio that these data types account for the size of whole file.In specific operation process, can
It is manually operated to divide data type, the ratio that can complete each type by means of R language and manual operation calculates.
In an embodiment of the present invention, unstructured data, semi-structured data and structural data are defined as follows:
Unstructured data:Unstructured data refers to the data not having fixed structure, for example, the office literary composition of all formats
Shelves, text, picture, all kinds of form, image and audio frequency, video information.
Semi-structured data:Semi-structured data refers to that data has implicit structure but is not the shape with bivariate table etc
Formula exists, a kind of knowledge source between structuring and unstructured knowledge source, for example, stores the resume of employee, is similar to
The files such as XML, HTML, JSON.
Structural data:Traditional relational data model, row data, are stored in data base, available bivariate table representation
Data, for example, be stored in csv, the data of excel, bivariate table.
The shared ratio in whole data file of unstructured data, semi-structured data and structural data can be distinguished
It is indicated with q, p and h.
In above-mentioned steps S103, by using following concordance scoring formula, the concordance of data file can be commented
Estimate:
Wherein, f refers to concordance score, and scope is [0,1], and f value is bigger, represents that the concordance of data file is higher, q, p
Represent the ratio of unstructured data, semi-structured data and structural data, wherein q+p+h=1 with h respectively.
The value of data file can be assessed by concordance score f with regard to data file calculating in step S103,
Concordance score f and being directly proportional that data file is worth, if f value is bigger, that is, closer to 1 then it represents that data file is corresponding
Valuation is also higher, and the concordance being drawn obtains branch and is stored.
<Embodiment>
The data file to be assessed of collection is the patent data bag of a JSON type, and size is 1G.Carried using the present invention
For appraisal procedure be estimated, process is as follows:
(1) calculate the ratio of all types of data in patent data bag
By being divided to the type of this data file and ratio calculates, confirm to contain structural data half structure
Change data and unstructured data, and structural data is 234.5M, semi-structured data is 103.36M, structural data is
686.14M, so each data proportion is as follows:
Structural data accounting:Q=234.5/1024=0.229
Semi-structured data accounting:P=103.36/1024=0.1
Structural data accounting:H=686.13/1024=0.671
(2) concordance of patent data bag is estimated
Using assessment formulaScoring to this patent data bag calculates, meter
Calculate result as follows:
Due to f value only 0.269, so the concordance of the content of this patent data bag is not high, thus to this patent number
When being estimated according to the value of bag, its price valuation also will not be very high.
【Data file assessment system】
Another embodiment of the present invention also provides a kind of assessment system, and this system includes:Data acquisition module 1, collection is treated
The data file of assessment;Type division module 2, divides to the type of the data file of collection, and calculates each type
The shared ratio in whole data file of data file;Consistency treatment module 3, determines described data using predetermined formula
The concordance score of file.
Specifically, data acquisition module can carry out data acquisition by existing data acquisition unit, for example, can pass through
Web crawlers carrys out gathered data file from network, and the data file in the present invention can be the number comprising multiple set of data files
According to bag, alternatively single document, the packet of collection can comprise JSON, picture, video, the file such as audio frequency, but does not limit to
In this.
Type division module is when the type to data file divides, according to data type, civilian by the data of collection
Part is divided into unstructured data, semi-structured data and structural data, and calculates these data types and account for whole file
The ratio of size.In specific operation process, can be manually operated to divide data type, can be by means of R language and manual behaviour
Make to calculate come the ratio to complete each type.
In an embodiment of the present invention, unstructured data, semi-structured data and structural data are defined as follows:Non- knot
Structure data:Unstructured data refers to the data not having fixed structure, for example, the office documents of all formats, text, figure
Piece, all kinds of form, image and audio frequency, video information.Semi-structured data:Semi-structured data refers to that data has implicit structure
But be not presented in bivariate table etc, a kind of knowledge source between structuring and unstructured knowledge source,
For example, the resume of storage employee, the similar file such as XML, HTML, JSON.Structural data:Traditional relational data model, OK
Data, is stored in data base, the data of available bivariate table representation, for example, is stored in csv, the data of excel, bivariate table.
The shared ratio in whole data file of unstructured data, semi-structured data and structural data can use q, p and h respectively
It is indicated.
Consistency treatment module, can be by using following consistent during the concordance to data file is processed
Property scoring formula the concordance of data file is estimated:
Wherein, f refers to concordance score, and scope is [0,1], and f value is bigger, represents that the concordance of data file is higher, q, p
Represent the ratio of unstructured data, semi-structured data and structural data, wherein q+p+h=1 with h respectively.One can be passed through
Cause property processing module processes the value that concordance score f with regard to data file obtaining to assess data file, and concordance obtains
Divide f and being directly proportional that data file is worth, if f value is bigger, that is, closer to 1 then it represents that the corresponding valuation of data file is also got over
High.
In the present invention, consistency treatment module the calculated concordance with regard to data file obtain branch and be saved in
In the storage system of assessment system, and utilization in data pricing system can be transferred into, think that the value assessment of data file carries
Foundation for reference, the value of the data file of estimation can be in data trade display terminal or display platform, for example, for patent number
According to bag, such as can be shown on patent consulting website in the platform showing this patent data bag, so that related personnel's reference makes
With.
It should be noted that the assessment aspect of data file has a lot, need to consider various aspects and just can draw number
According to the final valuation of file, the one side that simply estimated data's file is worth that the present invention provides, is that the valuation of data file carries
For a reference frame.
Those skilled in the art are it should be appreciated that embodiments herein can be provided as method, system or computer program
Product.Therefore, the application can be using complete hardware embodiment, complete software embodiment or the reality combining software and hardware aspect
Apply the form of example.And, the application can be using in one or more computers wherein including computer usable program code
The upper computer program implemented of usable storage medium (including but not limited to disk memory, CD-ROM, optical memory etc.) produces
The form of product.
The application is the flow process with reference to method, equipment (system) and computer program according to the embodiment of the present application
Figure and/or block diagram are describing.It should be understood that can be by each stream in computer program instructions flowchart and/or block diagram
Flow process in journey and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processor instructing general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device is to produce
A raw machine is so that produced for reality by the instruction of computer or the computing device of other programmable data processing device
The device of the function of specifying in present one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and can guide computer or other programmable data processing device with spy
Determine in the computer-readable memory that mode works so that the instruction generation inclusion being stored in this computer-readable memory refers to
Make the manufacture of device, this command device realize in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or
The function of specifying in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing device so that counting
On calculation machine or other programmable devices, execution series of operation steps to be to produce computer implemented process, thus in computer or
On other programmable devices, the instruction of execution is provided for realizing in one flow process of flow chart or multiple flow process and/or block diagram one
The step of the function of specifying in individual square frame or multiple square frame.
Although having been described for the preferred embodiment of the application, those skilled in the art once know basic creation
Property concept, then can make other change and modification to these embodiments.So, claims are intended to be construed to including excellent
Select embodiment and fall into being had altered and changing of the application scope.
Obviously, those skilled in the art can carry out various changes and modification without deviating from this Shen to the embodiment of the present application
Please embodiment spirit and scope.So, if these modifications of the embodiment of the present application and modification belong to the application claim
And its within the scope of equivalent technologies, then the application is also intended to comprise these changes and modification.
Claims (24)
1. one kind is based on conforming data file Valuation Method it is characterised in that including:
Gather data file to be assessed;
The type of data file of collection is divided, and determines the data file of each type institute in whole data file
The ratio accounting for;
Using default processing method, the concordance of described data file is processed.
2. according to claim 1 based on conforming data file Valuation Method it is characterised in that described data
File according to Type division be unstructured data, semi-structured data and structural data.
3. according to claim 2 based on conforming data file Valuation Method it is characterised in that described default
Processing method is processed to the concordance of described data file using following formula:
Wherein, described f is the concordance score of data file, and span is [0,1];Described q, p and h represent non-structural respectively
Change data, the semi-structured data and structural data ratio in whole data file, wherein, q+p+h=1.
4. according to claim 3 based on conforming data file Valuation Method it is characterised in that described f value
It is directly proportional to the estimated value of described data file.
5. according to claim 1 based on conforming data file Valuation Method it is characterised in that described data
File is the packet comprising multiple set of data files or is single document.
6. according to claim 2 based on conforming data file Valuation Method it is characterised in that described non-knot
Structure data refers to the data not having fixed structure, including office documents, text, picture, all kinds of form, image and audio frequency, regards
Frequency information.
7. according to claim 2 based on conforming data file Valuation Method it is characterised in that described half hitch
Structure data refers to that data has implicit structure but is not the data presented in bivariate table etc.
8. according to claim 7 based on conforming data file Valuation Method it is characterised in that described half hitch
Structure data includes storing the resume of employee, the similar file such as XML, HTML, JSON.
9. according to claim 2 based on conforming data file Valuation Method it is characterised in that described structure
Change data and refer to traditional relational data model, row data, be stored in data base, the data of available bivariate table representation.
10. according to claim 9 based on conforming data file Valuation Method it is characterised in that described knot
Structure data includes being stored in csv, the data of excel and bivariate table.
11. according to any one of claim 1 to 10 based on conforming data file Valuation Method, its feature exists
In the shared ratio in whole data file of the data file of each type is determined by R language and manual operation.
12. according to any one of claim 1 to 10 based on conforming data file Valuation Method, its feature exists
In described data file is gathered from network by web crawlers.
13. one kind are based on conforming data file valve estimating system it is characterised in that including:
Data acquisition module, gathers data file to be assessed;
Type division module, divides to the type of the data file of collection, and determines the data file of each type whole
Shared ratio in individual data file;
Consistency treatment module, is processed to the concordance of described data file using default processing method.
14. according to claim 13 based on conforming data file valve estimating system it is characterised in that described class
Described data file is divided into unstructured data, semi-structured data and structural data according to type by type division module.
15. according to claim 14 based on conforming data file valve estimating system it is characterised in that described one
Cause property processing module the concordance of described data file is processed by using following formula:
Wherein, described f is the concordance score of data file, and span is [0,1];Described q, p and h represent non-structural respectively
Change data, the semi-structured data and structural data ratio in whole data file, wherein, q+p+h=1.
16. according to claim 15 based on conforming data file valve estimating system it is characterised in that described f
Value is directly proportional to the estimated value of described data file.
17. according to claim 13 based on conforming data file valve estimating system it is characterised in that described number
It is the packet comprising multiple set of data files or for single document according to file.
18. according to claim 14 based on conforming data file valve estimating system it is characterised in that described non-
Structural data refers to the data not having fixed structure, including office documents, text, picture, all kinds of form, image and audio frequency,
Video information.
19. according to claim 14 based on conforming data file valve estimating system it is characterised in that described half
Structural data refers to that data has implicit structure but is not the data presented in bivariate table etc.
20. according to claim 19 based on conforming data file valve estimating system it is characterised in that described half
Structural data includes storing the resume of employee, the similar file such as XML, HTML, JSON.
21. according to claim 14 based on conforming data file valve estimating system it is characterised in that described knot
Structure data refers to traditional relational data model, row data, is stored in data base, the data of available bivariate table representation.
22. according to claim 21 based on conforming data file valve estimating system it is characterised in that described knot
Structure data includes being stored in csv, the data of excel and bivariate table.
23. according to any one of claim 13 to 22 based on conforming data file valve estimating system, its feature
It is, described Type division module determines the data file of each type in whole data literary composition by R language and manual operation
Shared ratio in part.
24. according to any one of claim 13 to 22 based on conforming data file valve estimating system, its feature
It is, described data acquisition module gathers described data file by web crawlers from network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610791831.4A CN106469195A (en) | 2016-08-31 | 2016-08-31 | Based on conforming data file Valuation Method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610791831.4A CN106469195A (en) | 2016-08-31 | 2016-08-31 | Based on conforming data file Valuation Method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106469195A true CN106469195A (en) | 2017-03-01 |
Family
ID=58230358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610791831.4A Pending CN106469195A (en) | 2016-08-31 | 2016-08-31 | Based on conforming data file Valuation Method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106469195A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107807972A (en) * | 2017-10-19 | 2018-03-16 | 北京科技大学 | A kind of test data consistency detecting method |
CN108734405A (en) * | 2018-05-24 | 2018-11-02 | 国信优易数据有限公司 | A kind of data value Evaluation Platform and method |
CN108764995A (en) * | 2018-05-24 | 2018-11-06 | 国信优易数据有限公司 | A kind of data value determines system and method |
CN109981632A (en) * | 2018-12-20 | 2019-07-05 | 上海分布信息科技有限公司 | Data value transmission method and data value Transmission system |
EP3660778A4 (en) * | 2017-07-26 | 2020-06-03 | Sony Corporation | Information processing device, information processing system, information processing method, and program |
WO2021179496A1 (en) * | 2020-03-10 | 2021-09-16 | 南方电网科学研究院有限责任公司 | Data transaction method and data transaction system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102915373A (en) * | 2012-11-06 | 2013-02-06 | 无锡江南计算技术研究所 | Data storage method and device |
CN104794190A (en) * | 2015-04-16 | 2015-07-22 | 成都睿峰科技有限公司 | Method and device for effectively storing big data |
CN105373605A (en) * | 2015-11-11 | 2016-03-02 | 中国农业大学 | Batch storage method and system for data files |
CN105488699A (en) * | 2015-12-25 | 2016-04-13 | 国信优易数据有限公司 | Data asset value assessment method |
CN105825413A (en) * | 2016-03-11 | 2016-08-03 | 国网天津市电力公司 | Bilateral multi-attribute big data resource value evaluation and exchange method |
-
2016
- 2016-08-31 CN CN201610791831.4A patent/CN106469195A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102915373A (en) * | 2012-11-06 | 2013-02-06 | 无锡江南计算技术研究所 | Data storage method and device |
CN104794190A (en) * | 2015-04-16 | 2015-07-22 | 成都睿峰科技有限公司 | Method and device for effectively storing big data |
CN105373605A (en) * | 2015-11-11 | 2016-03-02 | 中国农业大学 | Batch storage method and system for data files |
CN105488699A (en) * | 2015-12-25 | 2016-04-13 | 国信优易数据有限公司 | Data asset value assessment method |
CN105825413A (en) * | 2016-03-11 | 2016-08-03 | 国网天津市电力公司 | Bilateral multi-attribute big data resource value evaluation and exchange method |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3660778A4 (en) * | 2017-07-26 | 2020-06-03 | Sony Corporation | Information processing device, information processing system, information processing method, and program |
CN107807972A (en) * | 2017-10-19 | 2018-03-16 | 北京科技大学 | A kind of test data consistency detecting method |
CN107807972B (en) * | 2017-10-19 | 2020-12-22 | 北京科技大学 | Test data consistency detection method |
CN108734405A (en) * | 2018-05-24 | 2018-11-02 | 国信优易数据有限公司 | A kind of data value Evaluation Platform and method |
CN108764995A (en) * | 2018-05-24 | 2018-11-06 | 国信优易数据有限公司 | A kind of data value determines system and method |
CN109981632A (en) * | 2018-12-20 | 2019-07-05 | 上海分布信息科技有限公司 | Data value transmission method and data value Transmission system |
CN109981632B (en) * | 2018-12-20 | 2021-04-02 | 上海分布信息科技有限公司 | Data valuization transmission method and data valuization transmission system |
WO2021179496A1 (en) * | 2020-03-10 | 2021-09-16 | 南方电网科学研究院有限责任公司 | Data transaction method and data transaction system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106469195A (en) | Based on conforming data file Valuation Method and system | |
CN104391860B (en) | content type detection method and device | |
US9305279B1 (en) | Ranking source code developers | |
CN106886862A (en) | One kind bid and purchase management system and method | |
US9639353B1 (en) | Computing quality metrics of source code developers | |
CN106611291A (en) | Information push method and device | |
CN103646016A (en) | Implementation method of user-defined financial statement and server | |
US11050677B2 (en) | Enhanced selection of cloud architecture profiles | |
CN103279846A (en) | Project acceptance method and system based on BIM model | |
US20110047058A1 (en) | Apparatus and method for modeling loan attributes | |
WO2021190379A1 (en) | Method and device for realizing automatic machine learning | |
CN110135701A (en) | Control automatic generation method, device, electronic equipment and the readable medium of rule | |
CN110347855A (en) | Paintings recommended method, terminal device, server, computer equipment and medium | |
CN108572988A (en) | A kind of house property assessment data creation method and device | |
CN103116827A (en) | Rural power grid engineering control system | |
CN110675216A (en) | Bill data generation method and device | |
CN111242658A (en) | Information sharing reward method and device and computer readable storage medium | |
US11675756B2 (en) | Data complementing system and data complementing method | |
Tian et al. | Pricing barrier and American options under the SABR model on the graphics processing unit | |
CN107507023B (en) | Information delivery method and device | |
Guo et al. | Impact analysis of air pollutant emission policies on thermal coal supply chain enterprises in China | |
CN108062423B (en) | Information-pushing method and device | |
CN117236624A (en) | Issue repairer recommendation method and apparatus based on dynamic graph | |
CN107180083A (en) | A kind of analysis and processing method to investment project | |
CN109741172B (en) | Credit early warning method, device, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170301 |