CN106469195A - Based on conforming data file Valuation Method and system - Google Patents

Based on conforming data file Valuation Method and system Download PDF

Info

Publication number
CN106469195A
CN106469195A CN201610791831.4A CN201610791831A CN106469195A CN 106469195 A CN106469195 A CN 106469195A CN 201610791831 A CN201610791831 A CN 201610791831A CN 106469195 A CN106469195 A CN 106469195A
Authority
CN
China
Prior art keywords
data
data file
conforming
file
estimating system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610791831.4A
Other languages
Chinese (zh)
Inventor
孙玉权
张斌德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guoxin Youe Data Co Ltd
Original Assignee
Guoxin Youe Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guoxin Youe Data Co Ltd filed Critical Guoxin Youe Data Co Ltd
Priority to CN201610791831.4A priority Critical patent/CN106469195A/en
Publication of CN106469195A publication Critical patent/CN106469195A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types

Abstract

The present invention provides one kind to be based on conforming data file Valuation Method and system, and methods described includes:Gather data file to be assessed;The type of data file of collection is divided, and determines the ratio shared in whole data file of data file of each type;Using default processing method, the concordance of described data file is processed.The present invention by processing to the concordance of data file, thus being that data data transaction of fixing a price provides certain foundation from the conforming angle of data structure.

Description

Based on conforming data file Valuation Method and system
Technical field
The present invention relates to big data field is and in particular to a kind of be based on conforming data file Valuation Method and be System.
Background technology
Data trade is currently in the industry initial stage, and development is very fast, but lacks the theoretical direction of maturation.By data value Quantization is an extremely difficult thing, and this is to be determined by the substitutive characteristics of data and current business environment.Meanwhile, this One work also will be hindered by numerous objective factors, such as accurate assessment, the devaluation of data and the Life Cycle of data compiling costs Phase changes, and the surcharge of data etc..
Therefore urgently need one kind that data value can be quantified, valuation is carried out to data assets, to be preferably Data Market behavior service, promotes Data Market transaction data project to land.
Content of the invention
For above-mentioned technical problem, the present invention provides one kind from data structure conforming angle, data value to be commented Estimate, appraisal procedure and the system of certain reference frame is provided for data price data transaction.
Concordance is one of internationally recognized spatial data quality index, can divide into Space Consistency, attribute consistent Property, the type such as topological coherence, semantic consistency.Mainly in detection source data and Backup Data whether existing agreement Unanimously to guarantee the technology of Backup Data high availability.The present invention is not to weigh source data and Backup Data whether crash consistency Problem, but be directed to a data folder, the inside comprise various data types file, such as JSON, picture, video, sound Frequency etc. file, how to weigh this document consistency problem be present invention mainly solves problem.
For this reason, one embodiment of the invention provides one kind to be based on conforming data file Valuation Method, including:Adopt Collect data file to be assessed;The type of the data file of collection is divided, and calculates the data file of each type and exist Shared ratio in whole data file;Using default processing method, the concordance of described data file is processed.
Another embodiment of the present invention provides one kind to be based on conforming data file valve estimating system, and its feature exists In, including:Data acquisition module, gathers data file to be assessed;Type division module, the type to the data file of collection Divided, and calculated the shared ratio in whole data file of the data file of each type;Consistency treatment module, profit With default processing method, the concordance of described data file is processed.
The present invention provide based on conforming data file Valuation Method and system, the method is passed through data literary composition In part set, different files carry out stylistic division, sort out unstructured data, semi-structured data and structural data Ratio, then by the ratio of different shape file, calculate the form concordance of this set of data files, solve data valency Value assessment and a link of price evaluation, set of data files is worth from the concordance angle of data form Assessment, provides certain foundation for data value price data transaction.
Brief description
Fig. 1 is the schematic diagram of data value evaluation process provided in an embodiment of the present invention;
Fig. 2 is the structural representation of data value assessment system provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described.
【The technological thought of the present invention】
The present invention be based on principle of congruity, by concordance score formula come to comprise various data types file, As JSON, picture, video, the concordance of the data folder of audio frequency etc. file is estimated, thus assessing data folder Value, realize data value quantify.
Fig. 1 is the schematic diagram of data value evaluation process provided in an embodiment of the present invention.Fig. 2 provides for the embodiment of the present invention Data value assessment system structural representation.Below in conjunction with accompanying drawing, the data assessment method and system of the present invention is situated between Continue.
【Data file appraisal procedure】
As shown in figure 1, the data file appraisal procedure of the present invention comprises the following steps:
S101:Gather data file to be assessed;
S102:Data file class divides, and determines ratio
S103:The concordance of data file is processed;
In above-mentioned steps S101, data acquisition can be carried out using existing data acquisition unit, for example, can pass through network Reptile carrys out gathered data file from network, and the data file in the present invention can be the data comprising multiple set of data files Bag, alternatively single document, the packet of collection can comprise JSON, picture, video, the file such as audio frequency, but is not limited to This.
In above-mentioned steps S102, according to data type, the data file of collection is divided into unstructured data, half structure Change data and structural data, and calculate the ratio that these data types account for the size of whole file.In specific operation process, can It is manually operated to divide data type, the ratio that can complete each type by means of R language and manual operation calculates.
In an embodiment of the present invention, unstructured data, semi-structured data and structural data are defined as follows:
Unstructured data:Unstructured data refers to the data not having fixed structure, for example, the office literary composition of all formats Shelves, text, picture, all kinds of form, image and audio frequency, video information.
Semi-structured data:Semi-structured data refers to that data has implicit structure but is not the shape with bivariate table etc Formula exists, a kind of knowledge source between structuring and unstructured knowledge source, for example, stores the resume of employee, is similar to The files such as XML, HTML, JSON.
Structural data:Traditional relational data model, row data, are stored in data base, available bivariate table representation Data, for example, be stored in csv, the data of excel, bivariate table.
The shared ratio in whole data file of unstructured data, semi-structured data and structural data can be distinguished It is indicated with q, p and h.
In above-mentioned steps S103, by using following concordance scoring formula, the concordance of data file can be commented Estimate:
Wherein, f refers to concordance score, and scope is [0,1], and f value is bigger, represents that the concordance of data file is higher, q, p Represent the ratio of unstructured data, semi-structured data and structural data, wherein q+p+h=1 with h respectively.
The value of data file can be assessed by concordance score f with regard to data file calculating in step S103, Concordance score f and being directly proportional that data file is worth, if f value is bigger, that is, closer to 1 then it represents that data file is corresponding Valuation is also higher, and the concordance being drawn obtains branch and is stored.
<Embodiment>
The data file to be assessed of collection is the patent data bag of a JSON type, and size is 1G.Carried using the present invention For appraisal procedure be estimated, process is as follows:
(1) calculate the ratio of all types of data in patent data bag
By being divided to the type of this data file and ratio calculates, confirm to contain structural data half structure Change data and unstructured data, and structural data is 234.5M, semi-structured data is 103.36M, structural data is 686.14M, so each data proportion is as follows:
Structural data accounting:Q=234.5/1024=0.229
Semi-structured data accounting:P=103.36/1024=0.1
Structural data accounting:H=686.13/1024=0.671
(2) concordance of patent data bag is estimated
Using assessment formulaScoring to this patent data bag calculates, meter Calculate result as follows:
Due to f value only 0.269, so the concordance of the content of this patent data bag is not high, thus to this patent number When being estimated according to the value of bag, its price valuation also will not be very high.
【Data file assessment system】
Another embodiment of the present invention also provides a kind of assessment system, and this system includes:Data acquisition module 1, collection is treated The data file of assessment;Type division module 2, divides to the type of the data file of collection, and calculates each type The shared ratio in whole data file of data file;Consistency treatment module 3, determines described data using predetermined formula The concordance score of file.
Specifically, data acquisition module can carry out data acquisition by existing data acquisition unit, for example, can pass through Web crawlers carrys out gathered data file from network, and the data file in the present invention can be the number comprising multiple set of data files According to bag, alternatively single document, the packet of collection can comprise JSON, picture, video, the file such as audio frequency, but does not limit to In this.
Type division module is when the type to data file divides, according to data type, civilian by the data of collection Part is divided into unstructured data, semi-structured data and structural data, and calculates these data types and account for whole file The ratio of size.In specific operation process, can be manually operated to divide data type, can be by means of R language and manual behaviour Make to calculate come the ratio to complete each type.
In an embodiment of the present invention, unstructured data, semi-structured data and structural data are defined as follows:Non- knot Structure data:Unstructured data refers to the data not having fixed structure, for example, the office documents of all formats, text, figure Piece, all kinds of form, image and audio frequency, video information.Semi-structured data:Semi-structured data refers to that data has implicit structure But be not presented in bivariate table etc, a kind of knowledge source between structuring and unstructured knowledge source, For example, the resume of storage employee, the similar file such as XML, HTML, JSON.Structural data:Traditional relational data model, OK Data, is stored in data base, the data of available bivariate table representation, for example, is stored in csv, the data of excel, bivariate table. The shared ratio in whole data file of unstructured data, semi-structured data and structural data can use q, p and h respectively It is indicated.
Consistency treatment module, can be by using following consistent during the concordance to data file is processed Property scoring formula the concordance of data file is estimated:
Wherein, f refers to concordance score, and scope is [0,1], and f value is bigger, represents that the concordance of data file is higher, q, p Represent the ratio of unstructured data, semi-structured data and structural data, wherein q+p+h=1 with h respectively.One can be passed through Cause property processing module processes the value that concordance score f with regard to data file obtaining to assess data file, and concordance obtains Divide f and being directly proportional that data file is worth, if f value is bigger, that is, closer to 1 then it represents that the corresponding valuation of data file is also got over High.
In the present invention, consistency treatment module the calculated concordance with regard to data file obtain branch and be saved in In the storage system of assessment system, and utilization in data pricing system can be transferred into, think that the value assessment of data file carries Foundation for reference, the value of the data file of estimation can be in data trade display terminal or display platform, for example, for patent number According to bag, such as can be shown on patent consulting website in the platform showing this patent data bag, so that related personnel's reference makes With.
It should be noted that the assessment aspect of data file has a lot, need to consider various aspects and just can draw number According to the final valuation of file, the one side that simply estimated data's file is worth that the present invention provides, is that the valuation of data file carries For a reference frame.
Those skilled in the art are it should be appreciated that embodiments herein can be provided as method, system or computer program Product.Therefore, the application can be using complete hardware embodiment, complete software embodiment or the reality combining software and hardware aspect Apply the form of example.And, the application can be using in one or more computers wherein including computer usable program code The upper computer program implemented of usable storage medium (including but not limited to disk memory, CD-ROM, optical memory etc.) produces The form of product.
The application is the flow process with reference to method, equipment (system) and computer program according to the embodiment of the present application Figure and/or block diagram are describing.It should be understood that can be by each stream in computer program instructions flowchart and/or block diagram Flow process in journey and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processor instructing general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device is to produce A raw machine is so that produced for reality by the instruction of computer or the computing device of other programmable data processing device The device of the function of specifying in present one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and can guide computer or other programmable data processing device with spy Determine in the computer-readable memory that mode works so that the instruction generation inclusion being stored in this computer-readable memory refers to Make the manufacture of device, this command device realize in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or The function of specifying in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing device so that counting On calculation machine or other programmable devices, execution series of operation steps to be to produce computer implemented process, thus in computer or On other programmable devices, the instruction of execution is provided for realizing in one flow process of flow chart or multiple flow process and/or block diagram one The step of the function of specifying in individual square frame or multiple square frame.
Although having been described for the preferred embodiment of the application, those skilled in the art once know basic creation Property concept, then can make other change and modification to these embodiments.So, claims are intended to be construed to including excellent Select embodiment and fall into being had altered and changing of the application scope.
Obviously, those skilled in the art can carry out various changes and modification without deviating from this Shen to the embodiment of the present application Please embodiment spirit and scope.So, if these modifications of the embodiment of the present application and modification belong to the application claim And its within the scope of equivalent technologies, then the application is also intended to comprise these changes and modification.

Claims (24)

1. one kind is based on conforming data file Valuation Method it is characterised in that including:
Gather data file to be assessed;
The type of data file of collection is divided, and determines the data file of each type institute in whole data file The ratio accounting for;
Using default processing method, the concordance of described data file is processed.
2. according to claim 1 based on conforming data file Valuation Method it is characterised in that described data File according to Type division be unstructured data, semi-structured data and structural data.
3. according to claim 2 based on conforming data file Valuation Method it is characterised in that described default Processing method is processed to the concordance of described data file using following formula:
f = 3 2 ( q 2 + p 2 + h 2 ) - 1 2
Wherein, described f is the concordance score of data file, and span is [0,1];Described q, p and h represent non-structural respectively Change data, the semi-structured data and structural data ratio in whole data file, wherein, q+p+h=1.
4. according to claim 3 based on conforming data file Valuation Method it is characterised in that described f value It is directly proportional to the estimated value of described data file.
5. according to claim 1 based on conforming data file Valuation Method it is characterised in that described data File is the packet comprising multiple set of data files or is single document.
6. according to claim 2 based on conforming data file Valuation Method it is characterised in that described non-knot Structure data refers to the data not having fixed structure, including office documents, text, picture, all kinds of form, image and audio frequency, regards Frequency information.
7. according to claim 2 based on conforming data file Valuation Method it is characterised in that described half hitch Structure data refers to that data has implicit structure but is not the data presented in bivariate table etc.
8. according to claim 7 based on conforming data file Valuation Method it is characterised in that described half hitch Structure data includes storing the resume of employee, the similar file such as XML, HTML, JSON.
9. according to claim 2 based on conforming data file Valuation Method it is characterised in that described structure Change data and refer to traditional relational data model, row data, be stored in data base, the data of available bivariate table representation.
10. according to claim 9 based on conforming data file Valuation Method it is characterised in that described knot Structure data includes being stored in csv, the data of excel and bivariate table.
11. according to any one of claim 1 to 10 based on conforming data file Valuation Method, its feature exists In the shared ratio in whole data file of the data file of each type is determined by R language and manual operation.
12. according to any one of claim 1 to 10 based on conforming data file Valuation Method, its feature exists In described data file is gathered from network by web crawlers.
13. one kind are based on conforming data file valve estimating system it is characterised in that including:
Data acquisition module, gathers data file to be assessed;
Type division module, divides to the type of the data file of collection, and determines the data file of each type whole Shared ratio in individual data file;
Consistency treatment module, is processed to the concordance of described data file using default processing method.
14. according to claim 13 based on conforming data file valve estimating system it is characterised in that described class Described data file is divided into unstructured data, semi-structured data and structural data according to type by type division module.
15. according to claim 14 based on conforming data file valve estimating system it is characterised in that described one Cause property processing module the concordance of described data file is processed by using following formula:
f = 3 2 ( q 2 + p 2 + h 2 ) - 1 2
Wherein, described f is the concordance score of data file, and span is [0,1];Described q, p and h represent non-structural respectively Change data, the semi-structured data and structural data ratio in whole data file, wherein, q+p+h=1.
16. according to claim 15 based on conforming data file valve estimating system it is characterised in that described f Value is directly proportional to the estimated value of described data file.
17. according to claim 13 based on conforming data file valve estimating system it is characterised in that described number It is the packet comprising multiple set of data files or for single document according to file.
18. according to claim 14 based on conforming data file valve estimating system it is characterised in that described non- Structural data refers to the data not having fixed structure, including office documents, text, picture, all kinds of form, image and audio frequency, Video information.
19. according to claim 14 based on conforming data file valve estimating system it is characterised in that described half Structural data refers to that data has implicit structure but is not the data presented in bivariate table etc.
20. according to claim 19 based on conforming data file valve estimating system it is characterised in that described half Structural data includes storing the resume of employee, the similar file such as XML, HTML, JSON.
21. according to claim 14 based on conforming data file valve estimating system it is characterised in that described knot Structure data refers to traditional relational data model, row data, is stored in data base, the data of available bivariate table representation.
22. according to claim 21 based on conforming data file valve estimating system it is characterised in that described knot Structure data includes being stored in csv, the data of excel and bivariate table.
23. according to any one of claim 13 to 22 based on conforming data file valve estimating system, its feature It is, described Type division module determines the data file of each type in whole data literary composition by R language and manual operation Shared ratio in part.
24. according to any one of claim 13 to 22 based on conforming data file valve estimating system, its feature It is, described data acquisition module gathers described data file by web crawlers from network.
CN201610791831.4A 2016-08-31 2016-08-31 Based on conforming data file Valuation Method and system Pending CN106469195A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610791831.4A CN106469195A (en) 2016-08-31 2016-08-31 Based on conforming data file Valuation Method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610791831.4A CN106469195A (en) 2016-08-31 2016-08-31 Based on conforming data file Valuation Method and system

Publications (1)

Publication Number Publication Date
CN106469195A true CN106469195A (en) 2017-03-01

Family

ID=58230358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610791831.4A Pending CN106469195A (en) 2016-08-31 2016-08-31 Based on conforming data file Valuation Method and system

Country Status (1)

Country Link
CN (1) CN106469195A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807972A (en) * 2017-10-19 2018-03-16 北京科技大学 A kind of test data consistency detecting method
CN108734405A (en) * 2018-05-24 2018-11-02 国信优易数据有限公司 A kind of data value Evaluation Platform and method
CN108764995A (en) * 2018-05-24 2018-11-06 国信优易数据有限公司 A kind of data value determines system and method
CN109981632A (en) * 2018-12-20 2019-07-05 上海分布信息科技有限公司 Data value transmission method and data value Transmission system
EP3660778A4 (en) * 2017-07-26 2020-06-03 Sony Corporation Information processing device, information processing system, information processing method, and program
WO2021179496A1 (en) * 2020-03-10 2021-09-16 南方电网科学研究院有限责任公司 Data transaction method and data transaction system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915373A (en) * 2012-11-06 2013-02-06 无锡江南计算技术研究所 Data storage method and device
CN104794190A (en) * 2015-04-16 2015-07-22 成都睿峰科技有限公司 Method and device for effectively storing big data
CN105373605A (en) * 2015-11-11 2016-03-02 中国农业大学 Batch storage method and system for data files
CN105488699A (en) * 2015-12-25 2016-04-13 国信优易数据有限公司 Data asset value assessment method
CN105825413A (en) * 2016-03-11 2016-08-03 国网天津市电力公司 Bilateral multi-attribute big data resource value evaluation and exchange method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915373A (en) * 2012-11-06 2013-02-06 无锡江南计算技术研究所 Data storage method and device
CN104794190A (en) * 2015-04-16 2015-07-22 成都睿峰科技有限公司 Method and device for effectively storing big data
CN105373605A (en) * 2015-11-11 2016-03-02 中国农业大学 Batch storage method and system for data files
CN105488699A (en) * 2015-12-25 2016-04-13 国信优易数据有限公司 Data asset value assessment method
CN105825413A (en) * 2016-03-11 2016-08-03 国网天津市电力公司 Bilateral multi-attribute big data resource value evaluation and exchange method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3660778A4 (en) * 2017-07-26 2020-06-03 Sony Corporation Information processing device, information processing system, information processing method, and program
CN107807972A (en) * 2017-10-19 2018-03-16 北京科技大学 A kind of test data consistency detecting method
CN107807972B (en) * 2017-10-19 2020-12-22 北京科技大学 Test data consistency detection method
CN108734405A (en) * 2018-05-24 2018-11-02 国信优易数据有限公司 A kind of data value Evaluation Platform and method
CN108764995A (en) * 2018-05-24 2018-11-06 国信优易数据有限公司 A kind of data value determines system and method
CN109981632A (en) * 2018-12-20 2019-07-05 上海分布信息科技有限公司 Data value transmission method and data value Transmission system
CN109981632B (en) * 2018-12-20 2021-04-02 上海分布信息科技有限公司 Data valuization transmission method and data valuization transmission system
WO2021179496A1 (en) * 2020-03-10 2021-09-16 南方电网科学研究院有限责任公司 Data transaction method and data transaction system

Similar Documents

Publication Publication Date Title
CN106469195A (en) Based on conforming data file Valuation Method and system
CN104391860B (en) content type detection method and device
US9305279B1 (en) Ranking source code developers
CN106886862A (en) One kind bid and purchase management system and method
US9639353B1 (en) Computing quality metrics of source code developers
CN106611291A (en) Information push method and device
CN103646016A (en) Implementation method of user-defined financial statement and server
US11050677B2 (en) Enhanced selection of cloud architecture profiles
CN103279846A (en) Project acceptance method and system based on BIM model
US20110047058A1 (en) Apparatus and method for modeling loan attributes
WO2021190379A1 (en) Method and device for realizing automatic machine learning
CN110135701A (en) Control automatic generation method, device, electronic equipment and the readable medium of rule
CN110347855A (en) Paintings recommended method, terminal device, server, computer equipment and medium
CN108572988A (en) A kind of house property assessment data creation method and device
CN103116827A (en) Rural power grid engineering control system
CN110675216A (en) Bill data generation method and device
CN111242658A (en) Information sharing reward method and device and computer readable storage medium
US11675756B2 (en) Data complementing system and data complementing method
Tian et al. Pricing barrier and American options under the SABR model on the graphics processing unit
CN107507023B (en) Information delivery method and device
Guo et al. Impact analysis of air pollutant emission policies on thermal coal supply chain enterprises in China
CN108062423B (en) Information-pushing method and device
CN117236624A (en) Issue repairer recommendation method and apparatus based on dynamic graph
CN107180083A (en) A kind of analysis and processing method to investment project
CN109741172B (en) Credit early warning method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170301