CN117171812A - Multi-source trusted data production method based on blockchain, blockchain node and system - Google Patents

Multi-source trusted data production method based on blockchain, blockchain node and system Download PDF

Info

Publication number
CN117171812A
CN117171812A CN202311189157.9A CN202311189157A CN117171812A CN 117171812 A CN117171812 A CN 117171812A CN 202311189157 A CN202311189157 A CN 202311189157A CN 117171812 A CN117171812 A CN 117171812A
Authority
CN
China
Prior art keywords
data
source
verified
blockchain
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311189157.9A
Other languages
Chinese (zh)
Inventor
洪薇
洪健
李京昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Yangzhong Jushi Information Technology Co ltd
Original Assignee
Hubei Yangzhong Jushi Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Yangzhong Jushi Information Technology Co ltd filed Critical Hubei Yangzhong Jushi Information Technology Co ltd
Priority to CN202311189157.9A priority Critical patent/CN117171812A/en
Publication of CN117171812A publication Critical patent/CN117171812A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multi-source trusted data production method based on a blockchain, a blockchain node and a system, wherein the method comprises the following steps: identifying a data source, and acquiring source data from a plurality of data sources; after each source data is respectively subjected to data preprocessing, each source data in the source data set is respectively extracted to carry out data combination on the minimum data unit in the source data, and data to be verified is obtained; the blockchain nodes respectively compare the data to be verified based on intelligent contracts, and whether the corresponding blockchain node consensus passes or not is confirmed; and acquiring the consensus passing rate of all the blockchain nodes participating in the consensus, and if the data to be verified is determined to be the trusted data, storing the trusted data and production information related to the trusted data in the blockchain network. According to the invention, through a multi-source data consistency comparison mode, automatic verification and traceable multi-source data credible production are realized aiming at mass data, and the problems of reliability, credibility and usability of mass converged data are solved.

Description

Multi-source trusted data production method based on blockchain, blockchain node and system
Technical Field
The invention belongs to the technical field of data asset management, and particularly relates to a multi-source trusted data production method based on a blockchain, a blockchain node and a system.
Background
The trusted data needs to determine the authenticity and the credibility of the data from the production link of the data, including various data production attributes, such as the situation that the trusted data is affected by risk such as attack, destruction and the like after being changed into the record of the database, and the data is unrealistic and unreliable.
The government affair data refer to various data resources which are collected, generated, stored and managed by government affair departments and technical support units of all levels in the process of performing responsibilities. The government affair data generally comprises sharable government affair data, openable public data and government affair data which are not suitable for being opened and shared according to the transmissible range. The public data shows that the national integrated government affair data sharing hub has access to 5951 of government affair departments at each level, and issues 1.35 ten thousand of various data resources of 53 national affair department, and supports the national sharing call more than 4000 hundred million times in a cumulative way.
The basis of the open sharing of government affair data is to output trusted data and guarantee government public trust. However, in the process of pushing government data sharing to be opened by data management institutions such as provincial or municipal large data centers, the outputting of trusted data is difficult due to a plurality of data sources and differences, and the main reasons include:
1. The government affair data gathers a large amount of data information, and a large amount of untraceable original production information exists in the existing database data, so that the reliability of data record is low, auditing is difficult, and the circulation and use of the data are affected;
2. if the database adopts a trusted data production traceability mechanism related to production information coupling, when large-scale automatic data processing is carried out, the data preprocessing system has obvious performance and efficiency restriction, and trusted endorsement cannot be carried out for each data record;
3. the data conflict problem, the same data may be stored in a provincial database, a municipal database, a county database and a business unit database node at the same time, and the databases are deployed in different servers and have different geographic positions and operating environments, so that partial data has a plurality of multi-source and inconsistent conditions due to different data acquisition modes, data storage standards, data storage structures and the like.
Therefore, the government affair data open sharing needs to establish an automatic verification and traceable multi-source trusted data production method aiming at mass data, and the production cost of the trusted data is reduced under the condition of higher data reliability.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a multi-source trusted data production method based on a blockchain, a blockchain node and a system, which aim to solve the problems of reliability, credibility and usability of mass aggregated data.
To achieve the above object, in a first aspect, the present invention provides a blockchain-based multi-source trusted data production method applied to a blockchain node, the blockchain node accessing a blockchain network, the blockchain network including a plurality of blockchain nodes, comprising the steps of:
s1, identifying data sources by block chain link points, and acquiring source data from a plurality of data sources to obtain a source data set;
s2, the block chain node respectively carries out data preprocessing on each source data in the source data set;
s3, the block chain link points respectively extract minimum data units in the source data for each source data in the source data set after the data preprocessing, and data combination is carried out to obtain data to be verified;
s4, all or part of the blockchain nodes respectively compare the data to be verified based on intelligent contracts, and whether the corresponding blockchain node consensus passes or not is confirmed;
s5, obtaining the consensus passing rate of all the blockchain nodes participating in the consensus, judging whether the data to be verified is the trusted data, and if so, storing the trusted data and production information related to the trusted data in the blockchain network.
The invention can realize automatic verification and traceable multi-source data credible production aiming at mass data by carrying out consistency comparison on multi-source data based on the blockchain, and carrying out consistency comparison after gathering the multi-source data and extracting the minimum data unit in the source data for data combination, and is based on distributed network storage of the blockchain network.
In a specific service scene, a single-picking data record or field-level data combination in source data may not have a data circulation value, and in the invention, the split and recombined data to be verified is data with special service attribute and circulation value, so that the production of the multi-source trusted data has application significance.
The verification, consensus and trusted data storage process of the multi-source data are all carried out in a blockchain network, so that the transparency, reliability, credibility and usability of the multi-source trusted data production are ensured.
In a possible implementation manner, in step S2, the data preprocessing includes source data checksum source data cleaning, and step S2 includes the following sub-steps:
s2.1, respectively carrying out source data verification on each source data in a source data set by a block chain node according to a data verification rule, wherein the source data verification comprises any one or more of structure verification, repeatability verification, missing value verification and abnormal value verification;
S2.2, after the source data passes the verification, the block chain node respectively carries out source data cleaning on each source data in the source data set according to a data cleaning rule to obtain a cleaning result of each source data;
the source data cleaning comprises any one or more of data deduplication processing, missing value processing, data conversion processing and outlier processing;
s2.3, after the source data is cleaned, judging whether a cleaning result of the source data meets the passing condition in the data judging rule or not by the block chain node according to the data judging rule, and if so, marking the source data as first data to be processed as source data after data preprocessing;
if the passing condition in the data judging rule is not met, stopping the operation, marking the cleaned source data as first data to be confirmed, and storing the first data to be confirmed into the blockchain network;
the data verification rule, the data cleaning rule and the data judgment rule belong to a data preprocessing rule, and the data preprocessing rule is preset through block chain link points.
It can be understood that the data preprocessing rule can be preset and verified by block chain link points, and can also be set and verified during the multi-source trusted data production. Those skilled in the art can select the corresponding timing according to the actual needs.
Specifically, the above evidence-storing process specifically includes: the block chain nodes preset data preprocessing rules and are stored in the block chain network after passing through the common knowledge of all the block chain link points.
Further, for the first data to be confirmed, the first data needs to be stored in the blockchain network;
furthermore, the first data to be confirmed can be collected into a database, for example, a risk early warning database is established, and the first data to be confirmed is used as risk data for risk early warning.
In one possible implementation, step S3 includes the sub-steps of:
s3.1, acquiring source data after data preprocessing by the block chain link points;
s3.2, the blockchain node performs data combination on the minimum data unit of the source data after data preprocessing based on a data combination rule of the blockchain network consensus storage card to obtain data to be verified, and stores the data to be verified into a blockchain network; the minimum data unit refers to a combination of a field and a field value of a certain data record in a data table of source data.
It can be understood that the data combination rule can be preset and verified by block chain link points, or can be set and verified when the multi-source trusted data production is performed. Those skilled in the art can select the corresponding timing according to the actual needs.
Specifically, the above evidence-storing process specifically includes: the block chain nodes are configured with data combination rules and are authenticated to the block chain network after passing through the common knowledge of all block chain links.
In one possible implementation, step S4 includes the sub-steps of:
s4.1, respectively comparing the data to be verified based on intelligent contracts, wherein the data to be verified comprises one of the following two modes:
mode 1: all the blockchain nodes are used as executing nodes, and multisource verification rule comparison is executed on the basis of intelligent contracts to verify data;
mode 2: at least one blockchain node is used as an executing node, and multi-source verification rule comparison is executed on the basis of the intelligent contract to verify data; at least one blockchain node is used as a participating node, and whether the multisource verification rule executed by the executing node meets the requirement is confirmed only based on the intelligent contract, and the multisource verification rule is not executed to compare the to-be-verified data;
s4.2, if the data to be verified meets the multi-source verification rule, the corresponding blockchain node consensus passes; the multi-source verification rule is configured according to the key comparison field and stored in the blockchain network; the key comparison field is used for uniquely identifying the data to be verified.
It can be understood that the multi-source verification rule in step S4.1 may be preset and verified by the block link points, or may be set and verified during the production of the multi-source trusted data. Those skilled in the art can select the corresponding timing according to the actual needs.
Specifically, the above evidence-storing process specifically includes: the block chain node determines a key comparison field, wherein the key comparison field is used for uniquely identifying data to be verified; the blockchain node configures a multi-source verification rule according to the key comparison field, and the multi-source verification rule is stored in the blockchain network after passing through the common identification of all blockchain link points.
Further, the process of verifying the data to be verified may be multi-source data comparison by any one or more blockchain nodes based on the intelligent contract. And other blockchain nodes do not participate in multi-source data comparison calculation and only participate in the consensus process of the multi-source verification rule so as to verify whether the multi-source verification rule is correct.
It should be noted that the key comparison field may be one or more; the multi-source verification rule may be a comparison rule based on data types such as text, number, date, etc. to determine whether there is agreement between the data to be verified generated separately for each source data in the source data set.
In one possible implementation manner, in step S5, determining whether the data to be verified is trusted data includes:
if the consensus passing rate is a first preset value or greater than the first preset value, judging that the data to be verified is trusted data, otherwise, judging that the data to be verified is untrusted data; after the judgment result is identified, the data to be verified is stored in the blockchain;
If the consensus passing rate is between the second preset value and the first preset value and the participant passing through the consensus contains an authoritative node, judging that the data to be verified is trusted data; the authoritative node is one or more of the blockchain nodes, and the authoritative node is the blockchain node where the original production source of the data corresponding to the at least one key comparison field is located;
if the consensus passing rate is between the second preset value and the first preset value and the participant passing through the consensus does not contain an authoritative node, judging that the data to be verified is second data to be confirmed;
if the consensus passing rate is not greater than a second preset value, judging that the data to be verified is unreliable data; the first preset value is larger than the second preset value.
When the consensus passing rate meets the preset condition, the consensus passes, and the data to be verified is trusted data.
The values of the first preset value and the second preset value can be set by a person skilled in the art according to actual needs. For example: the first preset value may be configured as a value between 70%, 80%, 85%, 90%, 95%, 100% or 60% and 100%. The second preset value may be configured as a value between 50%, 60%, 70%, 80%, 90% or 50% and 99%. Of course, the first preset value and the second preset value may be set to other values.
In addition, references to "between" in the above description generally refer to values that do not include both ends.
In one possible implementation manner, in step S5, determining whether the data to be verified is trusted data further includes:
the authority node receives second data to be confirmed, judges whether the second data to be confirmed is trusted data or not according to a second data to be confirmed verification method stored in the blockchain network based on intelligent contract control, obtains an authority verification result and broadcasts the authority verification result to the blockchain network for consensus, and stores trusted data corresponding to the second data to be confirmed and related production information of the trusted data in the blockchain network after the consensus is passed; the second data verification method to be confirmed is established by an authoritative node, and is commonly recognized and passed in a blockchain network for verification.
It can be understood that the second verification method of the data to be confirmed can be that block link points are preset and verified, and can also be set and verified when the multi-source trusted data production is carried out. Those skilled in the art can select the corresponding timing according to the actual needs.
Specifically, the above evidence-storing process specifically includes: the authority node receives the second data to be confirmed, creates a corresponding second data verification method to be confirmed based on the received second data to be confirmed, submits the second data verification method to the blockchain network for consensus, and stores the consensus in the blockchain network after the consensus passes.
Further, the second verification method of the data to be confirmed may be set according to all historical versions of the data to be confirmed and/or the data change rule. And if the current change does not meet the preset data change rule, the data is identified as the unreliable data, and if the current change does not meet the preset data change rule, the data is identified as the trusted data.
In a possible implementation manner, the step S5 further includes the following substeps:
when the data to be verified is the trusted data, carrying out data processing on the multi-source trusted data by adopting a preset algorithm to obtain production information related to the multi-source trusted data, and then storing part or all of the obtained production information into a blockchain network; the related production information includes: data source information, source data processing rule information, source data related data processing information and block chain network consensus information;
the preset algorithm comprises one or more of the following algorithms:
algorithm 1: a hash algorithm is adopted, and the hash value is obtained through processing of the hash algorithm after the input information is combined;
for example, 3 pieces of input information are respectively represented by a character string 1, a character string 2 and a character string 3, and after combination, a character string 'character string 1-character string 2-character string 3' is obtained, and a corresponding hash value is obtained through a hash algorithm, such as SHA 256.
Algorithm 2: according to time sequence, input information is arranged to construct a merkle hash tree, and corresponding root hash values are output.
Specifically, the hash value associated with the trusted data may be verified as the associated production information of the trusted data.
In a second aspect, the present invention provides a blockchain node comprising:
the data source identification unit is used for identifying data sources, acquiring source data from a plurality of data sources and obtaining a source data set;
the data preprocessing unit is used for respectively preprocessing the data of each source data in the source data set, and the data preprocessing comprises source data verification and source data cleaning;
the data combination unit is used for respectively extracting the minimum data units in the source data from each source data in the source data set after the data preprocessing to carry out data combination so as to obtain data to be verified;
the data verification unit is used for comparing the data to be verified based on the intelligent contract, and if the data to be verified meets the multi-source verification rule, the consensus passes;
the consensus unit is used for acquiring the consensus passing rate of all the blockchain nodes participating in the consensus and judging whether the data to be checked are trusted data or not;
and the storage unit is used for storing the trusted data and the production information related to the trusted data into the blockchain if the consensus verification of the consensus unit is passed.
In a third aspect, the present invention provides a blockchain system, comprising:
the client is used for sending a data source to a blockchain node in the blockchain network;
the block chain node is used for identifying data sources, acquiring source data from a plurality of data sources and obtaining a source data set; respectively carrying out data preprocessing on each source data in a source data set, wherein the data preprocessing comprises source data verification and source data cleaning; respectively extracting minimum data units in the source data from each source data in the source data set after data preprocessing, and carrying out data combination to obtain data to be verified; based on the intelligent contracts, respectively comparing the data to be verified, and if the data to be verified meets the multi-source verification rule, passing the consensus; and then obtaining the consensus passing rate of all the blockchain nodes participating in the consensus, judging whether the data to be verified is the trusted data, and if so, indicating the blockchain network to store the trusted data and the production information related to the trusted data.
In a fourth aspect, the present invention provides an electronic device, comprising: at least one processor and interface circuitry; the processor obtains program instructions via the interface circuit according to which the method described in the first aspect or any one of the possible implementations of the first aspect is performed.
In a fifth aspect, the present invention provides a computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In a sixth aspect, the invention provides a computer program product which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:
the invention provides a multi-source trusted data production method, a block chain node and a system based on a block chain.
In a specific service scene, a single-picking data record or field-level data combination in source data may not have a data circulation value, and in the invention, the split and recombined data to be verified is data with special service attribute and circulation value, so that the production of the multi-source trusted data has application significance.
The verification, consensus and trusted data storage process of the multi-source data are all carried out in a blockchain network, so that the transparency, reliability, credibility and usability of the multi-source trusted data production are ensured.
In the blockchain network, the blockchain node is provided with a unique identity, and only the nodes with authority and the participants corresponding to the nodes can be confirmed to check, verify and share the trusted data and the production process thereof through intelligent contract control of the blockchain, so that trust endorsement is carried out for circulation use of the trusted data, a use foundation of the trusted data is laid, and the circulation value of the trusted data is improved.
Drawings
FIG. 1 is a flow chart of a blockchain-based multi-source trusted data production method provided by an embodiment of the present invention;
FIG. 2 is a block chain node architecture diagram provided by an embodiment of the present invention;
FIG. 3 is a block chain system architecture diagram provided by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
First, technical terms involved in the embodiments of the present invention will be described.
A blockchain is a chain data structure that sequentially links and combines data blocks in a temporal order, and cryptographically ensures that the data blocks are not tamperable and counterfeitable. Each block in the blockchain is linked to the immediately preceding block in the blockchain by including a cryptographic hash of the preceding block. Each chunk also includes a timestamp, a cryptographic hash of the chunk, and one or more transactions. The transaction that has been validated by a node of the blockchain network is hashed and forms a Merkle tree. In the Merkle tree, data at leaf nodes is hashed and for each branch of the Merkle tree, all hash values of that branch are concatenated at the root of that branch. The above process is performed for the Merkle tree up to the root node of the entire Merkle tree. The root node of the Merkle tree stores hash values representing all the data in the Merkle tree. When a hash value claims to be a transaction stored in the Merkle tree, a quick verification may be performed by determining whether the hash value is consistent with the Merkle tree structure.
A blockchain network is a network of computing nodes that is used to manage, update, and maintain one or more blockchain structures. In this specification, a blockchain network may include a public blockchain network, a private blockchain network, or a federated blockchain network.
In a common blockchain network, the consensus process is controlled by nodes of the consensus network. For example, there may be thousands of entity collaboration processes in a public blockchain network, each entity operating at least one node in the public blockchain network. Thus, a public blockchain network may be considered a public network of participating entities. In some examples, most entities (nodes) must sign each block in order and add the signed block to the blockchain of the blockchain network. Examples of public blockchain networks may include specific peer-to-peer payment networks.
A private blockchain network is provided for a particular entity. The read-write rights of each node in the private blockchain network are tightly controlled. Thus, private blockchain networks are also commonly referred to as licensed networks, which limit who is allowed to participate in the network and the level of network participation. In private blockchain networks, various types of access control mechanisms may be used.
The network range of the alliance chain is between the public chain and the private chain, and is usually used in environments of multiple member roles, such as payment settlement between banks, logistics between enterprises and the like, and in these environments, members with different authorities participate. Federated chain systems typically have authentication and rights settings and the number of nodes is often deterministic, appropriate for transactions and between enterprises or institutions. The alliance chain has the following characteristics: firstly, the transaction cost is cheaper, and the transaction only needs to be verified by a plurality of trusted high-power nodes without full network confirmation; the nodes can be well connected, faults can be quickly repaired by manual intervention, and the block time is allowed to be reduced by using a consensus algorithm; third, if the read rights are limited, better privacy protection can be provided.
In the related art, it is possible to provide the function of the smart contract whether it is a public chain, a private chain, or a federation chain. Intelligent contracts on blockchains are contracts on blockchain systems that can be executed by transaction triggers. The smart contracts may be defined in the form of codes. The intelligent contracts that need to be started are determined by the blockchain node running code. In one case, the blockchain node cannot actively initiate execution of the smart contract by triggering the blockchain node to execute the smart contract by the transaction, and in another case, it may be desirable to actively initiate the smart contract. For example, a blockchain node in a blockchain network may actively initiate a smart contract at a certain time to complete a timing task by actively executing the smart contract at a certain time.
Next, the technical scheme provided in the embodiment of the present invention is described.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a multi-source trusted data production method based on a blockchain, which is applied to a blockchain node, wherein the blockchain node accesses a blockchain network, and the blockchain network includes a plurality of blockchain nodes, and includes the following steps:
s1, identifying data sources by block chain link points, and acquiring source data from a plurality of data sources to obtain a source data set;
s2, respectively carrying out data preprocessing on each source data in the source data set by the block chain node, and recording data preprocessing information, wherein the data preprocessing comprises source data verification and source data cleaning;
s3, after data preprocessing, the blockchain node respectively extracts minimum data units in the source data for each source data in the source data set to perform data combination to obtain data to be verified;
s4, all or part of the blockchain nodes respectively compare at least the data to be verified based on intelligent contracts, and whether the corresponding blockchain node consensus passes or not is confirmed;
s5, obtaining the consensus passing rate of all the blockchain nodes participating in the consensus, judging whether the data to be verified is the trusted data, and if so, storing the trusted data and production information related to the trusted data in the blockchain network.
In the invention, the source data is sourced from a plurality of data sources, and for the same data record, the source data from different data sources possibly has the problem of data conflict, so that the reliability of the data is not high, and the method of the invention is needed to be adopted for producing the multi-source trusted data.
Example 2
In this embodiment, for step S1 "block link point identification data source, source data is obtained from a plurality of data sources, and a supplementary description of a source data set" in embodiment 1, step S1 includes the following sub-steps:
s1.1, identifying a data source, acquiring source data from a plurality of data sources, obtaining a source data set, and recording data source information;
the method for acquiring source data from a plurality of data sources comprises the following steps:
s1.1.1, if the data source is a database, connecting and inquiring through a client of a corresponding database or through a library of a programming language, for example, a MySQL database is connected and inquired through a MySQL client or a MySQLdb library of Python, a MongoDB database acquires data through a MongoDBCompass client, and a PostgreSQL database is connected and inquired through a psychg 2 library of Python;
s1.1.2 if the data source is an API, accessing the API endpoint through an HTTP request library in the programming language, acquiring the returned data, and accessing the API endpoint through a Requests library of Python to acquire the data;
S1.1.3 if the data source is a file, acquiring data through a file processing library in the programming language, for example, acquiring file data through csv and json modules of Python;
s1.1.4 if the data source is a website, capturing data by a web crawler tool, such as capturing website data by beautfulso of Python;
the data source information includes: at least 2 of database type, database connection information, data driven type, data source name, user name, and user password;
in the preferred scheme, if access rights or credentials are required for acquiring source data, recording data source sensitive information; the data source sensitive information includes at least one of a user name, a password, and an API key.
In the invention, the acquisition process of the source data can be verified through the data source information, so that whether the acquisition condition of the source data set accords with the actual service condition or not can be traced conveniently, for example, verification and judgment are carried out through the computing capacity of verification computer equipment, the data quantity of the source data and the verification time.
Example 3
In this embodiment, the step S2 of the embodiment 1 is to perform data preprocessing on each source data in the source data set by the blockchain node, where the data preprocessing includes a supplementary explanation of source data check and source data cleaning ", and the step S2 includes the following sub-steps:
S2.1, respectively carrying out source data verification on each source data in a source data set by a block chain node according to a data verification rule, wherein the source data verification comprises any one or more of structure verification, repeatability verification, missing value verification and abnormal value verification;
the structure check is to check whether the fields and the data structures in the source data are consistent with expectations, if so, the structure check is passed, and if not, the structure check is not passed; the structure check is used to ensure that all source data contains the necessary fields and that the type and naming of the fields are correct; correspondingly, recording structure verification information, wherein the structure verification information comprises: 1. fields and data structures in the source data are consistent with expectations or fields and data structures in the source data are inconsistent with expectations; 2. checking the start-stop time of the structure; 3. structural inspection rules; 4. and performing the block chain node information of the structure check.
The repeatability check is to check whether repeated records exist in the source data, if the repeated records do not exist, the repeatability check passes, and if the repeated records exist, the repeatability check does not pass; the repeatability check is used for ensuring that the source data provided by the source data does not have repeated data, and reducing the calculation overhead of a plurality of participants on the same data in a block chain network scene; correspondingly, recording repeatability verification information, wherein the repeatability verification information comprises: 1. the presence of duplicate records in the source data or the absence of duplicate records in the source data; 2. repeatedly checking start-stop time; 3. a repeatability test rule; 4. blockchain node information that performs duplicate checking.
The missing value verification is to check whether the missing value exists in the source data, if the missing value does not exist, the missing value verification is passed, and if the missing value exists, the missing value verification is not passed; correspondingly, missing value verification information is recorded, and the missing value verification information comprises: 1. the source data has a missing value or the source data has no missing value; 2. checking the start-stop time of the missing value; 3. missing value checking rules; 4. block link point information for performing missing value verification.
Checking whether the abnormal value exists in the source data, if not, checking the abnormal value to pass, and if not, checking the abnormal value to fail; the outlier includes invalid data or unreasonable data; correspondingly, abnormal value verification information is recorded, wherein the abnormal value verification information comprises: 1. abnormal values exist in the source data or no abnormal values exist in the source data; 2. checking the starting and stopping time of the abnormal value; 3. abnormal value checking rules; 4. block link point information for performing outlier verification.
For missing values, outliers, 2 processing schemes may be employed: 1. if the source data is found to have a missing value and an abnormal value, the subsequent operation is not carried out; 2. and the block chain node initiates a data preprocessing proposal for the data which is found to be not passed by the verification of the source data, confirms a specific data preprocessing proposal after consensus, further executes the deletion, correction or reservation of metadata and performs subsequent operation.
In the preferred scheme, after the source data is verified, key-Value verification metadata corresponding to each source data is obtained, and the verification metadata is stored and verified in a blockchain network;
for example, a calculation is performed based on a data preprocessing rule, such as a data record of a certain a enterprise, and fields include: business name, business contact address and unified social credit code; the corresponding field values include: * Company limited, city, street, 9142, 5725, ZX.
The data preprocessing rule includes: checking whether the enterprise name field is empty; the checking rule of the enterprise contact address is whether the enterprise contact address is empty; unifying the length check of the check rule of the social credit code.
For example, the Key-Value check metadata corresponding to an enterprise data record a is:
the name field of the enterprise is not null;
the enterprise contact address is not null;
social credit code, length 15.
In order to save storage resources of the blockchain network, preferably, only the hash value of the check metadata obtained by the hash calculation is verified.
S2.2, after the source data passes the verification, the block chain node respectively carries out source data cleaning on each source data in the source data set according to a data cleaning rule to obtain a cleaning result of each source data;
The source data cleaning comprises any one or more of data deduplication processing, missing value processing, data conversion processing and outlier processing;
the data deduplication processing refers to removing duplicate data according to a data repeatability check result; correspondingly, the data deduplication processing information comprises the starting and ending time of the data deduplication processing, the data deduplication processing rule and the block link point information for executing the data deduplication processing.
The missing value processing refers to identifying the checking result of the missing value, and does not adopt the conventional method of deleting, filling or interpolating to process the missing value; in a preferred scheme, the data with the missing value is marked through a hash algorithm and is recorded as source data missing information hash-qsz; correspondingly, the missing value processing information is recorded, wherein the missing value processing information comprises the starting and ending time of missing value processing, the missing value processing rule and block link point information for executing missing value processing.
The data conversion processing refers to unified processing of data formats, such as date format, numerical format, character string format, etc., according to the data structure verification result, and then converting the data into a form suitable for further analysis, such as character string type to numerical value type; accordingly, data conversion processing information is recorded, including start and stop time of data conversion processing, data conversion processing rules, and blockchain node information for executing data conversion processing.
The abnormal value processing refers to data for identifying the abnormal value, and the abnormal value is processed by adopting no conventional methods such as deleting, correcting or treating the abnormal value as a missing value; accordingly, abnormal value processing information is recorded, wherein the abnormal value processing information comprises starting and ending time of abnormal value processing, abnormal value processing rules and block link point information for executing abnormal value processing.
In the preferred scheme, after the source data is verified, key-Value cleaning metadata corresponding to each source data is obtained, and the cleaning metadata is stored and verified in the blockchain network.
In a preferred embodiment, the start-stop time may be replaced by a completion time stamp for completion of the data processing operation or other steps.
S2.3, after the source data is cleaned, judging whether a cleaning result of the source data meets the passing condition in the data judging rule or not by the block chain node according to the data judging rule, and if so, marking the source data as first data to be processed as source data after data preprocessing;
if the passing condition in the data judging rule is not met, stopping the operation, marking the cleaned source data as first data to be confirmed, and storing the first data to be confirmed into the blockchain network;
and S2.4, if the cleaning result of the source data meets the passing condition in the data judging rule, identifying the source data as first data to be processed, executing the next operation, and if the cleaning result of the source data does not meet the passing condition in the data judging rule, stopping the operation, and identifying the cleaned source data as first data to be confirmed.
The data verification rule, the data cleaning rule and the data judgment rule belong to a data preprocessing rule, and the data preprocessing rule is preset through block chain link points.
The invention firstly carries out verification on the source data, if the source data has problems, the subsequent operation steps can not be executed so as to ensure the authenticity and the usability of the data and save the limited blockchain computing resources.
Furthermore, the data processing of the invention does not adopt data conversion and other operations for changing the data, thereby improving the reliability and the system efficiency of the data processing.
Example 4
In this embodiment, in step S3 of embodiment 1, "each source data in the source data set after data preprocessing is extracted from the block link point and the minimum data unit in the source data is combined to obtain the supplementary description of the data to be verified", step S3 includes the following sub-steps:
s3.1, acquiring source data after data preprocessing by the block chain link points;
and S3.2, configuring a data combination rule by the blockchain node, after the data combination rule is subjected to the blockchain network consensus certification, carrying out data combination on the minimum data unit of the source data after data preprocessing based on the data combination rule to obtain data to be verified, and certifying the data to be verified in the blockchain network.
The minimum data unit of the invention is the combination of the field and the field value of a certain data record in the data table; in some cases, the data table or data record may also be considered as the smallest unit of data, since the data table, data record itself, stores only a single field in combination with a field value.
Such as a data record for an enterprise a, the fields include: business name, business contact address and unified social credit code; the corresponding field values include: * Company limited, city, street, 9142, 5725, ZX;
the data records with business circulation value comprise enterprise names and unified social credit codes, and after the data records of the enterprise A are obtained from the enterprise information record table, the enterprise names, the unified social credit code fields and the corresponding data values are extracted, and then data combination is carried out on the source data to obtain data to be verified.
In a specific business scenario, a single data record or field-level data combination in source data may not have a data circulation value, and in the invention, the split and recombined data to be verified is preset, service-attribute-containing and circulation-value data in one or more specific data circulation application scenarios, so that the production of the multi-source trusted data has application significance.
Example 5
In this embodiment, in step S4 "all or part of the blockchain nodes in embodiment 1 are compared with the data to be verified based on the intelligent contracts, whether the corresponding blockchain node consensus passes or not is confirmed", step S5 "the consensus passing rate of all the blockchain nodes participating in the consensus is obtained, and whether the data to be verified is the supplementary description of the trusted data is determined, and step S4 includes the following sub-steps:
s4.1, determining a key comparison field by the blockchain node, wherein the key comparison field is used for uniquely identifying data to be verified;
the key comparison field can be one or more; for example, the key comparison field may be key comparison field information Hash-ZD obtained by calculating data record information through a Hash algorithm; the data record information comprises at least one of service information, field information and field value information of data;
in practice, since the data record is not stored, in order to reduce the calculation amount, the values of the service information, the data field and the data value can be assembled to directly serve as the key comparison field, or the assembled information can be calculated by a hash algorithm to serve as the key comparison field.
S4.2, configuring a multi-source verification rule by the blockchain node according to the key comparison field, and storing the multi-source verification rule into a blockchain network;
The multi-source verification rule is used for comparing data to be verified, for example, the multi-source verification rule can be a comparison rule based on data types such as text, number, date and the like so as to determine whether the data to be verified generated by each source data in the source data set are consistent with each other.
The multi-source verification rule may be one or more.
S4.3, comparing the data to be verified based on intelligent contracts respectively, wherein a mode 1 or a mode 2 is selected:
mode 1: all the blockchain nodes are used as executing nodes, and multisource verification rule comparison is executed on the basis of intelligent contracts to verify data;
mode 2: at least one blockchain node is used as an executing node, and multi-source verification rule comparison is executed on the basis of the intelligent contract to verify data; at least one blockchain node is used as a participating node, and whether the multisource verification rule executed by the executing node meets the requirement is confirmed only based on the intelligent contract, and the multisource verification rule is not executed to compare the to-be-verified data.
S4.4, if the data to be verified meets the multi-source verification rule, the corresponding blockchain node consensus passes; the multi-source verification rule is configured according to the key comparison field and stored in the blockchain network; the key comparison field is used for uniquely identifying the data to be verified.
In the first application scenario, when all the block chain nodes are respectively compared with the data to be verified, the multisource trusted data obtained by the method can be ensured, and the consistency confirmation of the corresponding entities of the block chain nodes in the block chain network is obtained in the circulation link, so that the public trust of the multisource trusted data is ensured;
in the second application scenario, in the invention, when all blockchain nodes in the blockchain network participate in data verification, part of the nodes are executing nodes, and the other part of the nodes are participating nodes, the effect in the first application scenario can be achieved.
In the third application scenario, in the invention, when part of blockchain nodes in the blockchain network participate in data comparison verification, and the other part does not participate in the comparison verification process, when some blockchain node corresponding entities do not temporarily participate in the production or subsequent communication of the multi-source trusted data, the calculation cost is reduced, the consensus condition is lowered, and the efficient production and communication of the multi-source trusted data are promoted; correspondingly, when the third part of blockchain nodes which do not participate in comparison with the to-be-verified data need to participate in sharing circulation or use of the multi-source trusted data, the second verification and other modes can be confirmed again through manual verification or again through blockchain network consensus.
In the fourth application scenario, in the invention, when all blockchain nodes in the blockchain network participate in the data verification, the first part is the executing node, the second part is the participating node, and the third part does not participate in the comparison verification process, so that the same effect in the third application scenario can be achieved.
S4.5, performing multi-source data comparison by any one or more blockchain nodes based on intelligent contracts, and checking whether the multi-source verification rule is correct by other blockchain nodes which do not participate in the multi-source data comparison calculation, wherein if the multi-source verification rule is correct, the blockchain nodes pass through the multi-source verification rule if the data to be verified meets the multi-source verification rule.
It may be appreciated that the nodes involved in the comparison of the data to be verified may be referred to as executing nodes for executing the multi-source verification rule comparison of the data to be verified based on the smart contracts; the nodes participating in selecting the multi-source verification rule and judging whether the multi-source verification rule meets the requirement of executing comparison, namely judging whether the multi-source verification rule can be used for comparing the data to be verified can be called as participating nodes. The invention is based on the division of block link point functions, and the nodes in the block chain network comprise executing nodes and participating nodes. In some special cases, nodes in the blockchain network may also include nodes in addition to executing nodes and participating nodes to perform other functions.
In one case, when the Key alignment field is one, for example, the Key alignment field is Key01, then the alignment: (1) the field Key01; (2) a field Value01 corresponding to the field Key01; (3) hash value Hash01 of the data record; if the comparison results of (1), (2) and (3) of each data to be verified are the same, namely Key01, value01 and Hash Value Hash01 of different data sources are consistent, and the consistency of the multi-source data is confirmed to be the same, the block chain node consensus passes.
In another case, when the Key alignment field is plural, for example, the Key alignment field includes Key02, key03, key04, then the Key is aligned: (4) the fields Key02, key03 and Key04; (5) a field Value02 corresponding to the field Key02, a field Value04 corresponding to the field Key03, and a field Value04 corresponding to the Key04; (6) and the Hash value Hash01 of the data record is the same as the comparison result in the (4), (5) and (6) of each piece of data to be verified, or the comparison result in the (4), (5) and (6) of each piece of data to be verified is the same as the comparison result in the (4) and (6) of each piece of data to be verified, or the comparison result in the (5) and (6) of each piece of data to be verified is the same as the comparison result in the (5) and (6) of each piece of data to be verified, namely, the comparison results of different data sources are consistent, and the multi-source data are confirmed to be consistent and the block chain node is commonly identified to pass.
The determining whether the data to be verified is trusted data includes:
strong consensus for multi-source trusted data production: if the consensus passing rate Y=100%, judging that the data to be verified is trusted data, otherwise, judging that the data to be verified is untrusted data; after the judgment result is identified, the data to be verified is stored in the blockchain;
in another example, the consensus passing rate of the strong consensus may also be set to be greater than a certain probability, such as greater than 90%, greater than 95%, or greater than 98%, etc. At this time, the upper limit corresponding to the following consensus will be adjusted accordingly.
Multi-source trusted data production weak consensus: if the passing rate Y is less than 100% and the passing party contains authority nodes, the data to be verified is determined to be trusted data; the authoritative node is one or more of the blockchain nodes, and the authoritative node is the blockchain node where the original production source of the data corresponding to the at least one key comparison field is located;
if the passing rate Y is less than 100% and the passing party does not contain authoritative nodes, judging the data to be verified as second data to be confirmed;
if the consensus passing rate Y is less than or equal to 50%, the data to be verified is determined to be unreliable data.
For example, table 1 is data record information of a certain natural person, data records in public security department, civil administration department and education department, respectively:
TABLE 1
/>
In table 1: the public security department is an authoritative verification unit of name, gender and identity card number fields, and can endow the fields of name, gender and identity card number with the identification of public security department and administration department; the authoritative verification field of the marital status of the civil administration can endow the field of the marital status with the identification of the civil administration plus the administrative department; wherein, the "governing department" represents that the department is an authoritative verification unit, and the "public security department" or the "civil administration department" represents a verification unit of the data record; the corresponding node is a blockchain node, and if the department is an authoritative verification unit, the corresponding node is an authoritative node.
Further, specifically, the authoritative node in the invention is one or more of the blockchain nodes, and the authoritative node is the blockchain node where the original production source of the data corresponding to the at least one key comparison field is located; for example, civil departments are the original production departments of "marital status information", and public security departments are the original production departments of "name, gender, identification card number" information.
Referring to table 1, the scheme of strong consensus for multi-source trusted data production refers to that the comparison results of source data obtained from a plurality of data sources are completely consistent, and the trusted data can be confirmed, namely, the data sources provide consistent trust endorsements for the trusted data, and the consistency results of block chain consensus are reflected. That is, in table 1, the field value corresponding to the "name, identification number" field can be confirmed as trusted data, and the data in "sex, age, address" is judged as untrusted data because it is not completely identical in education departments, police departments, and civil departments.
Referring to table 1, in the scheme of weak consensus of multi-source trusted data production, if the authoritative node identifier is included, and the verification of other participants meets the threshold condition, the trusted data can be confirmed, i.e. if the authoritative node identifier is not included, the trusted data cannot be confirmed.
As a further alternative, further determining whether the second data to be confirmed is trusted data includes the steps of:
the authority node receives the second data to be confirmed, creates a corresponding second data to be confirmed verification method based on the received second data to be confirmed, submits the second data to be confirmed verification method to the blockchain network for consensus, judges whether the second data to be confirmed is trusted data according to the second data to be confirmed verification method based on intelligent contract control after the consensus is passed, obtains an authority verification result, broadcasts the authority verification result to the blockchain network for consensus, and stores the consensus after the consensus is passed.
The second verification method for the data to be confirmed comprises the steps of judging whether the data change is reasonable or not under preset rule conditions, for example, the data value of an identity card number field is kept the same, and the age changes with time to generate constant value change;
the second verification method for the data to be confirmed comprises the following steps:
1. acquiring all historical versions of the second data to be confirmed;
2. based on the acquired data history version, automatically analyzing whether the data change meets the preset rule condition or not to obtain an authoritative verification result; if the data change meets the preset rule condition, confirming that the second data to be confirmed is trusted data; and if the second data to be confirmed has no history version or the data change does not meet the preset rule condition, identifying the second data to be confirmed as unreliable data.
According to the invention, through a multi-source data consistency comparison mode, automatic verification and traceable multi-source data credible production are realized for massive data, through converging multi-source data and comparing, the transparency and reliability of the data are ensured through the distributed network storage based on the blockchain network, the data can be checked and verified by users with authority, and the participants only check the multi-source verification rule and do not participate in calculation, so that the production cost of the credible data can be reduced while the credibility of the data is ensured.
Example 6
The supplementary explanation of step 5 "if the result is yes, then the trusted data and the production information related to the trusted data are stored in the blockchain network" based on embodiment 1 includes the following sub steps:
s5.1, carrying out data processing on input information respectively by adopting an algorithm 1 or an algorithm 2 on related process information of multi-source trusted data production, and executing data trusted calculation to obtain output information;
the data trusted computing refers to performing automatic computing based on intelligent contract control;
algorithm 1: a hash algorithm is adopted, and the hash value is obtained through processing of the hash algorithm after the input information is combined;
algorithm 2: according to time sequence, input information is arranged to construct a merkle hash tree, and corresponding root hash values are output.
S5.2, at least one piece of output information is stored in the blockchain network, or all pieces of output information are stored in the blockchain network.
Specifically, the input/output information corresponding to the production information of the trusted data for certification is shown in table 2.
In the invention, the production information of the trusted data comprises various information such as data source related information, source data processing rule information, source data related data processing information, block chain network consensus information and the like, and the production record information of the non-counterfeitable multi-source trusted data is constructed based on the output information of the data identifier, so that the consistency of trusted verification and the atomicity of trusted verification of a plurality of source data are ensured, the production process and the result of the trusted data are subjected to data version control, the change of the data state can be tracked, and the problem of the non-trusted data asset caused by the non-consistency of the multi-source data is solved.
TABLE 2
/>
The invention ensures high value, low cost and traceability of trusted data production through the steps of data preprocessing, data trusted computing and the like based on the close coupling relation, and realizes automatic multisource data consistency comparison based on intelligent contract control to compute and produce trusted and reliable data, thereby improving the circulation use value of the data.
Example 7
The present embodiment provides, based on embodiment 1, hardware devices such as a corresponding blockchain node and a blockchain network for embodiment 1, so as to implement the method described in the foregoing embodiment.
FIG. 2 is a block chain node architecture diagram according to an embodiment of the present invention, as shown in FIG. 2, including:
a data source identifying unit 210, configured to identify a data source, obtain source data from a plurality of data sources, and obtain a source data set;
a data preprocessing unit 220, configured to perform data preprocessing on each source data in the source data set, where the data preprocessing includes source data verification and source data cleaning;
a data combination unit 230, configured to extract, for each source data in the source data set after data preprocessing, a minimum data unit in the source data, and perform data combination, so as to obtain data to be verified;
The data verification unit 240 is configured to compare the data to be verified based on the intelligent contract, and if the data to be verified meets the multi-source verification rule, the consensus passes;
the consensus unit 250 is configured to obtain a consensus passing rate of all blockchain nodes participating in consensus, and determine whether the data to be verified is trusted data;
and the storage unit 260 is configured to store the trusted data and the production information related to the trusted data in the blockchain if the consensus verification of the consensus unit 250 is passed.
FIG. 3 is a block chain system architecture diagram provided by an embodiment of the present invention, as shown in FIG. 3, comprising:
the client is used for sending a data source to a blockchain node in the blockchain network;
the block chain node is used for identifying data sources, acquiring source data from a plurality of data sources and obtaining a source data set; respectively carrying out data preprocessing on each source data in a source data set, wherein the data preprocessing comprises source data verification and source data cleaning; respectively extracting minimum data units in the source data from each source data in the source data set after data preprocessing, and carrying out data combination to obtain data to be verified; based on the intelligent contracts, respectively comparing the data to be verified, and if the data to be verified meets the multi-source verification rule, passing the consensus; and then obtaining the consensus passing rate of all the blockchain nodes participating in the consensus, judging whether the data to be verified is the trusted data, and if so, indicating the blockchain network to store the trusted data and the production information related to the trusted data.
It should be understood that the above blockchain node and the blockchain network are used to execute the method in the above embodiment, and the corresponding program units in the apparatus implement principles and technical effects similar to those described in the above method, and the working process of the apparatus may refer to the corresponding process in the above method, which is not repeated herein.
Based on the method in the above embodiment, the embodiment of the invention provides an electronic device. The apparatus may include: at least one memory for storing programs and at least one processor for executing the programs stored by the memory. Wherein the processor is adapted to perform the method described in the above embodiments when the program stored in the memory is executed.
Based on the method in the above embodiment, an embodiment of the present invention provides an electronic device, including: at least one processor and interface circuitry; the processor obtains program instructions through the interface circuit, and performs the method described in the above embodiment according to the program instructions.
Based on the method in the above embodiment, the embodiment of the present invention provides a computer-readable storage medium storing a computer program, which when executed on a processor, causes the processor to perform the method in the above embodiment.
Based on the method in the above embodiments, an embodiment of the present invention provides a computer program product, which when run on a processor causes the processor to perform the method in the above embodiments.
It is to be appreciated that the processor in the embodiments of the present invention may be a Central Processing Unit (CPU), other general purpose processor, digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), field Programmable Gate Array (FPGA), or other programmable logic device, transistor logic device, hardware component, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.
The steps of the method in the embodiment of the present invention may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in Random Access Memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (erasablePROM, EPROM), electrically erasable programmable read-only memory (electricallyEPROM, EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
It will be appreciated that the various numerical numbers referred to in the embodiments of the present invention are merely for ease of description and are not intended to limit the scope of the embodiments of the present invention.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A multi-source trusted data production method based on a blockchain, which is applied to a blockchain node, and is characterized by comprising the following steps:
s1, identifying data sources by block chain link points, and acquiring source data from a plurality of data sources to obtain a source data set;
s2, the block chain node respectively carries out data preprocessing on each source data in the source data set;
s3, the block chain link points respectively extract minimum data units in the source data for each source data in the source data set after the data preprocessing, and data combination is carried out to obtain data to be verified;
s4, all or part of the blockchain nodes respectively compare the data to be verified based on intelligent contracts, and whether the corresponding blockchain node consensus passes or not is confirmed;
S5, obtaining the consensus passing rate of all the blockchain nodes participating in the consensus, judging whether the data to be verified is the trusted data, and if so, storing the trusted data and production information related to the trusted data in the blockchain network.
2. The method according to claim 1, wherein in step S2, the data preprocessing comprises source data checksum source data cleansing, and step S2 comprises the sub-steps of:
s2.1, respectively carrying out source data verification on each source data in a source data set by a block chain node according to a data verification rule, wherein the source data verification comprises any one or more of structure verification, repeatability verification, missing value verification and abnormal value verification;
s2.2, after the source data passes the verification, the block chain node respectively carries out source data cleaning on each source data in the source data set according to a data cleaning rule to obtain a cleaning result of each source data;
the source data cleaning comprises any one or more of data deduplication processing, missing value processing, data conversion processing and outlier processing;
s2.3, after the source data is cleaned, judging whether a cleaning result of the source data meets the passing condition in the data judging rule or not by the block chain node according to the data judging rule, and if so, marking the source data as first data to be processed as source data after data preprocessing;
If the passing condition in the data judging rule is not met, stopping the operation, marking the cleaned source data as first data to be confirmed, and storing the first data to be confirmed into the blockchain network;
the data verification rule, the data cleaning rule and the data judgment rule belong to a data preprocessing rule, and the data preprocessing rule is preset through block chain link points.
3. A method according to claim 1 or 2, characterized in that step S3 comprises the sub-steps of:
s3.1, acquiring source data after data preprocessing by the block chain link points;
s3.2, the blockchain node performs data combination on the minimum data unit of the source data after data preprocessing based on a data combination rule of the blockchain network consensus storage card to obtain data to be verified, and stores the data to be verified into a blockchain network; the minimum data unit refers to a combination of a field and a field value of a certain data record in a data table of source data.
4. The method according to claim 1, characterized in that step S4 comprises the sub-steps of:
s4.1, respectively comparing the data to be verified based on intelligent contracts, wherein the data to be verified comprises one of the following two modes:
mode 1: all the blockchain nodes are used as executing nodes, and multisource verification rule comparison is executed on the basis of intelligent contracts to verify data;
Mode 2: at least one blockchain node is used as an executing node, and multi-source verification rule comparison is executed on the basis of the intelligent contract to verify data; at least one blockchain node is used as a participating node, and whether the multisource verification rule executed by the executing node meets the requirement is confirmed only based on the intelligent contract, and the multisource verification rule is not executed to compare the to-be-verified data;
s4.2, if the data to be verified meets the multi-source verification rule, the corresponding blockchain node consensus passes; the multi-source verification rule is configured according to the key comparison field and stored in the blockchain network; the key comparison field is used for uniquely identifying the data to be verified.
5. The method according to claim 4, wherein determining whether the data to be verified is trusted data in step S5 comprises:
if the consensus passing rate is a first preset value or greater than the first preset value, judging that the data to be verified is trusted data, otherwise, judging that the data to be verified is untrusted data; after the judgment result is identified, the data to be verified is stored in the blockchain;
if the consensus passing rate is between the second preset value and the first preset value and the participant passing through the consensus contains an authoritative node, judging that the data to be verified is trusted data; the authoritative node is one or more of the blockchain nodes, and the authoritative node is the blockchain node where the original production source of the data corresponding to the at least one key comparison field is located; the first preset value is larger than the second preset value;
If the consensus passing rate is between the second preset value and the first preset value and the participant passing through the consensus does not contain an authoritative node, judging that the data to be verified is second data to be confirmed;
and if the consensus passing rate is not greater than the second preset value, judging that the data to be verified is unreliable data.
6. The method according to claim 5, wherein in step S5, determining whether the data to be verified is trusted data, further comprises:
the authority node receives second data to be confirmed, judges whether the second data to be confirmed is trusted data or not according to a second data to be confirmed verification method stored in the blockchain network based on intelligent contract control, obtains an authority verification result and broadcasts the authority verification result to the blockchain network for consensus, and stores trusted data corresponding to the second data to be confirmed and related production information of the trusted data in the blockchain network after the consensus is passed; the second data verification method to be confirmed is established by an authoritative node, and is commonly recognized and passed in a blockchain network for verification.
7. The method according to claim 1, characterized in that step S5 further comprises the sub-steps of:
when the data to be verified is the trusted data, carrying out data processing on the multi-source trusted data by adopting a preset algorithm to obtain production information related to the multi-source trusted data, and then storing part or all of the obtained production information into a blockchain network; the related production information includes: data source information, source data processing rule information, source data related data processing information and block chain network consensus information;
The preset algorithm comprises one or more of the following algorithms:
a hash algorithm is adopted, and the hash value is obtained through processing of the hash algorithm after the input information is combined;
or, according to time sequence, the input information is arranged to construct a merkle hash tree, and the corresponding root hash value is output.
8. A blockchain node, comprising:
the data source identification unit is used for identifying data sources, acquiring source data from a plurality of data sources and obtaining a source data set;
the data preprocessing unit is used for respectively preprocessing the data of each source data in the source data set, and the data preprocessing comprises source data verification and source data cleaning;
the data combination unit is used for respectively extracting the minimum data units in the source data from each source data in the source data set after the data preprocessing to carry out data combination so as to obtain data to be verified;
the data verification unit is used for comparing the data to be verified based on the intelligent contract, and if the data to be verified meets the multi-source verification rule, the consensus passes;
the consensus unit is used for acquiring the consensus passing rate of all the blockchain nodes participating in the consensus and judging whether the data to be checked are trusted data or not;
and the storage unit is used for storing the trusted data and the production information related to the trusted data into the blockchain if the consensus verification of the consensus unit is passed.
9. A blockchain system, comprising:
the client is used for sending a data source to a blockchain node in the blockchain network;
the block chain node is used for identifying data sources, acquiring source data from a plurality of data sources and obtaining a source data set; respectively carrying out data preprocessing on each source data in a source data set, wherein the data preprocessing comprises source data verification and source data cleaning; respectively extracting minimum data units in the source data from each source data in the source data set after data preprocessing, and carrying out data combination to obtain data to be verified; based on the intelligent contracts, respectively comparing the data to be verified, and if the data to be verified meets the multi-source verification rule, passing the consensus; and then obtaining the consensus passing rate of all the blockchain nodes participating in the consensus, judging whether the data to be verified is the trusted data, and if so, indicating the blockchain network to store the trusted data and the production information related to the trusted data.
10. A computer readable storage medium storing a computer program, characterized in that the computer program, when run on a processor, causes the processor to perform the method of any one of claims 1-7.
CN202311189157.9A 2023-09-14 2023-09-14 Multi-source trusted data production method based on blockchain, blockchain node and system Pending CN117171812A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311189157.9A CN117171812A (en) 2023-09-14 2023-09-14 Multi-source trusted data production method based on blockchain, blockchain node and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311189157.9A CN117171812A (en) 2023-09-14 2023-09-14 Multi-source trusted data production method based on blockchain, blockchain node and system

Publications (1)

Publication Number Publication Date
CN117171812A true CN117171812A (en) 2023-12-05

Family

ID=88931625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311189157.9A Pending CN117171812A (en) 2023-09-14 2023-09-14 Multi-source trusted data production method based on blockchain, blockchain node and system

Country Status (1)

Country Link
CN (1) CN117171812A (en)

Similar Documents

Publication Publication Date Title
US10564936B2 (en) Data processing systems for identity validation of data subject access requests and related methods
US11783024B2 (en) Systems, methods, and apparatuses for protecting consumer data privacy using solid, blockchain and IPFS integration
US11438383B2 (en) Controlling permissible actions a computing device can perform on a data resource based on a use policy evaluating an authorized context of the device
US11431486B2 (en) System or method to implement consensus on read on distributed ledger/blockchain
US10356094B2 (en) Uniqueness and auditing of a data resource through an immutable record of transactions in a hash history
US20200250176A1 (en) Systems, methods, and apparatuses for distributing a metadata driven application to customers and non-customers of a host organization using distributed ledger technology (dlt)
WO2021036545A1 (en) Smart contract-based data processing method, and device and storage medium
US11387979B2 (en) Partially-ordered blockchain
US11940971B2 (en) Blockchain implementing reliability database
US11093495B2 (en) SQL processing engine for blockchain ledger
CN111919417A (en) System, method and apparatus for implementing super communities and community sidechains for distributed ledger technology with consensus management in a cloud-based computing environment
US20210243010A1 (en) Workflow Management Via Distributed Ledgers and Smart Contracts
Ahmad et al. Secure and transparent audit logs with BlockAudit
US11669532B2 (en) Blockchain implementing reliability database
US11343101B2 (en) Authentication through verification of an evolving identity credential
Piao et al. Privacy preserving in blockchain-based government data sharing: A Service-On-Chain (SOC) approach
CN111291394B (en) False information management method, false information management device and storage medium
CN115186304B (en) Transaction data verification method and system based on block chain
US11044104B2 (en) Data certification as a service powered by permissioned blockchain network
CN112712452A (en) Approval information processing method and device based on block chain
CN110598007B (en) Bill file processing method, device, medium and electronic equipment
CN111817859A (en) Data sharing method, device, equipment and storage medium based on zero knowledge proof
CN113011960A (en) Block chain-based data access method, device, medium and electronic equipment
Wang et al. Cloud data integrity verification algorithm based on data mining and accounting informatization
US11782823B2 (en) Automatically capturing weather data during engineering tests

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination