CN117353891A - Data transaction platform of decentralization - Google Patents

Data transaction platform of decentralization Download PDF

Info

Publication number
CN117353891A
CN117353891A CN202311183915.6A CN202311183915A CN117353891A CN 117353891 A CN117353891 A CN 117353891A CN 202311183915 A CN202311183915 A CN 202311183915A CN 117353891 A CN117353891 A CN 117353891A
Authority
CN
China
Prior art keywords
data
metadata
sold
transaction
seller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311183915.6A
Other languages
Chinese (zh)
Inventor
郭嘉丰
于雷
廖华明
程学旗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202311183915.6A priority Critical patent/CN117353891A/en
Publication of CN117353891A publication Critical patent/CN117353891A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0442Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/18Network architectures or network communication protocols for network security using different networks or channels, e.g. using out of band channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3247Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving digital signatures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/56Financial cryptography, e.g. electronic payment or e-cash
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/72Signcrypting, i.e. digital signing and encrypting simultaneously
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/102Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00 applying security measure for e-commerce

Abstract

The invention provides a data transaction platform with decentralization, which comprises: a metadata management module configured to: acquiring metadata corresponding to data to be sold by a seller and encrypted data corresponding to the data, wherein the metadata comprises data description information and a seller address; the data check and repeat module is configured to: according to the encrypted data corresponding to the data and the encrypted data corresponding to the existing data, carrying out data duplication in a non-decryption state, and determining the repetition rate of the data to be sold, wherein the repetition rate influences the grading of the data; a data transaction module configured to: after the buyer determines to purchase data for the seller according to the metadata and the score of the data to be sold, a data transmission payment protocol based on intelligent contract is established between the seller and the buyer and the data is transmitted through an encrypted channel.

Description

Data transaction platform of decentralization
Technical Field
The invention relates to the field of computers, in particular to the technical field of data transaction, and more particularly relates to a decentralised data transaction platform.
Background
Current centralized data transaction patterns mainly include a managed pattern and an aggregated pattern. The hosting mode refers to that a data owner hosts own data to a transaction platform, and has a significant problem that the transaction platform obtains control of the data, and the data owner may lose control of the use and transaction of the data, which may expose a user to risk of data leakage. Especially in cases where data sensitivity and privacy requirements are high, the adoption of managed modes may be at greater risk. The aggregate pattern, on the other hand, is a relatively popular data transaction pattern that can alleviate the drawbacks of the managed pattern. In aggregate mode, the data owner may interface with the data through the API without having to host the data to the transaction platform. In this way, the data owner can better control the use and transaction of the data. However, even in aggregate mode, the platform may retain the data as it passes through the transaction platform, which may pose a risk to the security of the data. Furthermore, single point failure in aggregate mode is also a non-negligible issue.
While centralized data transaction patterns are continuously perfected in practice, various problems and challenges remain. For example, the parties may be difficult to match. The transaction flow of the centralized data transaction platform usually needs to be confirmed and matched through the intermediary of the platform, which results in low transaction efficiency and low transaction speed. Second, data transaction fairness is difficult to guarantee. Platform operators of centralized data transaction platforms often have relatively high rights that they can influence the fairness of data transactions by modifying data traffic, setting prices, limiting data access, etc.; moreover, once the buyer buys bad data, it faces the embarrassment of difficulty in maintaining rights, which also constitutes an unfairness of the transaction. Third, data resale is difficult to keep away. On a centralized data transaction platform, once the data is sold, it is difficult for sellers to control the further use and dissemination of the data. Secondary resale of data is common due to the low cost of copying and transmitting data. Fourth, data privacy is difficult to protect. On a centralized data transaction platform, the user's data often needs to be uploaded to a platform server, which can lead to a risk of compromised data privacy. In addition, platform operators often also have access to user data, which they may violate user privacy by misusing data rights and the like. In summary, the data transaction under the traditional centralized platform has the defects that the buyer and the seller are difficult to match, the fairness of the data transaction is difficult to ensure, the data resale is difficult to prevent, the data privacy is difficult to protect, and the like. Therefore, there is a need to further explore and develop a safer, more efficient and reliable data transaction mechanism for decentralization to address the various drawbacks of the above-described decentralization platform and to meet the needs of various application scenarios.
With the increasing data transactions, many problems have arisen with conventional centralized data transaction platforms. Including difficult data transaction fairness, difficult data resale prevention, difficult data privacy protection, and the like. The existence of these problems has become a daunting problem with current data transaction platforms.
In order to solve these problems, in recent years, a decentralised data transaction platform has been attracting attention. The decentralization data transaction platform adopts a decentralization architecture based on blockchain and intelligent contract technology, so that the transaction process is safer and more fair. However, the current decentralized data transaction platform does not completely solve the above problems. Similar bid products on the market are not much. The data transaction platform is mainly based on an decentralization data exchange Protocol (Ocean Protocol).
But this platform does not solve the previously mentioned problems:
1. the platform transaction process is unfair. When a user uploads data, the platform requires that a data sample be provided for the buyer to check. However, the data sample may be artificially beautified, resulting in buyers buying data that is inconsistent with the sample. In addition, once the transaction of the platform is knocked out, the transaction cannot be changed, which causes poor information of buyers and unfair phenomenon.
2. The platform lacks data privacy protection measures. First, data privacy cannot be protected. The platform relies on a Provider module (Provider) to upload data, requiring the user to upload a clear URL whose content can be downloaded to anyone. Although the platform claims that the URL will be encrypted after release, there is still a risk of leakage in the uploading of the URL to the provisioning module.
Accordingly, there is a need for improvements over the prior art.
Disclosure of Invention
It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art and to provide a data transaction platform that is decentralised.
The invention aims at realizing the following technical scheme:
according to an alternative embodiment of the present invention, there is provided a data transaction platform for decentralization, comprising: a metadata management module configured to: acquiring metadata corresponding to data to be sold by a seller and encrypted data corresponding to the data, wherein the metadata comprises data description information and a seller address; the data check and repeat module is configured to: according to the encrypted data corresponding to the data and the encrypted data corresponding to the existing data, carrying out data duplication in a non-decryption state, and determining the repetition rate of the data to be sold, wherein the repetition rate influences the grading of the data; a data transaction module configured to: after the buyer determines to purchase data for the seller according to the metadata and the score of the data to be sold, a data transmission payment protocol based on intelligent contract is established between the seller and the buyer and the data is transmitted through an encrypted channel.
Optionally, the encrypted data corresponding to the data is unidirectional encrypted data for reflecting the content of the data.
Optionally, the data transaction module is configured to: based on the data transmission payment protocol, a transaction mechanism of point-to-point data segment transmission, segment verification, segment payment and breakpoint continuous transmission is established between the buyer and the seller.
Optionally, the data transaction module is configured to: in the process of the segmented transmission, in response to a request for terminating a transaction from a seller or a buyer, the transaction is terminated and settlement of the transaction is performed based on the ratio of the amount of data that has been transmitted and the total amount of data.
Optionally, when the data to be sold is a document type data product, the encrypted data corresponding to the data to be sold is obtained in the following manner: carrying out document preprocessing on the data to be sold, wherein the document preprocessing comprises the steps of removing stop words, removing punctuation marks, and carrying out word segmentation on documents after the stop words and the punctuation marks are removed, so as to obtain word segmentation results; and calculating a hash signature according to the word segmentation result by using a Minhash algorithm, and taking the hash signature as encrypted data corresponding to the data to be sold.
Optionally, for the document type data products, a data check and repeat module establishes a plurality of hash buckets according to hash signatures corresponding to the document type data products based on a local sensitive hash algorithm so as to divide similar hash signatures into the same hash bucket; and acquiring a hash bucket in which the hash signature of the data to be sold is positioned, calculating the similarity based on the hash signature of the data to be sold and the existing data of the hash bucket in which the data to be sold is positioned, and taking the maximum similarity corresponding to the data to be sold as the repetition rate of the data to be sold.
Optionally, when the data to be sold is a tabular data product, the encrypted data corresponding to the data to be sold is obtained in the following manner: performing table pretreatment on the table type data product, wherein the table pretreatment comprises blank row and blank column removal, and sorting the table after blank row and blank column removal according to a preset row-column sorting rule to obtain a sorted table; splitting the ordered table into a plurality of sub-tables according to a preset splitting mode and an input requirement of a bloom filter, wherein each sub-table is a table item in the row; and inserting the multiple sub-tables of each row into a bloom filter, outputting a bitmap corresponding to the table, and taking the bitmap corresponding to the table as encrypted data corresponding to the data to be sold.
Optionally, for the tabular data product, the data duplication checking module calculates the similarity based on the data to be sold and the bitmap of each existing data, and takes the maximum similarity corresponding to the data to be sold as the repetition rate.
Optionally, the data transaction platform registers, puts aside and inquires metadata based on intelligent contracts of the blockchain, creates orders, pays check sum transaction settlement, checks validity of message signatures and evaluates data.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1 is a block diagram of a decentralized data transaction platform according to one embodiment of the invention;
FIG. 2 is a schematic diagram of an implementation structure of a data transaction platform with decentralization according to an embodiment of the invention;
FIG. 3 is a flow chart of uploading data from a seller and retrieving data from a buyer on a decentralized data transaction platform according to an embodiment of the invention;
FIG. 4 is a flow diagram of buyer retrieval data, seller publication metadata and seller off-shelf metadata according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an interface of a registry contract according to an embodiment of the invention;
FIG. 6 is a schematic diagram of a smart contract implementation Registry interface, according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a data transaction according to an embodiment of the present invention.
Detailed Description
For the purpose of making the technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by way of specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As mentioned in the background section, existing data transaction platforms suffer from the problems of unfair platform transaction processes and lack of data privacy protection measures on the platform. The design and implementation of a data transaction platform with fair transaction and safe privacy have great practical significance, and no matter in the traditional centralized platform or the existing decentralized platform, there are still very many problems in terms of data privacy safety and fairness of transactions. The security is embodied in a plurality of aspects, namely, the data storage security, the data transmission security, the user fund security, the user data rights security and the user data privacy security. In contrast, the invention constructs a decentralised data transaction platform, referring to fig. 1, the platform only obtains metadata corresponding to data uploaded by sellers and encrypted data corresponding to the data through a metadata management module, and original data does not need to be uploaded to the transaction platform, so that the data storage safety and the user data privacy safety are ensured; in addition, the platform obtains the encrypted data without decryption, and the data duplicate checking module directly checks the data duplicate according to the encrypted data, so that the user data rights and interests safety is ensured; finally, after the seller and the buyer reach the transaction, a data transmission payment protocol is established between the seller and the buyer through the data transaction module, and data is transmitted based on an encryption channel, so that the data transmission safety and the user fund safety can be ensured. Thus, a fair and data privacy guaranteeing data transaction platform is provided for buyers and sellers.
According to an embodiment of the present invention, referring to fig. 2, the present invention provides a decentralised data transaction platform, whose architecture includes four layers, namely a front-end, a function layer, a middleware layer and an intelligent contract layer. Wherein:
the first layer is a front end and is used for providing an interactive interface of a user, and the interactive interface comprises four parts of interactive functions: data uploading, data off-shelf, data querying and data transaction.
The second layer is a functional layer and comprises a metadata management module, a data duplication checking module and a data transaction module. Wherein:
a metadata management module configured to: metadata corresponding to data to be sold by a seller and encrypted data corresponding to the data are obtained, wherein the metadata comprises data description information and a seller address. The metadata management module is responsible for functions including metadata uploading, metadata putting down, metadata inquiring and data right-determining.
The data check and repeat module is configured to: and according to the encrypted data corresponding to the data and the encrypted data corresponding to the existing data, carrying out data check in a non-decryption state, and determining the repetition rate of the data to be sold, wherein the repetition rate influences the grading of the data. The data check and reconstruction module is responsible for the functions including data preprocessing, text word segmentation, bloom filter, minHash algorithm and LSH algorithm.
A data transaction module configured to: after the buyer determines to purchase data for the seller according to the metadata and the score of the data to be sold, a data transmission payment protocol is established between the seller and the buyer and the data is transmitted based on the encrypted channel. The data transaction module realizes fairness of transaction through data segmentation and intelligent contracts based on TLS encrypted channels. The data transmission payment protocol supports functions such as breakpoint transmission, forced settlement, payment verification and the like. Wherein, forced settlement means that any party can terminate the transaction in the transaction process, and the platform performs settlement according to the transmission quantity of the data. Payment verification refers to the fact that the seller server can automatically verify the validity of the buyer's payment without the buyer providing additional proof.
The third layer is a middleware layer comprising an elastic search database and a blockchain-based distributed file storage system (InterPlanetary File System, IPFS for short, also called interstrand file system). The IPFS is used for storing metadata corresponding to the data and/or encrypted data corresponding to the data; the elastomer search database is used for establishing an inverted index and providing data query support. Of course, those skilled in the art may also implement other similar schemes, such as metadata corresponding to the data and/or encrypted data corresponding to the data being stored based on the dis database.
The fourth layer is an intelligent contract layer, is a building foundation of the whole platform and mainly comprises registry contracts, signature verification contracts and order book contracts. Registry contracts are responsible for managing metadata information for the entire platform and provide contract support for metadata registration, off-shelf, and query functions. Signature verification contracts are responsible for contract support to verify the validity of a message signature. The order book contract is responsible for recording the order status of the entire platform and providing contract support for order creation, credit mortgage, payment verification, and transaction settlement.
The function of the platform in use is described in three stages below.
According to one embodiment of the invention, the data transaction platform is configured to:
in the data product registration phase, the steps performed include:
metadata corresponding to the data to be sold uploaded by the seller and encrypted data corresponding to the data are obtained by a metadata management module;
the data searching and repeating module searches and repeats the newly uploaded data according to the historically uploaded data to obtain the repetition rate;
the metadata management module sends the encrypted data corresponding to the data subjected to duplicate checking to the distributed file storage system for storage;
the metadata management module sends metadata corresponding to the data through duplicate checking to a distributed file storage system based on a block chain for storage and obtains a hash value fed back after the distributed file storage system stores the metadata;
Registering the data based on the hash value fed back after the metadata is stored in the distributed file storage system in the registry contract to obtain the identity information corresponding to the data;
establishing an inverted index by utilizing an elastic search database according to the identity information and the data description information corresponding to the data and providing search service;
in the data product screening stage, the steps performed include:
inputting keywords by a buyer to perform data query, and calling search service provided by an elastic search database by a metadata management module to query related data to obtain identity information corresponding to the related data;
inquiring a list of hash values corresponding to the related data according to the identity information corresponding to the related data in the registry contract;
querying a corresponding metadata list by the distributed file storage system according to the hash value list corresponding to the related data, and querying scores corresponding to the metadata in the metadata list based on an evaluation contract;
sorting each metadata in the metadata list according to the scores, returning the metadata to the buyer, and screening the required data by the buyer according to the metadata list and the scores;
in the data product transaction phase, the steps performed include:
After the buyer screens out the required data, establishing contact with the seller according to the seller address in the corresponding metadata;
data transmission and transaction are carried out by the buyer and the seller through a data transmission payment protocol;
after the data transmission required by the buyer is finished, the buyer performs personal evaluation on the data, and the personal evaluation influences the grading of the data.
Referring to fig. 3, a flow example of uploading data by a seller and retrieving data by a buyer is described below. FIG. 3 is a block diagram of a decentralized data transaction platform illustrating the components contained within the platform and the manner in which the components cooperate. In the figure, solid lines represent seller data streams, dashed lines represent buyer data streams, dotted lines represent affiliations (e.g., smart contracts and blockchains), and arrows represent data streams.
According to one example of the present invention, a de-centralized data transaction platform is deployed on a server, wherein the process of uploading data by a seller and retrieving data by a buyer includes:
1. the vendor needs to provide a server with a detailed copy of metadata in JSON format, including: data description information (info field), vendor contact { IP: port, corresponding to vendor address }, and encrypted data.
2. And the duplicate checking service checks the duplicate of the data and acquires a result.
3. After the duplicate checking, the server stores the encrypted data corresponding to the data into the IPFS.
4. The server stores the JSON formatted metadata in IPFS, which returns a hash value to the server.
5. The server registers the hash value in the registry intelligent contract and acquires the identity information (i.e. docID) corresponding to the data. The docID refers to the subscript of the data product in the smart contract array, has uniqueness, and can be recorded in the smart contract and cannot be tampered if the product is suspected to be resale.
6. The server writes the Info field in docID and JSON into the elastic search database to build an inverted index, and in order to improve the index efficiency and reduce the storage rate, the elastic search database does not store the Info field, and the Info field is only used for building the index.
7. After the seller registers the product through steps 1-6, the buyer can query the data, for example: the server is queried for specific data using the keywords.
8. The server obtains the relevant docID list through the elastomer search database.
9. A hash value list is obtained from the relevant docID list, for example: the server queries the registry smart contract (i.e., registry contract) for a list of hash values based on the associated docID list.
10. The server queries IPFS to obtain metadata list, and then queries corresponding scores through the evaluation intelligent contract (namely the evaluation contract), sorts the list and returns to the user. If the product is suspected of being resale twice (whether the product is non-original or less original is determined based on repetition rate), the resale product display ordering will be very back and will be labeled and pointed to the original data.
11. The buyer performs P2P data transactions with the buyer, such as: after the buyer screens out the desired product, the buyer selects the desired product by the seller address { IP: port) contacts the seller and performs point-to-point data transactions via a data transfer payment protocol.
12. Data quality evaluation, after the data transmission is finished, the buyer can evaluate the data.
Embodiments of the metadata management module, the data duplication module, and the data transaction module in the functional layer are described in more detail below, respectively.
1. Metadata management module
Metadata management is an important component in a decentralized data transaction platform, and mainly provides metadata management functions, namely: is responsible for managing the metadata information of the seller and providing the buyer with a keyword retrieval function. In addition, the vendor address and data need to be bound to determine the data ownership.
The metadata management function includes: the platform needs to support publishing, off-shelf and retrieval of user metadata. Since the metadata publishing process needs to be up-chain and stored using IPFS and needs to build an inverted index, the inverted index has high efficiency of retrieval.
Three aspects of functional architecture schematic design, storage and index design, and design of registry contracts are described below.
1A. Schematic design of functional Structure
According to one embodiment of the present invention, the metadata management module mainly includes the following 6 classes of functions:
the core class (MetadataManager) of the metadata management module is responsible for handling user requests and managing metadata.
An IPFS client class (IPFSClient) is responsible for interacting with the IPFS, uploading, obtaining metadata and corresponding encrypted data.
A smart contract client class (smartContractClient) is responsible for interacting with smart contracts, storing and retrieving hash values of IPFS.
An elastomer search client class (ESClient) responsible for interacting with the elastomer search database.
Metadata class (Metadata) for representing Metadata stored in a platform.
Metadata index class (metadata index) for building and querying inverted indexes.
Fig. 4a shows a flow of buyer retrieval data, the illustrative steps comprising:
1. the buyer initiates a search request to the server, for example: the buyer searches for keywords of interest to himself.
2. The server queries the elastic search database to obtain identity information, for example: the server calls the elastic search database according to the key words to obtain a docID (namely identity information, namely the elastic search index id and uniqueness) list with decreasing matching degree.
3. The server queries the smart contracts based on docID, for example: if the docID list is not empty, the relevant data on the platform is proved, and the server searches the intelligent contract through the docID list to obtain an IPFS hash list.
4. The server retrieves the metadata list from the IPFS and returns it to the user.
5. The user further selects data of interest based on the metadata description and pricing.
Fig. 4b shows a schematic flow chart of vendor publishing metadata. According to one example of the present invention, a process for a seller to publish data includes:
1. the seller uploads the encrypted data and the metadata, and the server performs duplicate checking on the data.
2. If the check passes, the metadata and the encrypted data are stored in the IPFS and the hash value is obtained. If the failure occurs, returning error information; for example: and if the repetition rate of the check repetition is lower than a preset repetition rate threshold, checking the repetition is considered to pass, otherwise, checking the repetition is considered to not pass, and registration is not performed.
3. The server writes the hash value to the blockchain, for example: the server writes the hash value to the intelligent contract of the blockchain and obtains the subscript (i.e., docID in the figure) of the intelligent contract, which has uniqueness.
4. The server enters docID and metadata into the elastic search database to build an inverted index.
According to one embodiment of the present invention, metadata is put down in the following manner: uploading metadata to be taken off shelf by a seller; inquiring metadata which is most matched with metadata needing to be put off by a data check and repeat module; in the event that the seller is determined to be the data owner, the best matching metadata is shelved.
Fig. 4c shows a schematic flow chart of under-seller metadata. According to one example of the invention, the flow of vendor off-shelf metadata includes:
1. the user uploads metadata that needs to be taken off-shelf.
2. The server queries the elastic search database to obtain the docID of the best matching item.
3. The server queries the smart contract for docID to obtain the IPFS store hash.
4. The server obtains the metadata through IPFS.
5. The server judges that the metadata exists and the user is a data owner, if the user is a data owner, corresponding items in the IPFS and the elastic search database are deleted; for example: the server checks the correctness of the data and whether the user owns the property rights of the data, if the two conditions are true at the same time, the corresponding items in the ES (i.e. the elastic search database), the IPFS and the contract are deleted one by one, otherwise, an error is returned.
1B storage and index design
The platform needs to guarantee the integrity and availability of the metadata store. The safest way is to store the metadata directly onto a large public blockchain (e.g., ethernet), but storing metadata with IPFS, and storing hashes returned by IPFS with the public blockchain is a better way to take into account transaction costs.
The elastomer search database itself is implemented by an inverted index, but its default storage retrieval field setting adds to the index overhead. Therefore, the store variable of the search field can be set to false when the index is created, so that the elastic search database does not store the search field any more, and the index efficiency is improved.
1C design of registry contracts
Referring to fig. 5, the interface in fig. 5 defines a method and a data type of a Registry interface, and the Registry interface is named as IRegistry (interface IRegistry).
Product structure: representing a data product containing a hash value of the uint256 type and a status code of the uint8 type. The status code is used to indicate whether the metadata is valid or invalid.
register () interface: for registering a new data product with the Registry contract.
delist () interface: for shelving a registered data product.
query () interface: for querying all valid data products.
The smart contract in FIG. 6 implements a Registry interface and defines some private methods and private variables, including:
product [ ] private productList: an array of Product types stores all registered data products. The array is private and can only be used inside the smart contract.
mapping (uint 256= > address) private productSellers: a mapping from hash values to vendor addresses is used to optimize query performance. The mapping is private and can only be used inside the contract.
function getProductPosition (uint 256_hash) private view returns (uint 256): a proprietary view method, view means that the method does not change the contract state, i.e., does not consume Gas Fee. The method is used for inquiring the position of the Product corresponding to a certain hash value in the Product list array. The method queries the product sellers mapping according to the hash value, converts the hash value into position information of a uint256 type to return if the seller address corresponding to the hash value is not zero, and returns-1 otherwise.
function register (uint 256_hash, uint 8_status) external override returns (boost): an external method for registering a new data product with a Registry contract. It first checks if a hash value already exists and if so returns false. If not, it will add the new product to the end of the ProductList array and to the ProductSeller map. If the addition is successful, true will be returned.
function delete (uint 256_hash) external override returns (pool): external methods for shelving a registered data product require that the caller be the seller of the data product and that the product be valid (i.e., the first bit of the status code is 1). If the deletion is successful, the method returns true.
function query () external view override returns (Product [ ] memory): the external view method is used for querying all valid data products, namely, the products with the first bit of the status code being 1. It traverses the ProductList array, adds the valid product to a new array, and returns the array.
In the register () and delete () methods, it is necessary to first query the position of a Product corresponding to a certain hash value in the Product list array, and thus call getproduct position (). If the getProductPosition () returns-1, indicating that the hash value does not exist or has been put off shelf, then it is necessary to add a new Product to the end of the ProductList array in the register () and add a new record in the ProductSeller map; in delete (), false is returned directly. If the getProductPosition () returns a non-negative position value, indicating that the hash value already exists, then false needs to be returned in the register (); in delete () it is further checked whether the Product is valid, if so, it is down-shelf and the corresponding record is deleted in the Product senders map.
2. Data duplicate checking module
In order to protect the data property rights of the platform users, the platform needs to guarantee that the data is uniquely bound with the real owners. Here, a duplication checking algorithm of the decentralised data transaction platform will be designed to better trace the ownership of the data.
The data check and repeat module provides an in-station data check and repeat function and comprises the following steps: firstly, the platform should support the check of multiple data formats, and according to the platform, whether the data belongs to the derivative data obtained by limited modification of the existing data in the station can be effectively identified in a limited time. Firstly, the accuracy and precision of data check are required to be ensured; secondly, the platform needs to ensure the efficiency and the response time of data check, the response time of the check is determined by the existing data volume and the data self size of the platform, and under the service scene, the check algorithm needs to be optimized to reduce the response time of the system. Since data is divided into a document and a table, the following description will be given respectively:
2A design of document duplication checking algorithm
Assuming that a perpetrator will perform active and passive sentence conversion on a document, synonym substitution and deletion or addition of sentences, the text review algorithm needs to accurately identify the above situations.
k-Shange is a text similarity calculation method that converts text into a set of substrings of length k. In this approach, each k adjacent characters in the text are considered a hash, and the text is represented as a hash set. For example, in the case of k=3, the string "Hello World" is converted into a shine set { "Hel", "ell", "llo", "lo", "o W", "Wo", "word", "orl", "rld" }. The similarity of the two texts can then be compared by comparing the intersection and union of their shindle sets. This algorithm, although relatively naive, has a high degree of accuracy. However, when the text size becomes large, its O (n 2 ) The time complexity of (a) can severely impact system throughput.
According to one embodiment of the present invention, when the data to be sold is a document-type data product, the encrypted data corresponding to the data to be sold is obtained as follows: carrying out document preprocessing on the data to be sold, wherein the document preprocessing comprises the steps of removing stop words, removing punctuation marks, and carrying out word segmentation on documents after the stop words and the punctuation marks are removed, so as to obtain word segmentation results; and calculating a hash signature according to the word segmentation result by using a Minhash algorithm, and taking the hash signature as encrypted data corresponding to the data to be sold. The MinHash algorithm uses a set of random permutation functions (semantications) to convert each element in the dataset into a signature (signature) vector. The elements of each signature vector are calculated in such a way that the elements are represented as minimum hash values according to the permutation function.
For duplicate checking, the MinHash algorithm may be used with the LSH (locality sensitive hashing) algorithm to achieve a faster and scalable approximate similarity calculation. According to one embodiment of the invention, for document type data products, a data check and repeat module establishes a plurality of hash buckets according to hash signatures corresponding to the document type data products based on a local sensitive hash algorithm so as to divide similar hash signatures into the same hash bucket; and acquiring a hash bucket in which the hash signature of the data to be sold is positioned, calculating the similarity based on the hash signature of the data to be sold and the existing data of the hash bucket in which the data to be sold is positioned, and taking the maximum similarity corresponding to the data to be sold as the repetition rate of the data to be sold. The LSH algorithm uses a hash function to group elements in the dataset in order to quickly find similar elements. Therefore, the Minhash algorithm can be used together with the LSH algorithm to effectively solve the problem of similarity calculation on a large-scale data set.
2B design of table duplication checking algorithm
Tables are structured data, as opposed to text, which is unstructured data. The rows and columns of the table can be transformed randomly, so that the extraction of the characteristic vectors of the table is more difficult than the extraction of the characteristic vectors of the table, and the data preprocessing process of the table is more complex.
Suppose that a perpetrator performs a rank transformation, cell modification, and rank pruning on a table. The form check recalculation method needs to detect the three conditions.
The prior article proposes a method for extracting a table feature vector: each row is considered as a set, all subsets of the rows are inserted into a bloom filter, and after all rows are inserted into the bloom filter in this way, the bloom filter is taken as a feature vector of a table, but the check weight accuracy needs to be improved. According to one embodiment of the present invention, when the data to be sold is a tabular data product, the encrypted data corresponding to the data to be sold is obtained as follows: performing table pretreatment on the table type data product, wherein the table pretreatment comprises blank row and blank column removal, and sorting the table after blank row and blank column removal according to a preset row-column sorting rule to obtain a sorted table; splitting the ordered table into a plurality of sub-tables according to a preset splitting mode and an input requirement of a bloom filter, wherein each sub-table is a table item in the row; and inserting the multiple sub-tables of each row into a bloom filter, outputting a bitmap corresponding to the table, and taking the bitmap corresponding to the table as encrypted data corresponding to the data to be sold. The similarity of two tables can be calculated by dividing the intersection of bloom filters by the union. For the table type data product, the data check and repeat module calculates the similarity based on the data to be sold and the bitmap of each existing data, and takes the maximum similarity corresponding to the data to be sold as the repetition rate. Bloom filters (bloom filters) may hash strings into a bitmap (bitmap, or bit array), from which a repetition rate (similarity) may be calculated based on the bitmap. The technical scheme of the embodiment at least can realize the following beneficial technical effects: the table duplicate checking algorithm adds a data preprocessing module, firstly removes blank rows and columns, and then sorts the similar tables according to the rows and columns, so that the similar tables have the same row-column sequence, and the duplicate checking accuracy is improved.
3. Data transaction module
The data transaction module is used for providing data transaction functions. In the platform, sellers can accept requests of a plurality of buyers simultaneously, buyers can purchase a plurality of data simultaneously, and one user can be a buyer or a seller simultaneously. Since data transactions are end-to-end segmented transactions, the time spent on data transmission is determined by both the amount of data and the time spent on manual user verification, and the system throughput is limited by the vendor's computer resources, it is difficult to quantitatively evaluate the system throughput and response time. In such business scenarios, the design of the transaction system should be lightweight enough to improve throughput and concurrency of the system. Furthermore, the platform needs to provide a fair transaction mechanism, which requires "atomicity" for data transfer and payment.
According to one embodiment of the invention, the data transaction module is configured to: based on the data transmission payment protocol, a transaction mechanism of point-to-point data segment transmission, segment verification, segment payment and breakpoint continuous transmission is established between the buyer and the seller. In the process of the segmented transmission, in response to a request for terminating a transaction from a seller or a buyer, the transaction is terminated and settlement of the transaction is performed based on the ratio of the amount of data that has been transmitted and the total amount of data. The technical scheme of the embodiment at least can realize the following beneficial technical effects: the segmented transmission may be abnormal, so that a function of supporting breakpoint continuous transmission is required; the concept of segmented transmission of data can be utilized to realize 'atomicity' as far as possible, and in the worst case, the seller only loses the benefit of a certain segment of data; finally, in the transmission process, both sides can terminate the transaction, and the protocol should settle the transaction according to the ratio of the transmitted data quantity and the total data quantity.
In addition, the existing segmented transmission technology does not introduce intelligent contracts, so that many problems cannot be solved, for example, the existing segmented transmission technology is used as a data segmented transmission protocol, but does not support breakpoint continuous transmission, and if an abnormality occurs in a certain party in the transmission process, the whole transaction cannot be traced from the source and cannot be recovered. Secondly, all pass through manual check to the check-up of voucher, waste time and energy and be difficult to guarantee the correctness.
Here, an application layer data transmission payment protocol supporting credit mortgage, transaction settlement, automatic verification and breakpoint resume is designed by using the ideas of intelligent contracts and segmented transmission.
3A application layer message Format design
The application layer message adopts a unified JSON format and contains three fields, namely message_type, payload and signature. The message_type field indicates a message type, the payload field is a payload (or referred to as a message payload, a data payload), and the signature is a signature of both parties to the message. The message_type and payload fields vary according to the type of the message. There are six different message types:
HandShake messages (handshakes) are initiated by the buyer and the payload field fills in the metadata corresponding to the data (data product). After receiving the HandShake message, the seller returns a corresponding HandShake message, and the payload field is true, which indicates that the data can be downloaded.
An Order message (Order) is issued by the buyer and the payload field fills in the buyer's Order number.
A Data message (Data) issued by the seller, the payload field filling out the sold Data.
An acknowledgement message (ACK) is issued by the buyer.
A termination message (Exit) may be issued by both parties.
A breakpoint Resume message (Resume) is issued by the buyer.
3B timing design
As shown in fig. 7, according to an example of the present invention, after a buyer obtains metadata corresponding to data of a seller through a server, a connection is established between the buyer and the seller through a seller address (IP: port) in the metadata to perform data transaction, wherein an exemplary step of one data transaction is as follows:
1. the buyer sends a handshake message.
2. After receiving the handshake message, the seller locates the file according to the metadata in the handshake message, deduces the address of the buyer and the public key of the buyer according to the message and the signature, returns a handshake message, and deduces the address of the seller and the public key of the seller according to the signature and the message after the buyer receives the message. The two parties acquire the public key of the other party, and a basis is provided for subsequent automatic verification.
4. Generating an order number and a mortgage asset; for example: the buyer generates order ID and order information (order info), writes them to the order book contract, and mortgages sufficient assets (such as credit, currency, or tokens).
5. The buyer sends the order number as the load of the order message to the seller, the seller inquires the order book contract according to the order number after receiving the order number, and if the buyer has already mortgage the asset which is more than or equal to the selling price of the data into the contract, the buyer starts to transmit the data.
6. After the buyer receives the data and checks the data by the personnel, the buyer calls the billing interface of the order book contract to increase the transmitted data amount, and confirms the actual payment of the data on behalf of the buyer. And then sends a confirmation message to the seller.
7. After receiving the confirmation message, the seller automatically transmits the next piece of data to the buyer according to the data transmission record in the intelligent contract.
The above steps 6-7 are repeated in a loop until the data transmission is completed or a party invokes a termination message to perform forced settlement, the intelligent closing date refunds the buyer (i.e., refunds excess assets) in proportion to the amount of data transmitted and pays the seller (i.e., pays the asset to the seller). In addition, if a certain party is disconnected in the transmission process, because the intelligent contract always records the transmitted byte number, the next time the buyer sends a breakpoint continuous message to the seller, the seller firstly checks the legitimacy of the order, namely the order really belongs to both parties and the state is incomplete. The seller then continues to transmit in terms of the number of bytes in the contract until the transaction is completed or a party sends a termination message to terminate the transaction.
The data transaction module mainly comprises 6 kinds of functions, namely:
classes (smartcontact) for interacting with smart contracts include functions such as verifying, mortgage assets, settling accounts, obtaining contract status, and canceling contracts.
User class (User) including functions to obtain balance, add funds, present, and trace back transaction history.
A data segment transfer class (Data Segment Transfer) responsible for the functions of creating, encrypting, decrypting, validating, transmitting, receiving, and obtaining segment status of segment data, such as: is responsible for initiating transmission, requesting segments, sending segments, completing transmission, acquiring transmission status, acquiring transmission history, canceling transmission, acquiring transmission details and the like.
Payment class (Payment) includes functions to create orders, verify payments, settle accounts, obtain Payment status, obtain Payment history, cancel transactions, and the like.
And a Notification class (Notification) for displaying the verification information in the data transaction process.
Design of signature verification contracts
The signature verification contract is realized through Ether. Js, the Ether. Js provides a method for hashing the message, and the interface receives the message and returns a hashed result. The method can sign the hashed message, but only accept parameters of the Uint8Array type, and after the information hash is obtained by calling the Ethers.utiles.id (), the method also needs to call the Ethers.utiles.array () to Array the hashed message.
After receiving the message and signature of the counterpart, the public key of the counterpart needs to be reversely pushed out.
The ECSDA library of Openzapplin provides a public key calculation method, recovery (), which receives four parameters, namely the hashed message msghash, and signed r, s, v values, all of which are 32bytes. To obtain the r, s, v values of the signature requires invoking an ethers. Because the ethernet.utils.id () interface is not an ethernet standard hash, before calling the receiver (), msghash needs to be standardized, that is, the toEthSignedMessageHash () method of the ECSDA library is called, and then public key calculation is performed.
The verifier sol calls the ECSDA library, as shown in the following table, the verifier sol provides a verifyHash () interface to the outside, accepts the address, message hash and r, s and v values as parameters, compares the public key calculated by calling the ECSDA library with the incoming public key, and if the public keys are the same, the verification is successful.
3D order book contract design
The order book contract records the order status of the entire platform and provides a method for order modification and automatic verification.
The interface provides functions of recording orders, updating order status, settling orders and querying orders, and also provides functions of sellers to verify order information to ensure the security of transactions.
Order information structure: order information is recorded including buyer address, seller address, total number of bytes, number of bytes completed, order start time, expiration time, data selling price, whether the order is completed or not.
As shown in the table above, the contract implements the method in the IOrdersook. Sol, and defines some private elements:
private mapping order book (orderBook): and the system is used for storing all order information, takes order numbers as keys and takes order structures as values.
Private map up-to-date order lookup table (latestOrderLookUp): the method is used for recording the latest order number, and takes the buyer address and the seller address as keys and takes the order number as a value.
Function creation (function create (OrderInfo memory info) external payable returns (bytes 32 order_id)) supporting segmented transactions: the method accepts an OrderInfo type parameter info containing buyer address, seller address, commodity ID (i.e. identity information of data, docID), total number of pieces, number of pieces completed, order start time, expiration time, cost, and whether to complete. When this method is invoked, it is also necessary to pay for the asset equal to the data selling price to make a mortgage. The method firstly performs parameter verification to ensure that the addresses of the buyer and the seller are not empty, the finished transmission quantity is 0, and the total transmission quantity is more than 0. Then a unique order ID is generated, the order information is stored in the order book, and the order number is stored in the private map update order form lookup table. After the method is executed, the order number of the order is returned. In generating the order number, a while round and keccak256 hash algorithm may be used. At each cycle, a hash algorithm of keccak256 is used to hash the buyer address, the seller address and an incremental salt value to obtain a hash value as an order number. If the order corresponding to the hash value already exists, the salt value is increased, and the hash value is recalculated until a unique order number is found. This mechanism ensures the uniqueness of the order number.
A function (function addBytes) for recording the transmission amount of a certain piece of data (unit 256 num_bytes) first checks whether an order has been completed and invokes the function for a buyer based on the received order number and the transmission amount of a certain piece of data. If the order has been completed or is not a buyer call, call reverse () ends txn. The transfer quantity is then added to the final bytes field of the order.
function getOrderbook (bytes 32 order_id) external view returns (OrderInfo memory info): the method accepts the order ID to return to the order information structure.
function settlement (bytes 32 order_id) external: the order is settled according to the data transfer amount of the order, and the corresponding funds are transferred to the seller. If the order is not complete, the remaining funds are returned to the buyer. The method accepts a unique identifier of the order. The execution flow is as follows: it is checked whether the caller is a buyer or a seller. If not, recall (). The order is then settled based on the data transfer amount of the order and the corresponding funds are transferred to the seller. If the order is not complete, the remaining funds are returned to the buyer. Finally, the order is marked as completed.
function getLatestOrder (address buffer) external view returns (bytes 32): and returning a latest order according to the public keys of the seller and the buyer, and rapidly positioning the order number when the breakpoint is transmitted.
A function (function sellerVerifyOrderOnCreated) (bytes 32 order_id, address seller, address layer, ui 256 cost) external view returns (pool) for the seller to verify the accuracy of order information when an order has just been created, i.e., the function is used for the seller to verify whether order information is correct immediately after an order has been created. This function accepts four parameters that the seller wishes: order number, seller address, buyer address, and selling price of data. First, the functional unit acquires order information corresponding to an order number for subsequent condition checking. Then, the functional component firstly checks whether the order is completed or not, and returns error information if the order is completed; checking whether the completed data transmission quantity of the order is 0, and if not, returning error information; finally, whether the incoming seller address, buyer address, and data selling price are equal to the records within the contract. If equal, indicating compliance with the expectation, returning to correct (true); otherwise an error (false) is returned.
A function (function sellerVerifyOrderOnPayment) (bytes 32 order_id, address seller, address layer, ui 256 cost, ui 256 finished) external view returns (pool) for verifying the accuracy of the order information by the seller after the buyer pays a certain piece of data fee, that is, the function is for verifying whether the order information is correct by the seller after the buyer pays a certain piece of data fee. The functional component receives the order number, the seller address, the buyer address, the data selling price and the completed data transmission quantity of the order desired by the seller and compares the data transmission quantity with the records in the contract, and if the data transmission quantity is equal, the functional component returns to be correct (true); otherwise an error (false) is returned.
Based on the design of the order book contract, the data quantity sent is recorded and used as a settlement basis. By creating orders, recording the transfer amounts and settling the orders, data transfer and payment between buyers and sellers is achieved. Meanwhile, the contract also provides auxiliary functions such as order information inquiry, order information verification and the like.
In general, the present invention utilizes blockchain technology to design and implement a de-centralized data transaction platform. Firstly, the invention stores metadata through intelligent contracts and IPFS, establishes a search engine by utilizing an elastic search database, establishes a metadata management system supporting uploading, downloading and searching of users, and ensures the security and completeness of metadata storage and the high efficiency and accuracy of searching; secondly, the invention realizes a set of data transmission payment protocols supporting automatic verification, forced settlement, breakpoint continuous transmission, payment and transmission alternate process through the intelligent contract and the end-to-end application layer protocol based on the TLS encryption channel, and solves the problems of unfairness and opacity of data transaction. Thirdly, the invention optimizes the duplicate checking efficiency of text and form type data by using Minhash signature and bloom filter respectively by using a data duplicate checking mechanism, remarkably reduces the storage space, establishes a set of reliable, high-efficiency and high-storage-efficiency data duplicate checking system and provides a reliable basis for platform data confirmation. Fourth, the de-centralized data transaction platform designed by the invention does not store any original data, and the data is self-managed by a seller; in addition, when the data is required to be searched again, the original data can be uploaded to the server after being locally encrypted, so that the possibility of data leakage on a platform is radically eliminated, and the two points ensure the data privacy of a user.
It should be noted that, although the steps are described above in a specific order, it is not meant to necessarily be performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order, as long as the required functions are achieved.
The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A de-centralized data transaction platform, comprising:
a metadata management module configured to: acquiring metadata corresponding to data to be sold by a seller and encrypted data corresponding to the data, wherein the metadata comprises data description information and a seller address;
the data check and repeat module is configured to: according to the encrypted data corresponding to the data and the encrypted data corresponding to the existing data, carrying out data duplication in a non-decryption state, and determining the repetition rate of the data to be sold, wherein the repetition rate influences the grading of the data;
a data transaction module configured to: after the buyer determines to purchase data for the seller according to the metadata and the score of the data to be sold, a data transmission payment protocol based on intelligent contract is established between the seller and the buyer and the data is transmitted through an encrypted channel.
2. The data transaction platform according to claim 1, wherein the encrypted data corresponding to the data is one-way encrypted data for reflecting the content of the data.
3. The data transaction platform of claim 2, wherein the data transaction module is configured to: based on the data transmission payment protocol, a transaction mechanism of point-to-point data segment transmission, segment verification, segment payment and breakpoint continuous transmission is established between the buyer and the seller.
4. A data transaction platform according to claim 3, wherein the data transaction module is configured to: in the process of the segmented transmission, in response to a request for terminating a transaction from a seller or a buyer, the transaction is terminated and settlement of the transaction is performed based on the ratio of the amount of data that has been transmitted and the total amount of data.
5. The data transaction platform according to one of claims 1 to 4, wherein when the data to be sold is a document-type data product, the encrypted data corresponding to the data to be sold is obtained in the following manner:
carrying out document preprocessing on the data to be sold, wherein the document preprocessing comprises the steps of removing stop words, removing punctuation marks, and carrying out word segmentation on documents after the stop words and the punctuation marks are removed, so as to obtain word segmentation results;
And calculating a hash signature according to the word segmentation result by using a Minhash algorithm, and taking the hash signature as encrypted data corresponding to the data to be sold.
6. The data transaction platform according to claim 5, wherein for the document type data products, the data duplication module establishes a plurality of hash buckets based on the hash signatures corresponding to each document type data product based on a locality sensitive hashing algorithm to separate similar hash signatures into the same hash bucket; and acquiring a hash bucket in which the hash signature of the data to be sold is positioned, calculating the similarity based on the hash signature of the data to be sold and the existing data of the hash bucket in which the data to be sold is positioned, and taking the maximum similarity corresponding to the data to be sold as the repetition rate of the data to be sold.
7. The data transaction platform according to claim 5, wherein when the data to be sold is a tabular data product, the encrypted data corresponding to the data to be sold is obtained in the following manner:
performing table pretreatment on the table type data product, wherein the table pretreatment comprises blank row and blank column removal, and sorting the table after blank row and blank column removal according to a preset row-column sorting rule to obtain a sorted table;
Splitting the ordered table into a plurality of sub-tables according to a preset splitting mode and an input requirement of a bloom filter, wherein each sub-table is a table item in the row;
and inserting the multiple sub-tables of each row into a bloom filter, outputting a bitmap corresponding to the table, and taking the bitmap corresponding to the table as encrypted data corresponding to the data to be sold.
8. The data transaction platform according to claim 7, wherein for the tabular data products, the data check and repeat module calculates the similarity based on the bitmap of the data to be sold and each existing data, and takes the maximum similarity corresponding to the data to be sold as the repetition rate.
9. A data transaction platform according to any of claims 1-3, wherein the data transaction platform registers, puts down and queries for metadata based on blockchain smart contracts, and creates orders, checks payment checksums transaction settlements, and verifies the validity of message signatures, and evaluates data.
10. A method of conducting a data transaction based on a data transaction platform according to any of claims 1 to 9, the method comprising:
In the data product registration phase, the steps performed include:
metadata corresponding to the data to be sold uploaded by the seller and encrypted data corresponding to the data are obtained by a metadata management module;
the data searching and repeating module searches and repeats the newly uploaded data according to the historically uploaded data to obtain the repetition rate;
the metadata management module sends the encrypted data corresponding to the data subjected to duplicate checking to the distributed file storage system for storage;
the metadata management module sends metadata corresponding to the data through duplicate checking to a distributed file storage system based on a block chain for storage and obtains a hash value fed back after the distributed file storage system stores the metadata;
registering the data based on the hash value fed back after the metadata is stored in the distributed file storage system in the registry contract to obtain the identity information corresponding to the data;
establishing an inverted index by utilizing an elastic search database according to the identity information and the data description information corresponding to the data and providing search service;
in the data product screening stage, the steps performed include:
inputting keywords by a buyer to perform data query, and calling search service provided by an elastic search database by a metadata management module to query related data to obtain identity information corresponding to the related data;
Inquiring a list of hash values corresponding to the related data according to the identity information corresponding to the related data in the registry contract;
querying a corresponding metadata list by the distributed file storage system according to the hash value list corresponding to the related data, and querying scores corresponding to the metadata in the metadata list based on an evaluation contract;
sorting each metadata in the metadata list according to the scores, returning the metadata to the buyer, and screening the required data by the buyer according to the metadata list and the scores;
in the data product transaction phase, the steps performed include:
after the buyer screens out the required data, establishing contact with the seller according to the seller address in the corresponding metadata;
carrying out data segment transmission and transaction by the buyer and the seller through a data transmission payment protocol;
after the data transmission required by the buyer is finished, the buyer performs personal evaluation on the data, and the personal evaluation influences the grading of the data.
CN202311183915.6A 2023-09-14 2023-09-14 Data transaction platform of decentralization Pending CN117353891A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311183915.6A CN117353891A (en) 2023-09-14 2023-09-14 Data transaction platform of decentralization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311183915.6A CN117353891A (en) 2023-09-14 2023-09-14 Data transaction platform of decentralization

Publications (1)

Publication Number Publication Date
CN117353891A true CN117353891A (en) 2024-01-05

Family

ID=89358382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311183915.6A Pending CN117353891A (en) 2023-09-14 2023-09-14 Data transaction platform of decentralization

Country Status (1)

Country Link
CN (1) CN117353891A (en)

Similar Documents

Publication Publication Date Title
US20210082035A1 (en) Systems and methods to extract and utilize textual semantics
US11900363B2 (en) Computer-implemented system and method for determining the state of a machine executable contract implemented using a blockchain
US20190073646A1 (en) Consolidated blockchain-based data transfer control method and system
US20130124405A1 (en) Mobile-To-Mobile Payment System and Method
CA2337596A1 (en) Efficient internet service cost recovery system and method
WO2009126296A1 (en) Identification of near duplicated user-generated content
CN111488616A (en) Method and device for realizing preplan machine of service data block chain
US20040267731A1 (en) Method and system to facilitate building and using a search database
WO2021034603A1 (en) Performing multi-party cryptographic transactions on a blockchain and the use of fungible tokens
CN115769206A (en) Cryptographic data entry blockchain data structure
US20120158583A1 (en) Automated bank transfers using identifier tokens
AU2015331028A1 (en) Electronic processing system for electronic document and electronic file
WO2024060759A1 (en) Supply chain financial asset auditing method and apparatus, and device and medium
JP6721724B2 (en) Methods and devices that facilitate the expansion of payment entities
CN212433814U (en) Intelligent contract and evidence-based transaction scoring system based on block chain
CN117036115A (en) Contract data verification method, device and server
CN117353891A (en) Data transaction platform of decentralization
US20230274283A1 (en) Method and system for transfer of ownership of nft (non-fungible token) upon refund transaction in payment network
US20020059390A1 (en) Integration messaging system
CN111488353B (en) Intelligent contract implementation method and device for business data block chain
US11663563B2 (en) Methods and systems of providing interoperability between incompatible payment systems
CN112269915B (en) Service processing method, device, equipment and storage medium
US20130300562A1 (en) Generating delivery notification
TWM596397U (en) Automatic issue and import enterprise electronic invoices and digital receipts into the platform of the customer's intelligent accounting system
US20230269084A1 (en) Systems and methods for selecting secure, encrypted communications across distributed computer networks for cryptography-based digital repositories in order to perform blockchain operations in decentralized applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination