CN115576945A - Method for improving block chain data processing efficiency by data pre-screening - Google Patents

Method for improving block chain data processing efficiency by data pre-screening Download PDF

Info

Publication number
CN115576945A
CN115576945A CN202211252817.9A CN202211252817A CN115576945A CN 115576945 A CN115576945 A CN 115576945A CN 202211252817 A CN202211252817 A CN 202211252817A CN 115576945 A CN115576945 A CN 115576945A
Authority
CN
China
Prior art keywords
data
ipfs
stored
users
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211252817.9A
Other languages
Chinese (zh)
Inventor
吴锡
彭静
廖春梅
罗阳
苏红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN202211252817.9A priority Critical patent/CN115576945A/en
Publication of CN115576945A publication Critical patent/CN115576945A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for improving block chain data processing efficiency by utilizing data pre-screening, which comprises the following steps: preprocessing data, the preprocessing comprising: classifying and cutting data; storing the preprocessed data in an IPFS (Internet protocol file system), and generating a hash value corresponding to the stored data; the corresponding hash value is used as a certificate of the user stored data and returned to a user terminal of the stored data, wherein the certificate is used for the user of the stored data to read the stored data from the IPFS client; deploying an intelligent contract in the ether house private chain, wherein the intelligent contract is used for other users except the user storing the data in the ether house private chain to obtain the permission of reading the stored data from the IPFS client. The method for improving the data processing efficiency of the block chain by utilizing data pre-screening can save the storage space and time of the block chain and realize data interaction.

Description

Method for improving block chain data processing efficiency by data pre-screening
Technical Field
The invention relates to the field of data processing, in particular to a method for improving block chain data processing efficiency by utilizing data pre-screening.
Background
With the continuous development of internet technology, data management and value mining become more complex and difficult. Under such a data age, data as a valuable asset should be used under the management authority of its owner.
However, with the trend of data asset, a large number of companies have set out to collect various data which is no longer managed only by the owner of the data, so that how and by whom the data is used become no longer transparent, and the act of illegally using the data cannot be traced, which brings great privacy disclosure risks to the owner of the data, and the problem of data security faces great challenges. Massive data management is also a big pain point of a database management system. The traditional database management system can only support task timing limitation and cannot guarantee the integrity and consistency of mass data.
Particularly, medical data, as a kind of private data, is managed almost by a central server of each hospital. The centralized storage mode not only can make the data always face the risk of losing or being leaked, but also can form individual data islands to influence data interaction.
In order to solve the problem, people have proposed to store data on a block chain, and realize interaction between data through an encryption technology of the block chain, but the problem that the block chain storage throughput is low and is not suitable for storing a large amount of data still exists.
Disclosure of Invention
In view of the foregoing problems, the present invention provides a method and an apparatus for improving efficiency of processing blockchain data by using data pre-screening, and aims to solve the problems of security and privacy of user stored data, inconvenience in interaction, and inconvenience in storing a large amount of data.
In order to achieve the above object, in a first aspect, the present invention provides a method for improving efficiency of processing blockchain data by using data pre-screening, including:
pre-processing data, the pre-processing comprising: classifying and cutting the data;
storing the preprocessed data in an IPFS (Internet protocol file system), and generating a hash value corresponding to the stored data;
returning the corresponding hash value serving as a certificate of the user stored data to a user terminal for storing the data, wherein the certificate is used for reading the stored data from the IPFS client by the user for storing the data;
deploying an intelligent contract in the EtherFang private chain, wherein the intelligent contract is used for other users except the user storing the data in the EtherFang private chain to obtain the authority of reading the stored data from the IPFS client.
Further, the data is preprocessed, which includes:
classifying the data by adopting a ResNet network model;
dividing data required by a classification task into a class;
dividing data not needed by the classification task into another class;
and screening out data required by the classification task.
Further, the intelligent contracts correspond to interaction of one type of the stored data; deploying intelligent contracts in an Etherhouse private chain, comprising:
and under the condition that the data required to be stored by the user is divided into a plurality of categories, deploying interaction of a plurality of intelligent contracts corresponding to the data of each category.
Further, the positions of a part of the hash values in the corresponding hash values are saved in the intelligent contract;
and the other users directly read a part of the stored data from the IPFS client through the positions of a part of the hash values in the corresponding hash values stored in the intelligent contract.
Furthermore, the other users call the intelligent contract and send a reading request to the user storing the data under the condition that the other users need to obtain the complete stored data;
and the user storing the data is verified, after the verification is passed, the other users obtain the authority of reading the complete stored data from the IPFS client, and the stored data is read from the IPFS client.
Further, the other users obtain the right to read the complete stored data from the IPFS client, including:
the other users call the intelligent contract to send the account resources and the public key of the other users to the user storing the data, and after the user storing the data confirms the information, the corresponding hash value encrypted by the public key of the other users is sent to the other users;
and after receiving the corresponding hash value encrypted by the public key of the other user, decrypting the hash value by using the private key of the other user, and reading the stored data from the IPFS client according to the corresponding hash value obtained after decryption.
Further, the stored data is proportionally split into a first part of data, a second part of data and the rest part of data; the first part of data is used for being directly disclosed on the Etherhouse private chain, so that all other users can directly read the first part of data from the IPFS client;
the second part of data is used for some other user to call the intelligent contract, and after verification is completed, the other user reads the second part of data from the IPFS client;
and the residual partial data is used for marking the other users who do not fulfill the content of the intelligent contract as dishonest in the case that the other users do not fulfill the content of the intelligent contract after reading the second partial data, and preserving the residual partial data so that the other users who do not fulfill the content of the intelligent contract cannot obtain the complete stored data.
Further, the corresponding hash value is encrypted and then stored in the Etherhouse private chain;
verifying whether the corresponding hash value is changed before the other users obtain the authority to read the stored data from the IPFS client;
in the event that the corresponding hash value has not changed, the stored data has not been tampered with.
In a second aspect, the present invention further provides an apparatus for improving efficiency of processing blockchain data by using data pre-filtering, the apparatus comprising:
the preprocessing module is used for preprocessing data, and the preprocessing comprises the following steps: classifying and cutting data;
the data storage module is used for storing the data obtained after the preprocessing in the IPFS; generating a hash value corresponding to the stored data;
the sending module is used for returning the corresponding hash value serving as a certificate of the user storage data to the user terminal for storing the data; the voucher is used for a user storing data to read the stored data from an IPFS client;
and the permission obtaining module is used for deploying an intelligent contract in the Etherhouse private chain, and other users except the user storing the data in the Etherhouse private chain obtain the permission for reading the stored data from the IPFS client through the intelligent contract.
The invention provides a method for improving block chain data processing efficiency by utilizing data pre-screening, which comprises the steps of preprocessing data, classifying and cutting the data, storing the preprocessed data in an IPFS (Internet protocol file system), generating a hash value corresponding to the stored data, and returning the corresponding hash value to a user as a certificate of user stored data so as to facilitate the user with the hash value to read the data; and deploying an intelligent contract in the private chain of the Ethernet workshop, storing the position of the corresponding hash value part in the intelligent contract, and reading the stored data from the IPFS by a user according to the corresponding hash value. Firstly, the storage of redundant data is reduced from the source through preprocessing, the burden of data storage is lightened, and the data is conveniently applied by other subsequent tasks; data are stored by combining the IPFS and the Ethengfang private chain, so that the cost is low, the storage space of the data is large, and the time consumed for storage is short; meanwhile, data interaction is facilitated, and data security and privacy are guaranteed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart of a method for improving efficiency of processing blockchain data using data pre-screening according to an embodiment of the present invention;
FIG. 2 is a block diagram of an overall architecture of an "artificial intelligence blockchain" system that utilizes data pre-screening to improve blockchain data processing efficiency in one embodiment of the present invention;
FIG. 3 is a diagram illustrating an apparatus for improving the efficiency of processing blockchain data by utilizing data pre-filtering according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The core concept of the invention is to store preprocessed data by using IPFS, store hash values by using ether house private chains, expand the storage space of block chains and interactively store data by using intelligent contracts.
Referring to fig. 1, a flowchart of a method for improving efficiency of processing blockchain data by using data pre-filtering according to an embodiment of the present invention is shown.
In one embodiment of the present invention, as shown in fig. 1, the method for improving the efficiency of processing the blockchain data by using data pre-screening comprises the following steps:
step 101: preprocessing data, the preprocessing comprising: classifying and cutting data;
data preprocessing is an essential part of data processing, and on the premise that data can be used for calculation, the data preprocessing can be more beneficial to subsequent processing of images.
In one aspect, the present invention allows for space and time efficiency in storing data in a blockchain.
Taking medical data as an example, various diagnosis and treatment data, research and development data, patient data, payment and medical insurance data and the like are stored in each medical institution, and for a heart medical image segmentation task, only the heart medical image data is needed. If all the various data of the medical institution are stored on the blockchain without preprocessing, the blockchain stores a large amount of invalid data, which is a waste of space resources of the blockchain, and the efficiency of searching data is also greatly affected.
On the other hand, in consideration of the fact that the stored data can play a greater role in subsequent task applications and improve the execution efficiency of other tasks, it is necessary to perform preprocessing, including classification and clipping, on the data to achieve a uniform standard.
Based on the above, the present invention proposes to perform a classification operation before data uplink storage, so as to perform a certain limitation on data and reduce unnecessary uplink data from the source of data storage. When the data is medical images and the data amount is huge, the data can be divided into specific categories in data preprocessing.
Step 102: storing the preprocessed data in the IPFS, and generating a hash value corresponding to the stored data.
IPFS (Internet File System) is an open source code project initiated by Protocol Labs in 2014, and is essentially a content addressable, peer-to-peer, distributed storage hypermedia transport Protocol.
IPFS stores files by using decentralized fragmented cryptographic storage techniques, whereby a file is divided into fixed-size fragments, stored in a distributed manner at various nodes of the network, and a hash value is generated to mark the file. This decentralized storage technique also uses block chain techniques.
The mode of reading the file by the IPFS is as follows: the method comprises the steps of obtaining file fragments from each node of a network through an IPFS client or a front-end page of the IPFS, comparing file hash values obtained through file fragment calculation with hash values stored in the network, and if the hash values are consistent, indicating that a file is not tampered, otherwise, indicating that the file is tampered. The distributed storage enables the IPFS system to accommodate more data, and has higher storage and reading efficiency and lower cost.
Step 103: and returning the corresponding hash value serving as a certificate of the user stored data to a user terminal for storing the data, wherein the certificate is used for reading the stored data from the IPFS client by the user for storing the data.
After the data is stored on the IPFS, a hash value corresponding to the stored data content is generated and returned to the user terminal provided with the IPFS client, so that the user storing the data stores the hash value, and the user can read the data stored by the user on the IFPS, but it is inconvenient to store the data on the IPFS. If other people know the storage position of the stored data in the IPFS, the stored data can be directly read, so that the permission for reading the stored data needs to be restricted by combining an EtherFang private chain.
Step 104: deploying an intelligent contract in the EtherFang private chain, wherein the intelligent contract is used for other users except the user storing the data in the EtherFang private chain to obtain the authority of reading the stored data from the IPFS client.
The blockchain is essentially a decentralized, point-to-point transmission distributed storage database, through an internal consensus mechanism, each node in the blockchain network can store data after achieving consensus, and since the blocks are generated all the time forward, and cannot roll back, the data cannot be modified as long as the data is stored in the blocks, which is a natural advantage of the blockchain for storing data and protecting data security.
The blockchain is also composed of distributed nodes, but unlike IPFS, the blockchain stores data in a manner that each node passes and confirms the data to be stored according to a consensus mechanism, and then each node stores the data in its own block. The use of blockchains to store data can ensure the security of the data, but if a large amount of data needs to be stored, the efficiency of using blockchains alone to store data decreases as the amount of data increases.
This results in very low throughput of blockchain stored data since the uplink of data storage for blockchains requires a consensus among all nodes. If large-scale data is to be stored, the memory size of the blockchain block is also considered, and the large-scale data is likely to be stored in blocks. And when the blockchain stores large-scale data, the cost is increased.
The present invention combines IPFS and blockchain techniques together to store and read data.
Blockchains can be divided into three categories according to access and management permissions: public, private, and federation chains.
The public chain is a fully public blockchain, and anyone has access to and management rights to the public chain. In contrast to private and federation chains.
Both private and federation chains have certain restrictions on access and administrative rights. The private chain is owned by a private or private institution, and only the private or private institution has access to and authority to manage the private chain.
Federation chains are typically used for certain groups or organizations, as are rights to access and manage federation chains only for groups or organizations participating in the federation chains. Under the conditions that data are huge and data intercommunication among users is complex, the alliance chain can be adopted to realize intercommunication among different users simply by deploying chain codes.
According to the requirements of the invention, the private chain of Etherns is considered as a good choice. Each hospital is used as a node on the private chain to participate in and maintain the interaction on the chain.
The intelligent contract is a section of program which runs on a block chain, the content of the contract is specified in advance, and when the trigger contract condition is met, the contract content is automatically executed. Smart contracts at an ethernet house are mostly generated based on trading or switching activities. The invention arranges intelligent contracts on the private chain of the Ether house, and aims to realize the interaction and sharing of data among hospitals by using the intelligent contracts.
The other users may be aware of such a type of data available for interaction through smart contracts, but not directly accessible. The data is not directly stored in the private link node any more, but is stored in the IPFS, so the storage space is larger.
Other users except the user storing the data in the private chain of the EtherFang can call the intelligent contract to obtain if the other users want to obtain the authority of reading the stored data from the IPFS client, and the safety is guaranteed because the interaction process is automatically carried out, and the interaction records are traceable and not falsifiable.
In one embodiment of the invention, preprocessing the data comprises: classifying the data by adopting a ResNet network model; dividing data required by a classification task into a class; dividing data not needed by the classification task into another class; and screening out data required by the classification task.
In this embodiment, the classification task of data pre-processing is performed using a neural network model of artificial intelligence.
For example, medical data is taken as an example, for a classification task of a medical image, the medical image is different from other natural images, the structure of the medical image is complex, multiple imaging modalities exist, the image is a gray image, and pixels are single, so that information such as the shape and the boundary of the image is fuzzy. The network model with high classification precision and less parameter quantity is selected for classifying the medical images by adopting the convolutional neural network. A great deal of research has shown that the depth of the network is crucial to the performance of the model, and as the network deepens, the network can extract more complex features of the image.
However, as the depth of the network increases, the gradient vanishes or the gradient explosion of the network causes network degradation, resulting in worse performance of the deep network than the shallow network. The problem is well solved along with the proposal of a depth residual error network, the depth residual error network has good performance on image classification all the time, and the problem of network degradation caused by gradient disappearance or gradient explosion appearing after the network deepens is solved through a residual error learning structure. According to the particularity of the data, a ResNet network model with excellent performance is selected to complete the classification task before the data are stored in the IPFS, and the classification and identification accuracy is high.
In this embodiment, for the classification task, only the cardiac image data need to be stored, so that the data is only classified into two types, one type of cardiac image data; the rest data except the heart image data are classified into another type, namely, the network model executes a binary classification task.
In an embodiment of the present invention, the intelligent contract corresponds to interaction of a type of the stored data; deploying intelligent contracts in an Etherhouse private chain, comprising: and under the condition that the data required to be stored by the user is divided into a plurality of categories, deploying interaction of a plurality of intelligent contracts corresponding to the data of each category.
If the data that the hospital needs to store not only are heart image data, still need to store brain image data and stomach image data, like this, when deploying intelligent contract, can arrange three intelligent contract, correspond to the interaction of the data of three categories respectively to make things convenient for the careful management of each type of data.
In an embodiment of the present invention, the position of a part of hash values in the corresponding hash values is saved in the intelligent contract; and the other users directly read a part of the stored data from the IPFS client through the positions of a part of the hash values in the corresponding hash values stored in the intelligent contract.
In the above intelligent contract, the positions of some of the stored corresponding hash values refer to the positions where data is stored in many nodes of the IPFS and some data blocks are stored in nodes in the IPFS network.
Other users can directly read a part of data from the IPFS client through the positions of the hash values of a part of the intelligent contracts. The part of data is equivalent to be directly disclosed in the ether house private chain, and all users of the ether house private chain can directly read the data so as to attract other users needing to obtain the stored data and perform subsequent resource interaction.
In an embodiment of the present invention, the other users invoke the intelligent contract and send a reading request to the user storing the data when needing to obtain the complete stored data;
and the user storing the data is verified, after the verification is passed, the other users obtain the authority of reading the complete stored data from the IPFS client, and the stored data is read from the IPFS client.
In an embodiment of the present invention, specifically, the obtaining, by the other user, the right to read the complete stored data from the IPFS client includes:
the other users call the intelligent contract to send the account resources and the public key of the other users to the user storing the data, and after the user storing the data confirms the information, the corresponding hash value encrypted by the public key of the other users is sent to the other users;
and after receiving the corresponding hash value encrypted by the public key of the other user, decrypting the hash value by using the private key of the other user, and reading the stored data from the IPFS client according to the corresponding hash value obtained after decryption.
After the intelligent contract is issued, other nodes on the private chain can check partial data published by the hospital by calling the intelligent contract, if the data meet the requirements of the other nodes, the contract can be called to initiate an exchange request to the hospital, then a certain amount of account resources for exchanging the data are transferred into the intelligent contract, wherein the specific amount is determined by a data provider hospital, and an address for issuing the intelligent contract (namely the account address of the private chain of the ether house of the hospital) is obtained.
Then, the party transferring the account resource provides the public key of the party and the exchange hash value of the intelligent contract called by the party to the address (the account address of the private chain of the Etheng of the hospital) which is just obtained. After receiving the exchange hash value for calling the intelligent contract, the hospital verifies the exchange hash value first, after the verification is passed, the public key provided by the party transferring the account resource is used for encrypting the hash value (the corresponding hash value) generated by storing the data on the IPFS, and the data encrypted by the public key can be unlocked to obtain the data only by using the private key matched with the data. Therefore, the privacy of the data hash value can be guaranteed, and the data interaction on the block chain can be realized.
After the party transferring the account resources receives the hash value encrypted by the public key of the party, the party decrypts the hash value by the private key of the party to obtain the hash value of the IPFS, and then the data can be obtained from the IPFS. After the data is acquired, the party transferring out the account resources calls an intelligent contract, and the intelligent contract returns the previously received account resources to the hospital. Therefore, data interaction is realized between the two hospitals while data are not leaked, and the data security is guaranteed.
In one embodiment of the invention, the stored data is proportionally split into a first part of data, a second part of data and the rest part of data;
the first part of data is used for being directly disclosed on the Etherhouse private chain, so that all other users can directly read the first part of data from the IPFS client;
the second part of data is used for some other user to call the intelligent contract, and after verification is completed, the other user reads the second part of data from the IPFS client;
and the residual partial data is used for marking the other users who do not fulfill the content of the intelligent contract as dishonest in the case that the other users do not fulfill the content of the intelligent contract after reading the second partial data, and preserving the residual partial data so that the other users who do not fulfill the content of the intelligent contract cannot obtain the complete stored data.
In the data interaction, some users have dishonest behaviors, and the exchange process based on the block chain consensus sometimes cannot completely guarantee the rights of the party providing the data for exchange, so that the users who do not fulfill the intelligent contract are marked as dishonest and cannot obtain the complete data, and excessive loss of the party providing the data for exchange is avoided to a certain extent.
In an embodiment of the present invention, the corresponding hash value is encrypted and stored in the etherhouse private chain;
verifying whether the corresponding hash value is changed before the other users obtain the permission to read the stored data from the IPFS client;
in the event that the corresponding hash value has not changed, the stored data has not been tampered with.
And the corresponding hash value is encrypted by the user storing the data by using the public key of the user in the Etheng private chain, and then is stored in the block, and other users cannot directly obtain the corresponding hash value. After the data is read from the IPFS, the recalculated hash value is compared with the hash value originally stored in the block chain, whether the originally stored data is tampered or not can be known, and if the recalculated hash value is consistent with the hash value originally stored in the block chain, the originally stored data is represented to be not tampered.
The method for improving the block chain data processing efficiency by using data pre-screening can not only ensure the safety and privacy of data, but also improve the data storage efficiency and enlarge the data storage space.
In one embodiment of the application, the hospital only needs to store and read data, and does not use the stored data for interaction, so that an intelligent contract is not arranged in the Etherhouse private chain, and only the data is stored and read on the structure of the "IPFS + Etherhouse private chain".
Taking a hospital to store heart CT image data as an example, firstly, the hospital is required to provide a unique certificate capable of proving identity, data and a result obtained after data preprocessing, then, a node on a private chain of an Ethernet workshop is used for verifying whether the certificate of the hospital is correct or not, and if the certificate is incorrect, the certificate is not processed;
if the data is correct, sequentially judging results obtained after data preprocessing, if the data is the type of data required to be stored, storing the certificate and the data in an IPFS (internet protocol file system), calculating by using a node of the IPFS to obtain a hash value, and storing the hash value on an EtherFang private chain; if it is not the type of data that needs to be stored, the data is skipped.
And finally, storing the private link point signature obtained after the private link point signature is stored on the private link of the Ethern and storing the data on the block chain (namely storing the hash value on the private link of the Ethern) by the hospital.
Optionally, the user firstly confirms the private link point, and then stores the stored data in the IPFS, the private link point may obtain calculation in advance, obtain a corresponding hash value that is generated when the preprocessed data to be stored is stored on the IPFS, and then store the data in the IPFS. Namely, the hash value which is obtained by pre-calculation and corresponds to the preprocessed data is stored in the private link node, and then the preprocessed data is stored in the IPFS.
For hospital read data, the hospital first initiates a request to read the data and provides the address where the hash value is stored on the blockchain and a signature of itself. And then verifying whether the signature is correct or not by the node on the private chain, if not, not processing, if so, inquiring information correspondingly stored on the block chain, namely the hash value corresponding to the data stored in the IPFS by the node on the private chain according to the address of the hash value stored on the block chain, and searching the corresponding data on the IPFS according to the obtained hash value of the IPFS.
The IPFS + Ether house private chain structure takes the IPFS as a capacity expansion database of the private chain. The data is stored after the IPFS to obtain a hash value, and the blockchain stores only this hash value. Therefore, all data are not required to be stored in the blockchain, the efficiency of storing and reading the data is greatly improved, and the pressure on the storage space of the blockchain can be reduced.
Compared with the common block chain structure, the method for storing and reading data on the IPFS + private chain structure has the advantages that a step for storing or reading data on the IPFS is added, the storage and reading efficiency is not greatly influenced, and the storage space for increasing the data is greatly positively influenced.
The method for improving the block chain data processing efficiency by utilizing data pre-screening is suitable for medical image segmentation tasks. By preprocessing before data chaining, the precision of the image segmentation task is greatly improved. For example, data is classified in advance, so that when the data is applied to a medical image segmentation task, the same class of medical images is segmented, and the time for downloading the data and identifying the data is saved.
In an embodiment of the present invention, an artificial intelligence blockchain system for improving efficiency of processing blockchain data by using data prescreening is also provided, which takes cardiac image data in medical data as an example to perform an image segmentation task on a cardiac CT image.
The image segmentation problem has been the research focus in the field of computer vision, and the semantic segmentation problem in image segmentation has also been widely applied in the medical field, aiming at classifying each pixel in the image. Because the medical image has the problems of more noise, fuzzy boundary information and the like, the image with each classified pixel obtained by utilizing the semantic segmentation method of the image can overcome the problems of the medical image to a certain extent, and great help is provided for a doctor to analyze the image.
FIG. 2 is a block diagram of an overall architecture of an "artificial intelligence blockchain" system that utilizes data pre-screening to improve blockchain data processing efficiency in one embodiment of the present invention. As shown in fig. 2, the artificial intelligence block chain system mainly includes three modules, namely, a data preprocessing module, a block chain network module, and an image segmentation module. The artificial intelligence block chain system firstly carries out the operations of preprocessing, cutting and classifying on the image from bottom to top. And then storing the screened data in an IPFS + Ether house private chain structure, and realizing data intercommunication through an intelligent contract after the data are linked. And finally, reading data from the block chain to perform an image segmentation task.
In this embodiment, the system for improving the efficiency of processing the blockchain data by using data pre-screening mainly includes the following processes:
1. preprocessing medical data, including classifying the data to obtain medical images required by tasks, and then cutting the medical images;
2. uploading data to a block chain network, wherein the block chain network comprises an Ethernet private link point, an IPFS network and an intelligent contract;
3. the image data can be downloaded only after the block chain authority authentication is passed, namely, an intelligent contract is called to authenticate the downloading authority, and the image data can be downloaded only after the authentication is passed;
4. the downloaded image data is subjected to an image segmentation task that utilizes semantic segmentation, an encoder-decoder, and UNet models.
In the present embodiment, the classification is to separate the data used for cardiac medical image segmentation from other data as two categories, so as to ensure that only the data that can be used for cardiac medical image segmentation is stored in the block chain in the present embodiment, and the other data is identified and then no chain is linked. Such processing can greatly improve the efficiency and performance of the segmentation. The cropping is to modify the images of different sizes collected by each hospital according to the specified sizes, so that the uplink data can be kept in a uniform standard, and the subsequent segmentation task can be facilitated.
As full convolutional neural networks (FCNs) replace the fully-connected layers of traditional Convolutional Neural Networks (CNNs) with convolutional layers, the structure of the "encoder-decoder" is almost a general network model for the image semantic segmentation problem. The U-Net network model also adopts a structure of a coder-decoder, and the total sampling is carried out for 4 times in the coder stage, the resolution is reduced by 16 times, and the low-resolution information is just suitable for classification. Accordingly, the image is also up-sampled 4 times at the decoder stage, so that the original resolution of the image is maintained from the encoder to the decoder, and the high-resolution information can provide more fine features and is beneficial to accurate segmentation of the image. That is, the U-Net network model can classify images by using low resolution information and can accurately segment the images by combining with high resolution information. Therefore, the U-Net network model is very suitable for semantic segmentation of medical images. The present embodiment will also semantically segment cardiac data using the U-Net network model.
In the embodiment, mainly aiming at the problem of medical image segmentation, for the storage process of data, firstly, each hospital institution carries out classification operation on medical data by using a ResNet network model respectively, separates the data for cardiac CT image segmentation from other data, and then cuts the well-divided cardiac image data according to a uniform standard; then storing the processed heart image data in an IPFS (Internet protocol file system) to obtain a hash value; and then, storing part of the hash value on the private chain of the Etheng, and then issuing an intelligent contract for uploading data.
For the segmentation of medical images, it is common to target one type of data. The accurate segmentation can be achieved by segmenting one data by one network model, and the segmentation accuracy is influenced if a plurality of data are segmented by one model.
We will first evaluate the efficiency of the blockchain to store and read data. A block chain network structure of IPFS + Ether house is built, and the Ether house private chain is based on a Ropsten test network. Through testing, the storage and reading throughputs of the data subjected to the preprocessing operation and the data not subjected to the preprocessing operation and the storage space of the data subjected to the preprocessing operation and the data not subjected to the preprocessing operation are respectively evaluated.
In this embodiment, the generated hash value is stored in the IPFS using a smart contract to save the image. After storing each piece of picture data, the IPFS calculates a 32-byte hash value, and stores the hash value on an EtherFang, and stores the data in an intelligent contract. However, current etherhouses have limited contracts to store up to 24KB sized data. That is, each smart contract can only store 24KB of data (24,576 characters), while each piece of data stored on the IPFS requires 32 bytes of storage, so 768 pieces of data can be stored per transaction.
In the whole experiment, the task of screening, storing and segmenting the cardiac images from the data of the cardiac CT images and the data of the brain CT images is simulated, and the storage space, time and segmentation efficiency of the cardiac CT images are verified by an artificial intelligent block chain system which utilizes data pre-screening to improve the block chain data processing efficiency.
In the experimental data, there were 1,622 brain CT images and 654 heart CT images. If classification processing is not carried out, 2,276 pictures are shared in total, and all data can be stored only by issuing three intelligent contracts; in addition, generally, hospitals have more than two kinds of data, and need to issue more intelligent contracts to store all the data, which greatly wastes the storage space and efficiency of the block chain. After the preprocessing operation proposed by us, only cardiac images are stored after the cardiac CT images and other images are classified for the cardiac medical image segmentation task. For this experiment, only one intelligent contract needs to be issued to store all data.
For the data subjected to preprocessing operation, respectively counting the time for issuing an intelligent contract storage data and the time for calling a contract to read the data after the contract deployment is finished; similarly, for data which is not subjected to preprocessing operation, the time for issuing three pieces of intelligent contract storage data and the time for calling the contract to read the data after the contract is completely deployed are respectively counted. The statistical results are shown in table 1.
TABLE 1 store and read data times
Deploying smart contracts Store data Readdata
Preprocessing 38s 13s 2s
Without preprocessing 75s 63s 6s
As for the storage space, the space required for storing the data subjected to the preprocessing operation and the data not subjected to the preprocessing operation is shown in table 2, and it can be seen that the space occupied by the data not subjected to the preprocessing operation is about 76 times that occupied by the data subjected to the preprocessing operation. Although the storage structure using "IPFS + blockchain" does not store data on the blockchain, but only stores the hash value, it also takes more storage space to store the hash value.
TABLE 2 storage space
size bytes
all 1.21GB 1,300,990,887
heart 16.3MB 17,136,473
It can be seen that the time spent deploying the smart contracts and the time to store and read the data of the images after the preprocessing operation are greatly reduced.
Therefore, preprocessing is performed before the data uploading block chain, and the method has a positive effect on reducing time and space occupied by the uploading of the redundant data.
For the heart medical image segmentation task, on one hand, different types of image segmentation tasks are different, and if the images are segmented together forcibly, the segmentation effect is influenced; on the other hand, different image storage formats of different hospitals require certain processing on data, which affects the segmentation effect. In the experiment, 654 cardiac CT images in MHA format and 1,622 brain CT images in DICOM format were used as the images for segmentation. In order to evaluate the accuracy of the cardiac medical image segmentation task, the segmentation accuracy with and without pre-processing operations was compared. After the preprocessing operation, the obtained image only contains the heart CT image in the MHA format; images which are not subjected to preprocessing operation are in different formats, and can be divided together after being converted into the PNG format.
Next, we perform the image segmentation task on the stored data after downloading using the FCN and UNet network models.
We compare the segmentation results of the preprocessed and non-preprocessed data in the FCN and UNet network structures, respectively.
In addition, since the medical images have a plurality of storage formats, including DICOM format, NIFTI format, analyze format, etc., if the medical images are not processed to a certain extent, the experimental process and results are also affected, so that the images which are not preprocessed are also uniformly converted into PNG format in the experiment, and the segmentation effect is tested to be used as reference.
For a heart medical image segmentation task, the data formats of two hospitals are simulated to be different, the segmentation contents are different, and in order to simulate the data which is not preprocessed, namely the two data are not classified and mixed together, the data are converted into the same format and then the segmentation task is carried out. (since the medical image data stored in each hospital are different in standard, even if the medical image data are converted into MHA format or DICOM format, other information is stored in the image, and the formats cannot be completely unified, we convert both images into PNG images.) the quantitative indexes of the heart medical image segmentation task are shown in Table 3.
TABLE 3 quantitative indicators for cardiac medical image segmentation task
Dice Jacrd Auc
All-PNG-FCN 0.29794 0.19846 0.64385
All-PNG-UNet 0.29493 0.19668 0.65367
Heart-PNG-FCN 0.49019 0.34289 0.6712
Heart-PNG-UNet 0.62223 0.46067 0.73022
Heart-MHA-FCN 0.91515 0.8498 0.93069
Heart-MHA-UNet 0.93565 0.88559 0.9498
We compare the segmentation results of the preprocessed and non-preprocessed data in the FCN and UNet network structures, respectively: the data which is not subjected to the preprocessing operation is far less effective in segmentation after being converted into the PNG format than the data subjected to the preprocessing operation. The effect of dividing the data which is subjected to the preprocessing operation into PNGs or the original MHA format is better than that of dividing the data which is not subjected to the preprocessing operation. It can also be seen that the data format is transformed and the segmentation effect is also affected. The data is not converted after being preprocessed.
Therefore, a pre-processing operation (including classification) of the data prior to the data chaining is also necessary for the segmentation of the medical image.
The artificial intelligence block chain system for improving the block chain data processing efficiency by utilizing data pre-screening can reasonably utilize the storage space in performance, the time for storing and reading data is not increased, and the stored data can play a role in improving the segmentation precision and efficiency in the medical image segmentation task.
In this embodiment, the "artificial intelligence blockchain" system that utilizes data prescreening to improve efficiency of blockchain data processing is operated on a Windows10 operating system, a PC computer with a CPU model of i5-1135G7 and a memory of 16GB, and a curalcoud deep learning server, and specific configuration parameters of the server are as follows: the Ubuntu 18.04LTS operating system has a CPU model of I7-7700K and two Nvidia GTX 2080Ti video cards with 11GB video memory.
The embodiment aims at the input data of the medical image segmentation task in artificial intelligence, and solves the problems of storage safety, efficiency, data interaction and the like, and the artificial intelligence and the block chain technology are combined together. The block chain technology is utilized to solve the problems of data storage safety and interaction, and meanwhile, the artificial intelligence technology helps to solve the problem of storage efficiency of the block chain technology.
Experiments prove that the method for improving the block chain data processing efficiency by utilizing data pre-screening provided by the invention has a promotion effect on a medical image segmentation task. The method not only can provide a detailed and targeted data source for the medical image segmentation task, but also saves the time for downloading the stored data in the medical image segmentation task, and links the stored data after preprocessing, thereby improving the precision and the segmentation efficiency of the medical image segmentation task.
In an embodiment of the present invention, an apparatus for improving efficiency of blockchain data processing by data pre-filtering is provided, and referring to fig. 3, fig. 3 is a schematic diagram of an apparatus 300 for improving efficiency of blockchain data processing by data pre-filtering in an embodiment of the present invention, the apparatus including:
a preprocessing module 301, configured to perform preprocessing on data, where the preprocessing includes:
classifying and cutting the data;
a data storage module 302, configured to store the preprocessed data in an IPFS; generating a hash value corresponding to the stored data;
the sending module 303 returns the corresponding hash value serving as a credential of the user stored data to the user terminal storing the data; the certificate is used for reading the stored data from the IPFS client by a user storing the data;
and the permission obtaining module 304 is configured to deploy an intelligent contract in the ether house private chain, and other users except the user storing the data in the ether house private chain obtain permission to read the stored data from the IPFS client through the intelligent contract.
In an embodiment of the present invention, the right acquiring module includes:
the permission acquisition first sub-module is used for other users except the user for storing the data in the Etherhouse private chain and directly acquiring the reading permission of part of the stored data disclosed on the Etherhouse private chain;
and the permission acquisition second sub-module is used for calling an intelligent contract to send a permission request by other users except the user storing the data in the ether house private chain, and acquiring the complete read permission of the stored data after the permission request is verified.
And the data acquisition module is used for reading and downloading the stored data from the IPFS.
The device for improving the block chain data processing efficiency by utilizing data pre-screening is used for preprocessing before data storage, so that the storage space occupied by unnecessary data is reduced, the storage time is consumed, the data is stored in an IPFS by utilizing the IPFS + Etherhouse private chain structure, the storage space of the block chain is expanded, the cost is saved, meanwhile, an intelligent contract is deployed in the Etherhouse private chain, other users can call the intelligent contract to obtain the stored data, the data is not stored in a centralized manner any more, the data is not isolated any more, and the greater value can be played in interaction. The data is equivalently stored in the IPFS through an EtherFang private chain, so that the safety and the privacy are guaranteed.
The above description is only for the purpose of illustrating the preferred embodiments of the present disclosure and is not to be construed as limiting the embodiments of the present disclosure, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the embodiments of the present disclosure are intended to be included within the scope of the embodiments of the present disclosure.
The above description is only a specific implementation of the embodiments of the present disclosure, but the scope of the embodiments of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present disclosure, and all the modifications or substitutions should be covered within the scope of the embodiments of the present disclosure. Therefore, the protection scope of the embodiments of the present disclosure shall be subject to the protection scope of the claims.

Claims (9)

1. A method for improving efficiency of processing blockchain data by using data pre-filtering, the method comprising:
pre-processing data, the pre-processing comprising: classifying and cutting the data;
storing the preprocessed data in an IPFS (Internet protocol file system), and generating a hash value corresponding to the stored data;
the corresponding hash value is used as a certificate of the user stored data and returned to a user terminal of the stored data, wherein the certificate is used for the user of the stored data to read the stored data from the IPFS client;
deploying an intelligent contract in the EtherFang private chain, wherein the intelligent contract is used for other users except the user storing the data in the EtherFang private chain to obtain the authority of reading the stored data from the IPFS client.
2. The method of claim 1, wherein preprocessing the data comprises:
classifying the data by adopting a ResNet network model;
dividing data required by a classification task into a class;
dividing data not needed by the classification task into another class;
and screening out data required by the classification task.
3. The method of claim 1, wherein the smart contracts correspond to a type of interaction of the stored data; deploying intelligent contracts in an EtherFang private chain, comprising:
and under the condition that the data required to be stored by the user is divided into a plurality of categories, deploying interaction of a plurality of intelligent contracts corresponding to the data of each category.
4. The method of claim 1, further comprising:
saving the positions of a part of the corresponding hash values in the intelligent contract;
and the other users directly read a part of the stored data from the IPFS client through the positions of a part of the hash values in the corresponding hash values stored in the intelligent contract.
5. The method of claim 4, further comprising:
the other users call the intelligent contract and send reading requests to the users storing the data under the condition that the other users need to obtain the complete stored data;
and the user storing the data is verified, after the verification is passed, the other users obtain the authority of reading the complete stored data from the IPFS client, and the stored data is read from the IPFS client.
6. The method of claim 5, wherein the other users obtaining permission to read the complete stored data from the IPFS client comprises:
the other users call the intelligent contract to send the account resources and the public key of the other users to the user storing the data, and after the user storing the data confirms the information, the corresponding hash value encrypted by the public key of the other users is sent to the other users;
and after receiving the corresponding hash value encrypted by the public key of the other user, decrypting the hash value by using the private key of the other user, and reading the stored data from the IPFS client according to the corresponding hash value obtained after decryption.
7. The method of claim 1, further comprising:
proportionally splitting the stored data into a first part of data, a second part of data and the rest part of data;
the first part of data is used for being directly disclosed on the EtherFang private chain, so that all other users can directly read the first part of data from the IPFS client;
the second part of data is used for some other user to call the intelligent contract, and after verification is completed, the other user reads the second part of data from the IPFS client;
and the residual partial data is used for marking the other users who do not fulfill the content of the intelligent contract as dishonest in the case that the other users do not fulfill the content of the intelligent contract after reading the second partial data, and preserving the residual partial data so that the other users who do not fulfill the content of the intelligent contract cannot obtain the complete stored data.
8. The method of claim 1, further comprising:
encrypting the corresponding hash value and storing the encrypted hash value in the Ether house private chain;
verifying whether the corresponding hash value is changed before the other users obtain the authority to read the stored data from the IPFS client;
in the event that the corresponding hash value has not changed, the stored data has not been tampered with.
9. An apparatus for improving efficiency of blockchain data processing using data pre-filtering, the apparatus comprising:
a preprocessing module for preprocessing data, the preprocessing comprising: classifying and cutting the data;
the data storage module is used for storing the data obtained after the preprocessing in the IPFS; generating a hash value corresponding to the stored data;
the sending module is used for returning the corresponding hash value serving as a certificate of the user storage data to the user terminal for storing the data; the voucher is used for a user storing data to read the stored data from an IPFS client;
and the permission obtaining module is used for deploying an intelligent contract in the Etherhouse private chain, and other users except the user storing the data in the Etherhouse private chain obtain the permission for reading the stored data from the IPFS client through the intelligent contract.
CN202211252817.9A 2022-10-13 2022-10-13 Method for improving block chain data processing efficiency by data pre-screening Pending CN115576945A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211252817.9A CN115576945A (en) 2022-10-13 2022-10-13 Method for improving block chain data processing efficiency by data pre-screening

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211252817.9A CN115576945A (en) 2022-10-13 2022-10-13 Method for improving block chain data processing efficiency by data pre-screening

Publications (1)

Publication Number Publication Date
CN115576945A true CN115576945A (en) 2023-01-06

Family

ID=84585292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211252817.9A Pending CN115576945A (en) 2022-10-13 2022-10-13 Method for improving block chain data processing efficiency by data pre-screening

Country Status (1)

Country Link
CN (1) CN115576945A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112817916A (en) * 2021-02-07 2021-05-18 中国科学院新疆理化技术研究所 Data acquisition method and system based on IPFS
CN112818038A (en) * 2021-02-02 2021-05-18 山东伏羲智库互联网研究院 Data management method based on combination of block chain and IPFS (Internet protocol file system) and related equipment
CN113434094A (en) * 2021-07-08 2021-09-24 山东中科好靓科技有限公司 Data file storage and extraction method based on IPFS
CN114489477A (en) * 2021-12-20 2022-05-13 青岛鹏海软件有限公司 Decentralized distributed storage method based on block chain
CN114676462A (en) * 2022-01-10 2022-06-28 南京铉盈网络科技有限公司 Data storage system, method and device based on Ether house and intelligent contract

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818038A (en) * 2021-02-02 2021-05-18 山东伏羲智库互联网研究院 Data management method based on combination of block chain and IPFS (Internet protocol file system) and related equipment
CN112817916A (en) * 2021-02-07 2021-05-18 中国科学院新疆理化技术研究所 Data acquisition method and system based on IPFS
CN113434094A (en) * 2021-07-08 2021-09-24 山东中科好靓科技有限公司 Data file storage and extraction method based on IPFS
CN114489477A (en) * 2021-12-20 2022-05-13 青岛鹏海软件有限公司 Decentralized distributed storage method based on block chain
CN114676462A (en) * 2022-01-10 2022-06-28 南京铉盈网络科技有限公司 Data storage system, method and device based on Ether house and intelligent contract

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
国家知识产权局学术委员会组织: "《产业专利分析报告 第66册 区块链》", 《北京:知识产权出版社》, pages: 92 *

Similar Documents

Publication Publication Date Title
Neumann et al. Computation of likelihood ratios in fingerprint identification for configurations of three minutiae
EP3616383A1 (en) Systems and methods for enforcing centralized privacy controls in de-centralized systems
Mohsin et al. Based medical systems for patient’s authentication: Towards a new verification secure framework using CIA standard
Devaraj et al. An efficient framework for secure image archival and retrieval system using multiple secret share creation scheme
CN110929806B (en) Picture processing method and device based on artificial intelligence and electronic equipment
CN109815051A (en) The data processing method and system of block chain
Chen et al. Study and implementation on the application of blockchain in electronic evidence generation
US11769577B1 (en) Decentralized identity authentication framework for distributed data
CN116168820A (en) Medical data interoperation method based on virtual integration and blockchain fusion
CN117238458A (en) Critical care cross-mechanism collaboration platform system based on cloud computing
Dobbs et al. On art authentication and the Rijksmuseum challenge: A residual neural network approach
Elgohary et al. Improving uncertainty in chain of custody for image forensics investigation applications
Liu et al. A hybrid with distributed pooling blockchain protocol for image storage
Zhang et al. [Retracted] Design and Application of Electronic Rehabilitation Medical Record (ERMR) Sharing Scheme Based on Blockchain Technology
US20230070625A1 (en) Graph-based analysis and visualization of digital tokens
CN116414875A (en) Data processing apparatus and data processing method
CN110535630A (en) Key generation method, device and storage medium
Ukanah et al. Blockchain application in healthcare
CN107392042A (en) Electric network data monitoring method and device
CN115576945A (en) Method for improving block chain data processing efficiency by data pre-screening
CN116346448A (en) Group image drawing method and device based on federal learning
Al-Dhlan et al. Customizable encryption algorithms to manage data assets based on blockchain technology in smart city
Maghraby et al. Applied blockchain technology in saudi arabia electronic health records
US11537742B2 (en) Sampling from a remote dataset with a private criterion
CN113626776A (en) Information carrier concept attribute transfer and electronic signature printable method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230106