CN113343284B - Private data sharing method based on block chain - Google Patents

Private data sharing method based on block chain Download PDF

Info

Publication number
CN113343284B
CN113343284B CN202110878099.5A CN202110878099A CN113343284B CN 113343284 B CN113343284 B CN 113343284B CN 202110878099 A CN202110878099 A CN 202110878099A CN 113343284 B CN113343284 B CN 113343284B
Authority
CN
China
Prior art keywords
data
bridge server
value
private
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110878099.5A
Other languages
Chinese (zh)
Other versions
CN113343284A (en
Inventor
张金琳
俞学劢
高航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Shuqin Technology Co Ltd
Original Assignee
Zhejiang Shuqin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Shuqin Technology Co Ltd filed Critical Zhejiang Shuqin Technology Co Ltd
Priority to CN202110878099.5A priority Critical patent/CN113343284B/en
Publication of CN113343284A publication Critical patent/CN113343284A/en
Application granted granted Critical
Publication of CN113343284B publication Critical patent/CN113343284B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/56Financial cryptography, e.g. electronic payment or e-cash
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of block chains, in particular to a privacy data sharing method based on a block chain, which comprises the following steps: the method comprises the steps that a bridge server and a plurality of data stations are built, and the bridge server encrypts private data and then dispersedly stores the encrypted private data on the data stations; the data demand side submits the data processing model to the bridge server; the bridge server receives the data introduction written by the participants, associates the data introduction with the identification and then displays the data introduction; the data source side and the data demand side both open virtual accounts on the block chain; the data demand party specifies a private data identifier and transfers tokens with corresponding quantity to an account bound by the bridge server; recovering the private data; inputting the data line into a data processing model and bringing the data line into charging; and destroying the recovered private data, transferring the token corresponding to the charging to a virtual account of the data source side, and returning the rest tokens. The substantial effects of the invention are as follows: when the private data are used, the private data do not leave the bridge server and the data station, and the privacy and the safety of the private data are guaranteed.

Description

Private data sharing method based on block chain
Technical Field
The invention relates to the technical field of block chains, in particular to a private data sharing method based on a block chain.
Background
With the development of information technology, a great deal of data is generated during human life and production activities. And further promote the rapid development of big data technology and artificial intelligence technology. The current further development of big data technology is mainly limited by data shortage. The most important technology in the artificial intelligence technology is machine learning technology, and data is the basis of machine learning. However, the data is insufficient or not rich enough, so that the machine learning model cannot be effectively established. In most industries, data often exists in an isolated island form due to problems of industry competition, privacy security, complex administrative procedures and the like. Even the centralized integration of data among different departments of the same company faces significant weight resistance. In reality, it is almost impossible or expensive to integrate data distributed in various places and organizations. With the further development of artificial intelligence, it has become a worldwide trend to attach importance to data privacy and security.
In addition to data privacy concerns, the distribution of benefits during data integration is also an important limiting factor. The data source side who possesses the data worries about data leakage, and after selling once, the buyer can be good at reselling data under the condition that the data source side is unknown, thereby brings leakage risk to private data, and the interests of the data source side are damaged. When a data demand party purchases data, the problem that the integrity and the reality of the data are difficult to control exists. If duplicate data is purchased, it is costly. If the value of the purchased data is difficult to grasp, the willingness of the purchaser to purchase the goods is very low. Therefore, a data integration method which can ensure data privacy and prevent data from being disclosed and can provide data value reference for data demanders needs to be developed.
Chinese patent CN110580245B, published 2020, 3/10, a method and apparatus for sharing private data; the method is applied to the block chain node and can comprise the following steps: receiving a first establishing transaction aiming at a service contract initiated by a user, wherein the first establishing transaction comprises a service code and an authority control code, and the service code is executed when the transaction for calling the service contract is received; and deploying the service contract, so that when receiving a query transaction which is initiated by a query party and aims at the privacy data related to the historical transaction for invoking the service contract, invoking the service contract to execute the authority control code defined in the service contract to determine the query authority of the query party, and acquiring the privacy data to be viewed by the query party when the query authority is allowed. The method can not ensure that the inquirer cannot privately store the copy when checking the data, so that the privacy and the safety of the data cannot be ensured. And cannot provide a reference for value prediction for the inquirer.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the technical problem that a scheme for effectively facilitating the sharing of the private data and simultaneously ensuring the security of the private data is lacked at present. The privacy data sharing method based on the block chain is provided, so that the safety of the privacy data of a data source side can be guaranteed, a reference on value judgment is provided for a data demand side, and the data sharing willingness is effectively improved.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a private data sharing method based on a block chain comprises the following steps: the method comprises the steps that a bridge server and a plurality of data stations are built, a data source side with private data submits the private data to the bridge server, the bridge server distributes identification for the private data, and the private data are stored on the data stations in a scattered mode after being encrypted; submitting the data processing model to a bridge server by a data demand party needing to use data, wherein the bridge server establishes a historical execution data hash value table for each data processing model; the bridge server receives a data introduction written by the participants and displays the data introduction in association with the identification;
the data source side and the data demand side both open virtual accounts on the block chain; when a data demand party wants to use the private data, a private data identifier is appointed, and tokens with corresponding quantity are transferred to an account bound by a bridge server; the bridge server is communicated with a plurality of data stations and recovers private data; verifying whether the hash value of the data line of the private data exists in the corresponding historical execution data hash value table or not, if so, skipping the data line, and if not, adding the hash value of the data line into the historical execution data hash value table, inputting the data line into a data processing model and bringing the data line into charging; and destroying the recovered private data, feeding back the result of the data processing model to the data demand party, transferring the token corresponding to the charging to the virtual account of the data source party, and returning the rest tokens.
Preferably, the data introduction comprises a plurality of example data and data descriptions, wherein the data descriptions comprise the meaning, the format, the value range, the data time range and the data source of each field of the privacy data; the method comprises the steps that a data demander publishes a detection model through a bridge server, the input of the detection model is privacy data, the output of the detection model is a value index, when the bridge server receives new privacy data, the detection model is executed, and if the value index output by the detection model exceeds a preset threshold value, the identification and the value index of the privacy data are transmitted to the data demander.
Preferably, the bridge server adds a normalization field to the numerical field according to a field value range recorded in the data introduction, the bridge server runs a statistical detection model, the statistical detection model calculates the integrity of the private data, the boundary value of the numerical field, the average value of the numerical field, the variance of the numerical field, and the dispersion degree of the tag values, the integrity of the private data is the percentage of the data values which are not empty in the total data value, and the dispersion degree of the tag values is the ratio of the quantity value with the least number value of the field value of the tag type to the quantity value with the most number value.
Preferably, the bridge server runs a learning detection model, the learning detection model establishes a detection neural network model, the input of the detection neural network model is a plurality of fields of private data, the output of the detection neural network model is a label field of the private data, the detection neural network model has initialized weight coefficients, the learning detection model divides the private data into a training set and a testing set, after the data in the training set is input into the detection neural network model, the accuracy is obtained through the testing set, the learning detection model discloses the input field name, the output field name and the accuracy of the detection neural network model, the input field and the output field of the detection neural network model are randomly selected from the fields of the private data by the learning detection model as initial values, and the data demander submits a request for changing the input field and the output field selected by the learning detection model, when the number of data demanders submitting the same change request exceeds a preset threshold, the input field and the output field of the detection neural network model are changed into the field specified by the change request.
Preferably, the detection model comprises a neural network model trained by the data demander, the data line of the private data is input into the neural network model, and if the number of times of the output result of the neural network model not meeting the data line exceeds a preset threshold value, the identifier of the private data is transmitted to the data demander.
Preferably, the method for storing the encrypted private data in a plurality of data stations in a scattered manner comprises the following steps: the bridge server extracts the hash value of the data line and distributes an identifier for the data line; the bridge server cuts the encrypted data line into a plurality of subdata with preset length, if the length of the last subdata is not enough, 0 is supplemented, and the number of the subdata is matched with the number of the data stations; distributing the sub data to data stations and sending the mark to the data stations; the data station is provided with a plurality of storage areas, each storage area comprises a plurality of storage blocks, the space of each storage block is matched with the space required by the subdata and the identifier, and the identifier occupies a preset length space and is positioned in front of the subdata during storage; the data station stores a plurality of exchange pairs, each exchange pair comprises two binary sequences, the data station additionally stores latest received subdata in an idle storage block of a current storage area, checks whether the latest subdata and the last subdata have an exchange pair aligned according to bits, exchanges the aligned exchange pairs with contents in a storage position if the latest subdata and the last subdata exist, and does not exchange if the exchange pair is located in an area corresponding to an identifier; if the storage area has no free storage block, storing the subdata in the first storage block of the new storage area without checking the exchange pair; when the bridge server restores the data line, the identification of the data line is sent to the data station, and the data station finds the storage block of the subdata storage according to the identification; firstly, downwards checking whether an aligned exchange pair exists between the data of the next storage block and the data of the next storage block, and if so, continuously checking whether an aligned exchange pair exists between the next storage block and the next storage block at the position of the checked exchange pair; if yes, continuing to check whether an aligned exchange pair exists or not at the newly checked exchange pair position until the exchange pair is not checked or the last storage block of the storage area is reached; then detecting whether an aligned exchange pair exists with the data of the previous storage block, and if so, exchanging the storage position of the aligned exchange pair; copying a copy of all the storage blocks which are checked to exist the exchange pairs downwards, and starting from the last storage block, sequentially exchanging and aligning the storage positions of the exchange pairs to recover data until all the exchange pairs are recovered; and feeding back the data in the storage block corresponding to the identification in the copy to the bridge server.
Preferably, the data station establishes an exchange pair table, the data station associates several exchange pairs with each storage area, the exchange pair table records the storage area identifier and the exchange pair, and the storage area associates multiple exchange pairs, so that multiple records are established in the exchange pair table.
Preferably, the data processing model is a neural network model, the result to be obtained by the participant is the neural network model trained by the private data, and the bridge server extracts the weight coefficients of the neural network model to form a weight coefficient vector; and comparing the distance between the front weight coefficient vector and the rear weight coefficient vector of the data row substituted into the neural network, if the distance is less than a preset threshold value, the data row is not charged, and if the distance is more than or equal to the preset threshold value, the data row is charged.
Preferably, the method for storing the encrypted private data in a plurality of data stations in a scattered manner comprises the following steps: the bridge server extracts the hash value of the data line and distributes an identifier for the data line; copying a plurality of copies for each data line of the private data, wherein the number of the copies is the same as that of the data stations; dividing the original value of the numerical type field into a plurality of addends by the numerical type field in the row data line, and respectively distributing the addends to a plurality of copies; and the data station encrypts the copies and stores the encrypted copies in association with the identifiers.
Preferably, when the data demander submits the data processing model to the bridge server, if the data processing model is declared to be the neural network model, the bridge server and the data station execute the following steps: reading layer 0 and layer 1 of the neural network model; obtaining related numerical fields according to the layer 0, sending the related numerical fields and the identification of the data line to a data station, and extracting neurons and weight coefficients of the layer 1; the bridge server performs the following steps for each neuron of layer 1 in turn: enumerating connected input neurons to obtain corresponding field names; transmitting the field name, the corresponding weight coefficient, the offset value and the excitation function to a data station; if the field is a numerical value type, the data station multiplies a copy value corresponding to the stored field name by a weight coefficient and feeds back the multiplied copy value to a bridge server, and the bridge server adds the feedback values received by the field and adds an offset value to the added feedback values and substitutes the added feedback values into a value obtained by an excitation function to be used as the output of the neuron; and if the field is of a label type, directly feeding back a label value corresponding to the stored field name to the bridge server.
Preferably, the method for storing the encrypted private data in a plurality of data stations in a scattered manner comprises the following steps: the bridge server extracts the hash value of the data line and distributes an identifier for the data line; copying a plurality of copies for each data line of the private data, wherein the number of the copies is the same as that of the data stations; in the numerical value type fields in the row data lines, the normalized values of the numerical value type fields are subjected to 1 to N power calculation, and the values subjected to 1 to N power calculation of the original values are respectively split into a plurality of addends; respectively sending the plurality of copies to a plurality of data stations, sending the identification of the data line to the data stations, and storing the encrypted copies and the identification in a correlation manner by the data stations; when a data processing model is executed, a plurality of data stations respectively call stored original values to calculate the split addend of the power from 1 to N, and the coefficient of each item in a Taylor expansion formula is obtained according to Taylor expansion of an exp (x) function; multiplying the coefficient of each item by the addend of the stored corresponding times, adding the multiplied coefficients, encrypting the obtained result and sending the encrypted result to the bridge server; the bridge server obtains the results sent by all the data stations, adds the results, and substitutes the results into the excitation function to be used as the output of the neuron.
The substantial effects of the invention are as follows: 1) the data station is established to encrypt and store the private data, and the bridge server is used to receive the data processing model of the data demanding party and operate, so that the data demanding party can not directly see the specific value of the private data any more without leaving the bridge server and the data station when the private data is used, and the privacy and the safety of the private data are ensured; 2) the bridge server automatically charges according to the calling of the private data and automatically transfers accounts through the token, so that the data of a data source party can bring benefits, a data demand party does not need to worry about repudiation or delayed payment, meanwhile, the bridge server compares a data row with a historical execution data hash value table, if the same data is executed once, the data is not charged, and the benefit of the data demand party is guaranteed; 3) through the improved encryption storage, in the ordinary storage, the real encrypted value does not exist in the storage space, the real encrypted value is only temporarily recovered when in use and is destroyed after use, so that the existence duration of the real value is reduced, and the safety of data is improved; 4) when the data value is used for establishing the neural network model for training, the bridge server disassembles the neural network model, the layer 0 and the layer 1 are handed to the data station for execution, the data station directly returns the output value of the neuron of the layer 1, and the bridge server processes the subsequent layers, so that the numerical field does not need to restore the true value in the whole process of the neural network model training, the true value can be hidden, and the safety and the privacy of data are improved.
Drawings
Fig. 1 is a schematic diagram of private data sharing according to an embodiment.
Fig. 2 is a schematic diagram of a private data flow according to an embodiment.
FIG. 3 is a diagram illustrating private data preprocessing according to an embodiment.
FIG. 4 is a diagram illustrating private data line preprocessing, according to an embodiment.
FIG. 5 is a diagram illustrating an implementation of a learning detection model according to an embodiment.
FIG. 6 is a diagram illustrating a sub data storage according to an embodiment.
FIG. 7 is a schematic diagram of a data line storing process according to an embodiment.
Fig. 8 is a schematic diagram illustrating a data line restoration process according to an embodiment.
FIG. 9 is a flow chart illustrating an embodiment of a two data line storage process.
FIG. 10 is a schematic diagram of a processing flow of a neural network model according to an embodiment.
Wherein: 10. data station, 20, bridge server, 30, data source side, 31, privacy data, 32, data description, 33, example data, 40, data demander, 41, data processing model, 42, result, 43, probe model, 44, value index, 311, numeric field, 312, normalized field, 313, tag field, 314, identity, 315, subdata, 316, exchange pair, 431, learning probe model.
Detailed Description
The following provides a more detailed description of the present invention, with reference to the accompanying drawings.
The first embodiment is as follows:
a method for sharing private data 31 based on a blockchain, referring to fig. 1, comprising the following steps:
step S1), constructing a bridge server 20 and a plurality of data stations 10, and submitting the private data 31 to the bridge server 20 by the data source party 30 with the private data 31;
step S2), the bridge server 20 distributes the identification 314 for the private data 31, and the private data 31 are stored on a plurality of data stations 10 in a scattered manner after being encrypted;
step S3) the data demander 40 that needs to use the data submits the data processing model 41 to the bridge server 20;
step S4) the bridge server 20 creates a history execution data hash value table for each data processing model 41;
step S5) the bridge server 20 receives the data introduction composed by the participants and presents it in association with the identification 314;
step S6) the data source party 30 and the data demand party 40 both open virtual accounts on the blockchain;
step S7) when the data demander 40 wants to use the private data 31, designating the private data 31 identifier 314 and transferring a corresponding amount of tokens to the account bound by the bridge server 20;
step S8) the bridge server 20 communicates with the plurality of data stations 10 and restores the private data 31;
step S9) verifies whether the hash value of the data line of the privacy data 31 exists in the corresponding history execution data hash value table;
step S10), if the data exists, skipping the data line;
step S11), if the hash value does not exist, adding the hash value of the data line into a historical execution data hash value table, inputting the data line into the data processing model 41, and taking the data line into the charging;
step S12), the restored private data 31 is destroyed, the result 42 of the data processing model 41 is fed back to the data demanding party 40, the token corresponding to the charging is transferred to the virtual account of the data source party 30, and the remaining token is returned. The establishment of the virtual account and the transfer of the token on the block chain belongs to the prior art in the field and is not described herein. The pricing of each piece of data is negotiated between the data sharing participants or regulated by market supply-demand relationships.
Connection relationship between the bridge server 20 and the data stations 10 constructed referring to fig. 2, a plurality of data stations 10 are connected to the bridge server 20. In addition to establishing a communication connection between the data stations 10, the data stations 10 are only in communication connection with the bridge server 20. Due to the adoption of the encryption technology, the connection can also ensure the safety of data privacy under the unreliable condition. But there is no guarantee that the data will not be tampered with, resulting in the need to transmit again. Measures should be taken to ensure the reliability of the communication. A private network mode or a virtual local area network communication mode can be adopted. The data stations 10 are established in different geographical locations, close to different parties, or kept in custody by the parties. The program running on the data station 10 is limited to implementing the method described in the present embodiment and the necessary underlying services that enable the computer to run. It is possible to install software developed to implement the method of the present embodiment on a clean operating system and then disable the computer on which the data station 10 is located from installing any other software. The work logs of the data station 10 should be periodically uploaded to the blockchain storage for supervision. In this embodiment, the data station 10 only stores the sub-data 315 of the private data 31, and does not obtain a complete data line, so that there is no need to worry about the problem of data leakage excessively. The bridge server 20 should be built into the trusted execution environment, provided by the vendor that provides the trusted execution environment. The bridge server 20 is the weak point of the embodiment, and the failure of the bridge server 20 may result in failure to provide the service. The control of the bridge server 20 by an attacker would result in the disclosure of the private data 31, and thus the security of the bridge server 20 needs to be guaranteed. The technical solution provided by this embodiment aims to solve the problem that there is no malicious participant, but the participant keeps curiosity, that is, the participant tries to view the received private data 31 as much as possible, and provides a secure sharing method for the private data 31. In the prior art, when the data demander 40 uses the private data 31, the original data of the private data 31 stays on the computer of the data demander 40, or even is stored. The embodiment provides a technical scheme that when the data demand party 40 uses the private data 31, the data demand party does not need to touch the original data of the private data 31 and does not need to store the private data 31. A scheme with a data isolation function is constructed between the data demand side 40 and the data source side 30, so that the data is available and invisible, and the security of the private data 31 is protected. In this embodiment, the bridge server 20 may also be implemented by using a distributed server technology, and the implementation by using a distributed server has higher security. The distributed servers respectively obtain the partially restored private data 31, then the secure multi-party calculation is established to complete the execution of the data processing model 41, and in the process, each server cannot obtain a complete data line, so that the distributed servers are more secure and reliable, but the execution efficiency is sacrificed.
The data source side 30 performs preprocessing after submitting the private data 31, referring to fig. 3, the data introduction in the preprocessing includes a plurality of example data 33 and data description 32, and the data description 32 includes the meaning, format, value range, data time range and data source of each field of the private data 31. The preprocessing further comprises the data demander 40 issuing a detection model 43 through the bridge server 20, the input of the detection model 43 being the privacy data 31, the output of the detection model 43 being the value index 44, executing the detection model 43 when the bridge server 20 receives new privacy data 31, and transmitting the identification 314 of the privacy data 31 and the value index 44 to the data demander 40 if the value index 44 output by the detection model 43 exceeds a preset threshold. The private data 31 with high use value can be automatically found for the data demander 40 through the detection model 43, so that the technical effect of saving the time and energy for the data demander 40 to search data is achieved. Similarly, after the value indicator 44 is obtained by using the detection model 43, before the data consumer 40 purchases the data, the value of the data can be estimated and judged, and the payment can be paid more intentionally, without worrying that the data cannot generate value after payment. Therefore, the selling price of the data is favorably improved, and the positivity of the data source party 30 for providing the data is further improved. The data source side 30 can provide more abundant data, and can bring more value to the data demand side 40, forming a virtuous circle.
Referring to fig. 4, the bridge server 20 adds a normalization field 312 to the numeric field 311 according to a field value range described in the data introduction, the bridge server 20 runs a statistical learning detection model, the statistical learning detection model counts the integrity of the private data 31, the boundary value of the numeric field 311, the average value of the numeric field 311, the variance of the numeric field 311, and the dispersion degree of the tag value of the tag field 313, the integrity of the private data 31 is the percentage of the data value that is not empty to the total data value, and the dispersion degree of the tag value of the tag field 313 is the ratio of the value with the least value to the value with the most value in the tag field 313. If the private data 31 provided by a certain bank includes the bank deposit balance, the minimum value of the balance is 0, the maximum value is 1 hundred million, and if the balance exceeds 1 hundred million, the normalized value is 1. The normalized value of the deposit between 0 and 1 hundred million is the quotient of the balance of the deposit and 1 hundred million. The data of the bank should not exceed the control of the bank, and the bank can construct and store one data station 10. The private data 31 provided by a certain merchant includes the monthly consumption number of the consumer, the consumption number is 1 when the consumer consumes the data in the same day, and the consumption number is still recorded as 1 when the consumer consumes the data in the same day, so that the maximum value of the monthly consumption number is the number of days in the same month, and the normalized value of the consumption number is the ratio of the monthly consumption number to the number of days in the same month.
The bridge server 20 runs with a learning detection model 431, the learning detection model 431 establishes a detection neural network model, please refer to fig. 5, the input of the detection neural network model is a plurality of fields of the private data 31, the output is a label field of the private data 31, the detection neural network model has initialized weight coefficients, the learning detection model 431 divides the private data 31 into a training set and a testing set, after the data in the training set is input into the detection neural network model, the accuracy is obtained through the testing set, the learning detection model 431 randomly selects an input field name, an output field name and the accuracy of the detection neural network model, the input field and the output field of the detection neural network model are selected as initial values from the fields of the private data 31 by the learning detection model 431, the data demander 40 submits a request for changing the input field and the output field selected by the learning detection model 431, when the number of data consumers 40 submitting the same change request exceeds a preset threshold, the input fields and the output fields of the neural network model are detected and changed to the fields specified by the change request.
The privacy data 31 provided by a certain bank is a bad-account loan record, and records the information of the borrower with bad accounts, including property information, historical repayment information, credit investigation, loan amount, interest rate, borrower academic information, overdue amount and overdue duration. The initial detection neural network model inputs property information, historical repayment information and loan amount and outputs loan academic information. Such a model is not significant in practice, and thus, a plurality of data consumers 40 submit modification requests to modify the input to property information, historical repayment information, credit investigation and lender scholarship information, and whether the input is overdue or not. After receiving a sufficient number of identical change requests, the probing neural network model is changed according to the change requests. The modified detection neural network model has higher reference value.
The detection model 43 includes a neural network model trained by the data demander 40, the data line of the private data 31 is input into the neural network model, and if the number of times that the output result of the neural network model does not accord with the data line exceeds a preset threshold value, the identifier 314 of the private data 31 is transmitted to the data demander 40. A data demander 40 has trained a neural network model that derives the loan repayment method that the borrower is most likely to select based on the property information, credit investigation, historical loan information, historical repayment information, the borrower's scholastic information, and the loan amount. The loan repayment mode comprises modes of equal-amount principal information, equal-amount principal money, monthly payment due repayment, partial loan repayment in advance, total loan repayment in advance and repayment with loan. Each mode has different characteristics and is suitable for different loan persons capital conditions and interest bearing capacity. The neural network model is used for predicting the loan repayment mode most possibly selected by the borrower, and the loan service can be promoted by giving emphasis when the loan service is developed. The neural network model trained by a certain bank can be applied when the accuracy of the neural network model reaches 83% in testing and actual use. But banks also want to improve the prediction accuracy of the neural network model, but in the case that the accuracy has reached 83%, no more data can improve its accuracy. Also, a large amount of data has been input into the training, and it makes no sense to purchase and train the same data, or a type of data to which the neural network has been adapted. At this time, the bank wants to improve the accuracy of the neural network model, and is confronted with the situation that even if data exists, the bank dares to buy the neural network model because the effect is difficult to predict. And the accuracy is improved from 83% to 83%, the required data amount is large, the cost is consumed, and if the effect of the degree of data is difficult to estimate, the waste of more cost is caused. The bank can be caused to meet the accuracy of 83 percent, and the bank can not turn to the data market any more, so that the data market loses transaction, and the data value is not reflected favorably. By using the detection model 43 provided by the embodiment, the neural network model established by the bank can be actively discovered, the data which is not adapted yet can be purchased in a targeted manner, the corresponding data can be continuously trained, the training effect can be ensured, the data demander 40 can be helped to save a large amount of funds, and the supply and demand relationship of the data market can be accurately matched.
Referring to fig. 7, a method for storing the encrypted private data 31 in a plurality of data stations 10 in a distributed manner includes:
step S101), the bridge server 20 extracts the hash value of the data line and distributes an identifier 314 for the data line;
step S102), the bridge server 20 cuts the encrypted data line into a plurality of subdata 315 with preset length, 0 is supplemented if the length of the last subdata 315 is not enough, and the number of the subdata 315 is matched with the number of the data stations 10;
step S103) assigning the plurality of sub-data 315 to the plurality of data stations 10, respectively, and sending the identifier 314 to the data station 10;
step S104) the data station 10 opens up a plurality of storage areas, each storage area comprises a plurality of storage blocks, the space of each storage block is matched with the space required by the subdata 315 and the identifier 314, and the identifier 314 occupies the space with the preset length and is positioned in front of the subdata 315 during storage; the data station 10 stores a plurality of exchange pairs 316, each exchange pair 316 comprises two binary sequences, and the data station 10 additionally stores the latest received subdata 315 in a free storage block of a current storage area, as shown in fig. 6;
step S105) checking whether the latest subdata 315 and the last subdata 315 have a bitwise aligned exchange pair 316, if so, exchanging the contents of the aligned exchange pair 316 at a storage position, and if the exchange pair 316 is located in the area corresponding to the identifier 314, not exchanging; if the storage area has no free storage blocks, the sub-data 315 is stored in the first storage block of the new storage area, and no check of 316 is exchanged.
Table 1 exchange pair 316 table
Storage area numbering Exchange pair 316A Exchange pair 316B
9AD 100011 101000
9AE 101011 101110
9AE 000111 101010
The data station 10 creates a table of exchange pairs 316, the data station 10 associating a plurality of exchange pairs 316 for each storage area, the exchange pairs 316 recording the storage area identification 314 and the exchange pairs 316, the storage area associating a plurality of exchange pairs 316, and creating a plurality of records in the exchange pairs 316 table. As shown in table 1, the swap pair 316 used in the storage area corresponding to each number is described as a swap pair 316 table. The exchange pair 316A and the exchange pair 316B constitute a pair of exchange contents, and the exchange pair 316A and the exchange pair 316B have no exchange order constraint. That is, the previous sub-data 315 contains the exchange pair 316A, and the aligned bit value of the next sub-data 315 is exactly the exchange pair 316B, the two are exchanged. If the previous sub-data 315 contains the swap pair 316B, the aligned bit value of the next sub-data 315 is exactly the swap pair 316A, and the two are also swapped. As shown in table 2, is a number of sub-data 315 stored in the storage area 9 AE. The memory area 9AE uses a total of two exchange pairs 316, i.e., "101011" - "101110" and "000111" - "101010". As can be seen in Table 2, there are three aligned swap pairs 316 in common, i.e., underlined, between the three child data 315 identified 314 as B36A55DE, B36A55DF, and B36A55E0, respectively. The results of exchanging the values of the three parts are shown in table 3, and the contents of each piece of sub-data 315 will be destroyed after exchanging, and the correct result cannot be decrypted even if the decryption key is obtained in time. So that the real data is hidden. The attacker needs to obtain the table of exchange pairs 316 to recover the data. In this embodiment, the exchange pair 316 table is encrypted and stored by the public key of the data station 10, and when it needs to be used, it is decrypted by the private key and stored in the memory, and then it is destroyed after being used.
Identifier 314 and subdata 315 stored in table 2 storage area 9AE
Identification 314 Subdata 315
B36A55DE 101011010…0101010100…1101010100111001
B36A55DF 101110000…0000111010…0100101011110111
B36A55E0 101010100…0110111111…1001101110101010
B36A56D7 101100100…0010000110…1100001100001111
TABLE 3 result of exchange of memory area 9AE over exchange pair 316
Identification 314 Subdata 315
B36A55DE 101110010…0000111100…1101010100111001
B36A55DF 101011000…0101010010…0100101110110111
B36A55E0 101010100…0110111111…1001101011101010
B36A56D7 101100100…0010000110…1100001100001111
Referring to FIG. 8, when bridge server 20 restores a data line:
step S106) sends the identification 314 of the data line to the data station 10;
step S107) the data station 10 finds the storage block stored by the sub-data 315 according to the identifier 314;
step S108), whether an aligned exchange pair 316 exists between the data of the next storage block and the next storage block is checked downwards, and if the aligned exchange pair 316 exists, whether the aligned exchange pair 316 exists between the next storage block and the next storage block is checked continuously at the position of the checked exchange pair 316;
step S109) if the exchange pair exists, continuing to check whether the exchange pair 316 is aligned at the position of the newly checked exchange pair 316 or not until the exchange pair 316 is not checked or the last storage block of the storage area is reached;
step S110) detecting whether an aligned exchange pair 316 exists with the data of the previous storage block, and if so, exchanging the storage position of the aligned exchange pair 316;
step S111) copying a copy of all the storage blocks which are checked downwards to have the exchange pairs 316;
step S112) starting from the last storage block, restoring data of the storage positions of the 316 concurrent exchange pairs in sequence until all the 316 exchange pairs are restored;
step S113) feeds back the data in the storage block corresponding to the identifier 314 in the copy to the bridge server 20.
The beneficial technical effects of this embodiment do: the data station 10 is established to encrypt and store the private data 31, the bridge server 20 is used to receive the data processing model 41 of the data demand party 40 and the data processing model is operated, so that the private data 31 does not need to leave the bridge server 20 and the data station 10 when being used, the data demand party 40 can not directly see the specific value of the private data 31, and the privacy and the safety of the private data 31 are guaranteed.
The bridge server 20 automatically charges according to the calling of the private data 31, and automatically transfers money through the token, so that the data of the data source party 30 can bring income without worrying about the repudiation or delayed payment of the data demand party 40, meanwhile, the bridge server 20 compares a data row with a historical execution data hash value table, if the same data is executed once, the data is not charged, and the benefit of the data demand party 40 is ensured.
Through the improved encryption storage, in the ordinary storage, the real encrypted value does not exist in the storage space, the real encrypted value is only temporarily recovered when in use and is destroyed after use, so that the existence duration of the real value is reduced, and the safety of data is improved.
The second implementation:
the embodiment of the invention provides a private data 31 sharing method based on a block chain, which is a specific improvement on federal learning, and combines the characteristics of a neural network model established in the federal learning to further hide the private data 31 so as to further improve the security of the private data 31. The data processing model 41 is a neural network model, the result 42 to be obtained by the participant is the neural network model itself trained by the private data 31, and the bridge server 20 extracts the weight coefficients of the neural network model to form a weight coefficient vector; and comparing the distance between the front weight coefficient vector and the rear weight coefficient vector of the data row substituted into the neural network, if the distance is less than a preset threshold value, the data row is not charged, and if the distance is more than or equal to the preset threshold value, the data row is charged.
Referring to fig. 9, a method for storing the encrypted private data 31 in a plurality of data stations 10 in a distributed manner includes:
step S201) the bridge server 20 extracts the hash value of the data line and assigns an identifier 314 to the data line;
step S202) copying a plurality of copies for each data line of the privacy data 31, wherein the number of the copies is the same as that of the data stations 10;
step S203), dividing the original value of the numerical field 311 into a plurality of addends;
step S204) respectively distributing the addends to a plurality of copies;
step S205) sends the copies to the data stations 10 respectively, sends the identifications 314 of the data lines to the data stations 10, and the data stations 10 encrypt the copies and store the encrypted copies in association with the identifications 314. If 10 ten thousand yuan of bank deposit is divided into 3 addends, namely 2 ten thousand yuan, 3 ten thousand yuan and 5 ten thousand yuan, which are respectively allocated to 3 data stations 10, the deposit stored in each data station 10 is not the real deposit balance, and the real deposit can be obtained only if all the data of the 3 data stations 10 are leaked. It is clear that the probability of all 3 data stations 10 being broken by an attacker is extremely low.
Referring to fig. 10, when the data demander 40 submits the data processing model 41 to the bridge server 20, if the data processing model 41 is declared to be the neural network model, the bridge server 20 and the data station 10 execute the following steps:
step S301) reading the 0 th layer and the 1 st layer of the neural network model;
step S302) according to the related numerical type field 311 obtained by the layer 0, the related numerical type field 311 and the identification 314 of the data line are sent to the data station 10;
step S303) extracting neurons and weight coefficients of the layer 1;
the bridge server 20 performs the following steps for each neuron of layer 1 in turn:
step S304) enumerating the connected input neurons, and obtaining corresponding field names;
step S305) transmitting the field name, the corresponding weight coefficient, the offset value, and the excitation function to the data station 10;
step S306), if the field is numerical, the data station 10 multiplies the copy value corresponding to the stored field name by the weight coefficient and feeds back the product to the bridge server 20;
step S307), the bridge server 20 adds the feedback values received by the same field, adds the offset value, substitutes the value obtained by the excitation function, and outputs the value as the neuron;
step S308) if the field is of the tag type, directly feeding back the tag value corresponding to the stored field name to the bridge server 20.
In this embodiment, the established neural network model is exemplified, and a 3-layer fully-connected neural network model is established. The input layer, namely the 0 th layer, comprises bank deposit, historical loan amount, historical monthly repayment average value, borrower scholastic and current loan amount, wherein the borrower scholastic uses 0-4 to respectively represent the under-special, home, research student and doctor scholastic. I.e., layer 0 has a total of 5 neurons. The middle layer, layer 1, includes 3 neurons and the firing function uses a sigmoid function of the form 1/1+ exp (-x). 0-5 are used to represent the equal-amount principal, the monthly payment due, the payment in advance for partial loan, the payment in advance for the entire loan and the payment mode with the loan. Layer 2 is the output layer that will output the probability of each result 42 using the Softmax function. A loss function is established which is the absolute value of the difference between the probability of outputting the correct result 42 and 1. Since the neural network model in this embodiment is fully connected, i.e., each neuron in 3 neurons of layer 1 is connected to 5 neurons of layer 0. For the 1 st neuron of the 1 st layer, the x value is equal to the values v 1-v 5 of the 5 neurons of the 0 th layer, and the x value is multiplied by a weight coefficient w 1-w 5 respectively, and then an offset value b1 is added, namely x = v1 w1+ v2 w2+ v3 w3+ v4 w4+ v5 w5+ b 1. And then substituting x into the sigmod function to obtain a result, namely the output of the 1 st neuron of the layer 1. In the embodiment, v 1-v 5 are divided into several addends and stored in different data stations 10 respectively. That is, v1= v11+ v12+ v13, where v11, v12, and v13 are stored in the 3 data stations 10, respectively. The same applies to v 2-v 5. For the first data station 10, v11, v21, v31, v41 and v51 are stored, after all weight coefficients are obtained, namely w1 to w5, v11 w1, v21 w2, v31 w3, v41 w4 and v51 w5 are calculated, and the result is signed and sent to the bridge server 20. After receiving all the data returned by the data station 10, the bridge server 20 calculates the sum of all the returned values, adds b1 to the sum, which is the input value x of the 1 st neuron at the layer 1, and substitutes the obtained value into the excitation function sigmiod, which is the output of the 1 st neuron at the layer 1.
After the output of 3 neurons in layer 1 is obtained by the same method, the subsequent calculation is the same as that of the neural network in the prior art. The output result of the output layer can be obtained. After the loss function value is obtained, the gradient is solved, and the weight coefficient is updated. The weight coefficients are sent to the data station 10 and the data station 10 is instructed to call the next data line and calculate according to the same method. The loss function value and gradient will be obtained again and the weight coefficients adjusted further. Until the input of all data lines is completed.
In the process provided by the embodiment, the real values of the privacy data 31, namely v 1-v 5, are not directly restored all the time. And the split addend = participates in calculation, so that the solution of the neural network model is completed. Because the real value is never restored, the real value is effectively hidden, and the privacy and the safety of the real value are greatly improved.
Compared with the first embodiment, in the embodiment, when training of the data value neural network model is performed, the bridge server 20 disassembles the neural network model, the layer 0 and the layer 1 are handed to the data station 10 for execution, the data station 10 directly returns the output value of the layer 1 neuron, and the bridge server 20 processes the subsequent layers, so that the real value does not need to be restored in the numerical field 311 in the whole process of training of the neural network model, the real value is hidden, and the safety and the privacy of data are improved.
The embodiment has a further improved implementation mode to deal with the non-fully connected neural network model. The bridge server 20 checks the connection status of the neural network model, and if it is found that the layer 1 neuron is connected to only the neuron of one input layer, it is possible to restore the true value of the private data 31. I.e. when the weight factor happens to be 1.
The step S203) is modified to: in the numerical field 311 in the row data line, the normalized value of the numerical field 311 is divided into 1 st to N th powers, and the original value divided into 1 st to N th powers is divided into a plurality of addends. Step S204) correspondingly allocating the addends of the split of the original value from the power of 1 to the power of N to a plurality of copy storages. Namely, as soon as the data station 10 stores v111, v112, v113, v114, …, v11N, v211, v212, …, v21N, … and v51N, the last digit of the subscript indicates the power of the original value.
If there is no connection between the layer 1 neurons and the neurons of only one input layer, no improvement processing is performed, otherwise, when the data station 10 is notified of the calculation of the output of the layer 1 neurons connected to only one input layer neuron, the following method is performed:
the data stations 10 respectively call the stored original values to calculate the addend split from the power of 1 to the power of N, and the coefficient of each item in the Taylor expansion formula is obtained according to the Taylor expansion of the exp (x) function. The coefficients of x to the power 0 to the power N in the taylor expansion of the exp (x) function are: 1,1,1/2!,1/3!, …, 1/N!. The plus sign before the coefficient of the odd item is changed into minus sign, and then the stored addends from 1 power to N power, v111, v112, … and v11N are substituted into the modified Taylor expansion, and the result is encrypted and sent to the bridge server 20. The other data stations 10 do the same. The sum finally obtained by the bridge server 20 is the value of the original value exp (-x), and is directly substituted into the excitation function as the output of the neuron. The excitation function, i.e. sigmoid function, i.e. 1/1+ exp (-x), can directly obtain the output of layer 1 neurons. The offset value may be obtained by multiplying the value of exp (-x) by exp (-b 1). Since the taylor expansion cannot be performed indefinitely, the obtained value has a certain error, and the error is mainly ensured to be within an allowable range. When the value of N is 100, higher precision can be obtained, and the general requirements are met. And the value of N is increased, and obvious change of the computational complexity can not be brought. Meanwhile, inaccurate values can simultaneously achieve the purpose of hiding real values.
In the present embodiment, it takes a certain time to process the private data 31 only when the data is first submitted to the bridge server 20. The Taylor expansion is used for calculating the neural network model, and functions which only contain original values from the 1 st power to the N th power can be directly summed and calculated without restoring the original values. Such as commonly used sine functions, cosine functions, and natural logarithm functions. In the data processing model 41 submitted by the data consumer 40, these functions that can be taylor expanded are identified by the bridge server 20 and the data station 10 is informed to obtain the values of the functions directly by summation in this way, which can reduce the number of times to restore the real values. Similarly, the use of secure multiparty computing between data stations 10 permanently avoids the restoration of real values. But the efficiency of secure multiparty computing is very low and the requirements on communication bandwidth are also high. Under the condition of allowing Taylor expansion, the method can greatly accelerate the calculation speed, has low requirement on communication, can replace part of safe multi-party calculation, and helps to improve the execution efficiency of the data processing model 41.
And (3) implementation:
a private data 31 sharing method based on a block chain is applied to sharing of user data among city banks. The urban bank is small in scale, and the number and types of users are not abundant. But the data of the bank users belong to the private data 31, and the data complementation between the city banks can not be carried out. The data of the urban bank is difficult to provide value for the bank with larger scale, so the cost for establishing data cooperation between the urban bank and the bank with larger scale is higher.
A plurality of city banks construct private data 31 sharing through the method, and the problems that data are not abundant or the cost for acquiring the data is high can be solved. The data station 10 is respectively constructed by a plurality of city banks, or one data station 10 is jointly established by a plurality of city banks and is jointly maintained and managed. The city bank sends the service data to the bridge server 20, and the service data is stored in the plurality of data stations 10 in a dispersed manner after being preprocessed. The data storage method is the same as the second embodiment.
The preprocessing includes adding a data description 32 of the private data 31, provided by the city bank, as shown in table 4. Some of the privacy data 31 integrity, value boundaries, etc. are computed by the bridge server 20 from the privacy data 31. To avoid that the maximum and minimum values are directly shown, the boundary values of the privacy data 31 of the batch, such as the deposit, should be shown in the form of a label or a theoretical maximum value, and the display should be 0.1 to 9999 ten thousand.
TABLE 4 description of data submitted by Bank XX 32
XX Bank branch
Card data for nearly two years of this branch bank common user account
Data volume: 3 ten thousand lines
Introduction of data: the data is generated by simple preliminary statistics of the card flow data of the user of the common account type in the current bank. This branch is located at The XX district XX of XX city, the user is mainly used as the salary card for residents in 8 kilometers nearby and nearby employees. Sift out accounts Users who are used with very low frequency, which means that less than 10 people run annually and the running amount totals less than 1 thousand. The data is complete and has no missing or missing, the data is true by the card. The data fields include name, age, credit balance, income per month in the last two years, income per year in the last two years Monthly consumption, annual consumption in the last two years, academic calendar and loan data. Wherein, the consumption refers to the fund flow to the goods and service providing body Expenses, user-to-card transfers, credit card repayment, loan repayment, and purchasing financial products do not account for monthly and annual consumption. Deposit balance taking The value range is 0-10000 ten thousand yuan, the age range is 0-150, the monthly income range is 0-1000 ten thousand yuan, and the annual income range is 0-10000 yuan Wan YuanThe monthly consumption value range is 0-1000 ten thousand yuan, and the annual consumption value range is 0-10000 ten thousand yuan …
A city bank establishes a neural network model, inputs of which are age, deposit balance, income every month in the last two years and income every year in the last two years, and outputs monthly consumption amount for predicting consumption enthusiasm of a user, thereby providing reference for granting consumption amount to a credit card of the user. And also as a reference for the credit card swiping preferential policy making. However, the accuracy of the neural network model is low due to the insufficient data volume of the bank in city a.
It submits a detection model 43 to the bridge server 20, which detection model 43 checks whether the privacy data 31 contains the age, the deposit balance, the last two years of monthly income, the last two years of yearly income and the monthly expenditure amount. If the data contains the data, the age distribution, the distribution of the deposit balance and the distribution of the monthly consumption amount are checked, and the situation that only the user data of a certain age group is contained in the data in a centralized mode is avoided. I.e. the boundary values of these terms span over the threshold provided by the a city bank, while the distribution of the data should be as uniform as possible. If all fields are contained and the span exceeds a threshold value, a basic score is given, and then the more uniform the distribution is, the more the score is added, so that the final value index 44 is obtained. The value index 44 exceeds the base score, the bridge server 20 will notify city bank a that it is recommended to purchase the data.
The city a bank hereby finds several batches of private data 31 and makes purchases. The purchased data was trained on the neural network model. The specific training method is as described in example two. The accuracy of the finally trained neural network model obtained through test data tests reaches 70%. At this time, if the original probe model 43 is still used, many pieces of purchase recommendation information of the private data 31, which cannot be further improved in accuracy, are received. Making city a bank difficult to make decisions. If the purchase of data is not continued, the 70% accuracy is low. If the data is continuously purchased, the fund is consumed, and even if new data is purchased, the accuracy of the neural network model is improved after the training is difficult to guarantee.
City a bank thus submits the trained neural network model directly to the bridge server 20 as the probe model 43.
If a city bank newly submits a batch of private data 31 and all corresponding fields are included. The bridge server 20 substitutes the data lines into the neural network model in sequence, and finds that the prediction accuracy of the neural network model is low after several data line tests. It is shown that these data are not adapted by the neural network model and can help to improve accuracy. The bridge server 20 will send purchase recommendation information to city a bank. On the contrary, if the neural network model can accurately predict a correct result after the data row is substituted, the value of the data is not high. Thus helping the bank of city A to screen valuable data and reducing the time and energy consumption of the bank of city A. Eventually city a bank gets a neural network model with a prediction accuracy of 93% sufficient to carry out business use, thus withdrawing the probing model 43 from the bridge server 20. No further attempts are made to purchase data for this service until, in actual use, the accuracy of the neural network model drops below a certain threshold. Indicating that the user's consumption habits have changed, the city a bank resubmits the exploration model 43 looking for appropriate data to purchase.
The above embodiment is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the technical scope of the claims.

Claims (7)

1. A private data sharing method based on a block chain is characterized in that,
the method comprises the following steps:
the method comprises the steps that a bridge server and a plurality of data stations are built, a data source side with private data submits the private data to the bridge server, the bridge server distributes identification for the private data, and the private data are stored on the data stations in a scattered mode after being encrypted;
submitting the data processing model to a bridge server by a data demand party needing to use data, wherein the bridge server establishes a historical execution data hash value table for each data processing model;
the bridge server receives a data introduction written by the participants, associates the data introduction with the identification and then displays the data introduction;
the data source side and the data demand side both open virtual accounts on the block chain;
when a data demand party wants to use the private data, a private data identifier is appointed, and tokens with corresponding quantity are transferred to an account bound by a bridge server;
the bridge server is communicated with a plurality of data stations and recovers private data;
verifying whether the hash value of the data line of the private data exists in the corresponding historical execution data hash value table or not, if so, skipping the data line, and if not, adding the hash value of the data line into the historical execution data hash value table, inputting the data line into a data processing model and bringing the data line into charging;
destroying the recovered private data, feeding back the result of the data processing model to the data demand party, transferring the token corresponding to the charging to the virtual account of the data source party, and returning the rest tokens;
the data introduction comprises a plurality of example data and data descriptions, and the data descriptions comprise the meaning, the format, the value range, the data time range and the data source of each field of the privacy data;
the method comprises the steps that a data demander publishes a detection model through a bridge server, the input of the detection model is privacy data, the output of the detection model is a value index, when the bridge server receives new privacy data, the detection model is executed, and if the value index output by the detection model exceeds a preset threshold value, the identification and the value index of the privacy data are transmitted to the data demander;
the bridge server runs a learning detection model, the learning detection model establishes a detection neural network model, the input of the detection neural network model is a plurality of fields of private data, the output of the detection neural network model is a label field of the private data, the detection neural network model has initialized weight coefficients, the learning detection model divides the private data into a training set and a testing set, after the data in the training set is input into the detection neural network model, the accuracy is obtained through the testing set, the learning detection model discloses the input field name, the output field name and the accuracy of the detection neural network model, the input field and the output field of the detection neural network model are randomly selected from the fields of the private data by the learning detection model as initial values, and a data demand party submits a request for changing the input field and the output field selected by the learning detection model, when the number of data demanders submitting the same change request exceeds a preset threshold, changing the input field and the output field of the detection neural network model into a field specified by the change request;
the method for dispersedly storing the encrypted private data in a plurality of data stations comprises the following steps:
the bridge server extracts the hash value of the data line and distributes an identifier for the data line;
copying a plurality of copies for each data line of the private data, wherein the number of the copies is the same as that of the data stations;
dividing the original value of the numerical type field into a plurality of addends by the numerical type field in the row data line, and respectively distributing the addends to a plurality of copies;
and the data station encrypts the copies and stores the encrypted copies in association with the identifiers.
2. The method for sharing private data according to claim 1,
the bridge server adds a normalization field to the numerical field according to a field dereferencing range recorded in the data introduction, the bridge server runs with a statistical detection model, the statistical detection model counts the integrity of the privacy data, the boundary value of the numerical field, the average value of the numerical field, the variance of the numerical field and the dispersion degree of the tag values, the integrity of the privacy data is the percentage of the data values which are not empty to the total data value, and the dispersion degree of the tag values is the ratio of the quantity value with the least number of the field values of the tag types to the quantity value with the most number of the value.
3. The method for sharing private data according to claim 1 or 2,
the detection model comprises a neural network model trained by a data demand party, a data line of the private data is input into the neural network model, and if the number of times that an output result of the neural network model is inconsistent with the data line exceeds a preset threshold value, an identifier of the private data is transmitted to the data demand party.
4. The method for sharing private data according to claim 1 or 2,
the method for dispersedly storing the encrypted private data in a plurality of data stations comprises the following steps:
the bridge server extracts the hash value of the data line and distributes an identifier for the data line;
the bridge server cuts the encrypted data line into a plurality of subdata with preset length, if the length of the last subdata is not enough, 0 is supplemented, and the number of the subdata is matched with the number of the data stations;
distributing the sub data to data stations and sending the mark to the data stations;
the data station is provided with a plurality of storage areas, each storage area comprises a plurality of storage blocks, the space of each storage block is matched with the space required by the subdata and the identifier, and the identifier occupies a preset length space and is positioned in front of the subdata during storage;
the data station stores a plurality of exchange pairs, each exchange pair comprises two binary sequences, the data station additionally stores latest received subdata in an idle storage block of a current storage area, checks whether the latest subdata and the last subdata have an exchange pair aligned according to bits, exchanges the aligned exchange pairs with contents in a storage position if the latest subdata and the last subdata exist, and does not exchange if the exchange pair is located in an area corresponding to an identifier;
if the storage area has no free storage block, storing the subdata in the first storage block of the new storage area without checking the exchange pair;
when the bridge server restores the data line, the identification of the data line is sent to the data station, and the data station finds the storage block of the subdata storage according to the identification;
firstly, downwards checking whether an aligned exchange pair exists between the data of the next storage block and the data of the next storage block, and if so, continuously checking whether an aligned exchange pair exists between the next storage block and the next storage block at the position of the checked exchange pair;
if yes, continuing to check whether an aligned exchange pair exists or not at the newly checked exchange pair position until the exchange pair is not checked or the last storage block of the storage area is reached;
then detecting whether an aligned exchange pair exists with the data of the previous storage block, and if so, exchanging the storage position of the aligned exchange pair;
copying a copy of all the storage blocks which are checked to exist the exchange pairs downwards, and starting from the last storage block, sequentially exchanging and aligning the storage positions of the exchange pairs to recover data until all the exchange pairs are recovered;
and feeding back the data in the storage block corresponding to the identification in the copy to the bridge server.
5. The method of claim 4, wherein the private data sharing method based on block chain,
the data station establishes an exchange pair table, the data station associates a plurality of exchange pairs with each storage area, the exchange pair table records a storage area identifier and an exchange pair, and the storage area associates a plurality of exchange pairs, so that a plurality of records are established in the exchange pair table.
6. The method for sharing private data according to claim 1 or 2,
the data processing model is a neural network model, the result to be obtained by the participant is the neural network model trained by private data, and the bridge server extracts the weight coefficient of the neural network model to form a weight coefficient vector;
and comparing the distance between the front weight coefficient vector and the rear weight coefficient vector of the data row substituted into the neural network, if the distance is less than a preset threshold value, the data row is not charged, and if the distance is more than or equal to the preset threshold value, the data row is charged.
7. The method for sharing private data according to claim 1,
when the data demand party submits the data processing model to the bridge server, if the data processing model is declared to be the neural network model, the bridge server and the data station execute the following steps:
reading layer 0 and layer 1 of the neural network model;
obtaining related numerical fields according to the layer 0, sending the related numerical fields and the identification of the data line to a data station, and extracting neurons and weight coefficients of the layer 1;
the bridge server performs the following steps for each neuron of layer 1 in turn:
enumerating connected input neurons to obtain corresponding field names;
transmitting the field name, the corresponding weight coefficient, the offset value and the excitation function to a data station;
if the field is a numerical value type, the data station multiplies a copy value corresponding to the stored field name by a weight coefficient and feeds back the multiplied copy value to a bridge server, and the bridge server adds the feedback values received by the field and adds an offset value to the added feedback values and substitutes the added feedback values into a value obtained by an excitation function to be used as the output of the neuron;
and if the field is of a label type, directly feeding back a label value corresponding to the stored field name to the bridge server.
CN202110878099.5A 2021-08-02 2021-08-02 Private data sharing method based on block chain Active CN113343284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110878099.5A CN113343284B (en) 2021-08-02 2021-08-02 Private data sharing method based on block chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110878099.5A CN113343284B (en) 2021-08-02 2021-08-02 Private data sharing method based on block chain

Publications (2)

Publication Number Publication Date
CN113343284A CN113343284A (en) 2021-09-03
CN113343284B true CN113343284B (en) 2021-11-02

Family

ID=77480521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110878099.5A Active CN113343284B (en) 2021-08-02 2021-08-02 Private data sharing method based on block chain

Country Status (1)

Country Link
CN (1) CN113343284B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792337B (en) * 2021-09-09 2023-08-11 浙江数秦科技有限公司 Qualification auditing system based on privacy calculation
CN113792311A (en) * 2021-09-09 2021-12-14 浙江数秦科技有限公司 Neural network model sharing method based on block chain
CN113780552B (en) * 2021-09-09 2024-03-22 浙江数秦科技有限公司 Safe multiparty computing method for bidirectional privacy protection
CN116341014B (en) * 2023-05-29 2023-08-29 之江实验室 Multiparty federal private data resource interaction method, device and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862548A (en) * 2017-11-03 2018-03-30 国云科技股份有限公司 A kind of broad range of data sharing method based on block chain
CN109729168A (en) * 2018-12-31 2019-05-07 浙江成功软件开发有限公司 A kind of data share exchange system and method based on block chain
CN111259430A (en) * 2020-02-14 2020-06-09 北京哥伦布时代科技发展有限公司 Data processing method and device, electronic equipment and computer storage medium
CN112347470A (en) * 2020-11-27 2021-02-09 国家电网有限公司大数据中心 Power grid data protection method and system based on block chain and data security sandbox

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862548A (en) * 2017-11-03 2018-03-30 国云科技股份有限公司 A kind of broad range of data sharing method based on block chain
CN109729168A (en) * 2018-12-31 2019-05-07 浙江成功软件开发有限公司 A kind of data share exchange system and method based on block chain
CN111259430A (en) * 2020-02-14 2020-06-09 北京哥伦布时代科技发展有限公司 Data processing method and device, electronic equipment and computer storage medium
CN112347470A (en) * 2020-11-27 2021-02-09 国家电网有限公司大数据中心 Power grid data protection method and system based on block chain and data security sandbox

Also Published As

Publication number Publication date
CN113343284A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN113343284B (en) Private data sharing method based on block chain
Ackerberg et al. Quantifying equilibrium network externalities in the ACH banking industry
CN113268760B (en) Distributed data fusion platform based on block chain
CN111062805B (en) Financial service management platform for supply chain
CN112685766B (en) Enterprise credit investigation management method and device based on block chain, computer equipment and storage medium
US20200005410A1 (en) System and Method for Facilitating Legal Review for Commercial Loan Transactions
CN112633780A (en) Method for processing carbon resource based on climate chain, related device and storage medium
CN110415002A (en) Customer behavior prediction method and system
US11934976B2 (en) Method, device and program for controlling specialist platform
Hacioglu et al. Crafting performance-based cryptocurrency mining strategies using a hybrid analytics approach
Akbarovna OPPORTUNITIES FOR THE DEVELOPMENT OF CRYPTOCURRENCIES IN THE DIGITAL ECONOMY
CN114862587A (en) Abnormal transaction account identification method and device and computer readable storage medium
CN113657609B (en) Data management method and system based on block chain and federation transfer learning
Kumar et al. Banking 4.0: The era of artificial intelligence‐based fintech
Hardjanto Digital economy and blockchain technology using the SWOT analysis model
Naylor et al. Corporate planning models
US20200175562A1 (en) Gem trade and exchange system and previous-block verification method for block chain transactions
CN113792044A (en) Data fusion platform and neural network model hosting training method
CN115660814A (en) Risk prediction method and device, computer readable storage medium and electronic equipment
CN114819197A (en) Block chain alliance-based federal learning method, system, device and storage medium
Pokrovskaia et al. Blockchain and smart contracting in the context of digital transformation of service
Kaila Teaching alternative finance curriculum to undergraduates, graduates, and executives
CN114595909A (en) Electricity charge recovery risk assessment method and system
Zhu et al. A lending scheme based on smart contract for banks
CN113779622B (en) Safety data fusion system suitable for multiple application scenes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant