CN113377878A - Block chain-based hot data sharing platform - Google Patents
Block chain-based hot data sharing platform Download PDFInfo
- Publication number
- CN113377878A CN113377878A CN202110918986.0A CN202110918986A CN113377878A CN 113377878 A CN113377878 A CN 113377878A CN 202110918986 A CN202110918986 A CN 202110918986A CN 113377878 A CN113377878 A CN 113377878A
- Authority
- CN
- China
- Prior art keywords
- data
- node
- reserved
- value
- copies
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of block chains, in particular to a hot data sharing platform based on a block chain, which comprises a synchronous node, a reserved node, a plurality of cooperative nodes and an access node, wherein the synchronous node acquires hot data, makes a plurality of copies and associates line numbers, divides the true value of a numerical field of a data line into a plurality of addends, reserves one copy as the reserved copy, distributes the rest copies to the cooperative nodes for storage, periodically synchronizes the hot data by the synchronous node, calculates the variable quantity of the numerical field, synchronizes the variable quantity to the corresponding reserved copy, a data demand party submits a data calling request to the access node, the reserved node and the cooperative nodes acquire the result of a data processing model, and sends the result to the access node which sends the result to the data demand party. The invention has the beneficial effects that: the privacy and the safety of the hot data are effectively protected, the timeliness of the data is improved, and the execution efficiency of the data processing model is accelerated.
Description
Technical Field
The invention relates to the technical field of block chains, in particular to a hot data sharing platform based on a block chain.
Background
With the rapid and intensive development of information technology, the human society is shifting from the industrial society to the information society. A large number of enterprises and organizations input a large amount of business data into an information management system in a data mode through an electronic office system. The method not only can more efficiently and reasonably configure the human, financial and material resources of the enterprise and improve the utilization rate of production resources, but also can form a large amount of business data. Wherein data may be divided into hot data, warm data, and cold data according to how frequently the data is used. Hot data refers to data that is often queried or updated. Cold data refers to data that is used very infrequently, such as archived data. Cold data is typically suitable for off-line analysis, such as model training in machine learning or big data analysis. The value carried by the information can be fully utilized through data analysis, and the production efficiency and the resource utilization rate of enterprises are greatly released. However, the industry is highly divided at present, and enterprises only master own data and are difficult to effectively dig out the value of the data. Due to reasons of privacy protection, lack of benefit guarantee and the like, effective circulation of data among enterprises is difficult to form, and the existing data mainly exists in an isolated island form. Some data sharing systems currently exist, involving only the trading and sharing of archived data. The value of the timeliness data cannot be mined, and further development of data application is limited.
Chinese patent CN112100279A, published as 2020, 12/18, discloses a data sharing system based on a block chain, which includes a block chain, a processor, and a memory storing a computer program, where the block chain includes a data storage unit, which represents a first data storage unit, and is used to store a first event information, which represents an event type id, and a field of a first data table includes a node id, a content id, content data, and a content state value, and executes: step S1, receiving event information to be stored sent by a first node; step S2, analyzing a first event type id, a first node id, a first content id and first content data from the event information to be stored, traversing a first data table, and judging whether the event information corresponding to the first content id is stored, if so, executing step S3, otherwise, executing step S5; step S3, traversing all content data corresponding to the stored first content id in the first data table, and determining whether content data identical to the first content data exists, if so, executing step S4, and if not, executing step S5; step S4, adding a first preset step value to the content status value corresponding to the content data identical to the first content data, and ending the process; and step S5, correspondingly storing the first event type id, the first node id, the first content id and the first content data into a first data table. The technical scheme improves the efficiency and accuracy of data sharing, but the method cannot effectively protect the privacy and the safety of the data and is only suitable for sharing the data among a small number of associated enterprises.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the technical problem of the lack of a hot data sharing system for effectively protecting data privacy at present. The hot data sharing platform based on the block chain can apply hot data to data sharing under the condition of ensuring data safety, and further releases data value.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a blockchain based thermal data sharing platform comprising: the synchronous node and the reserved node are arranged on a data source side, the synchronous node synchronously obtains hot data from a data source side system, line numbers are distributed to data lines of the hot data, the synchronous node makes a plurality of copies for the data lines, the copies are connected with the line numbers, the real value of a numerical value field of the data lines is divided into a plurality of addends, the addends are respectively distributed to the copies, the reserved node keeps one copy as a reserved copy, the rest copies are distributed to the cooperative nodes for storage, the real value of a non-numerical value field is stored by the reserved copy, the rest copies store confusion values, the synchronous node periodically synchronizes the hot data, calculates the variation of the numerical value field, synchronizes the variation into the corresponding reserved copy, and directly updates the new value of the non-numerical value field into the reserved copy, the method comprises the steps that a data demand party submits a data calling request to an access node, the data calling request comprises a line number and a data processing model, the access node is provided with a virtual account on a block chain, the data demand party transfers a corresponding amount of tokens to the virtual account of the access node, the access node sends the data calling request to a reservation node corresponding to a data source party, the reservation node and a plurality of cooperation nodes form safe multi-party calculation, the result of the data processing model is obtained, the result is sent to the access node, the access node sends the result to the data demand party, and the corresponding amount of tokens are transferred to the virtual account of the data source party.
Preferably, the synchronization node obtains a source number from the access node, the synchronization node sends a column structure of a data row to the access node to obtain a class number and a field number, the synchronization node assigns a unique serial number to the data row, the source number and the unique serial number form a row number of the data row, the access node discloses the data column structure and the row number, the cooperative node opens up a plurality of storage blocks in each column structure storage area, the size of each storage block is matched with the maximum occupation space of a copy, the cooperative node stores the row number associated with the copy in the storage blocks, and when the storage blocks in the storage areas are full, new storage areas and storage blocks are opened up for the column structure of the copy.
Preferably, the cooperative node is provided with a longitudinal displacement sequence for each storage area, the longitudinal displacement sequence records longitudinal offset of each column, the storage blocks in the storage areas have sequence numbers, when a copy is stored, the longitudinal offset corresponding to the column of the copy is obtained, the corresponding column is stored in the storage block corresponding to the longitudinal offset, when the copy is restored, the storage block corresponding to each column is searched downwards according to the longitudinal displacement sequence, the value of each column of the copy is obtained, and the restoration of the copy is completed.
Preferably, the synchronization node establishes a synchronization table, stores the line number associated with the variation in the synchronization table, and the reservation node periodically checks the synchronization table, synchronizes the variation stored in the synchronization table to a reservation copy corresponding to the line number, and deletes the variation from the synchronization table after synchronization; the data calling request also comprises a synchronous checking flag bit, if the synchronous checking flag bit is 1, the reservation node firstly checks a synchronous table, if the variable quantity exists, the variable quantity is synchronized to a reserved copy, then the safe multi-party calculation is continuously established, and the result of the data processing model is obtained.
Preferably, the synchronization node lists non-numerical fields of data rows, a comparison table is manufactured, the comparison table comprises non-numerical field values and substitution numbers, the non-numerical field values are replaced by substitution numbers, the comparison table is sent to the reservation node, the substitution numbers are divided into a plurality of addends, the addends are respectively distributed to the plurality of copies, the synchronization node periodically synchronizes thermal data, the difference value of the corresponding substitution numbers before and after the change of the non-numerical fields is calculated, and the difference value is synchronized to the corresponding reservation copies.
Preferably, the access node discloses a comparison table, the data processing model is a set of calculation formulas, and before the reservation node constructs the secure multi-party calculation, the following steps are performed on the data processing model: extracting a weighted summation calculation formula in the data processing model; sending the weighted sum calculation formula to a plurality of cooperative nodes, calling corresponding copies by the cooperative nodes, substituting the addends stored in the copies into the weighted sum calculation formula to obtain intermediate sums, and sending the intermediate sums to the reserved nodes; the reservation node substitutes the addend stored in the reservation copy into the weighted sum calculation formula to obtain a reservation sum, and adds the reservation sum and all intermediate sums sent by the cooperative node to obtain a result of substituting the real value into the weighted sum calculation formula; and substituting the result of the weighted summation calculation formula into the data processing model, and constructing a safe multiparty calculation solution for the non-weighted summation calculation formula.
Preferably, the synchronization node calculates the true value of the numerical field and the value from the power of 2 to the power of N, and newly establishes a power of 2 to power of N for each numerical field in the copy; splitting the value from the power of 1 to the power of N of the real value of the numerical type field into a plurality of addends respectively, and distributing the addends to the reserved copies and the rest copies for storage respectively; before the reservation node constructs the safe multi-party calculation, the following steps are executed to the data processing model: extracting a unitary calculation formula which takes a real value as input and can be subjected to Taylor expansion, and expanding the unitary calculation formula into a Taylor expansion formula which is a weighted sum calculation formula of a value from 1 power to N power of the real value; sending the weighted sum calculation formula to a plurality of cooperative nodes, calling corresponding copies by the cooperative nodes, substituting the 1 st-power addend to the N-power addend stored in the copies into the weighted sum calculation formula to obtain an intermediate sum, and sending the intermediate sum to the reserved node; the reserved node substitutes the 1 st-order to N-order addends stored in the reserved copy into the weighting sum calculation formula to obtain a reserved sum, and the reserved sum is added with all intermediate sums sent by the cooperative node to obtain an approximate result of substituting the true value into the unitary calculation formula; substituting the approximate result of the unary calculation formula into the data processing model, and constructing a safe multiparty calculation solution for the calculation formula of the non-weighted summation.
Preferably, the retaining node judges an error of an approximate result obtained by the unary calculation formula through calculation by using the taylor expansion formula, if the error exceeds a preset percentage, the approximate result is discarded, the secure multi-party calculation is reestablished to solve the unary calculation formula, and if the error does not exceed the preset percentage, the approximate result is retained; the reserved node executes the following steps to judge the error: the reserved node reversely calculates an approximate true value according to the approximate result and the unitary calculation formula; and multiplying the approximate real value by a coefficient k, wherein k =1+ delta and delta are preset percentages, substituting the approximate real value corrected by using the coefficient k into a unitary calculation formula to obtain a corrected approximate result, and judging that the error exceeds the preset percentages if the difference between the corrected approximate result and the absolute values of the approximate results exceeds the preset percentages.
Preferably, the reservation node runs a privacy security check module, the privacy security check module enumerates outputs of the data processing model, enumerates input columns related to each output, takes the outputs and the related input columns as submodels, if the submodels only contain one input column, the privacy security check is not passed, the reservation node rejects the data call request, and notifies the access node to withdraw the data call request.
Preferably, the access node designates a plurality of columns as external primary keys, the access node and the synchronization node agree on salt, the synchronization node extracts a hash value after adding salt to the value of the external primary key of the data row, and the access node displays the hash value in association with the row number.
The substantial effects of the invention are as follows: 1) the hot data are stored in a scattered manner through the cooperative nodes and the retention nodes, and the data processing model is adopted to provide the result obtained after the hot data are processed for the calling requester, so that the calling requester can obtain the value of the data without contacting the hot data, the privacy safety of the hot data is effectively protected, and a data island is broken; 2) the method comprises the steps that a reservation node is arranged on a data source side, when hot data are updated, the data are updated and synchronized into a reservation copy through the reservation node, so that a cooperative node does not need to be changed, a calling request side can obtain the latest data to execute a data processing model, and the timeliness of the data is effectively improved; 3) by the improved collaborative computing mode, the quantity of multi-party safe computing needing to be established in the execution data processing model is reduced, and the execution efficiency of the data processing model is greatly improved.
Drawings
Fig. 1 is a schematic structural diagram of a thermal data sharing platform according to an embodiment.
Fig. 2 is a schematic diagram of a cooperative node storage according to an embodiment.
FIG. 3 is a diagram illustrating synchronization of thermal data by a synchronization node according to an embodiment.
FIG. 4 is a diagram illustrating a method for implementing a data processing model by a reservation node according to an embodiment.
Fig. 5 is a schematic diagram of a data processing model method performed by a reservation node according to the second embodiment.
FIG. 6 is a diagram illustrating an error determination method according to the second embodiment.
Fig. 7 is a schematic diagram of a privacy security check method according to a second embodiment.
Wherein: 10. synchronization node, 11, line number, 12, synchronization table, 20, reservation node, 30, collaboration node, 31, storage area, 32, storage block, 40, access node, 50, data processing model, 51, input column, 52, intermediate value, 53, output, 60, data demander, 70, result.
Detailed Description
The following provides a more detailed description of the present invention, with reference to the accompanying drawings.
The first embodiment is as follows:
referring to fig. 1, a platform for sharing hot data based on blockchains includes: the synchronous node 10, the reserved node 20, the cooperative nodes 30 and the access node 40 are arranged on a data source side, the synchronous node 10 synchronously obtains hot data from a data source side system, a line number 11 is distributed to a data line of the hot data, the synchronous node 10 makes a plurality of copies for the data line, the copies are connected with the line number 11, the real value of a numerical value field of the data line is divided into a plurality of addends, the addends are respectively distributed to the copies, the reserved node 20 reserves one copy as a reserved copy, the rest copies are distributed to the cooperative nodes 30 for storage, the real value of a non-numerical value field is stored by the reserved copy, the rest copies store confusion values, the synchronous node 10 periodically synchronizes the hot data, calculates the variation of the numerical value field, synchronizes the variation to the corresponding reserved copy, directly updates the new value of the non-numerical value field to the reserved copy, the data demanding party 60 submits a data calling request to the access node 40, the data calling request comprises a line number 11 and a data processing model 50, the access node 40 is provided with a virtual account on a block chain, the data demanding party 60 transfers a corresponding amount of tokens to the virtual account of the access node 40, the access node 40 sends the data calling request to a reservation node 20 of a corresponding data source party, the reservation node 20 and a plurality of cooperation nodes 30 construct a safe multi-party calculation, a result 70 of the data processing model 50 is obtained, the result 70 is sent to the access node 40, the access node 40 sends the result 70 to the data demanding party 60, and the corresponding amount of tokens are transferred to the virtual account of the data source party. The synchronization node 10 is connected to a data source side system to directly obtain hot data which may protect privacy or confidential data. Then, the synchronization node 10 splits the hot data into a plurality of copies, so that the individual copies do not contain the private data, and the plurality of copies are respectively distributed to the plurality of cooperative nodes 30 and the retention node 20, thereby effectively improving the security and privacy of the hot data. Only by attacking all the cooperative nodes 30 and the reservation nodes 20, a piece of hot data possibly with private or confidential data can be obtained, the attack cost is high, and the security of the data can be ensured. The value of the confusion value falls within the value range of the true value. To facilitate the data call requester to look up line number 11, access node 40 discloses a description of the data and line number 11. As shown in Table 1, the data description submitted for Bank A, wherein 9025 is the source number of Bank A, and 9025.1-9025.100000 are the line numbers 11 of data.
Table 1 description of data submitted by Bank mail
Bank armor |
Card data for credit card in nearly two years of this branch bank common user account |
Data volume: 10 ten thousand rows |
Line numbering: 9025.1-9025.100000 |
Introduction of data: the data is generated by simple preliminary statistics of the card flow data of the user of the common account type in the current bank. The line is located on the XX road of the XX district of the XX city, and users mainly use nearby residents and nearby employees as salary And (6) placing the card. Users with very low account usage frequency, i.e. less than 10 runs per year and less than 1 thousand runs in total, are screened out. The data is complete and has no missing, and the data is real by using the card. Data fields include name, age, deposit Balance, last two years of monthly income, last two years of annual income, last two years of monthly consumption, last two years of annual consumption, academic calendar and loan data. Wherein, the consumption refers to the capital expenditure of capital flowing to the commodity and service providing body and between the user cards Transfer accounts, credit card repayment, loan repayment and purchasing financing products do not account for monthly consumption and annual consumption. The deposit balance value range is 0-10000 ten thousand yuan, the age value range is 0-150, the monthly income value range is 0-1000 ten thousand yuan, and the annual income The value range is 0-10000 ten thousand yuan, the monthly consumption value range is 0-1000 ten thousand yuan, and the annual consumption is eliminatedThe fee range is 0-10000 ten thousand yuan … |
Cold data refers to archived data that is generally not subject to further change. Such as credit card transaction data, transportation data, and super-consumer data of the user in the last year. The cold data is considered suitable for use as training data for a machine learning model. Because it is stable and the amount of data is typically large.
The hot data refers to data with high access and update frequency, is usually directly stored in the data source side system, and the reading of the hot data relates to the secrecy of the hot data and also relates to the stability of the data source side system. The hot data is deposit data of bank depositors, and the deposit data can be used for providing a lender to check the property condition of the depositor, evaluate the repayment capability of the depositor and judge the loan risk when the depositor transacts loan business. The depositor deposit data filed as cold data has little significance for judging the current asset condition of the depositor.
The cold data can be used to train a neural network model that determines loan risk levels based on depositor property conditions. Therefore, cold data is used for completing the training of machine learning models, such as neural network models, classifiers and the like, and then the latest data obtained by the embodiment is input into the trained machine learning models, so that accurate results can be obtained. Of course, the training of the machine learning model can be directly performed using the present embodiment. If fields that change frequently are avoided, the training of the machine learning model is similar to cold data. After the retention node 20 is provided on the data origin side and the copy is assigned to the cooperative node 30, the change of the hot data is synchronized by using the retention copy, so that the change of the hot data can participate in the calculation of the data processing model 50. For example, the total real-time deposit amount of the depositor at a plurality of banks is obtained through the embodiment as the basis for loan credit granting.
The cooperative node 30 may be different for each synchronization node 10. Several cooperative nodes 30 are created, and a part is selected from all the cooperative nodes 30 and published to the synchronization node 10. The cooperative node 30 obtained by each synchronization node 10 is different. This is equivalent to further adding a layer of password to the data line, and further improving the security of the data line. The synchronous node 10 is independently arranged, so that the load on a data source system is reduced, and the original data is deleted and is not reserved for a long time after the copy making is finished although the original data is obtained by the synchronous node 10. Further, when the synchronization node 10 synchronizes data with the data source system, only the intranet is connected, after the copy is made, the original data is deleted, and then the original data is connected with the reservation node 20 and the cooperative node 30, so that the original data is not exposed to the extranet environment. Reservation node 20 holds the actual value of the complete non-numeric field and thus also needs to secure reservation node 20.
The tokens are used for prepayment, and once data is substituted into the data processing model 50 to obtain result feedback, the payment is automatically carried out, the possibility of delinquent is avoided, the benefit of a data source side is effectively protected, and the enthusiasm of the data source side for sharing hot data is facilitated. The token used in this embodiment is a stable currency, that is, the exchange ratio of the token to the legal currency is fixed. The embodiment is provided with nodes for exchanging the tokens and the legal coins or by virtue of a block chain which is established with the exchange nodes and adopts stable coins.
The method comprises the steps that a synchronous node 10 obtains a source number from an access node 40, the synchronous node 10 sends a column structure of a data row to the access node 40 to obtain a class number and a field number, the synchronous node 10 gives a unique serial number to the data row, the source number and the unique serial number form a row number 11 of the data row, the access node 40 discloses the data column structure and the row number 11, a cooperative node 30 opens up a storage area 31 for each column structure, a plurality of storage blocks 32 are opened up in the storage area 31, the size of each storage block 32 is matched with the maximum occupied space of a copy, the cooperative node 30 stores the copy related row number 11 in the storage block 32, and when the storage block 32 in the storage area 31 is full, a new storage area 31 and a new storage block 32 are opened up for the column structure of the copy. Each synchronization node 10 has a unique source number, the synchronization node 10 can assign a serial number to a data line by itself, and then the source number and the serial number are used as a line number 11 together, so that a globally unique line number 11 can be generated.
Referring to fig. 2, the cooperative node 30 sets a longitudinal shift sequence for each storage area 31, the longitudinal shift sequence records a longitudinal offset of each column, the storage blocks 32 in the storage area 31 have sequential numbers, when a copy is stored, the longitudinal offset corresponding to the column of the copy is obtained, the corresponding column is stored in the storage block 32 corresponding to the longitudinal offset, when the copy is restored, the storage block 32 corresponding to each column is searched downwards according to the longitudinal shift sequence, the value of each column of the copy is obtained, and the restoration of the copy is completed. The composition of the fields of the data row constitutes a column structure, i.e. which fields the data row contains, the data type of each field and the allowed memory space to be occupied together constitute a column structure. The storage area 31 and the storage block 32 corresponding to the column structure are used, so that data can be conveniently searched and read, and the utilization efficiency of the storage space is improved. Meanwhile, the operation and restoration of longitudinal cyclic displacement of the copy are facilitated. The longitudinal displacement sequence is only stored and used by the cooperative node 30, and does not leave the local, so that the safety is high. Even if an attacker obtains the contents of one memory block 32, the memory block 32 does not necessarily store one complete copy, and may be a mixture of columns of multiple copies. And an attacker can not be used for restoring the original data, so that the safety of the data is improved.
Referring to fig. 3, the synchronization node 10 establishes a synchronization table 12, stores the variation quantity associated with the line number 11 in the synchronization table 12, periodically checks the synchronization table 12 by the reservation node 20, synchronizes the variation quantity stored in the synchronization table 12 to the reserved copy corresponding to the line number 11, and deletes the variation quantity from the synchronization table 12 after synchronization; the data call request further includes a synchronization check flag bit, if the synchronization check flag bit is 1, the reservation node 20 first checks the synchronization table 12, and if there is a variation, synchronizes the variation to the reserved copy, and then continues to establish secure multiparty computation to obtain the result of the data processing model 50. The synchronization table 12 records the change of the thermal data to be synchronized, and the change amount of the thermal data is directly added to the corresponding field in the reserved copy, so that the thermal data can be synchronized conveniently. If the timeliness requirement of a certain data calling request on data is high, for example, the current latest deposit balance of a depositor is called, the latest deposit balances of multiple banks are summed and then compared with a preset standard to obtain a comparison result, and the comparison result is used as a result of whether the depositor property meets the loan condition, so that the deposit information cannot be leaked in the process. But has significant meaning for the loan provider to control the loan risk. At this time, before the reservation node 20 of each bank is executed, it will check whether the deposit data of the corresponding depositor has unsynchronized variation, so as to ensure the real-time performance and accuracy of the data. This is a technical effect that other data sharing systems do not have.
The method comprises the steps that a synchronous node 10 lists non-numerical fields of data rows, a comparison table is manufactured, the comparison table comprises non-numerical field values and alternative numbers, the non-numerical field values are replaced by the alternative numbers, the comparison table is sent to a reserved node 20, the alternative numbers are divided into a plurality of addends, the addends are respectively distributed to a plurality of copies, the synchronous node 10 periodically synchronizes thermal data, the difference value of the corresponding alternative numbers before and after the non-numerical fields are changed is calculated, and the difference value is synchronized into the corresponding reserved copies. The non-numeric field comprises a tag field and a text field, the value types of the tag field and the text field are texts, but the value range of the tag field is limited, for example, the value range of the academic field is as follows: high school and below, major specialty, this department and researchers. The tag field with limited value range can be conveniently converted into a numerical value field, namely, a comparison table is manufactured. The numbers 1 to 4 are used to indicate the academic records of high school and the university, the major, the subject and the researchers. The text type field refers to fields with no known value range or an overlarge value range, such as remark fields and address fields. The remark content can be freely filled by a data source side according to requirements, the content is not regularly circulated, whether the remark content is valuable or not is uncertain, and the remark content is difficult to utilize. Although the value of the address has certain limitation and cannot be filled arbitrarily, the value range of the address is very large and is close to free filling. The specific address typically does not contain usable information either. The embodiment ignores the text type field which is not used as the external primary key field. Namely, the data line submitted by the data source side needs to delete the text field, and only the numerical field and the label field are reserved. And if the text field is the external main key field, extracting the salted hash value of the text field value, and storing the salted hash value in all the copies.
Referring to fig. 4, before the access node 40 discloses a comparison table and the data processing model 50 is a set of calculation formulas and the reservation node 20 constructs a secure multiparty calculation, the following steps are performed on the data processing model 50:
step a 01) extracting a weighted sum formula in the data processing model 50; the machine learning model is one of the main application modes of big data, and the neural network model can solve almost all problems theoretically as the most widely applied machine learning model. However, in the neural network model, a large number of weighted sum calculation formulas exist, so that the scheme has wide application scenes although only aiming at the weighted sum calculation formulas.
Step a 02) sends the weighted sum calculation to the plurality of cooperating nodes 30.
Step A03) the cooperative node 30 calls the corresponding copy, substitutes the addend stored in the copy into the weighted sum calculation formula to obtain the intermediate sum, and sends the intermediate sum to the reservation node 20.
Step A04), the reserved node 20 substitutes the addend stored in the reserved copy into the weighted summation calculation formula to obtain a reserved sum, and adds the reserved sum and all intermediate sums sent by the cooperative node 30 to obtain a result of substituting the true value into the weighted summation calculation formula; the results of the weighted summation equations are substituted into the data processing model 50, and a secure multiparty calculation solution is constructed for the non-weighted summation equations.
In a neural network model, an input layer is provided with three neurons which respectively correspond to the age, the annual average deposit amount and the current deposit balance of a depositor, a first layer of neurons is provided with two neurons, one neuron is connected with the three neurons of the input layer, a stimulation function is a sigmmod function, weight coefficients are represented by a11, a12 and a13, an offset is represented by b1, an output 53 of the first layer of neurons is equal to the sigmmod (x), wherein the first layer of neurons is fully connected, and x = a11 depositor age + a12 annual average deposit amount + a13 + b 1.
The deposit data of the depositor at the bank A is specifically as follows: the depositor age is 33, the amount of money is 30 thousands per year, and the current deposit balance is 80 thousands.
Generate 4 addends for depositor age 33 as: 33= -12+13+14+18, 4 copies are assigned the values: 12, 13, 14, 18. The annual average deposit amount of 30 thousands generates 4 addends which are respectively stored by 3 cooperation nodes 30 and a reservation node 20: 30=3+5+10+12, 4 copies being assigned the values: 3. 5, 10 and 12. The addend generated for the current credit balance 80 is: 80=12+18+23+27, the 4 copies being assigned the values: 12. 18, 23, 27. After the disorganized sequence is distributed to the 3 cooperative nodes 30 and the reservation node 20, it is assumed that the data stored by the first cooperative node 30 is: 12, 3,12, reserving the data stored by the node 20 as: 18,12,27.
The first cooperative node 30 stores data of: 12, 3,12, then the sum calculated by the first cooperative node 30 is: a11 x-12 + a12 x 1,000.00+ a13 x 12, and so on. The sum of all 3 cooperative nodes 30 sent, and the sum obtained by the reservation node 20, are summed, and the result is: a11 (-12 +13+14+ 18) + a12 (3 +5+10+ 12) + a13 (12 +18+23+ 27). Namely: a11 + a12 + a13 + 80, plus the offset b1 to obtain the value of x, and substituting the value into the sigmod (x) function to obtain the neuron output 53. In the calculation process, the original real value is mixed in a plurality of confusion values and addends, so that the confusion values and the addends are hidden and are difficult to be accurately found, and the privacy and the safety of data are improved.
When a depositor applies for loan at a bank B, the bank B performs primary examination, initiates a data calling request to a bank A, finds that the depositor meets the primary examination conditions, and then makes a loan approval sheet to submit higher-level approval to transact loan business for the depositor.
After some time, bank b wants to scale up the loan due to sufficient funds. Thus actively screening for the target customer. A data call request for the depositor deposit data at bank a is then initiated again. Assuming that the depositor is working well after getting a loan, its deposit balance at the bank nail has increased to 260 ten thousand. The synchronization node 10 at the first bank generates a corresponding variation in the synchronization table 12 before invoking the request, namely +180, and the reservation node 20 also finds the variation when periodically querying the synchronization table 12 and performs synchronization. Thus, the data originally held by the reservation node 20: 18,12,27, synchronizing +180 to the retained copy, such that the data stored by the retained copy becomes: 18,12,27+180, namely: 18,12,207. When the bank a executes the data processing model 50 of the bank b again, the sum calculated by the first cooperative node 30 is: a11 x 12+ a12 x 1,000.00+ a13 x 12, but the sum calculated by the retention node 20 has become: a11 + a12 +1,000.00 + a13, the sum of all 3 cooperator nodes 30, and the sum of the reserved nodes 20, are summed, with the result: a11 (-12 +13+14+ 18) + a12 (3 +5+10+ 12) + a13 (12 +18+23+ 207), which is: a11 + a12 + a13 + 260 so that the most up-to-date data can be applied to the calculations of the data processing model 50 and the collaboration node 30 need not make any changes. At the same time, it is still maintained that the real data is stored dispersedly, i.e. as 260=12+18+23+207, any one of which is leaked, without causing privacy leakage. In fact, as long as there is one that is not compromised, it is equivalent to no privacy being compromised. According to the scheme, the privacy of the data is effectively protected, fuzzification processing of the data is not needed, the accuracy of data calculation is improved, meanwhile, the latest change of the data can be synchronized to the calculation, and the timeliness of the data is improved. Therefore, the credit granting upper limit of the depositor is actively improved, the validity period is set, and once the depositor needs to invest more funds in operation, the loan service can be handled directly.
The beneficial technical effects of this embodiment are: the cooperative node 30 and the retention node 20 are used for storing the hot data in a dispersed manner, and the data processing model 50 is used for providing the result obtained after the hot data is processed for the call requester, so that the call requester can obtain the value of the data without contacting the hot data, the privacy safety of the hot data is effectively protected, and the data island breaking is facilitated. The reservation node 20 is arranged on the data source side, when the hot data is updated, the data is updated and synchronized to the reservation copy through the reservation node 20, so that the cooperative node 30 can enable the calling requester to obtain the latest data and execute the result of the data processing model 50 without any change, and the timeliness of the data is effectively improved.
Example two:
the embodiment further expands the situation that weighted sum calculation can be carried out on the basis of the first embodiment from a neural network model. Referring to fig. 5, the calculation of the data processing model 50 in the present embodiment includes the following steps:
step B01), the synchronous node 10 calculates the real value of the numerical field and the value from the power of 2 to the power of N, and newly establishes a power of 2 to power of N for each numerical field in the copy;
step B02) splitting the values of the power from 1 to the power N of the real value of the numerical type field into a plurality of addends respectively, and distributing the addends to the reserved copies and the rest copies for storage respectively;
step B03) before the reservation node 20 constructs the secure multi-party computation, the following steps are performed on the data processing model 50:
step B04), extracting a unitary calculation formula which takes the real value as input and can be subjected to Taylor expansion, and expanding the unitary calculation formula into a Taylor expansion formula which is a weighted sum calculation formula of the value from the power of 1 to the power of N of the real value;
step B05) sending the weighted sum calculation formula to a plurality of cooperative nodes 30, and the cooperative nodes 30 calling corresponding copies;
step B06) the 1 st to N th power addends saved in the copy are substituted into the weighting sum calculation formula to obtain the intermediate sum, and the intermediate sum is sent to the reservation node 20;
step B07), the reserved node 20 substitutes the 1 st power to N th power addends saved in the reserved copy into a weighting sum calculation formula to obtain a reserved sum;
step B08) adding the reserved sum and all the intermediate sums sent by the cooperative node 30, namely obtaining an approximate result of substituting the true value into the unary calculation formula;
step B09) substitutes the approximate result of the unary formula into the data processing model 50, and constructs a secure multiparty computational solution for the non-weighted summation formula.
Taylor expansion is a series form of expanding a function as a weighted sum of arguments 0 to N. When N is large enough, the error between the result calculated using taylor expansion and the true result will be small enough. For example, the taylor expansion of the function exp (x) is a weighted sum of x to the power 0 to the power N. Wherein, the coefficients of the 0 th power to the N th power of x in Taylor expansion of exp (x) function are respectively: 1,1,1/2!,1/3!, …, 1/N!. The taylor expansion is infinite in N, and when a certain error is allowed, a large value of N may be used. Due to the fact that the value of N is increased, the cost of increasing the cost of storage space is paid, the data are mainly calculated through multiplication and addition, the calculation complexity is linear complexity, and the influence on the calculation efficiency is small. Therefore, the scheme can increase the accuracy of calculation with extremely low cost. If the value of N is increased from 10 to 20, in the case of increasing by one order of magnitude, the overhead of a small amount of storage space is also increased, 2 × 20 computations are added, the sum of addends is split once and the weighted sum is computed once, and the cost is increased by less than one order of magnitude. The accuracy of the calculation is estimated as the last term, the accuracy is increased to 20! Divided by 10! The result is: 6.7xe 11. The accuracy of the approximation calculation using the taylor expansion series improves by almost 12 orders of magnitude for the same argument.
The reservation node 20 judges the error of the approximate result obtained by the unitary calculation formula through the Taylor expansion calculation, if the error exceeds the preset percentage, the approximate result is discarded, the safe multiparty calculation solving unitary calculation formula is reestablished, and if the error does not exceed the preset percentage, the approximate result is reserved; referring to fig. 6, the error determination method includes the following steps:
step C01), the reserved node 20 reversely finds out an approximate true value according to the approximate result and the unary calculation formula;
step C02) multiplying the approximate true value by a coefficient k, wherein k =1+ delta and delta is a preset percentage, and substituting the approximate true value corrected by the coefficient k into a unitary calculation formula to obtain a corrected approximate result;
step C03) if the difference between the absolute value of the corrected approximate result and the absolute value of the approximate result exceeds the preset percentage, the error is judged to exceed the preset percentage.
The error of the approximation is calculated using the taylor expansion, depending on the value of the independent variable itself. If the argument itself takes the value 3, then the 10 th term is calculated as: pow (3,10)/10 | =0.016, accounting for only 0.19% compared to the sum of the first three terms, which is 8.5, has been an acceptable accuracy. If the argument takes the value of 10, the value of item 10 is 2755, significantly greater than the sum of the first three items. The larger the value of the argument, the worse the accuracy. Therefore, when the specific N value is determined, the value of N meeting the precision requirement is determined according to the upper limit value in the value range of the numerical field, and the calculation precision can be ensured to meet the requirement.
The reservation node 20 runs a privacy security check module, which enumerates the outputs 53 of the data processing model 50, enumerates the input columns 51 to which each output 53 relates, and uses the outputs 53 and the input columns 51 to which they relate as submodels, which refer to fig. 7, where if a submodel exists that includes only one input column 51, the privacy security check fails, the reservation node 20 rejects the data call request, and notifies the access node 40 to revoke the data call request. The privacy security check module enumerates the outputs 53 of the data processing model 50, enumerates the input columns 51 and intermediate values 52 for each output 53, and uses the outputs 53 and the input columns 51 as submodels, and if there is only one input column 51 included in a submodel, it is possible to reverse the inputs from the outputs 53. The privacy security check fails and the reservation node 20 refuses to execute the data processing model 50. If the first output 53 is calculated from the first input through the function f1, if the first output 53 is fed back to the data call requester, the first input can be directly obtained through the inverse function of the function f1, which may cause leakage of data and destroy privacy of the data. Thus, privacy security monitoring fails, denying execution of such data processing model 50. If the data processing model 50 only shows the second and third outputs 53 in fig. 7, it is difficult to deduce specific input values after functional operations because of the multiple input columns 51. Thus, data privacy security can be protected.
The access node 40 designates a plurality of columns as external primary keys, the access node 40 and the synchronization node 10 agree on salt, the synchronization node 10 extracts a hash value after adding salt to the value of the external primary key of the data line, and the access node 40 displays the hash value in association with the line number 11. To facilitate the data call requester obtaining an intersection between different hot data, the access node 40 exposes the external primary key of the data. For example, according to the identity card number of the depositor, the deposit, consumption and loan data related to the same identity card number are searched on the data of different banks. For example, the bank B inquires the deposit condition of the account under the same identification number name in the data of the bank A according to the identification number of the loan requester. But disclosing the external primary key does not directly disclose the original value, but rather should salt and extract the salted hash value.
For early databases, private data was stored in the clear. If the password is 123456, then the plaintext is 123456 and is stored in the database. Once the database is compromised, the password is compromised. In order to avoid the defects of the first generation password design, a technician does not store a plaintext password in a database, but stores an encrypted password. Stored in the database is E10ADC3949BA59ABBE56E057F20F883E, 123456. When the user logs in, the user can compare the password input by the user with the database after the MD5 is executed, and whether the user identity is legal is judged. The leakage of the password cannot be caused after the database is leaked.
When the value is used as the value of the public external primary key field, if the external primary key field is the id number or the mobile phone number, the privacy problem is caused in order not to directly disclose the id number or the mobile phone number of the user on the access node 40. The access node 40 in turn publishes the identity card number, the hash of the cell phone number. The use of hash values can also be used to intersect data. For example, the loan data disclosed in bank a has the mobile phone number of the borrower recorded therein, and the deposit statistics data disclosed in bank b has the mobile phone number 18863638282 of the same resident recorded therein, assuming that the loan data is 18863638282. Then the disclosure of bank a SHA256 (18863638282) =57A4AC1BBC03679EF2EEB5DA678095746FFC6a055DFB25C4538BCABEEC988E9F, and in the same disclosure of bank b, SHA256 is also calculated for the mobile phone number 18863638282, and the same result is obtained. A comparison of the two can correlate the two data lines. If the loan institution C gives credit to the user, the credit is given to the same resident, the mobile phone number is also 18863638282, the identity card number of the resident can be used for inquiring the business data of the resident in other banks, and the mobile phone number can also be used for inquiring. The loan institution c extracts the hash value of 18863638282 of the mobile phone number, and from the hash value of the mobile phone number disclosed by the access node 40, two records with the same hash value can be found, namely loan data of the bank a and deposit data of the bank b.
The hash value, although not directly back-calculated to the original cell phone number, still presents a risk. Namely, the risk of obtaining the original identity card number or the mobile phone number through the collision attack exists. That is, when the cell phone number hash value 57A4AC1BBC03679EF2EEB5DA678095746FFC6a055DFB25C4538BCABEEC988E9F is known, all cell phone numbers of the number segment that have been sold by the telecom carrier are exhausted, and SHA256 is used to extract the hash value comparison. Then within a certain time, it can exhaust to a mobile phone number whose hash value is exactly equal to 57A4AC1BBC03679EF2EEB5DA678095746FFC6a055DFB25C4538 bcabbeec 988E9F, and the original mobile phone number is obtained. For the identification number, the first 6 bits are region codes, the middle 8 bits are birth dates, which are limited values, so that the original identification number plaintext can be obtained by exhaustive means. Therefore, the method of adding salt is needed to further improve the safety of data. Salting, in cryptography, refers to the process of inserting a specific character string at an arbitrarily fixed location in a password to make the hashed result not match the hashed result of the original password, which is called "salting". If the salt of the appointed mobile phone number is added with characters at the initial position: PHE, adding characters at the end: HUD, then cell phone number 18863638282 after adding salt: PHE18863638282HUD, hash value of PHE18863638282 HUD: BDA2773420943B5589CC8C5A406E97A921D140753C431958C8D80B96E59506C 1. Since the salt is random and its length and form are not well defined, it would be unacceptable to use exhaustive means which are time consuming. Thereby effectively protecting the private data of the user. Meanwhile, the data source side can still conveniently solve the intersection of the data by using the same salt to extract the hash value.
To avoid salt leakage, the access node 40 should expose an API that returns a salted hash value of the submitted data. That is, submitting the mobile phone number 18863638282 to the API, and obtaining the returned hash value after adding salt as: BDA2773420943B5589CC8C5A406E97A921D140753C431958C8D80B96E59506C1 uses this hash value to find the intersection. So that the data source side is unaware of the salt and the salt is not revealed. The salted hash value of the external main key field is disclosed, so that a data call requester can find the intersection of data conveniently, the integration of the data is realized, the value of the data is fully utilized, and the data cannot be leaked.
Compared with the first embodiment, the present embodiment reduces the number of multi-party security computations that need to be established in the execution data processing model 50 through an improved collaborative computing manner, and greatly accelerates the execution efficiency of the data processing model 50.
Example three:
a bank A, a bank B and a supermarket C respectively establish a synchronous node 10 and a reserved node 20, the bank A and the bank B synchronize deposit data of depositors to the synchronous node 10 in a preset period, and the supermarket C synchronizes consumption records of membership cards to the corresponding synchronous node 10.
Assuming that the deposit data comprises a depositor name, a depositor identity card number, a depositor mobile phone number, a balance and a depositor academic record, and the member card consumption data of the supermarket comprises a member name, a member mobile phone number, a member grade, a discount grade and a credit line. The membership card of the supermarket is a real-name membership card for high-quality consumers, has higher discount strength and certain credit line, and can be used for the consumers to pay after consuming. Therefore, the supermarket checks the customer's handling of the membership card strictly.
The depositor name/member name, depositor identity card number, depositor mobile phone number/member mobile phone number are used as external main key fields, so that the bank A, the bank B and the supermarket C take values of the corresponding fields respectively. Like a resident king five, bank A and bank B have been offered deposit account, its ID card number is: 1234567890123, whose mobile phone number is 18800006666, the academic calendar is the subject, and the number of replacements corresponding to the subject is 3. The deposit balances of the resident king five in the bank A and the bank B are respectively as follows: 10 and 6 million.
The bank A makes 3 copies, and the name: king five, identification number: 1234567890123, whose cell phone number 18800006666, was submitted to access node 40 through API, respectively, obtained the salted hash values of the three fields, respectively, and stored in the corresponding fields of all copies. The bank A divides the balance of the depositor Wang into addicts: 10=2+3+5, the academic calendar is divided into addends: 3= -9+5+7, the bank B divides the balance of the depositor king five into addends: 6=1+2+3, the academic calendar is divided into addends: 3= -6+4+ 5. The first cooperative node 30 stores the balance and the academic history in the copy of the corresponding bank A as follows: 2, -9, the balance and the academic calendar in the copy corresponding to the bank B are respectively: 1, -6, the second cooperative node 30 saves the balance and the academic calendar in the copy of the corresponding bank a as follows: and 3, respectively setting the balance and the academic calendar in the copy corresponding to the bank B as follows: 2,4. The balance and the academic calendar stored by the reservation node 20 of the bank A are respectively as follows: 5, 7, the balance and the academic calendar stored in the reservation node 20 of the bank B are respectively as follows: 3,5.
The membership card issuing rule of the supermarket C is as follows: and taking the deposit balance as a judgment condition, the higher the learning history is, the lower the requirement on the deposit balance is. The supermarket C designs a data processing model 50 for processing the membership card according to a comparison table disclosed by the access node 40: r = v >50-10 th academic calendar, v is the total deposit of the resident in all banks, and assuming wang has savings accounts in bank a and bank b, the data processing model 50 changes to: r = v1+ v2> 50-10% academic records/2, v1, v2 are deposit balances of residents at bank a and bank b, respectively, and the academic records are calculated twice and need to be divided by 2. The right side of the equation is a bigger and smaller judgment formula, the result is true or false, and the left side of the equation is a Boolean type variable. If the deposit of the resident king five is distributed to 3 banks, the academic calendar is calculated for 3 times, and the academic calendar is divided by 3 in the data processing model correspondingly. R is the output of the data processing model, wherein R is true, which indicates the membership of transacting the membership card, and R is false, which indicates the membership of transacting the membership card. Further, the supermarket C changes the data processing model into a weighted summation mode, namely r = v1+ v2+10 academic history/2, a result r is obtained, and the supermarket C judges whether r is larger than 50 or not. r = v1+ v2+10 × scholastic/2, i.e. the final data processing model submitted to the access node. Therefore, the calculation formula of the reservation node 20 and the cooperative node 30 of the bank a and the bank b, which need to be calculated, is: v1+ v2+10 academic calendar/2 belongs to the weighted sum calculation formula. The balance field and the academic field in the data bank of the bank A and the bank B need to be called. The access node allocates field numbers to the columns, and the data processing model enables the reserved nodes and the cooperative nodes to read the fields of the data rows with the specified row numbers through the field numbers and place the fields in the corresponding positions in the calculation formula. For example, the balance field of the data line of the bank armor is read and put into the position of the calculation formula v 1.
When a fifth resident arrives at a third supermarket to apply for a discount membership card for the first time, the third supermarket obtains the name and the mobile phone number of the fifth resident, submits the mobile phone number to an Application Program Interface (API) of an access node 40, obtains a salted hash value, compares the salted hash value with external main key fields disclosed by a first bank and a second bank, finds out the row numbers 11 of the deposit data of the fifth resident at the first bank and the second bank, submits the two row numbers 11 and a data processing model 50 to the access node 40, sends the row numbers 11 and the data processing model 50 to reserved nodes 20 of the first bank and the second bank by the access node 40, and appoints the reserved nodes 20 of the first bank to feed back results.
After receiving the line number 11 and the data processing model 50, the reservation nodes 20 of the bank a and the bank b respectively send the line number 11 and the data processing model 50 to the cooperative node 30. Since the cooperative node 30 simultaneously stores the copies of the data lines corresponding to bank a and bank b, the first cooperative node 30 directly calls the corresponding values of the fields corresponding to the two data lines, and substitutes the corresponding values in the data processing model 50 for r = v1+ v2+10 academic calendar/2, so as to obtain r1=2+1+10 (-9-6)/2= -72, and similarly, the second cooperative node 30 obtains the result of r2=3+2+10 (5+4)/2=50, and the reserved node 20 of bank a calculates the result as: rma =5+0+10 (7 + 0)/2 =40, and the reservation node 20 of bank b calculates the result as: rbeta =0+3+10 (0 + 5)/2 =28, and each of all the cooperative nodes 30 and the reserved node 20 of bank b collects the respective calculation results to the reserved node 20 of bank a, and alternatively, all the cooperative nodes 30 and the reserved node 20 collect the calculation results to the access node 40. I.e., access node 40 acts as a summing node, or access node 40 designates a reservation node 20 or a summing node. It is also possible for the access node 40 to designate one of the cooperating nodes 30 as a summing node.
In this embodiment, the reservation node 20 of bank a calculates the final output r = r1+ r2+ r a + r b = -72+50+40+28= 46. The access node 40 sends 46 as an execution result of the data processing model 50 to supermarket C, and transfers the supermarket C to tokens with corresponding quantity of virtual accounts of the access node 40 in advance, and the tokens are transferred to the bank A and the bank B respectively. The supermarket third decision 46 does not reach the preset index 50, so the request of the fifth resident for transacting the membership card is rejected. After a period of time, the deposit balance of the depositor king five in the bank first changes from 10 ten thousand to 15 ten thousand, namely the change amount is +5, and a record of the change amount of the corresponding data row, namely +5, is established through a synchronization table 12 established by a synchronization node 10. When the reservation node 20 of the first bank periodically queries the synchronization table 12, the variable quantity +5 is synchronized into the reservation copy, that is, the balance and the academic record stored by the reservation node 20 of the first bank are changed into: 10,7. And then the fifth resident applies for transacting the discount membership card of the third supermarket again. The supermarket C initiates a data calling request again according to the mode, after the same calculation, the data processing model 50 feeds the obtained result 51 back to the supermarket C, the access node 40 feeds the result 51 back to the supermarket C, and the corresponding tokens are transferred to the virtual accounts of the bank A and the bank B respectively. The third judgment of the supermarket shows that the fifth resident has the qualification of transacting the discount membership card, so that the third supermarket receives the request of transacting the discount membership card and carries out related business.
In the embodiment, the supermarket C can call accurate deposit data of the resident Wang in the bank, calculate and obtain the required index, and complete accurate risk management and control. In addition, in the calculation process, the supermarket C can not contact any privacy data of residents, and the bank can not cause any privacy disclosure in the use process of the privacy data according to the scheme. Therefore, the safe sharing of the hot data is realized, and the discovery and the use of the data value are effectively promoted.
The above embodiment is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the technical scope of the claims.
Claims (10)
1. A blockchain based thermal data sharing platform, comprising:
the system comprises a synchronous node, a reserved node, a plurality of cooperative nodes and an access node, wherein the synchronous node and the reserved node are arranged on a data source side, the synchronous node synchronously obtains hot data from a data source side system, line numbers are distributed to data lines of the hot data, the synchronous node makes a plurality of copies for the data lines, the copies are connected with the line numbers, the real value of a numerical value field of the data line is divided into a plurality of addends, the plurality of addends are respectively distributed to the plurality of copies, the reserved node reserves one copy as a reserved copy, the rest copies are distributed to the plurality of cooperative nodes for storage, the real value of a non-numerical value field is stored by the reserved copy, and the rest copies store confusion values,
the synchronization node periodically synchronizes thermal data, calculates the variation of the numeric field, synchronizes the variation to the corresponding reserved copy, directly updates the new value of the non-numeric field to the reserved copy,
the method comprises the steps that a data demand party submits a data calling request to an access node, the data calling request comprises a line number and a data processing model, the access node is provided with a virtual account on a block chain, the data demand party transfers a corresponding amount of tokens to the virtual account of the access node, the access node sends the data calling request to a reservation node corresponding to a data source party, the reservation node and a plurality of cooperation nodes form safe multi-party calculation, the result of the data processing model is obtained, the result is sent to the access node, the access node sends the result to the data demand party, and the corresponding amount of tokens are transferred to the virtual account of the data source party.
2. The platform of claim 1,
the method comprises the steps that a synchronous node obtains a source number from an access node, the synchronous node sends a column structure of a data row to the access node to obtain a class number and a field number, the synchronous node gives a unique serial number to the data row, the source number and the unique serial number form the row number of the data row, the access node discloses the data column structure and the row number, a cooperative node opens up a storage area for each column structure, a plurality of storage blocks are opened up in the storage area, the size of each storage block is matched with the maximum occupied space of a copy, the cooperative node stores the row number related to the copy in the storage blocks, and when the storage blocks in the storage areas are full, new storage areas and storage blocks are opened up for the column structure of the copy.
3. The platform of claim 2,
the cooperative nodes are provided with longitudinal displacement sequences for each storage area, the longitudinal displacement sequences record longitudinal offset of each column, storage blocks in the storage areas are sequentially numbered, when the copies are stored, the longitudinal offset corresponding to the columns of the copies is obtained, the corresponding columns are stored in the storage blocks corresponding to the longitudinal offset, when the copies are restored, the storage blocks corresponding to each column are searched downwards according to the longitudinal displacement sequences, the value of each column of the copies is obtained, and the restoration of the copies is completed.
4. The block chain based thermal data sharing platform according to any one of claims 1 to 3,
the synchronous nodes establish a synchronous table, the variable quantity related line numbers are stored in the synchronous table, the reservation nodes periodically check the synchronous table, the variable quantities stored in the synchronous table are synchronized to reservation copies corresponding to the line numbers, and the variable quantities are deleted from the synchronous table after synchronization;
the data calling request also comprises a synchronous checking flag bit, if the synchronous checking flag bit is 1, the reservation node firstly checks a synchronous table, if the variable quantity exists, the variable quantity is synchronized to a reserved copy, then the safe multi-party calculation is continuously established, and the result of the data processing model is obtained.
5. The block chain based thermal data sharing platform according to any one of claims 1 to 3,
and the synchronization node periodically synchronizes thermal data, calculates the difference value of the corresponding substitution numbers before and after the change of the non-numerical field, and synchronizes the difference value to the corresponding reserved copy.
6. The block chain based thermal data sharing platform of claim 5,
the access node discloses a comparison table, the data processing model is a set of calculation formulas, and before the reservation node establishes safe multi-party calculation, the following steps are executed on the data processing model:
extracting a weighted summation calculation formula in the data processing model;
sending the weighted sum calculation formula to a plurality of cooperative nodes, calling corresponding copies by the cooperative nodes, substituting the addends stored in the copies into the weighted sum calculation formula to obtain intermediate sums, and sending the intermediate sums to the reserved nodes;
the reservation node substitutes the addend stored in the reservation copy into the weighted sum calculation formula to obtain a reservation sum, and adds the reservation sum and all intermediate sums sent by the cooperative node to obtain a result of substituting the real value into the weighted sum calculation formula;
and substituting the result of the weighted summation calculation formula into the data processing model, and constructing a safe multiparty calculation solution for the non-weighted summation calculation formula.
7. The block chain based thermal data sharing platform of claim 5,
the synchronous node calculates the true value of the numerical type field and the value from the power of 2 to the power of N, and newly establishes a power of 2 to power of N for each numerical type field in the copy;
splitting the value from the power of 1 to the power of N of the real value of the numerical type field into a plurality of addends respectively, and distributing the addends to the reserved copies and the rest copies for storage respectively;
before the reservation node constructs the safe multi-party calculation, the following steps are executed to the data processing model:
extracting a unitary calculation formula which takes a real value as input and can be subjected to Taylor expansion, and expanding the unitary calculation formula into a Taylor expansion formula which is a weighted sum calculation formula of a value from 1 power to N power of the real value;
sending the weighted sum calculation formula to a plurality of cooperative nodes, calling corresponding copies by the cooperative nodes, substituting the 1 st-power addend to the N-power addend stored in the copies into the weighted sum calculation formula to obtain an intermediate sum, and sending the intermediate sum to the reserved node;
the reserved node substitutes the 1 st-order to N-order addends stored in the reserved copy into the weighting sum calculation formula to obtain a reserved sum, and the reserved sum is added with all intermediate sums sent by the cooperative node to obtain an approximate result of substituting the true value into the unitary calculation formula;
substituting the approximate result of the unary calculation formula into the data processing model, and constructing a safe multiparty calculation solution for the calculation formula of the non-weighted summation.
8. The block chain based thermal data sharing platform of claim 7,
the retention node judges the error of the approximate result obtained by the unitary calculation formula through the Taylor expansion, if the error exceeds a preset percentage, the approximate result is discarded, the secure multi-party calculation is reestablished to solve the unitary calculation formula, and if the error does not exceed the preset percentage, the approximate result is retained;
the reserved node executes the following steps to judge the error:
the reserved node reversely calculates an approximate true value according to the approximate result and the unitary calculation formula;
and multiplying the approximate real value by a coefficient k, wherein k =1+ delta and delta are preset percentages, substituting the approximate real value corrected by using the coefficient k into a unitary calculation formula to obtain a corrected approximate result, and judging that the error exceeds the preset percentages if the difference between the corrected approximate result and the absolute values of the approximate results exceeds the preset percentages.
9. The block chain based thermal data sharing platform of claim 7,
and the reservation node runs with a privacy security check module, the privacy security check module enumerates the output of the data processing model, enumerates the input columns related to each output, takes the output and the related input columns as submodels, if the submodels only comprise one input column, the privacy security check is not passed, the reservation node rejects the data call request, and informs the access node to cancel the data call request.
10. The block chain based thermal data sharing platform according to any one of claims 1 to 3,
the access node designates a plurality of columns as external primary keys, the access node and the synchronization node agree on salt, the synchronization node extracts a hash value after adding salt to the value of the external primary key of the data row, and the access node displays the hash value and the row number in an associated manner.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110918986.0A CN113377878A (en) | 2021-08-11 | 2021-08-11 | Block chain-based hot data sharing platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110918986.0A CN113377878A (en) | 2021-08-11 | 2021-08-11 | Block chain-based hot data sharing platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113377878A true CN113377878A (en) | 2021-09-10 |
Family
ID=77576843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110918986.0A Pending CN113377878A (en) | 2021-08-11 | 2021-08-11 | Block chain-based hot data sharing platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113377878A (en) |
-
2021
- 2021-08-11 CN CN202110918986.0A patent/CN113377878A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vovchenko et al. | Electronic currency: the potential risks to national security and methods to minimize them | |
CN109063169A (en) | A kind of customer data management system based on block chain | |
CN113268760B (en) | Distributed data fusion platform based on block chain | |
Kahn et al. | Credit and identity theft | |
CN113343284B (en) | Private data sharing method based on block chain | |
CN113886867B (en) | Loan credit system based on multisource data fusion | |
CN110333948A (en) | Virtual resource allocation method and apparatus based on block chain | |
CN113420335A (en) | Block chain-based federal learning system | |
CN113779624A (en) | Private data sharing method based on intelligent contracts | |
Kumar et al. | A systematic review of the research on disruptive technology–Blockchain | |
Zheng | Data trading with differential privacy in data market | |
CN113779622B (en) | Safety data fusion system suitable for multiple application scenes | |
CN110727735B (en) | Method, device and equipment for cooperatively completing task event based on block chain technology | |
CN113377878A (en) | Block chain-based hot data sharing platform | |
Rewatkar et al. | Decentralized voting application using blockchain | |
US20200175562A1 (en) | Gem trade and exchange system and previous-block verification method for block chain transactions | |
Shukla et al. | Delend: A p2p loan management scheme using public blockchain in 6g network | |
CN113792044A (en) | Data fusion platform and neural network model hosting training method | |
CN113779623B (en) | Thermal data fusion method based on blockchain | |
CN111539805A (en) | Distributed data operation algorithm based on enterprise risk early warning and credit system | |
CN113377730A (en) | Financial data real-time sharing system based on block chain | |
kizi Mirsadikova | Accounting For Cryptocurrencies | |
CN110535664A (en) | Data processing method, device, server and storage medium based on block chain | |
Vijayalakshmi et al. | Constructive system for double-spend data detection and prevention in inter and intra-block of blockchain | |
CN113792873B (en) | Neural network model hosting training system based on blockchain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210910 |
|
RJ01 | Rejection of invention patent application after publication |