CN114386072B - Data sharing method, device and system - Google Patents

Data sharing method, device and system Download PDF

Info

Publication number
CN114386072B
CN114386072B CN202210036998.5A CN202210036998A CN114386072B CN 114386072 B CN114386072 B CN 114386072B CN 202210036998 A CN202210036998 A CN 202210036998A CN 114386072 B CN114386072 B CN 114386072B
Authority
CN
China
Prior art keywords
data
attribute
target
holder
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210036998.5A
Other languages
Chinese (zh)
Other versions
CN114386072A (en
Inventor
张信明
吴青龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202210036998.5A priority Critical patent/CN114386072B/en
Publication of CN114386072A publication Critical patent/CN114386072A/en
Application granted granted Critical
Publication of CN114386072B publication Critical patent/CN114386072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data sharing method, device and system, wherein the method comprises the following steps: the cloud server obtains data requests sent by data requesters in the plurality of data holders, wherein the data requests indicate at least one target attribute; for each data holder, re-encrypting the data request by using a corresponding first re-encryption key, and sending the re-encrypted data request to the data holder; determining each candidate data holder for holding the data under the target attribute based on the holding condition table fed back by each data holder; for each target attribute, determining a data provider from candidate data holders of the target attribute, and requesting target data corresponding to the target attribute from the data provider; and after re-encrypting the target data returned by the data provider, returning the re-encrypted target data to the data demand party. According to the scheme, the privacy and the safety of the data can be improved in the data sharing process.

Description

Data sharing method, device and system
Technical Field
The present disclosure relates to the field of data integration technologies, and in particular, to a data sharing method, device, and system.
Background
Data integration is an important ring in the data fusion process, and specific multi-source data needs to be integrated together to support target mining, analysis and prediction tasks, so that the method has wide application in the fields of traffic, medical treatment, business, social activities and the like in smart cities. As a typical scenario of data integration, data sharing requires multiple parties as a data provider and a demander, respectively, so that the demander can obtain enough public data to achieve the purpose of mining effective information from big data.
In the task of data sharing, it is common practice to introduce a cloud service party, through which the data interaction process between the demand party and the plurality of data providers is coordinated. However, cloud service parties are not necessarily honest, and data privacy and security issues remain.
Disclosure of Invention
In view of this, the present application provides a method, an apparatus and a system for sharing data, so as to improve the privacy and security of data in the data sharing process.
In order to achieve the above object, the present application provides a data sharing method, which is applied to a cloud server in a data sharing system, where the data sharing system includes the cloud server and a plurality of data holders, and the method includes:
Obtaining a data request sent by a data requiring party, wherein the data request indicates at least one target attribute to which data requested by the data requiring party belongs, and the data request is a request encrypted by a private key of the data requiring party, and the data requiring party belongs to the plurality of data holding parties;
determining a first re-encryption key set by the data demand party for each data holding party aiming at the data holding party, re-encrypting the data request by using the first re-encryption key, and sending the re-encrypted data request to the data holding party so that the data holding party decrypts the re-encrypted data request based on a private key of the data holding party;
obtaining a holding condition table fed back by each data holder, wherein the holding condition table fed back by the data holder comprises: the data holding condition of the data holding party for each target attribute;
determining a holder list of each target attribute based on a holder list fed back by each data holder, wherein the holder list of each target attribute comprises: each candidate data holder holding data under the target attribute among the plurality of data holders;
For each target attribute, determining a data provider for providing data under the target attribute to the data demander from candidate data holders contained in a holder list of the target attribute;
for each target attribute, requesting target data corresponding to the target attribute from a data provider corresponding to the target attribute;
and after obtaining the target data corresponding to the target attribute returned by the data provider, re-encrypting the target data based on a second re-encryption key set by the data provider for the data demand side, and returning the re-encrypted target data to the data demand side.
In one possible implementation manner, the data holder stores a data table, where the data table includes: a public classification attribute and a plurality of private attributes, wherein the data sheet comprises a plurality of records, each record corresponds to a record identifier, and the target attribute belongs to the plurality of private attributes;
the cloud server stores a plurality of record identifiers corresponding to a plurality of records in the data table;
the determining a data provider for providing the data under the target attribute to the data demander from the candidate data holders included in the holder list of the target attribute includes:
Extracting at least one target record identifier from the plurality of record identifiers;
generating a sequence of pseudo attribute value pairs for the target attribute and the common classification attribute, the sequence of pseudo attribute value pairs comprising: at least one pair of pseudo attribute values, each pseudo attribute value pair comprising: a pseudo attribute value generated for the target attribute and a pseudo attribute value generated for the common classification attribute;
transmitting the at least one target record identifier and the pseudo attribute value pair sequence to each candidate data holder in a holder list of the target attribute;
obtaining information gain corresponding to the target attribute returned by the candidate data holder, wherein the information gain is determined by the candidate data holder based on at least one real attribute value pair corresponding to the at least one target record identifier and a false attribute value pair converted by the at least one false attribute value pair, and the real attribute value pair comprises an attribute value of the target attribute and an attribute value of the public classification attribute in a record corresponding to the target record identifier in the candidate data holder;
determining at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by the candidate data holders;
From the at least one candidate data provider, a data provider for providing the data under the target attribute to the data demander is determined.
In yet another aspect, the present application further provides a data sharing system, including:
a cloud server and a plurality of data holders;
the cloud server is used for receiving a request of data from a plurality of data holding parties, wherein the request of data is used for sending a request of data to the cloud server, the request of data indicates at least one target attribute to which the data requested by the data holding party belongs, and the request of data is encrypted by adopting a private key of the data holding party;
the cloud server is configured to execute the data sharing method described in any one embodiment of the present application.
In still another aspect, the present application further provides a data sharing device, which is applied to a cloud server in a data sharing system, where the data sharing system includes the cloud server and a plurality of data holders, and the device includes:
a request obtaining unit, configured to obtain a data request sent by a data demander, where the data request indicates at least one target attribute to which data requested by the data demander belongs, and the data request is a request encrypted by using a private key of the data demander, where the data demander belongs to the plurality of data holders;
A request re-encryption unit, configured to determine, for each data holder, a first re-encryption key set by the data consumer for the data holder, re-encrypt the data request with the first re-encryption key, and send the re-encrypted data request to the data holder, so that the data holder decrypts the re-encrypted data request based on its private key;
a table obtaining unit, configured to obtain a holding condition table fed back by each data holder, where the holding condition table fed back by the data holder includes: the data holding condition of the data holding party for each target attribute;
a list determining unit, configured to determine, based on a holding condition table fed back by each data holder, a holder list of each target attribute, where the holder list of each target attribute includes: each candidate data holder holding data under the target attribute among the plurality of data holders;
a provider determination unit configured to determine, for each target attribute, a data provider for providing the data under the target attribute to the data demander from among candidate data holders included in a holder list of the target attribute;
A data request unit, configured to request, for each target attribute, target data corresponding to the target attribute from a data provider corresponding to the target attribute;
and the data re-encryption unit is used for re-encrypting the target data based on a second re-encryption key set by the data provider for the data demand party after obtaining the target data corresponding to the target attribute returned by the data provider, and returning the re-encrypted target data to the data demand party.
As can be seen from the above, in the embodiment of the present application, the data request sent by the data demand side to the cloud server is a data request encrypted by using its private key, and the cloud server re-encrypts the data request based on the proxy re-encryption technology and forwards the re-encrypted data request to each data holder, so that the cloud server cannot obtain the target attribute to which the data requested by the data demand side belongs; in addition, the holding condition table returned by the data holding direction cloud server only contains the holding condition of the data holding party for each target attribute, so that the cloud server cannot acquire the information of the target attribute, and the privacy and the safety of the data requested in the data sharing process are improved.
In addition, after the cloud server requests the data with the target attribute from the determined data provider, the data provider also encrypts the data with the target attribute by adopting the private key, and the cloud server only needs to re-encrypt the data with the target attribute and then forward the re-encrypted data to the data demander without cracking the attribute of the target data, so that the cloud server cannot know the data content requested to be shared by the data demander, and the privacy and the safety of the data needing to be shared are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 shows a schematic diagram of a data sharing method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of constructing a data sharing system according to an embodiment of the present application;
fig. 3 is a schematic flow interaction diagram of a data sharing method according to an embodiment of the present application;
Fig. 4 is a schematic flow chart of determining candidate data providers in the data sharing method according to the embodiment of the present application;
FIG. 5 is a schematic flow chart of at least one pending data holder for determining a target attribute according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of generalizing data under a target attribute by a data provider in an embodiment of the present application;
fig. 7 is a schematic diagram illustrating a composition structure of a data sharing device according to an embodiment of the present application.
Detailed Description
The scheme of the embodiment of the application is suitable for a data sharing scene among a plurality of data parties, takes the cloud server as an agent party, and realizes data sharing based on an agent re-encryption technology so as to ensure the privacy in the data sharing process and improve the data security in the data sharing process.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without undue burden, are within the scope of the present application.
The data sharing system comprises the cloud server and a plurality of data holders, wherein any one data holder can serve as a data demand side, and the data demand side can request data from other data holders through the cloud server, so that the data demand side can obtain data which does not exist in the data sharing system, and data sharing is achieved. And the cloud server may determine a data provider for providing data to the data demander from among a plurality of data holders, in which case the data holder may provide corresponding data to the data demander as the data provider.
The data sharing method of the present application is described below with reference to flowcharts.
Fig. 1 shows a flow chart of a data sharing method provided in an embodiment of the present application, where the method of the embodiment may be applied to a cloud server in a data sharing system. The method of the present embodiment may include the steps of:
s101, obtaining a data request sent by a data demand party.
The data demander belongs to any one of a plurality of data holders.
Wherein the data request indicates that there is at least one target attribute to which the data requested by the data requestor pertains. If the data under the attribute a and the attribute B does not exist in the data demander, but it is desired to obtain the data under the attribute a and the attribute B from other data holders, the data request sent by the data demander may indicate the attribute a and the attribute B, so that the data request indicates the requirement of requesting the data under the attribute a and the attribute B.
Because the data sharing is realized based on the proxy re-encryption technology, the data request is a request encrypted by adopting the private key of the data requiring party, namely the data requiring party encrypts the data request in a commonly-called two-level encryption mode.
S102, determining a first re-encryption key set by the data demand party for each data holder, re-encrypting the data request by using the first re-encryption key, and sending the re-encrypted data request to the data holder.
In this application, to hide the identity of the data demander, the cloud server takes all the datamation holders as the receiving sides of the data request.
Wherein, for each data holder, since the re-encryption key set by the data demand party for the data holder is stored in the cloud server, the re-encryption key set by the data demand party for the data holder is referred to as a first re-encryption key for convenience of distinction. Accordingly, the cloud server may re-encrypt the data request based on the first re-encryption key corresponding to the data holder, and the obtained re-encrypted data request may decrypt with the private key of the data holder.
Accordingly, after obtaining the re-encrypted data request, the data holder may decrypt the re-encrypted data request based on the private key of the data holder.
Therefore, the cloud server does not need to decrypt the data request and cannot acquire the attribute corresponding to the data requested to be shared by the data demander, so that the data security in the data sharing process can be improved, and the data privacy is also ensured.
S103, obtaining a holding condition table fed back by each data holder.
The holding condition table fed back by the data holding party comprises: data holding conditions of the data holder for each target attribute.
Specifically, the data holding case includes both holding data and non-holding data. And for each target attribute, the data holding condition of the data holder for the target attribute is holding data or not holding data. If the data holding condition of the data holding party on the target attribute is holding data, the data holding party is indicated to have the data under the target attribute; otherwise, if the data holding condition of the data holding party for a certain target attribute is not holding data, the data holding party is not provided with the data under the target attribute.
It will be appreciated that the data holder generates a holding list of data holding situations containing at least one target attribute only and not information of the at least one target attribute.
For example, the holding case table generated by the data holder may be in the form of a binary sequence having the same number of bits as the total number of the at least one target attribute. And each bit in the binary sequence corresponds to a target attribute, and the value on each bit is used for representing the data holding condition of the data holder for the target attribute on the bit. The description will be made later in connection with the form of binary sequences, and will not be repeated here.
S104, determining a holder list of each target attribute based on the holder list fed back by each data holder.
Wherein the holder list of each target attribute includes: each candidate data holder holding data under the target attribute among a plurality of data holders.
For example, assuming that the data holder a feeds back the holding table indicating that the data holder holds data under the second target attribute, and the data holder B feeds back the holding table indicating that the data holder holds the attribute under the second target attribute, the data holder a and the data holder B are included in the holder list of the second target attribute for the second target attribute of the plurality of target attributes.
S105, for each target attribute, a data provider for providing data under the target attribute to the data demander is determined from among the candidate data holders included in the holder list of the target attribute.
In the present application, there are many possibilities, without limitation, for determining a specific implementation of the data provider for providing the data under the target attribute to the data provider from among a plurality of candidate data holders. The implementation manner of determining the data provider corresponding to a certain target attribute will be described later in conjunction with the possible cases, which will not be described herein.
S106, for each target attribute, requesting target data corresponding to the target attribute from a data provider corresponding to the target attribute.
Only one data provider can be determined for each target attribute, and accordingly, the data provider only provides target data corresponding to the target attribute.
S107, after obtaining the target data corresponding to the target attribute returned by the data provider, re-encrypting the target data based on the second re-encryption key set by the data provider for the data demand, and returning the re-encrypted target data to the data demand.
It will be appreciated that for each target attribute corresponds to a data provider, the data provider returns target data for that target attribute to which the data provider corresponds. In the application, the target data may be data obtained by generalizing the data under the target attribute, so that the data provider may only provide part of the data content under the target attribute to the data demander, thereby improving the data privacy.
In the present application, the cloud server may store therein a re-encryption key set by each data holder for any one of the other data holders. Here, for convenience of distinction, the re-encryption key set by the data provider for the data demander is referred to as a second re-encryption key.
Correspondingly, after the cloud server re-encrypts the target data by using the second re-encryption key and sends the target data to the data requiring party, the data requiring party can decrypt the re-encrypted target data by using the private key of the cloud server, so that the required data content associated with the target attribute is obtained.
As can be seen from the above, the data request sent by the data demand side to the cloud server is a data request encrypted by using its private key, and the cloud server re-encrypts the data request based on the proxy re-encryption technology and forwards the data request to each data holder, so that the cloud server cannot obtain the target attribute to which the data requested by the data demand side belongs; in addition, the holding condition table returned by the data holding direction cloud server only contains the holding condition of the data holding party for each target attribute, but not the information of each target attribute, so that the cloud server cannot acquire the information of the target attribute, and the privacy and the safety of the data requested in the data sharing process are improved.
In addition, after the cloud server requests the data with the target attribute from the determined data provider, the data provider also encrypts the data with the target attribute by adopting the private key, and the cloud server only needs to re-encrypt the data with the target attribute and then forward the re-encrypted data to the data demander without cracking the attribute of the target data, so that the cloud server cannot know the data content requested to be shared by the data demander, and the privacy and the safety of the data needing to be shared are improved.
In order to facilitate an understanding of the aspects of the present application, a brief description of the process of constructing a data sharing system follows.
Referring to fig. 2, which is a schematic flow chart illustrating a data sharing system construction in an embodiment of the present application, a method of the present embodiment may include:
s201, based on an authentication mechanism that provides an electronic authentication service, identity authentication is performed on a cloud server and a plurality of data holders that participate in data sharing.
The electronic authentication service is commonly called CA (Certificate Authority ) authentication service, and is a service for providing authenticity and reliability verification for parties related to electronic signatures.
The authentication structure providing the electronic authentication service may be an authority responsible for issuing and managing digital certificates.
In the application, the identity authentication of the cloud server and the data holder in the CA authentication process can comprise related authentication such as identity authentication, and specific authentication content and process can be set according to requirements, so that the application is not limited.
S202, for each data holder, the data holder sends the re-encryption key between the data holder and each other data holder to the cloud server, respectively.
In this application, the cloud server, as a proxy in proxy re-encryption, may store a re-encryption key between each data holder and any one of the other data holders.
And S203, the data held by the data holders are respectively unified into a longitudinally aligned two-dimensional data table based on the privacy independent set merging algorithm.
Among them, the Privacy independent set merge algorithm is also called Privacy preserving set merge algorithm (Privacy-preserving set union). The data held by each of the plurality of data holders is unified into the data table with the same format by the algorithm, so that each data table held by the data holder comprises a public classification attribute and a plurality of private attributes.
In the two-dimensional data table, each column is an attribute, and a row is a record, and each record corresponds to a record identification ID. For example, the set of attributes that each data holder i holds in the data table may be represented as D DP_i =(A cls ,A i_1 ,A i_2 ,...,A i_d ) Wherein A is cls Representing common classification attributes, A i_1 To A i_d Are all private attributes. d is the total number of the private attributes, the value of d is a natural number greater than 0, i is a natural number greater than 1 and less than or equal to T, and T is the total number of the data holders.
Wherein the common classification attribute is an attribute of which all data holders have an attribute value, i.e. each data holder holds a value belonging to the common classification attribute. Based on this, the common classification attribute is also an attribute that all data holders have and can see the attribute value.
In addition, the common classification attribute is also an attribute by which each data holder is the basis of data classification. For example, the record or the category to which the entity of the record belongs is determined according to whether the value under the common classification attribute in the record is larger than a set value.
Unlike public classification attributes, not every data holder holds a value under a private attribute. It is because the data holder may not be able to view the values of certain private attributes, e.g., because the values under certain private attributes are not visible to each data holder, it may be necessary to request the values under certain private attributes from other data holders to obtain the values under the corresponding private attributes.
For example, a public classification attribute in a two-dimensional data sheet may be a name, while a private attribute may include: gender, age, school, occupation, etc. Each row in the data table held by a certain data holder has a value of a name, but the data table of the data holder may only have values of two attributes, namely gender and age, but does not have values of attributes of academic and professional, and the like.
It is to be understood that step S203 is only one form of holding data by the data holder, and that in practical applications, the specific form of the data in the data table held by the data holder may be other, which is not limited.
S204, the cloud server initializes the trust degree of each data holder.
For example, the cloud server side will trust TS of each data holder i i All initialized as: TS (transport stream) i =5.0, and the confidence level of each data holder is within the range TS i ∈[0.0,10.0]。
The trust of the data holder is determined as an optional operation, and the purpose of the trust is to improve the reliability of data sharing.
In addition, in the present application, a plurality of record identifiers corresponding to a plurality of records in a data table maintained in a data holder are stored in a cloud server.
It will be appreciated that the foregoing is an introduction to a data sharing system and its construction in order to facilitate an understanding of the data sharing method of the present application. It will be appreciated that the composition of the data sharing system and the form in which the data is held by the data holder may be varied, and that the above is merely illustrative of one possible form, as well as other forms, are equally applicable to the present application.
In order to facilitate understanding of the specific implementation of the data provider determining a certain target attribute in the solution of the present application, the following description is made with reference to a possible scenario from the point of interaction between the data demander and each data holder and the cloud server.
As shown in fig. 3, which is a schematic flow interaction diagram of a data sharing method provided in the embodiment of the present application, the method of the present embodiment may include:
s301, sending a data request to a cloud server by the data demand.
Wherein the data request indicates that there is at least one target attribute to which the data requested by the data requestor pertains. The data request indicates that the target attribute of the request belongs to the aforementioned plurality of private attributes.
For example, the data request may carry information of each target attribute required to request to share data, and the set of attributes required to be shared by the data sharing task is reqa= (ReqA 1 ,ReqA 2 ,...,ReqA m ) Wherein, reqA x Any one of the target attributes indicated in the data request, wherein x belongs to 1 to m, and m is the total number of the target attributes indicated in the data request.
The data demander is any one of a plurality of data holders of the data sharing system.
S302, for each data holder in the data sharing system, the cloud server re-encrypts the data request based on a first re-encryption key set by the data demand side for the data holder, and sends the re-encrypted data request to the data holder.
S303, the data holder decrypts the re-encrypted data request by using the private key of the data holder, and determines the information of the at least one target attribute based on the decrypted data request.
S304, the data holder sends a holding condition list aiming at the at least one target attribute to the cloud server according to the data holding condition of each private attribute.
The holding case table includes: the data holder holds the data for each target attribute.
For example, in one alternative, the holding case table generated by the data holder may be in the form of a binary sequence having the same number of bits as the total number of the at least one target attribute. And each bit in the binary sequence corresponds to a target attribute, and the value on each bit is used for representing the data holding condition of the data holder for the target attribute on the bit.
For example, the data holder i is based on its own D DP_i =(A i_1 ,A i_ 2,...,A i_d ) Is data holding case of (1), the requested attribute set reqa= (ReqA) 1 ,ReqA 2 ,...,ReqA m ) The generated binary sequence B DP_i =(b i_1 ,b i_2 ,...,b i_m )。
Wherein b i_x Representing the data holder i for the target attribute ReqA x Is a data holding case of (a). b i_x E {0,1}, if b i_x Take a value of 0, indicating that the data is not held, i.e., the data holder i does not hold the target attribute ReqA x The following data; b i_x A value of 1 indicates that the data is held, i.e. the data holder i holds the target attribute ReqA x The following data.
Because the binary sequence only contains the holding condition of the data holder on each target attribute in different positions, the cloud server can only obtain the total number of the target attributes and the holding condition of the data holder on the target attributes in each position according to the binary sequence, and can not obtain the specific attribute information requested by the data demander.
S305, the cloud server determines a holder list of each target attribute based on the holder table fed back by each data provider.
Wherein the holder list of the target attribute includes information of each data holder holding the data under the target attribute.
For example, for the xth target attribute (or each target attribute ReqA x ) Its holder list Cand x Can be expressed as follows:
Cand x =(DP x_1 ,DP x_2 ,...,DP x_k )
wherein DP x_t The data holder t, which represents the data under the x-th target attribute, is any one natural number from 1 to k, and k is the total number of data holders holding the data under the x-th target attribute.
S306, for each target attribute, the cloud server extracts at least one target record identifier from a plurality of record identifiers, generates a pseudo attribute value pair sequence for the target attribute and the public classification attribute, and sends the at least one target record identifier and the pseudo attribute value pair sequence to each candidate data holder in a holder list of the target attribute.
For example, the cloud server may randomly extract a first set proportion of record identifiers from the plurality of record identifiers as the target record identifiers.
Wherein for each target attribute, the sequence of pseudo attribute value pairs comprises: at least one pair of pseudo attribute values, each pseudo attribute value pair comprising: the cloud server generates a pseudo attribute value for the target attribute and a pseudo attribute value for the common classification attribute.
The number of the pseudo attribute value pairs in the pseudo attribute value pair sequence may be set according to needs, for example, the number of the pseudo attribute value pairs may be a second set proportion of the total number of the target record marks, where the first set proportion and the second set proportion are different.
Wherein the cloud server generates pseudo attribute value pairs for the target attribute and the common classification attribute, not real attribute values, but only values generated to mask the real attribute under the target attribute.
S307, for each target attribute, the candidate data holder obtains at least one real attribute value pair corresponding to at least one target record identifier, converts the at least one pseudo attribute value pair into at least one false attribute value pair, and determines the information gain generated by the target attribute according to the at least one real attribute value pair and the at least one false attribute value pair.
For each target attribute, the real attribute value pair includes an attribute value of the target attribute and an attribute value of the common classification attribute in a record corresponding to the target record identifier in the candidate data holder. That is, the attribute value in the real attribute value pair is the attribute value of the target attribute and the common classification attribute of the real record in each record corresponding to each target record identifier in the candidate data holder.
The purpose of the conversion of the pair of pseudo attribute values is to convert the pseudo attribute value of the target attribute in the pair of pseudo attribute values into a pseudo attribute value within the range of values of the target attribute and to convert the pseudo attribute value of the common classification attribute into a pseudo attribute value within the range of values of the common classification attribute.
The range of the target attribute refers to the range of the possible value of the data under the target attribute. For example, the value range of the attribute of gender may include: both male and female. The value ranges of the public classification attributes are similar, and are not repeated.
For example, in combination with the value range of the target attribute, according to a set conversion relation or a set conversion mode, the pseudo attribute value of the target attribute in the pseudo attribute value pair can be determined to correspond to the target value in the value range of the target attribute, and the target value is determined to be the pseudo attribute value of the target attribute. The conversion of the pseudo attribute values of the common classification attribute is also similar and will not be described in detail.
The specific manner of determining the information gain brought by the target attribute based on the mixed data of the at least one real attribute value pair and the at least one false attribute value pair can be any method for calculating the information gain at present, which is not limited.
For example, the data holder i may calculate the target attribute ReqA by the following equation one x Gain of information generated
Wherein H (A) cls ) Entropy of information for the calculated common classification attribute, H<A cls |ReqA x >For attribute as target attribute ReqA x Information entropy of the common classification attribute. The process of calculating the information entropy can adopt the existing information entropy calculation method without limitation.
In this application, the information gain generated by considering the real data under the target attribute may also expose the data information under the target attribute, so if these information gains are directly sent to the cloud server, there may also be a risk of data leakage. In the method, some false data is mixed in the process of calculating the information gain of the target attribute, so that the risk of exposing the data under the target attribute through the information gain can be reduced.
And S308, the candidate data holder sends the information benefit of the target attribute to the cloud server.
S309, for each target attribute, the cloud server determines at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by the candidate data holders.
Among them, for convenience of distinction, a candidate data holder selected from among the candidate data holders is referred to as a candidate data provider.
It can be understood that the information gain can reflect a part of the gain of the true attribute value of the target attribute, and in combination with the information gain returned by each candidate data holder, a small number of candidate data holders separated from a plurality of information gain distribution main bodies can be removed according to the principle that most of the data gain is true, and finally, candidate data providers corresponding to more information of the included true data can be screened out, so that reliable candidate data providers can be selected.
For ease of understanding, one implementation of determining candidate data providers in the present application will be described with reference to the implementation flow of fig. 4, where a possible case is taken as an example, and will not be described herein.
S310, for each target attribute, the cloud server determines, from at least one candidate data provider corresponding to the target attribute, a data provider for providing data under the target attribute to the data demander.
For example, the cloud server may randomly select one candidate data provider from at least one candidate data provider corresponding to the target attribute as the data provider of the target attribute.
In one alternative, the cloud server may also maintain a score of the trust level of the respective data holders. On this basis, for each target attribute, after each candidate data provider is determined, the trust level of the candidate data provider may be increased by a set value, and the trust level of each data holder not belonging to the candidate data provider in the holder list of the target attribute may be decreased by the set value.
Accordingly, one candidate data provider with the trust degree larger than the set threshold value can be selected from the at least one candidate data provider as the data provider for providing the data under the target attribute to the data demander. For example, a candidate data provider with a confidence level greater than a set threshold is randomly selected as the data provider.
S311, for each target attribute, the cloud server requests target data corresponding to the target attribute from a data provider corresponding to the target attribute.
S312, after obtaining the target data corresponding to the target attribute returned by the data provider, the cloud server re-encrypts the target data based on the second re-encryption key set by the data provider for the data demand, and returns the re-encrypted target data to the data demand.
The above steps S311 and S312 may be referred to the related description of the previous embodiments, and are not described herein.
In the application, in the process of selecting a candidate data provider, in order to reduce the risk of leakage of data of a target attribute caused by that a cloud server obtains some data of the target attribute from the information gain of the target attribute, in the application, for each target attribute requested by a data demander, a pseudo attribute value pair corresponding to the target attribute and a public classification attribute is generated by the cloud server. On the basis, the candidate data provider generates false attribute value pairs corresponding to the target attribute and the public classification attribute based on the false attribute value pairs, and generates information gain of the target attribute based on the virtual attribute value pairs and the real attribute value pairs, so that the information gain is not only the information gain of the real data of the target attribute, but also the possibility that the cloud server acquires the real data from the information gain is reduced, and the risk of data leakage is reduced.
To facilitate an understanding of the implementation of determining candidate data providers based on information gain in the present application, the following is described in connection with one possible implementation:
referring to fig. 4, a schematic diagram of an implementation flow of determining candidate data providers in the data sharing method provided in the embodiment of the present application is shown, where the method of the embodiment may be applied to a cloud server, and the flow is a flow of determining candidate data providers for a target attribute, and the flow includes:
s401, for a target attribute, at least one target record identifier is extracted from a plurality of record identifiers, a pseudo attribute value pair sequence is generated for the target attribute and the public classification attribute, and the at least one target record identifier and the pseudo attribute value pair sequence are sent to each candidate data holder in a holder list of the target attribute.
For example, assuming that the number of repetitions of the candidate holder to be determined (i.e., the number of repetitions of steps S401 to S405) is set to the number h, the first set ratio of extracting the target record identifiers may be 2/h, and correspondingly, 2n/h target record identifiers may be randomly extracted from the plurality of record identifiers, where n is the total number of the plurality of record identifiers.
Similarly, 0.2n/h pseudo attribute value pairs may be included in the generated sequence of pseudo attribute value pairs.
S402, obtaining information gain of the target attribute returned by each candidate data holder in the holder list of the target attribute.
The candidate data holder determines the information gain of the target attribute based on at least one real attribute value pair corresponding to the target record identifier and at least one false attribute value pair converted from the false attribute value pair.
The above two steps can be referred to the related description of the previous embodiments, and will not be repeated here.
S403, determining at least one pending data holder with information gain in a distribution section meeting a set condition based on the distribution characteristics of the information gain returned by each candidate data holder.
The number of the information gains in the distribution interval meeting the set condition is larger than the number of the information gains in other distribution intervals. The information gain in the distribution interval meeting the set condition is selected, namely the information gain in the distribution main body among the information gains returned by the candidate data holders, namely the partial information gain which can represent the real attribute value of the target attribute.
For example, in one possible implementation, as shown in fig. 5, there is shown a process for determining at least one pending data holder, the process including:
s51, determining the maximum information gain and the minimum information gain in the information gains returned by the candidate data holders.
S52, constructing a gain distribution total interval formed by the minimum information gain and the maximum information gain, and determining the gain distribution total interval as a target distribution interval to be processed.
S53, determining a first middle partition point and a second middle partition point which trisect the target distribution interval.
The information gain value corresponding to the first middle dividing point is smaller than the information gain value corresponding to the second middle dividing point.
S54, the number of information gains contained in the first gain distribution interval and the second gain interval is counted respectively.
The first gain distribution section is a distribution section formed by the minimum information gain and the second intermediate division point, and the second gain distribution section is a distribution section formed by the first intermediate division point and the maximum information gain.
For example, taking the total section of the gain distribution constituted by the maximum information gain and the minimum information gain as an example, it is assumed that the minimum information gain is ig l And the maximum information gain is ig r Then ig l And ig r The first middle partition point of the trisection of the formed interval is as follows:the second intermediate partition point is: />
Correspondingly, the first gain distribution interval isAnd the second gain distribution interval is:
s55, a candidate gain distribution section having a large number of information gains is determined from the first gain distribution section and the second gain distribution section.
In the present application, one section having a larger number of information gains is reserved in the first gain distribution section and the second gain distribution section, and the reserved section is referred to as a candidate gain distribution section for convenience of distinction.
S56, detecting whether the quantity of information gains contained in the candidate gain distribution interval is smaller than a set threshold value or the interval range of the candidate gain distribution interval is smaller than a set range interval, if so, determining that at least one candidate data holder with the information gain in the candidate gain distribution interval is at least one pending data holder; if not, the candidate gain distribution section is determined as the target distribution section, and the process returns to step S53.
For a target attribute ReqA x Holder list Cand x Through the repeated steps S53 to S55, the candidate data holders can be finally deleted to form a set of pending data providers
S404, adding one to the repetition number.
For example, the initial value of the repetition number may be set to 1, and then, the repetition number is increased by one every time the steps S401 to S403 are repeatedly performed more than once.
S405, detecting whether the repetition number reaches the set number, if so, executing step S406; if not, the process returns to step S401.
The number of times of setting may be set according to actual needs, which is not limited.
S406, if the repetition number reaches the set number, determining a preset number of candidate data providers with higher occurrence frequency from at least one pending data holder obtained each time.
The number of settings can be set as desired.
If, after repeating steps S401 to S403 above for a set number of times, at least one pending data holder can be obtained each time, the number of occurrences of each pending data holder can be counted, and then the pending data holder having a higher number of occurrences and a number of occurrences not lower than 2/h is selected as the candidate data provider.
It can be understood that in the case that the information gain includes the information gain of the false attribute value pair, the information gains returned by different candidate data holders can swing around the information gain corresponding to the true attribute value pair, the information gains returned by each candidate data holder are obtained in a repeated manner, and the pending data holders determined for the information gain are synthesized each time, so that the candidate data providers with more information gains including the true data in the provided information gain can be more effectively selected.
In the above embodiments of the present application, for each target attribute, the target data corresponding to the target attribute returned by the data provider may be generalized data obtained by generalizing the data under the target attribute. Wherein data generalization aggregates data by replacing relatively low-level values (e.g., values of attribute ages) with higher-level concepts (e.g., young, middle-aged, and elderly). In order to improve data security and enable a data demander to obtain data content required under a target attribute, the data provider in the application generalizes the data under the target attribute.
In the present application, a specific implementation manner of generalizing data under the target attribute by the data provider may not be limited.
In one possible implementation manner, after determining the data provider corresponding to the target attribute, the cloud server may send the classification tree and the differential privacy parameter uploaded by the data demander to the data provider. And the data provider may generalize the data under the target attribute based on the classification tree and the differential privacy parameters. The generalization process by combining the classification tree and the differential privacy parameter can be implemented in any currently common mode, which is not limited.
For ease of understanding, a schematic flow diagram of generalizing data under a target attribute by a data provider is shown in fig. 6, where the process of generalizing data under a target attribute by a data provider of the target attribute is described.
The process may include:
s601, the cloud server sends the classification tree and the differential privacy parameters to a data provider corresponding to the target attribute.
Wherein, classification tree top-down is from fuzzy to concrete for attribute value classification, and data generalization is performed according to classification tree top-down.
The cloud server may also, prior to this stepTo initialize any one of the data providers DP Rand_x Contribution value Con of (2) x =0.0
Data provider DP Rand_x Representing any one target attribute ReqA x A corresponding data provider.
S602, data provider DP Rand_x All record identifications ID in the data table are initialized as one ID block, and the original ID block set is defined to be equal to the ID block.
That is, all IDs are initialized to one block [ IDAll]And defines the original ID block combination OldBlock= { [ IDAll](corresponding to the level of the classification tree) x =0。
S603, data provider DP Rand_x The ID blocks in the original ID block set OldBlock are classified according to the classification tree and the level of the current classification tree x Downwards embodying a layer to obtain an update ID block set NewBlock x
S604, data provider DP Rand_x Combining the current update ID block set and the original ID block set, calculating the information gain of the sub-implementation
Specifically, if NewBlock x Satisfying k-anonymityOtherwise, go (L)>
Wherein H is<A cls |OldBlock>Information entropy for common classification attribute in the case of initial block. H<A cls |NewBlock>In order to calculate the information entropy of the common classification attribute under the condition of the new ID block set, the current arbitrary information entropy calculation method can be adopted in the information entropy calculation process, and the method is not limited.
S605, data provider DP Rand_x Gain information to be embodiedSending to cloud server, and collecting update ID block newBlock x And the encrypted private key is sent to the cloud server.
S606, the cloud server collects all data providers DP Rand_x Detecting whether all values in the returned total set with the information gain are zero, if so, indicating that the generalization iteration process of the data provider is finished, so that the data provider executes step S609; if not, step S607 is performed.
E.g. total set of information gainsx is 1 to m, m is the total number of target attributes indicated in the data request,/->Representing target attributes ReqA x The corresponding data provider provides a materialized information gain.
S607, the cloud server side uses an exponential mechanism algorithm to calculate the total setRandom selection element->Determining the element->Corresponding target attribute ReqA w Data provider DP to be processed ReqA_w The pending data provider DP ReqA_w The current corresponding update ID block set NewBlock w The update ID block set newBlock corresponding to the data provider to be processed is collected w And re-encrypting, and sending the re-encrypted update ID block set to the data providers corresponding to all the target attributes.
In addition, assume that the pending data provider DP ReqA_w Current contribution value Con w Cloud server alsoThe contribution value of the data provider to be processed is adjustedI.e. the contribution value of the data provider to be processed is increased by the corresponding information gain.
S608, the data provider DP to be processed ReqA_w After receiving the re-encrypted message of the update ID block set, classifying the tree hierarchy level w Self-increasing, wherein other data parties update local update ID block sets OldBlock as update ID block sets NewBlock corresponding to the data provider to be processed w And returns to step S604.
It will be appreciated that steps S604 to S608 above are processes that iterate continuously to achieve data generalization.
S609, data side provider DP Rand_x And according to the attribute values of the ID blocks in the current original ID block set OldBlock of the classification tree, the ID blocks are subjected to noise adding processing and then sent to the cloud server in a private key encryption mode, so that the cloud server re-encrypts the encrypted ID blocks and then sends the encrypted ID blocks to a data requiring party.
It should be noted that fig. 6 is only an example of an implementation manner in which the data provider generalizes the target attribute under the target attribute, and the data generalization manner may also be possible in practical applications, which is not limited.
It may be appreciated that in the above embodiments of the present application, after the cloud server returns the re-encrypted target data to the data demander, the data demander may further train the classification model based on the target data corresponding to each target attribute and the data under the common classification attribute, and determine the accuracy of the classification model.
For example, for each record identifier in the data table, taking the value of the public classification attribute as the labeling attribute of the data under the target attribute corresponding to the record identifier, and training the classification model. The classification accuracy of the classification model is then tested using the test dataset.
Accordingly, the data demander returns the classification accuracy determined by the data demander. Based on the above, the cloud server can adjust the trust of the data provider corresponding to each target attribute based on the obtained classification accuracy.
For example, for any one target attribute ReqA x Corresponding data provider DP Rand_x The data provider DP may be adjusted in the following manner Rand_x Corresponding trust TS Rand_x
Wherein Accu ReqA Classification accuracy returned for the data demander.
As an alternative, the cloud server may also calculate a contribution allocation Share for each data provider and increase the contribution of each data provider by the Share Rand_x . The share calculation formula is as follows:
wherein,
the foregoing is one possible implementation manner of updating the trust and contribution value of each data provider, and the application may update the trust and contribution value of each data provider in other manners, which is not limited.
Corresponding to the data sharing method, the application also provides a data sharing device. Fig. 7 is a schematic diagram of a composition structure of a data sharing device according to an embodiment of the present application, where the device is applied to a cloud server in a data sharing system, and the data sharing system includes the cloud server and a plurality of data holders. The device comprises:
a request obtaining unit 701, configured to obtain a data request sent by a data demander, where the data request indicates at least one target attribute to which data requested by the data demander belongs, and the data request is a request encrypted by using a private key of the data demander, where the data demander belongs to the plurality of data holders;
A request re-encrypting unit 702, configured to determine, for each data holder, a first re-encrypting key set by the data consumer for the data holder, re-encrypt the data request with the first re-encrypting key, and send the re-encrypted data request to the data holder, so that the data holder decrypts the re-encrypted data request based on its private key;
a table obtaining unit 703, configured to obtain a holding condition table fed back by each data holder, where the holding condition table fed back by the data holder includes: the data holding condition of the data holding party for each target attribute;
a list determining unit 704, configured to determine, based on a holding condition table fed back by each data holder, a holder list of each target attribute, where the holder list of each target attribute includes: each candidate data holder holding data under the target attribute among the plurality of data holders;
a provider determining unit 705 configured to determine, for each target attribute, a data provider for providing the data under the target attribute to the data demander from among candidate data holders included in a holder list of the target attribute;
A data request unit 706, configured to request, for each target attribute, target data corresponding to the target attribute from a data provider corresponding to the target attribute;
and the data re-encrypting unit 707 is configured to re-encrypt the target data based on a second re-encryption key set by the data provider for the data demander after obtaining the target data corresponding to the target attribute returned by the data provider, and return the re-encrypted target data to the data demander.
In one possible implementation manner, the data holder stores a data table, where the data table includes: a public classification attribute and a plurality of private attributes, wherein the data sheet comprises a plurality of records, each record corresponds to a record identifier, and the target attribute belongs to the plurality of private attributes;
the cloud server stores a plurality of record identifiers corresponding to a plurality of records in the data table;
the provider determination unit includes:
an identifier extracting unit, configured to extract at least one target record identifier from the plurality of record identifiers;
a pseudo attribute generating unit, configured to generate a pseudo attribute value pair sequence for the target attribute and the common classification attribute, where the pseudo attribute value pair sequence includes: at least one pair of pseudo attribute values, each pseudo attribute value pair comprising: a pseudo attribute value generated for the target attribute and a pseudo attribute value generated for the common classification attribute;
A sequence transmitting unit, configured to transmit the at least one target record identifier and the pseudo attribute value pair sequence to each candidate data holder in the holder list of the target attribute;
a gain obtaining unit, configured to obtain an information gain corresponding to the target attribute returned by the candidate data holder, where the information gain is an information gain of the target attribute determined by the candidate data holder based on at least one real attribute value pair corresponding to the at least one target record identifier and a false attribute value pair converted from the at least one false attribute value pair, where the real attribute value pair includes an attribute value of the target attribute and an attribute value of the common classification attribute in a record corresponding to the target record identifier in the candidate data holder;
a candidate determining unit, configured to determine at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by each candidate data holder;
and the provider selection unit is used for determining a data provider for providing the data under the target attribute to the data demander from the at least one candidate data provider.
In yet another possible implementation manner, the candidate determining unit includes:
a pending party determining subunit, configured to determine, based on distribution characteristics of information gains returned by each candidate data holder, at least one pending data holder whose information gain is in a distribution interval that satisfies a setting condition, where the number of information gains in the distribution interval that satisfies the setting condition is greater than the number of information gains in other distribution intervals;
a number increasing subunit for increasing the number of repetitions by one;
a repeated triggering subunit, configured to return to execute the operation of the identifier extraction unit if the repetition number does not reach the set number;
and the candidate party determining subunit is used for determining a preset number of candidate data providers with higher occurrence frequency from at least one pending data holder obtained each time if the repetition number reaches the set number.
In an alternative, the method further comprises:
the trust degree adjusting unit is used for increasing the trust degree of the candidate data provider by a set value after the candidate party determining unit determines at least one candidate data provider, and reducing the trust degree of each data holder which does not belong to the candidate data provider in the holder list of the target attribute by the set value;
The provider selection unit is specifically configured to select, from the at least one candidate data provider, one candidate data provider with a trust degree greater than a set threshold as a data provider for providing the data under the target attribute to the data demander.
In yet another possible implementation, the pending party determination subunit includes:
an endpoint determining subunit, configured to determine a maximum information gain and a minimum information gain among information gains returned by each candidate data holder;
a total interval construction subunit, configured to construct a gain distribution total interval composed of the minimum information gain and the maximum information gain, and determine the gain distribution total interval as a target distribution interval to be processed;
the dividing point determining subunit is used for determining a first middle dividing point and a second middle dividing point which are used for trisecting the target distribution interval, and the information gain value corresponding to the first middle dividing point is smaller than the information gain value corresponding to the second middle dividing point;
a data statistics subunit, configured to separately count the number of information gains included in a first gain distribution interval and a second gain distribution interval, where the first gain distribution interval is a distribution interval formed by the minimum information gain and the second intermediate partition point, and the second gain distribution interval is a distribution interval formed by the first intermediate partition point and the maximum information gain;
A candidate interval determining subunit configured to determine, from the first gain distribution interval and the second gain distribution interval, a candidate gain distribution interval in which the number of included information gains is large;
a division triggering subunit, configured to determine the candidate gain distribution interval as a target distribution interval if the number of information gains included in the candidate gain distribution interval is not less than a set threshold or an interval range of the candidate gain distribution interval is not less than a set range interval, and return to performing the operation of the division determining subunit;
and the pending party confirmation subunit is used for determining at least one candidate data holder with information gain in the candidate gain distribution interval as at least one pending data holder if the quantity of the information gain contained in the candidate gain distribution interval is smaller than a set threshold value or the interval range of the candidate gain distribution interval is smaller than a set range interval.
In yet another possible implementation manner, the apparatus may further include:
the accuracy obtaining unit is used for obtaining the classification accuracy returned by the data demand party after the data re-encryption unit returns the re-encrypted target data to the data demand party, wherein the classification accuracy is the classification accuracy of a classification model trained by the data demand party based on the target data corresponding to each target attribute and the data under the common classification attribute;
And the trust level readjusting unit is used for adjusting the trust level of the data provider corresponding to each target attribute by combining the classification accuracy.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. Meanwhile, the features described in the embodiments of the present specification may be replaced with or combined with each other to enable those skilled in the art to make or use the present application. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims (10)

1. A data sharing method, applied to a cloud server in a data sharing system, the data sharing system including the cloud server and a plurality of data holders, the method comprising:
obtaining a data request sent by a data requiring party, wherein the data request indicates at least one target attribute to which data requested by the data requiring party belongs, and the data request is a request encrypted by a private key of the data requiring party, and the data requiring party belongs to the plurality of data holding parties;
Determining a first re-encryption key set by the data demand party for each data holding party aiming at the data holding party, re-encrypting the data request by using the first re-encryption key, and sending the re-encrypted data request to the data holding party so that the data holding party decrypts the re-encrypted data request based on a private key of the data holding party;
obtaining a holding condition table fed back by each data holder, wherein the holding condition table fed back by the data holder comprises: the data holding condition of the data holding party for each target attribute;
determining a holder list of each target attribute based on a holder list fed back by each data holder, wherein the holder list of each target attribute comprises: each candidate data holder holding data under the target attribute among the plurality of data holders;
for each target attribute, determining a data provider for providing data under the target attribute to the data demander from candidate data holders contained in a holder list of the target attribute;
for each target attribute, requesting target data corresponding to the target attribute from a data provider corresponding to the target attribute;
And after obtaining the target data corresponding to the target attribute returned by the data provider, re-encrypting the target data based on a second re-encryption key set by the data provider for the data demand side, and returning the re-encrypted target data to the data demand side.
2. The method of claim 1, wherein the data holder stores a data table comprising: a public classification attribute and a plurality of private attributes, wherein the data sheet comprises a plurality of records, each record corresponds to a record identifier, and the target attribute belongs to the plurality of private attributes;
the cloud server stores a plurality of record identifiers corresponding to a plurality of records in the data table;
the determining a data provider for providing the data under the target attribute to the data demander from the candidate data holders included in the holder list of the target attribute includes:
extracting at least one target record identifier from the plurality of record identifiers;
generating a sequence of pseudo attribute value pairs for the target attribute and the common classification attribute, the sequence of pseudo attribute value pairs comprising: at least one pair of pseudo attribute values, each pseudo attribute value pair comprising: a pseudo attribute value generated for the target attribute and a pseudo attribute value generated for the common classification attribute;
Transmitting the at least one target record identifier and the pseudo attribute value pair sequence to each candidate data holder in a holder list of the target attribute;
obtaining information gain corresponding to the target attribute returned by the candidate data holder, wherein the information gain is determined by the candidate data holder based on at least one real attribute value pair corresponding to the at least one target record identifier and a false attribute value pair converted by the at least one false attribute value pair, and the real attribute value pair comprises an attribute value of the target attribute and an attribute value of the public classification attribute in a record corresponding to the target record identifier in the candidate data holder;
determining at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by the candidate data holders;
from the at least one candidate data provider, a data provider for providing the data under the target attribute to the data demander is determined.
3. The method of claim 2, wherein the determining at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by the candidate data holders comprises:
Determining at least one pending data holder with information gain in a distribution interval meeting a set condition based on the distribution characteristics of the information gain returned by each candidate data holder, wherein the number of the information gain in the distribution interval meeting the set condition is larger than the number of the information gain in other distribution intervals;
adding one to the repetition number;
if the repetition number does not reach the set number, returning to execute the operation of extracting at least one target record identifier from the plurality of record identifiers;
and if the repetition times reach the set times, determining a preset number of candidate data providers with higher occurrence frequency from at least one pending data holder obtained each time.
4. The method of claim 2, further comprising, after determining at least one candidate data provider from among the candidate data holders corresponding to the target attribute:
increasing the trust degree of the candidate data provider by a set value, and reducing the trust degree of each data holder which does not belong to the candidate data provider in the holder list of the target attribute by the set value;
the determining, from the at least one candidate data provider, a data provider for providing the data under the target attribute to the data demander, including:
And selecting one candidate data provider with the trust degree larger than a set threshold value from the at least one candidate data provider as a data provider for providing the data under the target attribute to the data demander.
5. A method according to claim 3, wherein said determining at least one pending data holder whose information gain is within a distribution interval satisfying a set condition based on the distribution characteristics of the information gain returned by each candidate data holder comprises:
determining the maximum information gain and the minimum information gain in the information gains returned by the candidate data holders;
constructing a gain distribution total interval formed by the minimum information gain and the maximum information gain, and determining the gain distribution total interval as a target distribution interval to be processed;
determining a first middle partition point and a second middle partition point which trisect the target distribution interval, wherein the information gain value corresponding to the first middle partition point is smaller than the information gain value corresponding to the second middle partition point;
respectively counting the quantity of information gains respectively contained in a first gain distribution interval and a second gain distribution interval, wherein the first gain distribution interval is a distribution interval formed by the minimum information gain and the second middle partition point, and the second gain distribution interval is a distribution interval formed by the first middle partition point and the maximum information gain;
Determining candidate gain distribution intervals with a large number of included information gains from the first gain distribution interval and the second gain distribution interval;
if the number of the information gains contained in the candidate gain distribution section is not less than a set threshold value or the section range of the candidate gain distribution section is not less than a set range section, determining the candidate gain distribution section as a target distribution section, and returning to execute the operation of determining a first middle partition point and a second middle partition point for trisecting the target distribution section;
and if the quantity of the information gain contained in the candidate gain distribution interval is smaller than a set threshold value or the interval range of the candidate gain distribution interval is smaller than a set range interval, determining at least one candidate data holder with the information gain in the candidate gain distribution interval as at least one pending data holder.
6. The method of claim 4, further comprising, after said returning the re-encrypted target data to the data demander:
obtaining the classification accuracy returned by the data demand side, wherein the classification accuracy is the classification accuracy of a classification model trained by the data demand side based on target data corresponding to each target attribute and data under the public classification attribute;
And adjusting the trust degree of the data provider corresponding to each target attribute by combining the classification accuracy.
7. A data sharing system, comprising:
a cloud server and a plurality of data holders;
the cloud server is used for receiving a request of data from a plurality of data holding parties, wherein the request of data is used for sending a request of data to the cloud server, the request of data indicates at least one target attribute to which the data requested by the data holding party belongs, and the request of data is encrypted by adopting a private key of the data holding party;
the cloud server is configured to perform the data sharing method according to any one of claims 1 to 6.
8. A data sharing apparatus, for use in a cloud server in a data sharing system, the data sharing system including the cloud server and a plurality of data holders, the apparatus comprising:
a request obtaining unit, configured to obtain a data request sent by a data demander, where the data request indicates at least one target attribute to which data requested by the data demander belongs, and the data request is a request encrypted by using a private key of the data demander, where the data demander belongs to the plurality of data holders;
A request re-encryption unit, configured to determine, for each data holder, a first re-encryption key set by the data consumer for the data holder, re-encrypt the data request with the first re-encryption key, and send the re-encrypted data request to the data holder, so that the data holder decrypts the re-encrypted data request based on its private key;
a table obtaining unit, configured to obtain a holding condition table fed back by each data holder, where the holding condition table fed back by the data holder includes: the data holding condition of the data holding party for each target attribute;
a list determining unit, configured to determine, based on a holding condition table fed back by each data holder, a holder list of each target attribute, where the holder list of each target attribute includes: each candidate data holder holding data under the target attribute among the plurality of data holders;
a provider determination unit configured to determine, for each target attribute, a data provider for providing the data under the target attribute to the data demander from among candidate data holders included in a holder list of the target attribute;
A data request unit, configured to request, for each target attribute, target data corresponding to the target attribute from a data provider corresponding to the target attribute;
and the data re-encryption unit is used for re-encrypting the target data based on a second re-encryption key set by the data provider for the data demand party after obtaining the target data corresponding to the target attribute returned by the data provider, and returning the re-encrypted target data to the data demand party.
9. The apparatus of claim 8, wherein the data holder stores a data table comprising: a public classification attribute and a plurality of private attributes, wherein the data sheet comprises a plurality of records, each record corresponds to a record identifier, and the target attribute belongs to the plurality of private attributes;
the cloud server stores a plurality of record identifiers corresponding to a plurality of records in the data table;
the provider determination unit includes:
an identifier extracting unit, configured to extract at least one target record identifier from the plurality of record identifiers;
a pseudo attribute generating unit, configured to generate a pseudo attribute value pair sequence for the target attribute and the common classification attribute, where the pseudo attribute value pair sequence includes: at least one pair of pseudo attribute values, each pseudo attribute value pair comprising: a pseudo attribute value generated for the target attribute and a pseudo attribute value generated for the common classification attribute;
A sequence transmitting unit, configured to transmit the at least one target record identifier and the pseudo attribute value pair sequence to each candidate data holder in the holder list of the target attribute;
a gain obtaining unit, configured to obtain an information gain corresponding to the target attribute returned by the candidate data holder, where the information gain is an information gain of the target attribute determined by the candidate data holder based on at least one real attribute value pair corresponding to the at least one target record identifier and a false attribute value pair converted from the at least one false attribute value pair, where the real attribute value pair includes an attribute value of the target attribute and an attribute value of the common classification attribute in a record corresponding to the target record identifier in the candidate data holder;
a candidate determining unit, configured to determine at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by each candidate data holder;
and the provider selection unit is used for determining a data provider for providing the data under the target attribute to the data demander from the at least one candidate data provider.
10. The apparatus according to claim 9, wherein the candidate party determining unit includes:
a pending party determining subunit, configured to determine, based on distribution characteristics of information gains returned by each candidate data holder, at least one pending data holder whose information gain is in a distribution interval that satisfies a setting condition, where the number of information gains in the distribution interval that satisfies the setting condition is greater than the number of information gains in other distribution intervals;
a number increasing subunit for increasing the number of repetitions by one;
a repeated triggering subunit, configured to return to execute the operation of the identifier extraction unit if the repetition number does not reach the set number;
and the candidate party determining subunit is used for determining a preset number of candidate data providers with higher occurrence frequency from at least one pending data holder obtained each time if the repetition number reaches the set number.
CN202210036998.5A 2022-01-13 2022-01-13 Data sharing method, device and system Active CN114386072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210036998.5A CN114386072B (en) 2022-01-13 2022-01-13 Data sharing method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210036998.5A CN114386072B (en) 2022-01-13 2022-01-13 Data sharing method, device and system

Publications (2)

Publication Number Publication Date
CN114386072A CN114386072A (en) 2022-04-22
CN114386072B true CN114386072B (en) 2024-04-02

Family

ID=81202676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210036998.5A Active CN114386072B (en) 2022-01-13 2022-01-13 Data sharing method, device and system

Country Status (1)

Country Link
CN (1) CN114386072B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454434B (en) * 2023-12-22 2024-02-23 北京天润基业科技发展股份有限公司 Database attribute statistics method and system based on secret sharing and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533650A (en) * 2016-11-17 2017-03-22 浙江工商大学 Cloud-oriented interactive privacy protection method and system
KR20180101870A (en) * 2017-03-06 2018-09-14 고려대학교 산학협력단 Method and system for data sharing using attribute-based encryption in cloud computing
KR20190063193A (en) * 2017-11-29 2019-06-07 고려대학교 산학협력단 METHOD AND SYSTEM FOR DATA SHARING FOR INTERNET OF THINGS(IoT) MANAGEMENT IN CLOUD COMPUTING
CN110636500A (en) * 2019-08-27 2019-12-31 西安电子科技大学 Access control system and method supporting cross-domain data sharing and wireless communication system
JP6803598B1 (en) * 2020-08-04 2020-12-23 Eaglys株式会社 Data sharing systems, data sharing methods, and data sharing programs
CN112364376A (en) * 2020-11-11 2021-02-12 贵州大学 Attribute agent re-encryption medical data sharing method
CN113901512A (en) * 2021-09-27 2022-01-07 北京邮电大学 Data sharing method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533650A (en) * 2016-11-17 2017-03-22 浙江工商大学 Cloud-oriented interactive privacy protection method and system
KR20180101870A (en) * 2017-03-06 2018-09-14 고려대학교 산학협력단 Method and system for data sharing using attribute-based encryption in cloud computing
KR20190063193A (en) * 2017-11-29 2019-06-07 고려대학교 산학협력단 METHOD AND SYSTEM FOR DATA SHARING FOR INTERNET OF THINGS(IoT) MANAGEMENT IN CLOUD COMPUTING
CN110636500A (en) * 2019-08-27 2019-12-31 西安电子科技大学 Access control system and method supporting cross-domain data sharing and wireless communication system
JP6803598B1 (en) * 2020-08-04 2020-12-23 Eaglys株式会社 Data sharing systems, data sharing methods, and data sharing programs
CN112364376A (en) * 2020-11-11 2021-02-12 贵州大学 Attribute agent re-encryption medical data sharing method
CN113901512A (en) * 2021-09-27 2022-01-07 北京邮电大学 Data sharing method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于身份代理重加密的云数据共享方案;李明富;陈立伟;;湘潭大学自然科学学报;20170915(03);全文 *
云计算下高效灵活的属性基加密方案;刘会梦;张光华;庞少博;石晓朦;;信息安全研究;20171105(11);全文 *

Also Published As

Publication number Publication date
CN114386072A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
US11212268B2 (en) Method and system for identity and access management for blockchain interoperability
US10204339B2 (en) Method and system for blockchain-based combined identity, ownership, integrity and custody management
Yu et al. A blockchain-based shamir’s threshold cryptography scheme for data protection in industrial internet of things settings
CN110472428B (en) Medical data sharing method and system based on block chain
US11431484B2 (en) Blockchain transaction privacy enhancement through broadcast encryption
CN110033258B (en) Service data encryption method and device based on block chain
US8880875B1 (en) System, apparatus and method for decentralizing attribute-based encryption information
JP6601624B2 (en) Content distribution system, content distribution method, content generation apparatus, and content generation program
CN112434313A (en) Data sharing method, system, electronic device and storage medium
US20060085454A1 (en) Systems and methods to relate multiple unit level datasets without retention of unit identifiable information
CN113515760B (en) Horizontal federal learning method, apparatus, computer device, and storage medium
Jiang et al. P 2 AE: Preserving Privacy, Accuracy, and Efficiency in Location-Dependent Mobile Crowdsensing
CN110059503A (en) The retrospective leakage-preventing method of social information
CN111800252A (en) Information auditing method and device based on block chain and computer equipment
KR102465467B1 (en) The decentralized user data storage and sharing system based on DID
US20130185777A1 (en) Methods And Apparatus For Reliable And Privacy Protecting Identification Of Parties&#39; Mutual Friends And Common Interests
CN113905047A (en) Space crowdsourcing task allocation privacy protection method and system
CN112231760A (en) Privacy-protecting distributed longitudinal K-means clustering
CN114386072B (en) Data sharing method, device and system
DE112020000117T5 (en) PROCEDURE FOR CREDIT AGGREGATION ACROSS INTERACTIONS
Mirval et al. Practical fully-decentralized secure aggregation for personal data management systems
JP2013110628A (en) Key exchange system, key exchange device, key generation apparatus, key exchange method and key exchange program
Lu et al. A Blockchain and CP-ABE Based Access Control Scheme with Fine-Grained Revocation of Attributes in Cloud Health
CN106961386A (en) A kind of location privacy protection method in the service of registering
CN110532786A (en) Using the block chain blacklist sharing method of Hash desensitization process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant