CN114386072B

CN114386072B - Data sharing method, device and system

Info

Publication number: CN114386072B
Application number: CN202210036998.5A
Authority: CN
Inventors: 张信明; 吴青龙
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2024-04-02
Anticipated expiration: 2042-01-13
Also published as: CN114386072A

Abstract

This application provides a data sharing method, device and system. The method includes: the cloud server obtains a data request sent by a data demander among multiple data holders, and the data request indicates at least one target attribute; for each data The holder uses the corresponding first-level encryption key to re-encrypt the data request, and sends the re-encrypted data request to the data holder; based on the holding status table fed back by each data holder, determine the holding status of the data request. There are candidate data holders of data under the target attribute; for each target attribute, determine the data provider from the candidate data holders of the target attribute, and request the target data corresponding to the target attribute from the data provider; for the data After the target data returned by the provider is re-encrypted, the re-encrypted target data is returned to the data demander. The solution of this application can improve the privacy and security of data during the data sharing process.

Description

Data sharing method, device and system

Technical Field

The present disclosure relates to the field of data integration technologies, and in particular, to a data sharing method, device, and system.

Background

Data integration is an important ring in the data fusion process, and specific multi-source data needs to be integrated together to support target mining, analysis and prediction tasks, so that the method has wide application in the fields of traffic, medical treatment, business, social activities and the like in smart cities. As a typical scenario of data integration, data sharing requires multiple parties as a data provider and a demander, respectively, so that the demander can obtain enough public data to achieve the purpose of mining effective information from big data.

In the task of data sharing, it is common practice to introduce a cloud service party, through which the data interaction process between the demand party and the plurality of data providers is coordinated. However, cloud service parties are not necessarily honest, and data privacy and security issues remain.

Disclosure of Invention

In view of this, the present application provides a method, an apparatus and a system for sharing data, so as to improve the privacy and security of data in the data sharing process.

In order to achieve the above object, the present application provides a data sharing method, which is applied to a cloud server in a data sharing system, where the data sharing system includes the cloud server and a plurality of data holders, and the method includes:

Obtaining a data request sent by a data requiring party, wherein the data request indicates at least one target attribute to which data requested by the data requiring party belongs, and the data request is a request encrypted by a private key of the data requiring party, and the data requiring party belongs to the plurality of data holding parties;

determining a first re-encryption key set by the data demand party for each data holding party aiming at the data holding party, re-encrypting the data request by using the first re-encryption key, and sending the re-encrypted data request to the data holding party so that the data holding party decrypts the re-encrypted data request based on a private key of the data holding party;

obtaining a holding condition table fed back by each data holder, wherein the holding condition table fed back by the data holder comprises: the data holding condition of the data holding party for each target attribute;

determining a holder list of each target attribute based on a holder list fed back by each data holder, wherein the holder list of each target attribute comprises: each candidate data holder holding data under the target attribute among the plurality of data holders;

For each target attribute, determining a data provider for providing data under the target attribute to the data demander from candidate data holders contained in a holder list of the target attribute;

for each target attribute, requesting target data corresponding to the target attribute from a data provider corresponding to the target attribute;

and after obtaining the target data corresponding to the target attribute returned by the data provider, re-encrypting the target data based on a second re-encryption key set by the data provider for the data demand side, and returning the re-encrypted target data to the data demand side.

In one possible implementation manner, the data holder stores a data table, where the data table includes: a public classification attribute and a plurality of private attributes, wherein the data sheet comprises a plurality of records, each record corresponds to a record identifier, and the target attribute belongs to the plurality of private attributes;

the cloud server stores a plurality of record identifiers corresponding to a plurality of records in the data table;

the determining a data provider for providing the data under the target attribute to the data demander from the candidate data holders included in the holder list of the target attribute includes:

Extracting at least one target record identifier from the plurality of record identifiers;

generating a sequence of pseudo attribute value pairs for the target attribute and the common classification attribute, the sequence of pseudo attribute value pairs comprising: at least one pair of pseudo attribute values, each pseudo attribute value pair comprising: a pseudo attribute value generated for the target attribute and a pseudo attribute value generated for the common classification attribute;

transmitting the at least one target record identifier and the pseudo attribute value pair sequence to each candidate data holder in a holder list of the target attribute;

obtaining information gain corresponding to the target attribute returned by the candidate data holder, wherein the information gain is determined by the candidate data holder based on at least one real attribute value pair corresponding to the at least one target record identifier and a false attribute value pair converted by the at least one false attribute value pair, and the real attribute value pair comprises an attribute value of the target attribute and an attribute value of the public classification attribute in a record corresponding to the target record identifier in the candidate data holder;

determining at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by the candidate data holders;

From the at least one candidate data provider, a data provider for providing the data under the target attribute to the data demander is determined.

In yet another aspect, the present application further provides a data sharing system, including:

a cloud server and a plurality of data holders;

the cloud server is used for receiving a request of data from a plurality of data holding parties, wherein the request of data is used for sending a request of data to the cloud server, the request of data indicates at least one target attribute to which the data requested by the data holding party belongs, and the request of data is encrypted by adopting a private key of the data holding party;

the cloud server is configured to execute the data sharing method described in any one embodiment of the present application.

In still another aspect, the present application further provides a data sharing device, which is applied to a cloud server in a data sharing system, where the data sharing system includes the cloud server and a plurality of data holders, and the device includes:

a request obtaining unit, configured to obtain a data request sent by a data demander, where the data request indicates at least one target attribute to which data requested by the data demander belongs, and the data request is a request encrypted by using a private key of the data demander, where the data demander belongs to the plurality of data holders;

A request re-encryption unit, configured to determine, for each data holder, a first re-encryption key set by the data consumer for the data holder, re-encrypt the data request with the first re-encryption key, and send the re-encrypted data request to the data holder, so that the data holder decrypts the re-encrypted data request based on its private key;

a table obtaining unit, configured to obtain a holding condition table fed back by each data holder, where the holding condition table fed back by the data holder includes: the data holding condition of the data holding party for each target attribute;

a list determining unit, configured to determine, based on a holding condition table fed back by each data holder, a holder list of each target attribute, where the holder list of each target attribute includes: each candidate data holder holding data under the target attribute among the plurality of data holders;

a provider determination unit configured to determine, for each target attribute, a data provider for providing the data under the target attribute to the data demander from among candidate data holders included in a holder list of the target attribute;

A data request unit, configured to request, for each target attribute, target data corresponding to the target attribute from a data provider corresponding to the target attribute;

and the data re-encryption unit is used for re-encrypting the target data based on a second re-encryption key set by the data provider for the data demand party after obtaining the target data corresponding to the target attribute returned by the data provider, and returning the re-encrypted target data to the data demand party.

As can be seen from the above, in the embodiment of the present application, the data request sent by the data demand side to the cloud server is a data request encrypted by using its private key, and the cloud server re-encrypts the data request based on the proxy re-encryption technology and forwards the re-encrypted data request to each data holder, so that the cloud server cannot obtain the target attribute to which the data requested by the data demand side belongs; in addition, the holding condition table returned by the data holding direction cloud server only contains the holding condition of the data holding party for each target attribute, so that the cloud server cannot acquire the information of the target attribute, and the privacy and the safety of the data requested in the data sharing process are improved.

In addition, after the cloud server requests the data with the target attribute from the determined data provider, the data provider also encrypts the data with the target attribute by adopting the private key, and the cloud server only needs to re-encrypt the data with the target attribute and then forward the re-encrypted data to the data demander without cracking the attribute of the target data, so that the cloud server cannot know the data content requested to be shared by the data demander, and the privacy and the safety of the data needing to be shared are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic diagram of a data sharing method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of constructing a data sharing system according to an embodiment of the present application;

fig. 3 is a schematic flow interaction diagram of a data sharing method according to an embodiment of the present application;

Fig. 4 is a schematic flow chart of determining candidate data providers in the data sharing method according to the embodiment of the present application;

FIG. 5 is a schematic flow chart of at least one pending data holder for determining a target attribute according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of generalizing data under a target attribute by a data provider in an embodiment of the present application;

fig. 7 is a schematic diagram illustrating a composition structure of a data sharing device according to an embodiment of the present application.

Detailed Description

The scheme of the embodiment of the application is suitable for a data sharing scene among a plurality of data parties, takes the cloud server as an agent party, and realizes data sharing based on an agent re-encryption technology so as to ensure the privacy in the data sharing process and improve the data security in the data sharing process.

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without undue burden, are within the scope of the present application.

The data sharing system comprises the cloud server and a plurality of data holders, wherein any one data holder can serve as a data demand side, and the data demand side can request data from other data holders through the cloud server, so that the data demand side can obtain data which does not exist in the data sharing system, and data sharing is achieved. And the cloud server may determine a data provider for providing data to the data demander from among a plurality of data holders, in which case the data holder may provide corresponding data to the data demander as the data provider.

The data sharing method of the present application is described below with reference to flowcharts.

Fig. 1 shows a flow chart of a data sharing method provided in an embodiment of the present application, where the method of the embodiment may be applied to a cloud server in a data sharing system. The method of the present embodiment may include the steps of:

s101, obtaining a data request sent by a data demand party.

The data demander belongs to any one of a plurality of data holders.

Wherein the data request indicates that there is at least one target attribute to which the data requested by the data requestor pertains. If the data under the attribute a and the attribute B does not exist in the data demander, but it is desired to obtain the data under the attribute a and the attribute B from other data holders, the data request sent by the data demander may indicate the attribute a and the attribute B, so that the data request indicates the requirement of requesting the data under the attribute a and the attribute B.

Because the data sharing is realized based on the proxy re-encryption technology, the data request is a request encrypted by adopting the private key of the data requiring party, namely the data requiring party encrypts the data request in a commonly-called two-level encryption mode.

S102, determining a first re-encryption key set by the data demand party for each data holder, re-encrypting the data request by using the first re-encryption key, and sending the re-encrypted data request to the data holder.

In this application, to hide the identity of the data demander, the cloud server takes all the datamation holders as the receiving sides of the data request.

Wherein, for each data holder, since the re-encryption key set by the data demand party for the data holder is stored in the cloud server, the re-encryption key set by the data demand party for the data holder is referred to as a first re-encryption key for convenience of distinction. Accordingly, the cloud server may re-encrypt the data request based on the first re-encryption key corresponding to the data holder, and the obtained re-encrypted data request may decrypt with the private key of the data holder.

Accordingly, after obtaining the re-encrypted data request, the data holder may decrypt the re-encrypted data request based on the private key of the data holder.

Therefore, the cloud server does not need to decrypt the data request and cannot acquire the attribute corresponding to the data requested to be shared by the data demander, so that the data security in the data sharing process can be improved, and the data privacy is also ensured.

S103, obtaining a holding condition table fed back by each data holder.

The holding condition table fed back by the data holding party comprises: data holding conditions of the data holder for each target attribute.

Specifically, the data holding case includes both holding data and non-holding data. And for each target attribute, the data holding condition of the data holder for the target attribute is holding data or not holding data. If the data holding condition of the data holding party on the target attribute is holding data, the data holding party is indicated to have the data under the target attribute; otherwise, if the data holding condition of the data holding party for a certain target attribute is not holding data, the data holding party is not provided with the data under the target attribute.

It will be appreciated that the data holder generates a holding list of data holding situations containing at least one target attribute only and not information of the at least one target attribute.

For example, the holding case table generated by the data holder may be in the form of a binary sequence having the same number of bits as the total number of the at least one target attribute. And each bit in the binary sequence corresponds to a target attribute, and the value on each bit is used for representing the data holding condition of the data holder for the target attribute on the bit. The description will be made later in connection with the form of binary sequences, and will not be repeated here.

S104, determining a holder list of each target attribute based on the holder list fed back by each data holder.

Wherein the holder list of each target attribute includes: each candidate data holder holding data under the target attribute among a plurality of data holders.

For example, assuming that the data holder a feeds back the holding table indicating that the data holder holds data under the second target attribute, and the data holder B feeds back the holding table indicating that the data holder holds the attribute under the second target attribute, the data holder a and the data holder B are included in the holder list of the second target attribute for the second target attribute of the plurality of target attributes.

S105, for each target attribute, a data provider for providing data under the target attribute to the data demander is determined from among the candidate data holders included in the holder list of the target attribute.

In the present application, there are many possibilities, without limitation, for determining a specific implementation of the data provider for providing the data under the target attribute to the data provider from among a plurality of candidate data holders. The implementation manner of determining the data provider corresponding to a certain target attribute will be described later in conjunction with the possible cases, which will not be described herein.

S106, for each target attribute, requesting target data corresponding to the target attribute from a data provider corresponding to the target attribute.

Only one data provider can be determined for each target attribute, and accordingly, the data provider only provides target data corresponding to the target attribute.

S107, after obtaining the target data corresponding to the target attribute returned by the data provider, re-encrypting the target data based on the second re-encryption key set by the data provider for the data demand, and returning the re-encrypted target data to the data demand.

It will be appreciated that for each target attribute corresponds to a data provider, the data provider returns target data for that target attribute to which the data provider corresponds. In the application, the target data may be data obtained by generalizing the data under the target attribute, so that the data provider may only provide part of the data content under the target attribute to the data demander, thereby improving the data privacy.

In the present application, the cloud server may store therein a re-encryption key set by each data holder for any one of the other data holders. Here, for convenience of distinction, the re-encryption key set by the data provider for the data demander is referred to as a second re-encryption key.

Correspondingly, after the cloud server re-encrypts the target data by using the second re-encryption key and sends the target data to the data requiring party, the data requiring party can decrypt the re-encrypted target data by using the private key of the cloud server, so that the required data content associated with the target attribute is obtained.

As can be seen from the above, the data request sent by the data demand side to the cloud server is a data request encrypted by using its private key, and the cloud server re-encrypts the data request based on the proxy re-encryption technology and forwards the data request to each data holder, so that the cloud server cannot obtain the target attribute to which the data requested by the data demand side belongs; in addition, the holding condition table returned by the data holding direction cloud server only contains the holding condition of the data holding party for each target attribute, but not the information of each target attribute, so that the cloud server cannot acquire the information of the target attribute, and the privacy and the safety of the data requested in the data sharing process are improved.

In order to facilitate an understanding of the aspects of the present application, a brief description of the process of constructing a data sharing system follows.

Referring to fig. 2, which is a schematic flow chart illustrating a data sharing system construction in an embodiment of the present application, a method of the present embodiment may include:

s201, based on an authentication mechanism that provides an electronic authentication service, identity authentication is performed on a cloud server and a plurality of data holders that participate in data sharing.

The electronic authentication service is commonly called CA (Certificate Authority ) authentication service, and is a service for providing authenticity and reliability verification for parties related to electronic signatures.

The authentication structure providing the electronic authentication service may be an authority responsible for issuing and managing digital certificates.

In the application, the identity authentication of the cloud server and the data holder in the CA authentication process can comprise related authentication such as identity authentication, and specific authentication content and process can be set according to requirements, so that the application is not limited.

S202, for each data holder, the data holder sends the re-encryption key between the data holder and each other data holder to the cloud server, respectively.

In this application, the cloud server, as a proxy in proxy re-encryption, may store a re-encryption key between each data holder and any one of the other data holders.

And S203, the data held by the data holders are respectively unified into a longitudinally aligned two-dimensional data table based on the privacy independent set merging algorithm.

Among them, the Privacy independent set merge algorithm is also called Privacy preserving set merge algorithm (Privacy-preserving set union). The data held by each of the plurality of data holders is unified into the data table with the same format by the algorithm, so that each data table held by the data holder comprises a public classification attribute and a plurality of private attributes.

In the two-dimensional data table, each column is an attribute, and a row is a record, and each record corresponds to a record identification ID. For example, the set of attributes that each data holder i holds in the data table may be represented as D _{DP_i} ＝(A _cls ,A _{i_1} ,A _{i_2} ,...,A _{i_d} ) Wherein A is _cls Representing common classification attributes, A _{i_1} To A _{i_d} Are all private attributes. d is the total number of the private attributes, the value of d is a natural number greater than 0, i is a natural number greater than 1 and less than or equal to T, and T is the total number of the data holders.

Wherein the common classification attribute is an attribute of which all data holders have an attribute value, i.e. each data holder holds a value belonging to the common classification attribute. Based on this, the common classification attribute is also an attribute that all data holders have and can see the attribute value.

In addition, the common classification attribute is also an attribute by which each data holder is the basis of data classification. For example, the record or the category to which the entity of the record belongs is determined according to whether the value under the common classification attribute in the record is larger than a set value.

Unlike public classification attributes, not every data holder holds a value under a private attribute. It is because the data holder may not be able to view the values of certain private attributes, e.g., because the values under certain private attributes are not visible to each data holder, it may be necessary to request the values under certain private attributes from other data holders to obtain the values under the corresponding private attributes.

For example, a public classification attribute in a two-dimensional data sheet may be a name, while a private attribute may include: gender, age, school, occupation, etc. Each row in the data table held by a certain data holder has a value of a name, but the data table of the data holder may only have values of two attributes, namely gender and age, but does not have values of attributes of academic and professional, and the like.

It is to be understood that step S203 is only one form of holding data by the data holder, and that in practical applications, the specific form of the data in the data table held by the data holder may be other, which is not limited.

S204, the cloud server initializes the trust degree of each data holder.

For example, the cloud server side will trust TS of each data holder i _i All initialized as: TS (transport stream) _i =5.0, and the confidence level of each data holder is within the range TS _i ∈[0.0,10.0]。

The trust of the data holder is determined as an optional operation, and the purpose of the trust is to improve the reliability of data sharing.

In addition, in the present application, a plurality of record identifiers corresponding to a plurality of records in a data table maintained in a data holder are stored in a cloud server.

It will be appreciated that the foregoing is an introduction to a data sharing system and its construction in order to facilitate an understanding of the data sharing method of the present application. It will be appreciated that the composition of the data sharing system and the form in which the data is held by the data holder may be varied, and that the above is merely illustrative of one possible form, as well as other forms, are equally applicable to the present application.

In order to facilitate understanding of the specific implementation of the data provider determining a certain target attribute in the solution of the present application, the following description is made with reference to a possible scenario from the point of interaction between the data demander and each data holder and the cloud server.

As shown in fig. 3, which is a schematic flow interaction diagram of a data sharing method provided in the embodiment of the present application, the method of the present embodiment may include:

s301, sending a data request to a cloud server by the data demand.

Wherein the data request indicates that there is at least one target attribute to which the data requested by the data requestor pertains. The data request indicates that the target attribute of the request belongs to the aforementioned plurality of private attributes.

For example, the data request may carry information of each target attribute required to request to share data, and the set of attributes required to be shared by the data sharing task is reqa= (ReqA ₁ ,ReqA ₂ ,...,ReqA _m ) Wherein, reqA _x Any one of the target attributes indicated in the data request, wherein x belongs to 1 to m, and m is the total number of the target attributes indicated in the data request.

The data demander is any one of a plurality of data holders of the data sharing system.

S302, for each data holder in the data sharing system, the cloud server re-encrypts the data request based on a first re-encryption key set by the data demand side for the data holder, and sends the re-encrypted data request to the data holder.

S303, the data holder decrypts the re-encrypted data request by using the private key of the data holder, and determines the information of the at least one target attribute based on the decrypted data request.

S304, the data holder sends a holding condition list aiming at the at least one target attribute to the cloud server according to the data holding condition of each private attribute.

The holding case table includes: the data holder holds the data for each target attribute.

For example, in one alternative, the holding case table generated by the data holder may be in the form of a binary sequence having the same number of bits as the total number of the at least one target attribute. And each bit in the binary sequence corresponds to a target attribute, and the value on each bit is used for representing the data holding condition of the data holder for the target attribute on the bit.

For example, the data holder i is based on its own D _{DP_i} ＝(A _{i_1} ,A _{i_} 2,...,A _{i_d} ) Is data holding case of (1), the requested attribute set reqa= (ReqA) ₁ ,ReqA ₂ ,...,ReqA _m ) The generated binary sequence B _{DP_i} ＝(b _{i_1} ,b _{i_2} ,...,b _{i_m} )。

Wherein b _{i_x} Representing the data holder i for the target attribute ReqA _x Is a data holding case of (a). b _{i_x} E {0,1}, if b _{i_x} Take a value of 0, indicating that the data is not held, i.e., the data holder i does not hold the target attribute ReqA _x The following data; b _{i_x} A value of 1 indicates that the data is held, i.e. the data holder i holds the target attribute ReqA _x The following data.

Because the binary sequence only contains the holding condition of the data holder on each target attribute in different positions, the cloud server can only obtain the total number of the target attributes and the holding condition of the data holder on the target attributes in each position according to the binary sequence, and can not obtain the specific attribute information requested by the data demander.

S305, the cloud server determines a holder list of each target attribute based on the holder table fed back by each data provider.

Wherein the holder list of the target attribute includes information of each data holder holding the data under the target attribute.

For example, for the xth target attribute (or each target attribute ReqA _x ) Its holder list Cand _x Can be expressed as follows:

Cand _x ＝(DP _{x_1} ,DP _{x_2} ,...,DP _{x_k} )

wherein DP _{x_t} The data holder t, which represents the data under the x-th target attribute, is any one natural number from 1 to k, and k is the total number of data holders holding the data under the x-th target attribute.

S306, for each target attribute, the cloud server extracts at least one target record identifier from a plurality of record identifiers, generates a pseudo attribute value pair sequence for the target attribute and the public classification attribute, and sends the at least one target record identifier and the pseudo attribute value pair sequence to each candidate data holder in a holder list of the target attribute.

For example, the cloud server may randomly extract a first set proportion of record identifiers from the plurality of record identifiers as the target record identifiers.

Wherein for each target attribute, the sequence of pseudo attribute value pairs comprises: at least one pair of pseudo attribute values, each pseudo attribute value pair comprising: the cloud server generates a pseudo attribute value for the target attribute and a pseudo attribute value for the common classification attribute.

The number of the pseudo attribute value pairs in the pseudo attribute value pair sequence may be set according to needs, for example, the number of the pseudo attribute value pairs may be a second set proportion of the total number of the target record marks, where the first set proportion and the second set proportion are different.

Wherein the cloud server generates pseudo attribute value pairs for the target attribute and the common classification attribute, not real attribute values, but only values generated to mask the real attribute under the target attribute.

S307, for each target attribute, the candidate data holder obtains at least one real attribute value pair corresponding to at least one target record identifier, converts the at least one pseudo attribute value pair into at least one false attribute value pair, and determines the information gain generated by the target attribute according to the at least one real attribute value pair and the at least one false attribute value pair.

For each target attribute, the real attribute value pair includes an attribute value of the target attribute and an attribute value of the common classification attribute in a record corresponding to the target record identifier in the candidate data holder. That is, the attribute value in the real attribute value pair is the attribute value of the target attribute and the common classification attribute of the real record in each record corresponding to each target record identifier in the candidate data holder.

The purpose of the conversion of the pair of pseudo attribute values is to convert the pseudo attribute value of the target attribute in the pair of pseudo attribute values into a pseudo attribute value within the range of values of the target attribute and to convert the pseudo attribute value of the common classification attribute into a pseudo attribute value within the range of values of the common classification attribute.

The range of the target attribute refers to the range of the possible value of the data under the target attribute. For example, the value range of the attribute of gender may include: both male and female. The value ranges of the public classification attributes are similar, and are not repeated.

For example, in combination with the value range of the target attribute, according to a set conversion relation or a set conversion mode, the pseudo attribute value of the target attribute in the pseudo attribute value pair can be determined to correspond to the target value in the value range of the target attribute, and the target value is determined to be the pseudo attribute value of the target attribute. The conversion of the pseudo attribute values of the common classification attribute is also similar and will not be described in detail.

The specific manner of determining the information gain brought by the target attribute based on the mixed data of the at least one real attribute value pair and the at least one false attribute value pair can be any method for calculating the information gain at present, which is not limited.

For example, the data holder i may calculate the target attribute ReqA by the following equation one _x Gain of information generated

Wherein H (A) _cls ) Entropy of information for the calculated common classification attribute, H<A _cls |ReqA _x >For attribute as target attribute ReqA _x Information entropy of the common classification attribute. The process of calculating the information entropy can adopt the existing information entropy calculation method without limitation.

In this application, the information gain generated by considering the real data under the target attribute may also expose the data information under the target attribute, so if these information gains are directly sent to the cloud server, there may also be a risk of data leakage. In the method, some false data is mixed in the process of calculating the information gain of the target attribute, so that the risk of exposing the data under the target attribute through the information gain can be reduced.

And S308, the candidate data holder sends the information benefit of the target attribute to the cloud server.

S309, for each target attribute, the cloud server determines at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by the candidate data holders.

Among them, for convenience of distinction, a candidate data holder selected from among the candidate data holders is referred to as a candidate data provider.

It can be understood that the information gain can reflect a part of the gain of the true attribute value of the target attribute, and in combination with the information gain returned by each candidate data holder, a small number of candidate data holders separated from a plurality of information gain distribution main bodies can be removed according to the principle that most of the data gain is true, and finally, candidate data providers corresponding to more information of the included true data can be screened out, so that reliable candidate data providers can be selected.

For ease of understanding, one implementation of determining candidate data providers in the present application will be described with reference to the implementation flow of fig. 4, where a possible case is taken as an example, and will not be described herein.

S310, for each target attribute, the cloud server determines, from at least one candidate data provider corresponding to the target attribute, a data provider for providing data under the target attribute to the data demander.

For example, the cloud server may randomly select one candidate data provider from at least one candidate data provider corresponding to the target attribute as the data provider of the target attribute.

In one alternative, the cloud server may also maintain a score of the trust level of the respective data holders. On this basis, for each target attribute, after each candidate data provider is determined, the trust level of the candidate data provider may be increased by a set value, and the trust level of each data holder not belonging to the candidate data provider in the holder list of the target attribute may be decreased by the set value.

Accordingly, one candidate data provider with the trust degree larger than the set threshold value can be selected from the at least one candidate data provider as the data provider for providing the data under the target attribute to the data demander. For example, a candidate data provider with a confidence level greater than a set threshold is randomly selected as the data provider.

S311, for each target attribute, the cloud server requests target data corresponding to the target attribute from a data provider corresponding to the target attribute.

S312, after obtaining the target data corresponding to the target attribute returned by the data provider, the cloud server re-encrypts the target data based on the second re-encryption key set by the data provider for the data demand, and returns the re-encrypted target data to the data demand.

The above steps S311 and S312 may be referred to the related description of the previous embodiments, and are not described herein.

In the application, in the process of selecting a candidate data provider, in order to reduce the risk of leakage of data of a target attribute caused by that a cloud server obtains some data of the target attribute from the information gain of the target attribute, in the application, for each target attribute requested by a data demander, a pseudo attribute value pair corresponding to the target attribute and a public classification attribute is generated by the cloud server. On the basis, the candidate data provider generates false attribute value pairs corresponding to the target attribute and the public classification attribute based on the false attribute value pairs, and generates information gain of the target attribute based on the virtual attribute value pairs and the real attribute value pairs, so that the information gain is not only the information gain of the real data of the target attribute, but also the possibility that the cloud server acquires the real data from the information gain is reduced, and the risk of data leakage is reduced.

To facilitate an understanding of the implementation of determining candidate data providers based on information gain in the present application, the following is described in connection with one possible implementation:

referring to fig. 4, a schematic diagram of an implementation flow of determining candidate data providers in the data sharing method provided in the embodiment of the present application is shown, where the method of the embodiment may be applied to a cloud server, and the flow is a flow of determining candidate data providers for a target attribute, and the flow includes:

s401, for a target attribute, at least one target record identifier is extracted from a plurality of record identifiers, a pseudo attribute value pair sequence is generated for the target attribute and the public classification attribute, and the at least one target record identifier and the pseudo attribute value pair sequence are sent to each candidate data holder in a holder list of the target attribute.

For example, assuming that the number of repetitions of the candidate holder to be determined (i.e., the number of repetitions of steps S401 to S405) is set to the number h, the first set ratio of extracting the target record identifiers may be 2/h, and correspondingly, 2n/h target record identifiers may be randomly extracted from the plurality of record identifiers, where n is the total number of the plurality of record identifiers.

Similarly, 0.2n/h pseudo attribute value pairs may be included in the generated sequence of pseudo attribute value pairs.

S402, obtaining information gain of the target attribute returned by each candidate data holder in the holder list of the target attribute.

The candidate data holder determines the information gain of the target attribute based on at least one real attribute value pair corresponding to the target record identifier and at least one false attribute value pair converted from the false attribute value pair.

The above two steps can be referred to the related description of the previous embodiments, and will not be repeated here.

S403, determining at least one pending data holder with information gain in a distribution section meeting a set condition based on the distribution characteristics of the information gain returned by each candidate data holder.

The number of the information gains in the distribution interval meeting the set condition is larger than the number of the information gains in other distribution intervals. The information gain in the distribution interval meeting the set condition is selected, namely the information gain in the distribution main body among the information gains returned by the candidate data holders, namely the partial information gain which can represent the real attribute value of the target attribute.

For example, in one possible implementation, as shown in fig. 5, there is shown a process for determining at least one pending data holder, the process including:

s51, determining the maximum information gain and the minimum information gain in the information gains returned by the candidate data holders.

S52, constructing a gain distribution total interval formed by the minimum information gain and the maximum information gain, and determining the gain distribution total interval as a target distribution interval to be processed.

S53, determining a first middle partition point and a second middle partition point which trisect the target distribution interval.

The information gain value corresponding to the first middle dividing point is smaller than the information gain value corresponding to the second middle dividing point.

S54, the number of information gains contained in the first gain distribution interval and the second gain interval is counted respectively.

The first gain distribution section is a distribution section formed by the minimum information gain and the second intermediate division point, and the second gain distribution section is a distribution section formed by the first intermediate division point and the maximum information gain.

For example, taking the total section of the gain distribution constituted by the maximum information gain and the minimum information gain as an example, it is assumed that the minimum information gain is ig _l And the maximum information gain is ig _r Then ig _l And ig _r The first middle partition point of the trisection of the formed interval is as follows:the second intermediate partition point is: />

Correspondingly, the first gain distribution interval isAnd the second gain distribution interval is:

s55, a candidate gain distribution section having a large number of information gains is determined from the first gain distribution section and the second gain distribution section.

In the present application, one section having a larger number of information gains is reserved in the first gain distribution section and the second gain distribution section, and the reserved section is referred to as a candidate gain distribution section for convenience of distinction.

S56, detecting whether the quantity of information gains contained in the candidate gain distribution interval is smaller than a set threshold value or the interval range of the candidate gain distribution interval is smaller than a set range interval, if so, determining that at least one candidate data holder with the information gain in the candidate gain distribution interval is at least one pending data holder; if not, the candidate gain distribution section is determined as the target distribution section, and the process returns to step S53.

For a target attribute ReqA _x Holder list Cand _x Through the repeated steps S53 to S55, the candidate data holders can be finally deleted to form a set of pending data providers

S404, adding one to the repetition number.

For example, the initial value of the repetition number may be set to 1, and then, the repetition number is increased by one every time the steps S401 to S403 are repeatedly performed more than once.

S405, detecting whether the repetition number reaches the set number, if so, executing step S406; if not, the process returns to step S401.

The number of times of setting may be set according to actual needs, which is not limited.

S406, if the repetition number reaches the set number, determining a preset number of candidate data providers with higher occurrence frequency from at least one pending data holder obtained each time.

The number of settings can be set as desired.

If, after repeating steps S401 to S403 above for a set number of times, at least one pending data holder can be obtained each time, the number of occurrences of each pending data holder can be counted, and then the pending data holder having a higher number of occurrences and a number of occurrences not lower than 2/h is selected as the candidate data provider.

It can be understood that in the case that the information gain includes the information gain of the false attribute value pair, the information gains returned by different candidate data holders can swing around the information gain corresponding to the true attribute value pair, the information gains returned by each candidate data holder are obtained in a repeated manner, and the pending data holders determined for the information gain are synthesized each time, so that the candidate data providers with more information gains including the true data in the provided information gain can be more effectively selected.

In the above embodiments of the present application, for each target attribute, the target data corresponding to the target attribute returned by the data provider may be generalized data obtained by generalizing the data under the target attribute. Wherein data generalization aggregates data by replacing relatively low-level values (e.g., values of attribute ages) with higher-level concepts (e.g., young, middle-aged, and elderly). In order to improve data security and enable a data demander to obtain data content required under a target attribute, the data provider in the application generalizes the data under the target attribute.

In the present application, a specific implementation manner of generalizing data under the target attribute by the data provider may not be limited.

In one possible implementation manner, after determining the data provider corresponding to the target attribute, the cloud server may send the classification tree and the differential privacy parameter uploaded by the data demander to the data provider. And the data provider may generalize the data under the target attribute based on the classification tree and the differential privacy parameters. The generalization process by combining the classification tree and the differential privacy parameter can be implemented in any currently common mode, which is not limited.

For ease of understanding, a schematic flow diagram of generalizing data under a target attribute by a data provider is shown in fig. 6, where the process of generalizing data under a target attribute by a data provider of the target attribute is described.

The process may include:

s601, the cloud server sends the classification tree and the differential privacy parameters to a data provider corresponding to the target attribute.

Wherein, classification tree top-down is from fuzzy to concrete for attribute value classification, and data generalization is performed according to classification tree top-down.

The cloud server may also, prior to this stepTo initialize any one of the data providers DP _{Rand_x} Contribution value Con of (2) _x ＝0.0

Data provider DP _{Rand_x} Representing any one target attribute ReqA _x A corresponding data provider.

S602, data provider DP _{Rand_x} All record identifications ID in the data table are initialized as one ID block, and the original ID block set is defined to be equal to the ID block.

That is, all IDs are initialized to one block [ IDAll]And defines the original ID block combination OldBlock= { [ IDAll](corresponding to the level of the classification tree) _x ＝0。

S603, data provider DP _{Rand_x} The ID blocks in the original ID block set OldBlock are classified according to the classification tree and the level of the current classification tree _x Downwards embodying a layer to obtain an update ID block set NewBlock _x 。

S604, data provider DP _{Rand_x} Combining the current update ID block set and the original ID block set, calculating the information gain of the sub-implementation

Specifically, if NewBlock _x Satisfying k-anonymityOtherwise, go (L)>

Wherein H is<A _cls |OldBlock>Information entropy for common classification attribute in the case of initial block. H<A _cls |NewBlock>In order to calculate the information entropy of the common classification attribute under the condition of the new ID block set, the current arbitrary information entropy calculation method can be adopted in the information entropy calculation process, and the method is not limited.

S605, data provider DP _{Rand_x} Gain information to be embodiedSending to cloud server, and collecting update ID block newBlock _x And the encrypted private key is sent to the cloud server.

S606, the cloud server collects all data providers DP _{Rand_x} Detecting whether all values in the returned total set with the information gain are zero, if so, indicating that the generalization iteration process of the data provider is finished, so that the data provider executes step S609; if not, step S607 is performed.

E.g. total set of information gainsx is 1 to m, m is the total number of target attributes indicated in the data request,/->Representing target attributes ReqA _x The corresponding data provider provides a materialized information gain.

S607, the cloud server side uses an exponential mechanism algorithm to calculate the total setRandom selection element->Determining the element->Corresponding target attribute ReqA _w Data provider DP to be processed _{ReqA_w} The pending data provider DP _{ReqA_w} The current corresponding update ID block set NewBlock _w The update ID block set newBlock corresponding to the data provider to be processed is collected _w And re-encrypting, and sending the re-encrypted update ID block set to the data providers corresponding to all the target attributes.

In addition, assume that the pending data provider DP _{ReqA_w} Current contribution value Con _w Cloud server alsoThe contribution value of the data provider to be processed is adjustedI.e. the contribution value of the data provider to be processed is increased by the corresponding information gain.

S608, the data provider DP to be processed _{ReqA_w} After receiving the re-encrypted message of the update ID block set, classifying the tree hierarchy level _w Self-increasing, wherein other data parties update local update ID block sets OldBlock as update ID block sets NewBlock corresponding to the data provider to be processed _w And returns to step S604.

It will be appreciated that steps S604 to S608 above are processes that iterate continuously to achieve data generalization.

S609, data side provider DP _{Rand_x} And according to the attribute values of the ID blocks in the current original ID block set OldBlock of the classification tree, the ID blocks are subjected to noise adding processing and then sent to the cloud server in a private key encryption mode, so that the cloud server re-encrypts the encrypted ID blocks and then sends the encrypted ID blocks to a data requiring party.

It should be noted that fig. 6 is only an example of an implementation manner in which the data provider generalizes the target attribute under the target attribute, and the data generalization manner may also be possible in practical applications, which is not limited.

It may be appreciated that in the above embodiments of the present application, after the cloud server returns the re-encrypted target data to the data demander, the data demander may further train the classification model based on the target data corresponding to each target attribute and the data under the common classification attribute, and determine the accuracy of the classification model.

For example, for each record identifier in the data table, taking the value of the public classification attribute as the labeling attribute of the data under the target attribute corresponding to the record identifier, and training the classification model. The classification accuracy of the classification model is then tested using the test dataset.

Accordingly, the data demander returns the classification accuracy determined by the data demander. Based on the above, the cloud server can adjust the trust of the data provider corresponding to each target attribute based on the obtained classification accuracy.

For example, for any one target attribute ReqA _x Corresponding data provider DP _{Rand_x} The data provider DP may be adjusted in the following manner _{Rand_x} Corresponding trust TS _{Rand_x} ：

Wherein Accu _ReqA Classification accuracy returned for the data demander.

As an alternative, the cloud server may also calculate a contribution allocation Share for each data provider and increase the contribution of each data provider by the Share _{Rand_x} . The share calculation formula is as follows:

wherein,

the foregoing is one possible implementation manner of updating the trust and contribution value of each data provider, and the application may update the trust and contribution value of each data provider in other manners, which is not limited.

Corresponding to the data sharing method, the application also provides a data sharing device. Fig. 7 is a schematic diagram of a composition structure of a data sharing device according to an embodiment of the present application, where the device is applied to a cloud server in a data sharing system, and the data sharing system includes the cloud server and a plurality of data holders. The device comprises:

a request obtaining unit 701, configured to obtain a data request sent by a data demander, where the data request indicates at least one target attribute to which data requested by the data demander belongs, and the data request is a request encrypted by using a private key of the data demander, where the data demander belongs to the plurality of data holders;

A request re-encrypting unit 702, configured to determine, for each data holder, a first re-encrypting key set by the data consumer for the data holder, re-encrypt the data request with the first re-encrypting key, and send the re-encrypted data request to the data holder, so that the data holder decrypts the re-encrypted data request based on its private key;

a table obtaining unit 703, configured to obtain a holding condition table fed back by each data holder, where the holding condition table fed back by the data holder includes: the data holding condition of the data holding party for each target attribute;

a list determining unit 704, configured to determine, based on a holding condition table fed back by each data holder, a holder list of each target attribute, where the holder list of each target attribute includes: each candidate data holder holding data under the target attribute among the plurality of data holders;

a provider determining unit 705 configured to determine, for each target attribute, a data provider for providing the data under the target attribute to the data demander from among candidate data holders included in a holder list of the target attribute;

A data request unit 706, configured to request, for each target attribute, target data corresponding to the target attribute from a data provider corresponding to the target attribute;

and the data re-encrypting unit 707 is configured to re-encrypt the target data based on a second re-encryption key set by the data provider for the data demander after obtaining the target data corresponding to the target attribute returned by the data provider, and return the re-encrypted target data to the data demander.

the provider determination unit includes:

an identifier extracting unit, configured to extract at least one target record identifier from the plurality of record identifiers;

a pseudo attribute generating unit, configured to generate a pseudo attribute value pair sequence for the target attribute and the common classification attribute, where the pseudo attribute value pair sequence includes: at least one pair of pseudo attribute values, each pseudo attribute value pair comprising: a pseudo attribute value generated for the target attribute and a pseudo attribute value generated for the common classification attribute;

A sequence transmitting unit, configured to transmit the at least one target record identifier and the pseudo attribute value pair sequence to each candidate data holder in the holder list of the target attribute;

a gain obtaining unit, configured to obtain an information gain corresponding to the target attribute returned by the candidate data holder, where the information gain is an information gain of the target attribute determined by the candidate data holder based on at least one real attribute value pair corresponding to the at least one target record identifier and a false attribute value pair converted from the at least one false attribute value pair, where the real attribute value pair includes an attribute value of the target attribute and an attribute value of the common classification attribute in a record corresponding to the target record identifier in the candidate data holder;

a candidate determining unit, configured to determine at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by each candidate data holder;

and the provider selection unit is used for determining a data provider for providing the data under the target attribute to the data demander from the at least one candidate data provider.

In yet another possible implementation manner, the candidate determining unit includes:

a pending party determining subunit, configured to determine, based on distribution characteristics of information gains returned by each candidate data holder, at least one pending data holder whose information gain is in a distribution interval that satisfies a setting condition, where the number of information gains in the distribution interval that satisfies the setting condition is greater than the number of information gains in other distribution intervals;

a number increasing subunit for increasing the number of repetitions by one;

a repeated triggering subunit, configured to return to execute the operation of the identifier extraction unit if the repetition number does not reach the set number;

and the candidate party determining subunit is used for determining a preset number of candidate data providers with higher occurrence frequency from at least one pending data holder obtained each time if the repetition number reaches the set number.

In an alternative, the method further comprises:

the trust degree adjusting unit is used for increasing the trust degree of the candidate data provider by a set value after the candidate party determining unit determines at least one candidate data provider, and reducing the trust degree of each data holder which does not belong to the candidate data provider in the holder list of the target attribute by the set value;

The provider selection unit is specifically configured to select, from the at least one candidate data provider, one candidate data provider with a trust degree greater than a set threshold as a data provider for providing the data under the target attribute to the data demander.

In yet another possible implementation, the pending party determination subunit includes:

an endpoint determining subunit, configured to determine a maximum information gain and a minimum information gain among information gains returned by each candidate data holder;

a total interval construction subunit, configured to construct a gain distribution total interval composed of the minimum information gain and the maximum information gain, and determine the gain distribution total interval as a target distribution interval to be processed;

the dividing point determining subunit is used for determining a first middle dividing point and a second middle dividing point which are used for trisecting the target distribution interval, and the information gain value corresponding to the first middle dividing point is smaller than the information gain value corresponding to the second middle dividing point;

a data statistics subunit, configured to separately count the number of information gains included in a first gain distribution interval and a second gain distribution interval, where the first gain distribution interval is a distribution interval formed by the minimum information gain and the second intermediate partition point, and the second gain distribution interval is a distribution interval formed by the first intermediate partition point and the maximum information gain;

A candidate interval determining subunit configured to determine, from the first gain distribution interval and the second gain distribution interval, a candidate gain distribution interval in which the number of included information gains is large;

a division triggering subunit, configured to determine the candidate gain distribution interval as a target distribution interval if the number of information gains included in the candidate gain distribution interval is not less than a set threshold or an interval range of the candidate gain distribution interval is not less than a set range interval, and return to performing the operation of the division determining subunit;

and the pending party confirmation subunit is used for determining at least one candidate data holder with information gain in the candidate gain distribution interval as at least one pending data holder if the quantity of the information gain contained in the candidate gain distribution interval is smaller than a set threshold value or the interval range of the candidate gain distribution interval is smaller than a set range interval.

In yet another possible implementation manner, the apparatus may further include:

the accuracy obtaining unit is used for obtaining the classification accuracy returned by the data demand party after the data re-encryption unit returns the re-encrypted target data to the data demand party, wherein the classification accuracy is the classification accuracy of a classification model trained by the data demand party based on the target data corresponding to each target attribute and the data under the common classification attribute;

And the trust level readjusting unit is used for adjusting the trust level of the data provider corresponding to each target attribute by combining the classification accuracy.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. Meanwhile, the features described in the embodiments of the present specification may be replaced with or combined with each other to enable those skilled in the art to make or use the present application. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. A data sharing method, characterized in that it is applied to a cloud server in a data sharing system. The data sharing system includes the cloud server and multiple data holders. The method includes:

Obtain a data request sent by the data demander, the data request indicates at least one target attribute to which the data requested by the data demander belongs, and the data request is a request encrypted using the private key of the data demander , the data demander belongs to the multiple data holders;

For each data holder, determine the first re-encryption key set by the data demander for the data holder, use the first re-encryption key to re-encrypt the data request, and The re-encrypted data request is sent to the data holder, so that the data holder can decrypt the re-encrypted data request based on its private key;

Obtain the holding status table fed back by each data holder. The holding status table fed back by the data holder includes: the data holding status of each target attribute by the data holder;

Based on the holding status table fed back by each data holder, a list of holders of each target attribute is determined. The list of holders of the target attribute includes: one of the plurality of data holders holding the target. Each candidate data holder of the data under the attribute;

For each target attribute, determine a data provider for providing data under the target attribute to the data demander from among the candidate data holders included in the holder list of the target attribute;

For each target attribute, request the target data corresponding to the target attribute from the data provider corresponding to the target attribute;

After obtaining the target data corresponding to the target attribute returned by the data provider, the target data is re-encrypted based on the second encryption key set by the data provider for the data demander, Return the re-encrypted target data to the data requester.

2. The method according to claim 1, characterized in that the data holder stores a data table, the data table includes: a public classification attribute and a plurality of private attributes, and the data table includes There are multiple records, each record corresponds to a record identifier, and the target attribute belongs to the multiple private attributes;

The cloud server stores multiple record identifiers corresponding to multiple records in the data table;

Determining a data provider for providing data under the target attribute to the data demander from among the candidate data holders included in the holder list of the target attribute includes:

Extract at least one target record identifier from the plurality of record identifiers;

Generate a pseudo attribute value pair sequence for the target attribute and the public classification attribute, the pseudo attribute value pair sequence comprising: at least one pseudo attribute value pair, each pseudo attribute value pair comprising: a pseudo attribute value generated for the target attribute and a pseudo attribute value generated for the public classification attribute;

Send the at least one target record identifier and the sequence of pseudo-attribute value pairs to each candidate data holder in the holder list of the target attribute;

Obtaining the information gain corresponding to the target attribute returned by the candidate data holder, the information gain being the information gain of the target attribute determined by the candidate data holder based on at least one real attribute value pair corresponding to the at least one target record identifier and a false attribute value pair converted from the at least one pseudo attribute value pair, the real attribute value pair comprising the attribute value of the target attribute in the record corresponding to the target record identifier in the candidate data holder and the attribute value of the common classification attribute;

Based on the information gain returned by each candidate data holder, determining at least one candidate data provider from each candidate data holder corresponding to the target attribute;

From the at least one candidate data provider, a data provider for providing data under the target attribute to the data demander is determined.

3. The method according to claim 2, characterized in that the step of determining at least one candidate data provider from the candidate data holders corresponding to the target attribute based on the information gain returned by the candidate data holders comprises:

Based on the distribution characteristics of the information gains returned by each candidate data holder, determine at least one undetermined data holder whose information gain is within a distribution interval that satisfies a set condition, wherein the number of information gains within the distribution interval that satisfies the set condition is greater than the number of information gains in other distribution intervals;

Add one to the number of repetitions;

If the number of repetitions does not reach the set number, return to performing the operation of extracting at least one target record identifier from the multiple record identifiers;

If the number of repetitions reaches the set number, the first set number of candidate data providers that appear more frequently are determined from at least one pending data holder obtained each time.

4. The method according to claim 2, characterized in that, after determining at least one candidate data provider from candidate data holders corresponding to the target attribute, it further includes:

Increase the trust of the candidate data provider by a set value, and reduce the trust of each data holder in the holder list of the target attribute that does not belong to the candidate data provider by the set value;

Determining a data provider for providing data under the target attribute to the data demander from the at least one candidate data provider includes:

From the at least one candidate data provider, a candidate data provider whose trustworthiness is greater than a set threshold is selected as a data provider for providing the data demander with the target attribute data.

5. The method according to claim 3, characterized in that, based on the distribution characteristics of the information gain returned by each candidate data holder, it is determined that at least one pending data holder whose information gain is within a distribution interval that meets the set conditions is determined. Youfang includes:

Determine the maximum information gain and the minimum information gain among the information gains returned by each candidate data holder;

Constructing a total gain distribution interval composed of the minimum information gain and the maximum information gain, and determining the total gain distribution interval as the target distribution interval to be processed;

Determine a first middle dividing point and a second middle dividing point that divide the target distribution interval into three equal parts, and the information gain value corresponding to the first middle dividing point is less than the information gain value corresponding to the second middle dividing point;

The number of information gains contained in each of the first gain distribution interval and the second gain interval is counted respectively. The first gain distribution interval is a distribution interval composed of the minimum information gain and the second intermediate dividing point, and the The second gain distribution interval is a distribution interval composed of the first intermediate dividing point and the maximum information gain;

Determine a candidate gain distribution interval containing a larger number of information gains from the first gain distribution interval and the second gain distribution interval;

If the number of information gains contained in the candidate gain distribution interval is not less than the set threshold or the interval range of the candidate gain distribution interval is not less than the set range interval, determine the candidate gain distribution interval as the target distribution interval and return Execute the operation of determining the first intermediate dividing point and the second intermediate dividing point that divide the target distribution interval into three equal parts;

If the number of information gains contained in the candidate gain distribution interval is less than a set threshold or the interval range of the candidate gain distribution interval is less than a set range interval, at least one candidate data holder whose information gain is within the candidate gain distribution interval is determined as at least one pending data holder.

6. The method according to claim 4, characterized in that, after returning the re-encrypted target data to the data demander, it further includes:

Obtain the classification accuracy returned by the data demander. The classification accuracy is the classification accuracy of the classification model trained by the data demander based on the target data corresponding to each target attribute and the data under the common classification attribute;

In combination with the classification accuracy, the trustworthiness of the data provider corresponding to each target attribute is adjusted.

7. A data sharing system, comprising:

Cloud servers and multiple data holders;

Wherein, the data demander among the plurality of data holders is used to send a data request to the cloud server, the data request indicates at least one target attribute to which the data requested by the data demander belongs, and The data request is a request encrypted using the private key of the data requester;

The cloud server is used to execute the data sharing method described in any one of claims 1 to 6 above.

8. A data sharing device, characterized in that it is applied to a cloud server in a data sharing system. The data sharing system includes the cloud server and multiple data holders. The device includes:

A request obtaining unit is used to obtain a data request sent by the data demander, the data request indicates at least one target attribute to which the data requested by the data demander belongs, and the data request adopts the data requester's A request encrypted by the private key, and the data demander belongs to the multiple data holders;

Request re-encryption unit, configured to determine, for each data holder, the first re-encryption key set by the data demander for the data holder, and use the first re-encryption key to The data request is re-encrypted, and the re-encrypted data request is sent to the data holder, so that the data holder can decrypt the re-encrypted data request based on its private key;

The table obtaining unit is used to obtain the holding status table fed back by each data holder. The holding status table fed back by the data holder includes: the data holding status of each target attribute by the data holder;

The list determination unit is used to determine the holder list of each target attribute based on the holding status table fed back by each data holder. The holder list of the target attribute includes: the multiple data holders. Each candidate data holder holding data under the target attribute;

A provider determination unit, configured to determine, for each target attribute, from candidate data holders included in a holder list of the target attribute, a data provider for providing the data demander with the data under the target attribute;

A data request unit, configured to request target data corresponding to the target attribute from the data provider corresponding to the target attribute for each target attribute;

A data re-encryption unit configured to, after obtaining the target data corresponding to the target attribute returned by the data provider, re-encrypt the data based on the second re-encryption key set by the data provider for the data demander. The target data is re-encrypted, and the re-encrypted target data is returned to the data demander.

9. The device according to claim 8, characterized in that the data holder stores a data table, the data table includes: a public classification attribute and a plurality of private attributes, and the data table includes There are multiple records, each record corresponds to a record identifier, and the target attribute belongs to the multiple private attributes;

The provider determination unit includes:

an identifier extraction unit, configured to extract at least one target record identifier from the plurality of record identifiers;

A pseudo-attribute generation unit, configured to generate a pseudo-attribute value pair sequence for the target attribute and the common classification attribute. The pseudo-attribute value pair sequence includes: at least one pseudo-attribute value pair, and each pseudo-attribute value pair includes: The pseudo attribute value generated by the target attribute and the pseudo attribute value generated for the public classification attribute;

A sequence sending unit, configured to send the sequence of the at least one target record identifier and the pseudo attribute value pair to each candidate data holder in the holder list of the target attribute;

A gain obtaining unit, configured to obtain the information gain corresponding to the target attribute returned by the candidate data holder, where the information gain is at least one true value corresponding to the at least one target record identifier based on the candidate data holder. The attribute value pair and the false attribute value pair converted from the at least one pseudo attribute value pair, the determined information gain of the target attribute, the real attribute value pair includes the target record in the candidate data holder Identify the attribute value of the target attribute and the attribute value of the public classification attribute in the corresponding record;

A candidate determination unit configured to determine at least one candidate data provider from each candidate data holder corresponding to the target attribute based on the information gain returned by each candidate data holder;

The provider selection unit is used to determine a data provider for providing the data with the target attribute to the data demander from the at least one candidate data provider.

10. The device according to claim 9, wherein the candidate determination unit comprises:

The pending party determination subunit is used to determine at least one pending data holder whose information gain is within a distribution interval that meets the set conditions based on the distribution characteristics of the information gain returned by each candidate data holder, where the set conditions are met The number of information gains in the distribution interval is greater than the number of information gains in other distribution intervals;

The times increase subunit is used to increase the number of repetitions by one;

A repetition trigger subunit, used to return to perform the operation of the identification extraction unit if the number of repetitions does not reach the set number;

The candidate determination subunit is used to determine the first set number of candidate data providers with a higher frequency of occurrence from at least one pending data holder obtained each time if the number of repetitions reaches a set number.