CN117574432A

CN117574432A - Data security release method and device, computer equipment and storage medium

Info

Publication number: CN117574432A
Application number: CN202311586203.9A
Authority: CN
Inventors: 何阳; 郭骞; 梁飞
Original assignee: State Grid Smart Grid Research Institute Co ltd; State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd
Current assignee: State Grid Smart Grid Research Institute Co ltd; State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-02-20

Abstract

The invention relates to the technical field of power system security, in particular to a data security issuing method, a device, computer equipment and a storage medium. The method comprises the following steps: receiving a data access request and determining data to be distributed; inputting data to be issued into a pre-constructed data processing model for feature extraction and privacy elimination, obtaining issuable data for issuing, wherein the data processing model is constructed based on an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor. By implementing the invention, the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor are adopted to construct the data processing model, so that the data processing model is adopted to process the data to be distributed, the obtained distributable data can have the same characteristics as the original data and practical service data, and meanwhile, the privacy disclosure can be prevented, and the privacy protection for preventing reverse analysis and reasoning attack is realized.

Description

Data security release method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of power system security, in particular to a data security issuing method, a device, computer equipment and a storage medium.

Background

The electric power marketing 2.0 service system greatly expands service services, so that the service scenes of the cooperation channels on the marketing line are complex and changeable, the service data has the characteristics of large volume and various types, and simultaneously, the cooperation channel side has a large number of channels for user service interaction. This will lead to serious security challenges for the data during distribution, and there are security problems such as privacy disclosure caused by reverse analysis.

The partner channel side is provided with a large number of channels for user service interaction, and if effective privacy protection of data cannot be ensured in the service data release process, an attacker can easily infer user privacy data through released data and an exchanged data model. Therefore, the data needs to be processed in the process of releasing the business data, and the sensitive information leakage caused by reverse analysis and reasoning attack can be prevented while the effective data with the same characteristics as the original data is released, so that the business data interaction safety facing the marketing cooperation channel is maintained.

Disclosure of Invention

In view of the above, the present invention provides a data security publishing method, apparatus, computer device and storage medium, so as to solve the problem of how to implement effective privacy protection of data in the data publishing process.

In a first aspect, the present invention provides a data security publishing method, including: receiving a data access request, and determining data to be distributed according to the data access request; the data to be distributed is input into a pre-constructed data processing model for feature extraction and privacy elimination, the data which can be distributed is obtained for distribution, the data processing model is constructed based on an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor, the encoder is used for extracting data features, the utility discriminator is used for simulating expected classification tasks, and the privacy discriminator and the privacy reconstructor are used for measuring privacy leakage risks, so that privacy elimination is achieved.

According to the data security release method provided by the embodiment of the invention, the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor are adopted to construct the data processing model, so that the data processing model is adopted to process the data to be distributed, the obtained distributable data can be prevented from privacy disclosure while the service data which has the same characteristics as the original data and has practicability, and the privacy protection for preventing reverse analysis and reasoning attack is realized.

In an alternative embodiment, the data processing model is constructed as follows: an input layer, a convolution layer, a pooling layer and a batch normalization layer are adopted to construct an encoder; constructing a utility discriminator by adopting a multi-layer perceptron; constructing a privacy discriminator by adopting a multi-layer perceptron; constructing a privacy reconstructor by adopting an inverted encoder; and training the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor to obtain a data processing model.

In an alternative embodiment, the training of the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor results in a data processing model comprising: initializing a first weight of an encoder; updating the second weight of the utility discriminator and the first weight of the encoder based on the first loss function; updating a third weight of the privacy discriminator based on the second loss function; updating a fourth weight of the privacy reconstructor based on the third loss function; updating the updated first weight and second weight based on the fourth loss function; and respectively taking the updated first weight, second weight, third weight and fourth weight as parameters of an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor to obtain a data processing model.

In this embodiment, the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor are trained respectively, and then the fourth loss function is used to train the whole, so that the utility-privacy balance can be realized by the obtained data processing model.

In an alternative embodiment, the first loss function is determined based on cross entropy between the prediction result of the utility discriminator and the real tag, the second loss function is determined based on cross entropy between the privacy class of the privacy discriminator and the privacy tag, the third loss function is determined based on the square of the difference between the reconstructed data of the privacy reconstructor and the original data, and the fourth loss function is determined based on the first loss function, the second loss function, and the third loss function.

In an alternative embodiment, the fourth loss function is expressed using the following formula:

wherein m represents a batch size, lambda ₁ 、λ ₂ And lambda (lambda) ₃ Training coefficients, y, representing the utility discriminator, privacy discriminator and privacy reconstructor, respectively _i The real tag representing data I, UD (E (I _i ) Representing the pair of data I by the utility discriminator _i Class prediction results of τ (y) _i ,UD(E(I _i ) ) represents the cross entropy between the class prediction result and the real label, PR (E (I) _i ) Indicating privacy reconstructor pair data I _i And z _i Privacy tag representing data I, PD (E (I _i ) Indicating the privacy discriminator for data I _i Is a privacy class prediction result of τ (z) _i ,PD(E(I _i ) ) represents the cross entropy between the privacy class prediction result and the privacy tag.

In an alternative embodiment, before inputting the data to be distributed into the pre-constructed data processing model for feature extraction and privacy elimination, the method further comprises: carrying out format unification processing on the data to be issued by adopting a preset rule to obtain a processing result; and binarizing the processing result to obtain binary sequence data.

In this embodiment, the format unification processing and the binarization operation are performed on the data to be published, so that a numerical form in which the data processing model is easy to process can be obtained.

In an alternative embodiment, the method further comprises: acquiring data to be interacted; inputting the data to be interacted into a pre-constructed data processing model for feature extraction and privacy elimination to obtain a issuable data set.

In this embodiment, by acquiring the data to be published, and performing feature extraction and privacy elimination on the data to be published, a data set capable of being published is obtained, and when a data access request is subsequently received, the data set capable of being directly queried from the data set capable of being published can provide as many data access amounts and access times as possible on the premise of minimizing privacy budget.

In an alternative embodiment, the privacy discriminator and the privacy reconstructor measure the risk of privacy disclosure by adopting the following ways to implement privacy elimination: carrying out privacy class prediction on the data characteristics extracted by the encoder by adopting a privacy discriminator; reconstructing the extracted data features by using a privacy reconstructor to obtain reconstructed data; calculating cross entropy between the privacy category and the privacy label by adopting the second loss function, and calculating reconstruction errors of the reconstruction data and the original data by adopting a third loss function, wherein the cross entropy is used for measuring whether privacy leakage risks exist or not; and training the privacy discriminator and the privacy reconstructor according to the cross entropy and the reconstruction error, so that the reconstructed data output by the trained privacy reconstructor realizes privacy elimination.

In a second aspect, the present invention provides a data security issuing device, which is characterized in that the device includes: the data determining module is used for receiving the data access request and determining data to be distributed according to the data access request; the data processing module is used for inputting data to be issued into a pre-constructed data processing model to perform feature extraction and privacy elimination, the data to be issued are obtained to be issued, the data processing model is constructed based on an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor, the encoder is used for extracting data features, the utility discriminator is used for simulating expected classification tasks, and the privacy discriminator and the privacy reconstructor are used for measuring privacy leakage risks and achieving privacy elimination.

In an alternative embodiment, the data processing model is constructed using the following modules: the encoder construction module is used for constructing an encoder by adopting an input layer, a convolution layer, a pooling layer and a batch normalization layer; a utility discriminator construction module for constructing a utility discriminator using a multi-layer perceptron; the privacy discriminator building module is used for building a privacy discriminator by adopting a multi-layer perceptron; a privacy reconstructor construction module for constructing a privacy reconstructor using an inverse encoder; and the training module is used for training the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor to obtain a data processing model.

In an alternative embodiment, the training module is specifically configured to: initializing a first weight of an encoder; updating the second weight of the utility discriminator and the first weight of the encoder based on the first loss function; updating a third weight of the privacy discriminator based on the second loss function; updating a fourth weight of the privacy reconstructor based on the third loss function; updating the updated first weight and second weight based on the fourth loss function; and respectively taking the updated first weight, second weight, third weight and fourth weight as parameters of an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor to obtain a data processing model.

In an alternative embodiment, the apparatus further comprises: the unified processing module is used for carrying out format unified processing on the data to be distributed by adopting preset rules to obtain a processing result; and the binary processing module is used for binarizing the processing result to obtain binary sequence data.

In an alternative embodiment, the apparatus further comprises: the data acquisition module is used for acquiring data to be interacted; and the issuable data set determining module is used for inputting the data to be interacted into the pre-constructed data processing model to perform feature extraction and privacy elimination to obtain the issuable data set.

In an alternative embodiment, the privacy discriminator and the privacy reconstructor measure the risk of privacy disclosure in the following manner, so as to implement privacy elimination: carrying out privacy class prediction on the data characteristics extracted by the encoder by adopting a privacy discriminator; reconstructing the extracted data features by using a privacy reconstructor to obtain reconstructed data; calculating cross entropy between the privacy category and the privacy label by adopting a second loss function, calculating reconstruction errors of the reconstruction data and the original data by adopting a third loss function, and measuring whether privacy leakage risks exist or not by adopting the cross entropy; and training the privacy discriminator and the privacy reconstructor according to the cross entropy and the reconstruction error, so that the reconstruction data output by the trained privacy reconstructor realizes privacy elimination.

In a third aspect, the present invention provides a computer device comprising: the data security issuing method comprises the steps of storing computer instructions in a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores the computer instructions, and the processor executes the computer instructions, so that the data security issuing method of the first aspect or any corresponding implementation mode of the first aspect is executed.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the data security issuing method of the first aspect or any of the embodiments corresponding thereto.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data security publishing method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a marketing business data preprocessing flow, according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a business data feature extraction and privacy elimination flow in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of a data security publishing process according to an embodiment of the invention;

FIG. 5 is a block diagram of a data security issuing apparatus according to an embodiment of the present invention;

FIG. 6 is a block diagram of another data security issuing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In accordance with an embodiment of the present invention, there is provided a data security distribution method embodiment, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.

In this embodiment, a data security issuing method is provided, which may be used in an electronic device, such as a computer, a mobile phone, a tablet computer, etc., fig. 1 is a flowchart of a data security issuing method according to an embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:

step S101, a data access request is received, and data to be distributed is determined according to the data access request. The data may be service data in the electric power marketing service system, or may be other data that can be issued according to a data access request, and the type and source of the data are not specifically limited in the application. The data access request may be a request sent by the data requester to obtain data, where the data access request may include identity information of the data sender and a description of the data to be obtained. After receiving the data access request, the request may be parsed to determine the data that it is desired to request, i.e., the data to be distributed.

Step S102, inputting data to be issued into a pre-constructed data processing model for feature extraction and privacy elimination, obtaining issuable data for issuing, wherein the data processing model is constructed based on an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor, the encoder is used for extracting data features, the utility discriminator is used for simulating expected classification tasks, and the privacy discriminator and the privacy reconstructor are used for measuring privacy leakage risks and achieving privacy elimination. Specifically, a data processing model is constructed through an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor, so that the data processing model can realize feature extraction in data through the encoder, and the set utility discriminator can ensure that the utility of the obtained issuable data is unchanged, namely, the data output by the model can replace the original data for subsequent data mining, issuing and sharing; the privacy discriminator and the privacy reconstructor can ensure that the privacy information of the obtained issuable data meets the requirement or does not have privacy disclosure risk, so that the data processing model is adopted to process the data to be issued, and the obtained issuable data can be prevented from disclosure while the utility is ensured.

In this embodiment, a data security publishing method is provided, and the process includes the following steps:

step S201, a data access request is received, and data to be distributed is determined.

Specifically, before inputting data to be distributed into a pre-constructed data processing model for feature extraction and privacy elimination, the method comprises the following steps:

and step 2011, carrying out format unification processing on the data to be distributed by adopting a preset rule to obtain a processing result. Specifically, when the data is business data in the electric power marketing business system, it may have various types of business data, for example, 23 business categories are included in a certain electric power marketing business system, and there are specific electric charge settlement, asset management, and the like. For easy processing of the subsequent data processing model, the data is subjected to format unification processing according to preset rules, wherein the preset rules can be preset according to business knowledge and requirements, for example, the preset rules can be that different customer types, electricity consumption categories and other sortable data are mapped to integer codes, for example, different customer types are mapped to integer codes, or different electricity consumption categories are mapped to standard electricity consumption category names. Data which is difficult to classify, such as an IP address, a use time length and the like, is converted into a byte string from a character string. Corresponding rules can be formulated for specific data, and the preset rules are not specifically limited in the application.

Step S2012, binarizing the processing result to obtain binary sequence data. Specifically, after format unification processing is performed, binarization operation is performed on the processing result, that is, the data which has been subjected to the unification processing is converted into a binary sequence. Thereby, it is possible to realize that different types of data are converted into numerical forms which are easy to process by the data processing model.

Step S202, inputting data to be issued into a pre-constructed data processing model for feature extraction and privacy elimination, obtaining issuable data for issuing, wherein the data processing model is constructed based on an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor, the encoder is used for extracting data features, the utility discriminator is used for simulating expected classification tasks, and the privacy discriminator and the privacy reconstructor are used for measuring privacy leakage risks and achieving privacy elimination.

Specifically, the step S202 includes:

step S2021, adopting an input layer, a convolution layer, a pooling layer and a batch normalization layer to construct an encoder; specifically, to learn useful feature representations from the data, reduce dimensionality and remove redundancy, capture nonlinear relationships, an encoder can be constructed to perform feature extraction on the data. In this embodiment, the encoder consists of an input layer, multiple convolution layers, a pooling layer, and a batch normalization layer, the number of layers for each structure can be adjusted according to the data specifically processed. Wherein in the encoder, the convolutional layer performs a convolutional operation using a set of trainable filters to output an activation map. The batch normalization layer normalizes the activation map of the previous layer output by subtracting the batch mean and dividing by the batch standard deviation. Then, the pooling layer takes the maximum value or average value from the subarea of the previous layer to form a more compact characteristic, so that the calculation error is reduced, and the overfitting is avoided.

In step S2022, a utility discriminator is built using the multi-layer perceptron. Specifically, the utility discriminator (UD, utilityDiscriminator) is configured to process the features extracted by the encoder, simulate an expected classification task, and determine the expected classification task according to specific data, for example, the data is power grid customer electricity data, and the classification task can be classified into four types of users, namely, high-quality users, potential users, high-consumption users and common users according to the consumption mode and consumption habit of the users. In this embodiment, the utility discriminator is constructed with a Multi-Layer Perceptron (MLP), the number of layers of which can be adjusted accordingly based on specific data and tasks.

Step S2023, constructing a privacy discriminator by using the multi-layer perceptron; specifically, the privacy discriminator (PD, privacyDiscriminator) is constructed to predict privacy classes or types of features extracted by the encoder, where a class refers to a class related to sensitivity and privacy risk of data. Different privacy protection measures are set for different privacy classes, so that related data can be better managed and protected by determining the different privacy classes. And for the privacy category of the feature predicted by the privacy discriminator, carrying out error calculation on the privacy category and a preset privacy label, so as to measure the appointed privacy disclosure risk. The privacy discriminator can also be constructed by adopting a multi-layer perceptron, and the specific layer number of the perceptron can also be adjusted according to actual conditions.

Step S2024, constructing a privacy reconstructor using an inverse encoder; specifically, the privacy reconstructor (PR, privacyReconstructor) is configured to simulate malicious parties and quantify intuitive privacy errors. The privacy reconstructor is constructed by adopting an inverted encoder, namely the structure in the encoder is inverted to construct the privacy reconstructor, or the encoder is mirrored by using a complete layer-to-layer inverted architecture, so that the construction of the privacy reconstructor is realized. For example, the privacy reconstructor is composed of a plurality of non-pooling layers and deconvolution layers, and the non-pooling operation of the non-pooling layers is realized by feature resizing or closest value filling. The deconvolution layer densifies the non-pooled sparse activations by a deconvolution operation. For the analog data obtained by reconstructing the privacy, the reconstruction error can be calculated between the analog data and the original data (namely, the characteristics extracted by the encoder), so that the leakage risk of the unknown privacy information is measured. The reconstruction error may be determined by calculating the euclidean distance between the analog data and the original data.

Step S2025, training the encoder, the utility discriminator, the privacy discriminator, and the privacy reconstructor to obtain a data processing model. Specifically, in the training process, the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor can be trained respectively, and then the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor are trained jointly, namely utility-privacy balance game training is performed, so that the utility-privacy balance is realized by the finally obtained data processing model. When the trained data processing model is adopted for data processing, the model can output useful and privacy, namely, the model has high inference precision when being used for data processing and data release tasks, and has low privacy inference precision and high reconstruction error when being subjected to malicious processing and reverse engineering of an attacker.

Specifically, before training, parameter settings are first set, including setting weights of the model, the number of rounds of training, and the batch size, etc. In the training process, multiple iterative computations can be performed, and the following steps are respectively executed in each iterative process:

step a1, initializing a first weight of an encoder.

Step a2, updating the second weight of the utility discriminator and the first weight of the encoder based on the first loss function. Specifically, the encoder weight θ is initialized first _E And training a Utility Discriminator (UD) to learn rate l ₁ Gradient up update θ _E And theta _UD The cross entropy between the prediction class UD (E (I)) and the real label y is minimized:

wherein C is _u Representing the standard cross entropy metric utility error between UD (E (I)) and real tag y, m representing batch size, y _i The real tag representing data I, UD (E (I _i ) Representing the pair of data I by the utility discriminator _i Class prediction results of τ (y) _i ,UD(E(I _i ) ) represents the cross entropy between the category prediction result and the real label.

Step a3, updating the third weight of the privacy discriminator based on the second loss function; specifically, in training the privacy discriminator, a second loss function is employed at a learning rate l ₂ Gradient up update θ _PD The cross entropy between the prediction privacy class PD (E (I)) and the privacy tag z is minimized:

wherein C is _p1 Representing PD (E (I) _i ) Standard cross entropy metric utility error between privacy tag z, z) _i Privacy tag representing data I, PD (E (I _i ) Indicating the privacy discriminator for data I _i Privacy class prediction result τ (z) _i ,PD(E(I _i ) ) represents the cross entropy between the privacy class prediction result and the privacy tag.

Step a4, updating the fourth weight of the privacy reconstructor based on the third loss function; specifically, a third loss function is employed in training the privacy reconstructor at a learning rate l ₃ Gradient up update θ _PR Minimizing reconstruction error C _p2 ：

In the formula, PR (E (I) _i ) Indicating privacy reconstructor pair data I _i Is a reconstruction of the results of (a).

Step a5, based on the fourth loss function, updating the first weight and the second weightUpdating the weights; specifically, after the weights of the encoder, the utility discriminator, the privacy discriminator, and the privacy reconstructor are updated, respectively, privacy utility balance countermeasure adjustment is performed at the learning rate l using a fourth loss function ₄ Gradient up update θ _E And theta _UD Minimizing sum error C _sum ：

Wherein lambda is ₁ 、λ ₂ And lambda (lambda) ₃ Representing training coefficients of the utility discriminator, the privacy discriminator and the privacy reconstructor, respectively.

And a step a6, respectively taking the updated first weight, the updated second weight, the updated third weight and the updated fourth weight as parameters of the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor to obtain a data processing model.

It should be noted that, since the utility identifier can make the utility of the reconstructed data and the original data keep consistent, the privacy identifier can determine the privacy disclosure risk existing in the reconstructed data, and the privacy reconstructor can implement the reconstruction of the original data, so that the reconstructed data output by the privacy reconstructor has no privacy disclosure risk through training the utility identifier, the privacy identifier and the privacy reconstructor, that is, the privacy elimination is implemented, and meanwhile, the utility of the reconstructed data with the original data is ensured.

In an alternative embodiment, the method further comprises: acquiring data to be interacted; inputting the data to be interacted into a pre-constructed data processing model for feature extraction and privacy elimination to obtain a issuable data set. Specifically, when the data is the electric power marketing business data, after the electric power marketing system generates the data, the data can be used as data to be distributed, and the data can be input into a pre-constructed data processing model for feature extraction and privacy elimination to obtain a data set capable of being distributed. And then, when the data access interface receives a data access request, directly inquiring in the issuable data set, and determining the issuable data to issue. By means of the feature extraction and privacy elimination of the data in advance, the data access amount and access times can be provided as much as possible on the premise of minimizing privacy budget, and meanwhile, the data release time can be reduced. The data release can be realized by adopting a non-interactive data release mode.

The data security release method provided by the embodiment of the invention can be applied to a plurality of service types in electric power marketing service, can improve the security of service data interaction of on-line channels, can meet the service data interaction requirement of on-line cooperation channels, can release data with the same characteristics as the original data, ensures the practicability of the data, and simultaneously effectively prevents privacy leakage caused by reverse analysis and reasoning attack.

As a specific application embodiment of the invention, the data security issuing method can be realized by adopting the following flow:

1. a data processing model is constructed using an encoder, a utility discriminator, a privacy discriminator, and a privacy reconstructor.

Before construction, sample data is firstly acquired for data preprocessing so as to convert different types of business data into numerical forms which are easy to process by a machine learning algorithm. As shown in fig. 2, the data preprocessing process specifically includes marketing business data format unification processing, that is, making unification rules according to business knowledge and requirements, and unifying marketing business data: the sortable data of different customer types, electricity usage categories, etc. are mapped to integer codes, e.g., different customer types are mapped to integer codes, or different electricity usage categories are mapped to standard electricity usage category names. Data which is difficult to classify, such as an IP address, a use time length and the like, is converted into a byte string from a character string. And converting the unified data into a binary sequence to obtain formatted data.

As shown in fig. 3, the model is being constructed, requiring the encoder, utility discriminator, privacy discriminator and privacy reconstructor to be constructed first, respectively. Wherein the encoder is configured to extract depth features of the formatted traffic data: the encoder (E) consists of an input layer, a plurality of convolution layers, a pooling layer and a batch normalization layer, and the layer number can be adjusted according to specific data. The convolution layer performs a convolution operation using a set of trainable filters to output an activation map. The batch normalization layer normalizes the activation map of the previous layer output by subtracting the batch mean and dividing by the batch standard deviation. Then, the pooling layer takes the maximum value or average value from the subarea of the previous layer to form a more compact characteristic, so that the calculation error is reduced, and the overfitting is avoided.

The utility discriminator is used for processing the depth features and simulating the expected classification task. The Utility Discriminator (UD) is constructed by a multi-layer perceptron (MLP) which simulates the expected classification task, processes the depth features E (I), and outputs the task classification result through a plurality of full connection layers.

The privacy discriminator and the privacy reconstructor are used for measuring the risk of revealing the privacy information. Wherein the Privacy Discriminator (PD) is constructed by a multi-layer perceptron (MLP), predicting from the depth features E (I) a user-specified privacy class z ', the error of the privacy class z' from the privacy label z being used to measure the specified privacy leakage risk C _p2 . The Privacy Reconstructor (PR) is an inverse encoder that mirrors the encoder using a full layer-to-layer reverse architecture, emulating a powerful contrast reconstructor that consists of a plurality of non-pooled layers and deconvolution layers. The non-pooling operation is achieved by feature resizing or closest value filling. The deconvolution layer then non-pools by a deconvolution operationThe resulting sparsity activates densification. PR simulation of malicious party, quantization of visual privacy error C _p1 . The leakage risk C of the unknown privacy information can be measured through the reconstruction error between the original data I and the analog data I _p1 。

After the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor are constructed, utility-privacy balance game training is performed. In the training process, an initial amount is set: model weight θ _E ,θ _UD ,θ _PD ,θ _PR The method comprises the steps of carrying out a first treatment on the surface of the Training wheel number n; batch size m. Then inputting the data set to be processed, and performing standard authentication and generation: in each epoch, the encoder weights θ are first initialized _E And training a professional Utility Discriminator (UD) to learn rate l ₁ Gradient up update θ _E And theta _UD Minimizing cross entropy C between prediction class UD (E (I)) and real label y _u ：

Training a Privacy Discriminator (PD) to learn rate l ₂ Gradient up update θ _PD Minimizing cross entropy C between the prediction privacy class PD (E (I)) and the privacy tag z _p1 ；

Training a Privacy Reconstructor (PR) to learn a rate l ₃ Gradient up update θ _PR Minimizing reconstruction error C _p2 ；

Finally, privacy utility balance countermeasure adjustment is executed at the learning rate l ₄ Gradient up update θ _E And theta _UD Minimizing sum error C _sum :

Training minimums and errors to find privacy-utility tradeoffs, complete module training.

2. And data is issued safely. In order to provide as much data access amount and access times as possible on the premise of ensuring as little privacy budget as possible, the system adopts non-interactive data release, and the data security release flow is shown in fig. 4.

Inputting data to be issued to a data processing module to obtain a issuable data set: the method comprises the steps of carrying out data preprocessing on data to be issued, and carrying out feature extraction and privacy elimination on service data in advance by adopting a data processing module to obtain an issuable data set.

Then receiving a data access request by adopting a data access interface; querying the issuable data set, extracting data and issuing: and after the data access interface receives the data access request, relevant data is extracted from the issuable data set and issued.

According to the data security release method, in the marketing business data release process, the original data characteristics can be extracted through the utility-privacy balance game, and the privacy characteristics are removed, so that data with the same characteristic distribution as the original data can be generated, and the purpose of guaranteeing the data availability and the data security release can be achieved.

The embodiment also provides a data security issuing device, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a data security issuing apparatus, as shown in fig. 5, including:

the data determining module 51 is configured to receive a data access request, and determine data to be distributed according to the data access request.

The data processing module 52 is configured to input data to be distributed into a pre-constructed data processing model for feature extraction and privacy elimination, obtain data to be distributed for distribution, wherein the data processing model is constructed based on an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor, the encoder is used for extracting data features, the utility discriminator is used for simulating expected classification tasks, and the privacy discriminator and the privacy reconstructor are used for measuring privacy leakage risks and achieving privacy elimination.

wherein m represents a batch size, lambda ₁ 、λ ₂ And lambda (lambda) ₃ Training coefficients, y, representing the utility discriminator, privacy discriminator and privacy reconstructor, respectively _i The real tag representing data I, UD (E (I _i ) Representing the pair of data I by the utility discriminator _i Class prediction results of PR (E (I) _i ) Indicating privacy reconstructor pair data I _i And z _i Privacy tag representing data I, PD (E (I _i ) Indicating the privacy discriminator for data I _i Privacy class prediction results of (a).

In an alternative embodiment, as shown in fig. 6, the data security issuing apparatus includes:

a data preprocessing module 61, configured to formulate a unified rule according to business knowledge and requirements; unifying marketing business data based on unifying rules; and converting the unified marketing business data into a binary sequence to obtain formatted data.

A data processing model construction module 62, configured to receive the formatted data sent by the data preprocessing module; the construction encoder extracts depth features of the formatted data; constructing a utility discriminator to process the depth characteristics and simulate the expected classification task; constructing a privacy attacker and an antagonism reconfigurator, and measuring the risk of privacy information disclosure; and performing utility-privacy balance game training to obtain a data processing module for safely releasing the service data.

The data security issuing module 63 is configured to input data to be issued to the data processing module, so as to obtain an issuable data set; the data access interface receives a data access request; querying a data set which can be issued, extracting data and issuing the data;

further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The embodiment of the invention also provides computer equipment, which is provided with the data security issuing device shown in the figure 6.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 7, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 7.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the computer device of the presentation of a sort of applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A data security distribution method, the method comprising:

receiving a data access request, and determining data to be distributed according to the data access request;

the method comprises the steps of inputting data to be distributed into a pre-constructed data processing model for feature extraction and privacy elimination, and obtaining data which can be distributed for distribution, wherein the data processing model is constructed based on an encoder, a utility discriminator, a privacy discriminator and a privacy reconfigurator, the encoder is used for extracting data features, the utility discriminator is used for simulating expected classification tasks, and the privacy discriminator and the privacy reconfigurator are used for measuring privacy leakage risks and achieving privacy elimination.

2. The method of claim 1, wherein the data processing model is constructed by:

an input layer, a convolution layer, a pooling layer and a batch normalization layer are adopted to construct an encoder;

constructing a utility discriminator by adopting a multi-layer perceptron;

Constructing a privacy discriminator by adopting a multi-layer perceptron;

constructing a privacy reconstructor by adopting an inverted encoder;

and training the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor to obtain a data processing model.

3. The method of claim 2, wherein training the encoder, utility discriminator, privacy discriminator, and privacy reconstructor to obtain a data processing model comprises:

initializing a first weight of the encoder;

updating the second weight of the utility discriminator and the first weight of the encoder based on a first loss function;

updating a third weight of the privacy discriminator based on a second loss function;

updating a fourth weight of the privacy reconstructor based on a third loss function;

updating the updated first weight and second weight based on the fourth loss function;

and respectively taking the updated first weight, second weight, third weight and fourth weight as parameters of the encoder, the utility identifier, the privacy identifier and the privacy reconstructor to obtain a data processing model.

4. A method according to claim 3, wherein the first loss function is determined based on cross entropy between the prediction result of the utility discriminator and the real tag, the second loss function is determined based on cross entropy between the privacy class of the privacy discriminator and the privacy tag, the third loss function is determined based on the square of the difference between the reconstructed data and the original data of the privacy reconstructor, and the fourth loss function is determined based on the first, second and third loss functions.

5. The method of claim 4, wherein the fourth loss function is expressed by the following formula:

6. The method of claim 1, wherein inputting the data to be distributed into the pre-constructed data processing model for feature extraction and privacy elimination further comprises:

carrying out format unification processing on the data to be issued by adopting a preset rule to obtain a processing result;

and binarizing the processing result to obtain binary sequence data.

7. The method according to claim 1, wherein the method further comprises:

Acquiring data to be interacted;

inputting the data to be interacted into a pre-constructed data processing model for feature extraction and privacy elimination to obtain a issuable data set.

8. The method of claim 1, wherein the privacy discriminator and privacy reconstructor measure privacy exposure risk by:

carrying out privacy class prediction on the data characteristics extracted by the encoder by adopting a privacy discriminator;

reconstructing the extracted data features by using a privacy reconstructor to obtain reconstructed data;

calculating cross entropy between the privacy category and the privacy label by adopting the second loss function, and calculating reconstruction errors of the reconstruction data and the original data by adopting a third loss function, wherein the cross entropy is used for measuring whether privacy leakage risks exist or not;

and training the privacy discriminator and the privacy reconstructor according to the cross entropy and the reconstruction error, so that the reconstructed data output by the trained privacy reconstructor realizes privacy elimination.

9. A data security issuing apparatus, the apparatus comprising:

the data determining module is used for receiving a data access request and determining data to be distributed according to the data access request;

The data processing module is used for inputting data to be issued into a pre-constructed data processing model to perform feature extraction and privacy elimination, and obtaining issuable data to issue, wherein the data processing model is constructed based on an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor, the encoder is used for extracting data features, the utility discriminator is used for simulating expected classification tasks, and the privacy discriminator and the privacy reconstructor are used for measuring privacy leakage risks.

10. The apparatus of claim 9, wherein the data processing model is constructed using the following modules: the encoder construction module is used for constructing an encoder by adopting an input layer, a convolution layer, a pooling layer and a batch normalization layer; a utility discriminator construction module for constructing a utility discriminator using a multi-layer perceptron; the privacy discriminator building module is used for building a privacy discriminator by adopting a multi-layer perceptron; a privacy reconstructor construction module for constructing a privacy reconstructor using an inverse encoder; and the training module is used for training the encoder, the utility discriminator, the privacy discriminator and the privacy reconstructor to obtain a data processing model.

11. The device according to claim 10, wherein the training module is specifically configured to: initializing a first weight of an encoder; updating the second weight of the utility discriminator and the first weight of the encoder based on the first loss function; updating a third weight of the privacy discriminator based on the second loss function; updating a fourth weight of the privacy reconstructor based on the third loss function; updating the updated first weight and second weight based on the fourth loss function; and respectively taking the updated first weight, second weight, third weight and fourth weight as parameters of an encoder, a utility discriminator, a privacy discriminator and a privacy reconstructor to obtain a data processing model.

12. The apparatus of claim 11, wherein the first loss function is determined based on cross entropy between a prediction result of the utility discriminator and the real tag, the second loss function is determined based on cross entropy between a privacy class of the privacy discriminator and the privacy tag, the third loss function is determined based on a square difference between the reconstructed data and the original data of the privacy reconstructor, and the fourth loss function is determined based on the first loss function, the second loss function, and the third loss function.

13. The apparatus of claim 12, wherein the fourth loss function is expressed by the following formula:

14. The apparatus of claim 9, wherein the apparatus further comprises: the unified processing module is used for carrying out format unified processing on the data to be distributed by adopting preset rules to obtain a processing result; and the binary processing module is used for binarizing the processing result to obtain binary sequence data.

15. The apparatus of claim 9, wherein the apparatus further comprises: the data acquisition module is used for acquiring data to be interacted; and the issuable data set determining module is used for inputting the data to be interacted into the pre-constructed data processing model to perform feature extraction and privacy elimination to obtain the issuable data set.

16. The apparatus of claim 9, wherein the privacy discriminator and privacy reconstructor measure privacy exposure risk by:

17. A computer device, comprising:

a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the data security issuing method of any of claims 1 to 8.

18. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the data security distribution method according to any one of claims 1 to 8.