CN117708868A

CN117708868A - Information protection method and system based on queue data desensitization and differential privacy protection

Info

Publication number: CN117708868A
Application number: CN202311443296.XA
Authority: CN
Inventors: 刘婉姮; 李建涛; 沈庭艳; 王萌; 马雪琦; 赵子欣; 刘影; 李美睿; 唐佩福
Original assignee: Fourth Medical Center General Hospital of Chinese PLA
Current assignee: Fourth Medical Center General Hospital of Chinese PLA
Priority date: 2023-11-01
Filing date: 2023-11-01
Publication date: 2024-03-15

Abstract

The invention provides an information protection method and system based on queue data desensitization and differential privacy protection, which are used for protecting sensitive data security, realizing legal compliance of the sensitive data and maximizing data availability and mining value, wherein a real-world data platform performs data desensitization by combining a differential privacy technology with a queue data federal analysis and calculation technology based on Swarm learning, so that personal privacy and data security are protected. On the premise of ensuring personal sensitive data safety, if the problem exists that data destruction needs to be carried out on published data, adopting a technical means to remotely identify the user identity and safely destroy the sensitive data in the storage medium, and avoiding unauthorized users from recovering original data information by using residual data so as to achieve the aim of protecting key data.

Description

Information protection method and system based on queue data desensitization and differential privacy protection

Technical Field

The invention relates to the technical field of computers, in particular to an information protection method and system based on queue data desensitization and differential privacy protection.

Background

The real world data platform stores a large amount of medical sensitive data, and once the sensitive data is leaked or illegally utilized, the sensitive data can cause irreparable loss. Therefore, providing a scheme that can protect sensitive data is a need to be addressed.

Disclosure of Invention

The present invention aims to provide an information protection method and system based on queue data desensitization and differential privacy protection that overcomes or at least partially solves the above-mentioned problems.

In order to achieve the above purpose, the technical scheme of the invention is specifically realized as follows:

one aspect of the present invention provides an information protection method based on queue data desensitization and differential privacy protection, comprising:

the client performs local model training by utilizing a gradient descent strategy according to a local database and global model parameters sent by a received server to obtain a client model;

the client generates a random factor, and model disturbance is carried out on the client model by utilizing the random factor to obtain a disturbed client model;

the server side aggregates the disturbed client side models received from the client sides to obtain new global model parameters;

the server side sends the new global model parameter broadcast to each client side;

each client receives the new global model parameters, and performs local model training by using a gradient descent strategy according to the local database and the new global model parameters to obtain a new client model;

the method further comprises the steps of:

the server side sends an identity verification instruction to a storage side, wherein the identity verification instruction comprises identity data;

the storage terminal receives and processes the identity verification instruction, retrieves the identity verification data according to the identity data, compares the identity verification data with the identity data and generates a comparison result; matching key management authorities according to the comparison result, and generating a destruction verification code and a random number according to the key management authorities; generating a destroying key according to the destroying verification code and the random number and a preset algorithm, and sending a destroying confirmation instruction to the server;

the server receives the destroying confirmation instruction and sends a destroying confirmation key to the storage end according to the destroying confirmation instruction;

after the storage end receives the destroying confirmation key, verifying whether the destroying key is consistent with the destroying confirmation key, and acquiring an external key when verifying that the destroying key is consistent with the destroying confirmation key; and verifying the one-to-one correspondence between the external key and the verification key, and after determining the one-to-one correspondence between the external key and the verification key, fusing to generate a master key, and destroying data through the master key.

The server side aggregates the perturbed client side models received from the clients, and the obtaining of new global model parameters includes: and the server uses FedAVG algorithm to aggregate the perturbed client model received from each client to obtain new global model parameters.

Wherein the random factor conforms to a gaussian distribution.

The method comprises the steps that a client performs local model training by utilizing a gradient descent strategy according to a local database and global model parameters sent by a received server, and before the client model is obtained, the method further comprises the following steps: and dividing the swarm learning task, dividing the group task into a plurality of subtasks, and enabling each client to participate in training a machine learning model.

Wherein, before the client generates the random factor, and uses the random factor to perform model disturbance on the client model to obtain the disturbed client model, the method further comprises: and (5) carrying out noise adding processing on the model by utilizing a differential privacy technology.

Another aspect of the present invention provides an information protection system based on queue data desensitization and differential privacy protection, comprising:

the client is used for carrying out local model training by utilizing a gradient descent strategy according to the local database and the global model parameters sent by the received server to obtain a client model; generating a random factor, and performing model disturbance on the client model by using the random factor to obtain a disturbed client model;

the server is used for aggregating the disturbed client models received from the clients to obtain new global model parameters; broadcasting the new global model parameters to each client;

each client is further configured to receive the new global model parameter, and perform local model training by using a gradient descent strategy according to the local database and the new global model parameter to obtain a new client model;

the system further comprises: a storage end;

the server side is further configured to send an authentication instruction to the storage side, where the authentication instruction includes identity data;

the storage end is used for receiving and processing the identity verification instruction, calling identity verification data according to the identity data, comparing the identity verification data with the identity data and generating a comparison result; matching key management authorities according to the comparison result, and generating a destruction verification code and a random number according to the key management authorities; generating a destroying key according to the destroying verification code and the random number and a preset algorithm, and sending a destroying confirmation instruction to the server;

the server is further configured to receive the destruction confirmation instruction, and send a destruction confirmation key to the storage end according to the destruction confirmation instruction;

the storage end is further used for verifying whether the destroying key is consistent with the destroying confirmation key after receiving the destroying confirmation key, and acquiring an external key when verifying that the destroying key is consistent with the destroying confirmation key; and verifying the one-to-one correspondence between the external key and the verification key, and after determining the one-to-one correspondence between the external key and the verification key, fusing to generate a master key, and destroying data through the master key.

The server side aggregates the perturbed client side models received from the clients in the following manner to obtain new global model parameters: and the server uses FedAVG algorithm to aggregate the perturbed client model received from each client to obtain new global model parameters.

Wherein the random factor conforms to a gaussian distribution.

Wherein the system further comprises: the division module is used for carrying out local model training by utilizing a gradient descent strategy according to the local database and the global model parameters sent by the received server side at the client side, dividing the task of swarm learning before obtaining the client side model, dividing the group task into a plurality of subtasks, and each client side jointly participates in training the machine learning model.

Wherein the system further comprises: and the noise adding processing module is used for generating a random factor at the client, carrying out model disturbance on the client model by utilizing the random factor, and carrying out noise adding processing on the model by utilizing a differential privacy technology before the disturbed client model is obtained.

Therefore, the information protection method and the system based on queue data desensitization and differential privacy protection provided by the invention are used for protecting sensitive data security, realizing legal compliance of the sensitive data and maximizing data availability and mining value, and the real world data platform is used for carrying out data desensitization by combining a differential privacy technology with a queue data federal analysis and calculation technology based on Swarm learning, so that personal privacy and data security are protected. On the premise of ensuring personal sensitive data safety, if the problem exists that data destruction needs to be carried out on published data, adopting a technical means to remotely identify the user identity and safely destroy the sensitive data in the storage medium, and avoiding unauthorized users from recovering original data information by using residual data so as to achieve the aim of protecting key data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of data desensitization in an information protection method based on queue data desensitization and differential privacy protection according to an embodiment of the present invention;

fig. 2 is a schematic diagram of data desensitization according to an information protection method based on queue data desensitization and differential privacy protection according to an embodiment of the present invention;

fig. 3 is a flowchart of data destruction performed by an information protection method based on queue data desensitization and differential privacy protection according to an embodiment of the present invention;

fig. 4 is a schematic diagram of data destruction performed by an information protection method based on queue data desensitization and differential privacy protection according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an information protection system based on queue data desensitization and differential privacy protection according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 shows a flow chart of data desensitization in an information protection method based on queue data desensitization and differential privacy protection according to an embodiment of the present invention, fig. 2 shows a schematic diagram of data desensitization in an information protection method based on queue data desensitization and differential privacy protection according to an embodiment of the present invention, fig. 3 shows a flow chart of data destruction in an information protection method based on queue data desensitization and differential privacy protection according to an embodiment of the present invention, and fig. 4 shows a schematic diagram of data destruction in an information protection method based on queue data desensitization and differential privacy protection according to an embodiment of the present invention; referring to fig. 1 to fig. 4, an information protection method based on queue data desensitization and differential privacy protection according to an embodiment of the present invention includes:

s1, the client performs local model training by utilizing a gradient descent strategy according to a local database and global model parameters sent by a received server to obtain a client model.

This step is the step of local calculation shown in fig. 2, and in implementation, the client i is based on a local databaseAnd the global model parameters of the server received +.>As local parameter, i.e. +.>Performing gradient descent strategy for local model training to obtain +.>Where t represents the current round.

As an optional implementation manner of the embodiment of the invention, before the client performs local model training by using a gradient descent strategy according to the local database and the global model parameters sent by the received server to obtain the client model, the information protection method based on queue data desensitization and differential privacy protection provided by the embodiment of the invention further comprises the following steps: and dividing the swarm learning task, dividing the group task into a plurality of subtasks, and enabling each client to participate in training a machine learning model.

S2, the client generates a random factor, and model disturbance is carried out on the client model by using the random factor, so that the disturbed client model is obtained.

This step is the model perturbation step shown in FIG. 2, and in practice, each client generates a random noise n, which is Gaussian-distributed, usingThe local model is perturbed, where w is a matrix and n pairs of each element of the matrix produce noise.

As an optional implementation manner of the embodiment of the invention, before the client generates the random factor, and uses the random factor to perform model disturbance on the client model to obtain the disturbed client model, the information protection method based on queue data desensitization and differential privacy protection provided by the embodiment of the invention further comprises the following steps: and (5) carrying out noise adding processing on the model by utilizing a differential privacy technology.

S3, the server side aggregates the disturbed client side models received from the clients, and uses the SwarmLearning technology to aggregate the models through collective intelligence to obtain new global model parameters.

As an optional implementation manner of the embodiment of the present invention, the server side aggregates the perturbed client side models received from the respective client sides, and the obtaining of new global model parameters includes: and the server uses FedAVG algorithm to aggregate the disturbed client model received from each client to obtain new global model parameters.

This step is a step of model aggregation shown in fig. 2, and in implementation, the server uses the FedAVG algorithm to aggregate the data received from the clientObtaining new global model parameters->I.e. perturbed model parameters.

S4, the server side broadcasts and sends the new global model parameters to each client side.

This step is a step of model broadcasting shown in fig. 2, and in implementation, the server broadcasts new model parameters to each client.

And S5, each client receives new global model parameters, and performs local model training by using a gradient descent strategy according to the local database and the new global model parameters to obtain a new client model.

This step is a step of updating the local model shown in fig. 2, and when the method is implemented, each client receives new model parameters and performs local calculation again.

It can be seen that the invention uses differential privacy techniques in combination with the queue data federal analysis computing technique based on Swarm learning for data desensitization.

When the invention is concretely realized, federal learning and differential privacy are adopted, data are not directly transmitted in federal learning, gradient information is transmitted, and meanwhile, the gradient information can also be private. Therefore, when the gradient information is prevented from being revealed, differential privacy is adopted, and federal learning based on the differential privacy mainly adds noise to the gradient information, so that high communication or calculation cost does not exist.

At the same time, the present invention uses federal learning based on population learning (SwarmLearning, SL) for optimization. The parameters of the federal learning model are handled by a "central coordinator", resulting in a concentration of "rights", and its star architecture also results in reduced fault tolerance. The introduction of population learning can solve the problems to a certain extent.

Unlike traditional federal learning (CS architecture), group learning has no central parameter server, and does not need to upload parameters to a central server for aggregation, and data and model parameters are stored locally, so that the group learning has higher privacy and security. The method is a new paradigm of decentralized federal learning, and combines decentralized hardware infrastructure, distributed machine learning and blockchain, thereby safely adding member nodes, dynamic election leaders and aggregation model parameters.

The group learning combines two technologies of distributed machine learning and blockchain, and has the advantages of the two technologies: the right peer-to-peer between the nodes has higher security and fault tolerance. The use of blockchain techniques ensures traceability and non-tamper ability of model updates. Data security by recording model updates to the blockchain in the form of transactions and using the blockchain consensus mechanism

SL framework main component (FLforBC blockchain network):

1. (Swarm edge node)

SL node: a user-defined deep learning algorithm is run and the local model is iteratively updated.

2. (Swarm coordinator node)

SN node (swarmnetworknode): communicate with each other to maintain global state information of the model, and track training progress through the ethernet blockchain platform.

SWCI node (swarmlearing communication interface node): is securely connected to the SN node to view the status of the SL framework, and controls and manages.

Spidfemiwire server node: both the SN node and SL node contain a SPIREAgentWorkLoadAtteststor plugin that communicates with the SPIRE server node to prove its identity and to obtain and manage the SPIRE Verifiable Identity Document (SVID).

LS node (license server node): licenses are downloaded and managed to run the SL framework.

The simple process comprises the following steps:

SL trains on distributed nodes, dynamically selects a leader, registers a new node through a blockchain intelligent contract, acquires a model, and trains the local model until a defined synchronization condition (SI: synchronization time) is satisfied. The model parameters are then exchanged via a Swarm Application Programming Interface (API) and the latest model of updated parameters is built up in combination before starting a new round of training.

It should be noted that, in the information protection method based on queue data desensitization and differential privacy protection provided in the embodiment of the present invention, no sequence exists between the steps S1 to S5 and the steps S6 to S9, the steps S1 to S5 may be executed first, then the steps S6 to S9 may be executed, the steps S6 to S9 may be executed first, then the steps S1 to S5 may be executed, and the steps S1 to S5 and the steps S6 to S9 may be executed simultaneously, which is not limited in the present invention. The sequence numbers of the above steps are used only as illustrations of steps.

Referring to fig. 1, fig. 3, and fig. 4, the information protection method based on queue data desensitization and differential privacy protection provided by the embodiment of the present invention further includes:

s6, the server side sends an identity verification instruction to the storage side, wherein the identity verification instruction comprises identity data;

s7, the storage end receives and processes the identity verification instruction, retrieves the identity verification data according to the identity data, compares the identity verification data with the identity data and generates a comparison result; matching key management authority according to the comparison result, and generating a destruction verification code and a random number according to the key management authority; generating a destroying key according to a destroying verification code and a random number and a preset algorithm, and sending a destroying confirmation instruction to a server;

s8, the server receives the destroying confirmation command and sends a destroying confirmation key to the storage end according to the destroying confirmation command;

s9, after the storage end receives the destroying confirmation key, verifying whether the destroying key is consistent with the destroying confirmation key, and acquiring an external key when the destroying key is verified to be consistent with the destroying confirmation key; and verifying the one-to-one correspondence between the external key and the verification key, after determining the one-to-one correspondence between the external key and the verification key, fusing to generate a master key, and destroying data through the master key.

Therefore, according to the information protection method based on queue data desensitization and differential privacy protection provided by the embodiment of the invention, in order to protect sensitive data security, realize legal compliance of the sensitive data and maximize data availability and mining value, a real-world data platform uses a differential privacy technology and a queue data federal analysis and calculation technology based on Swarm learning to perform data desensitization, so that personal privacy and data security are protected. On the premise of ensuring personal sensitive data safety, if the problem exists that data destruction needs to be carried out on published data, adopting a technical means to remotely identify the user identity and safely destroy the sensitive data in the storage medium, and avoiding unauthorized users from recovering original data information by using residual data so as to achieve the aim of protecting key data.

Therefore, the invention adopts the combination of the differential privacy technology and the queue data federal analysis and calculation technology based on the Swarm learning to perform data desensitization; meanwhile, remote data destruction and full-flow information safety protection can be performed.

Fig. 5 is a schematic structural diagram of an information protection system based on queue data desensitization and differential privacy protection according to an embodiment of the present invention, where the information protection system based on queue data desensitization and differential privacy protection applies the above method, and the following only briefly describes the structure of the information protection system based on queue data desensitization and differential privacy protection, and other less matters, please refer to the related description in the above information protection method based on queue data desensitization and differential privacy protection, referring to fig. 5, where the information protection system based on queue data desensitization and differential privacy protection according to an embodiment of the present invention includes:

the server side is used for aggregating the disturbed client side models received from the client sides to obtain new global model parameters; transmitting a new global model parameter broadcast to each client;

and each client is also used for receiving new global model parameters, and performing local model training by utilizing a gradient descent strategy according to the local database and the new global model parameters to obtain a new client model.

As an optional implementation manner of the embodiment of the invention, the server side aggregates the perturbed client model received from each client side to obtain new global model parameters by the following way: and the server uses FedAVG algorithm to aggregate the disturbed client model received from each client to obtain new global model parameters.

As an alternative implementation of the embodiment of the present invention, the random factor conforms to a gaussian distribution.

As an optional implementation manner of the embodiment of the present invention, the information protection system based on queue data desensitization and differential privacy protection provided by the embodiment of the present invention further includes: a storage end;

the server side is further used for sending an identity verification instruction to the storage side, wherein the identity verification instruction comprises identity data;

the storage end is used for receiving and processing the identity verification instruction, retrieving the identity verification data according to the identity data, comparing the identity verification data with the identity data and generating a comparison result; matching key management authority according to the comparison result, and generating a destruction verification code and a random number according to the key management authority; generating a destroying key according to a destroying verification code and a random number and a preset algorithm, and sending a destroying confirmation instruction to a server;

the server is also used for receiving the destroying confirmation instruction and sending a destroying confirmation key to the storage end according to the destroying confirmation instruction;

the storage end is also used for verifying whether the destroying key is consistent with the destroying confirmation key after receiving the destroying confirmation key, and acquiring an external key when verifying that the destroying key is consistent with the destroying confirmation key; and verifying the one-to-one correspondence between the external key and the verification key, after determining the one-to-one correspondence between the external key and the verification key, fusing to generate a master key, and destroying data through the master key.

Therefore, the information protection system based on queue data desensitization and differential privacy protection provided by the embodiment of the invention is used for protecting sensitive data security, realizing legal compliance of the sensitive data and maximizing data availability and mining value, and the real world data platform is used for carrying out data desensitization by combining a differential privacy technology with a queue data federal analysis and calculation technology based on Swarm learning, so that personal privacy and data security are protected. On the premise of ensuring personal sensitive data safety, if the problem exists that data destruction needs to be carried out on published data, adopting a technical means to remotely identify the user identity and safely destroy the sensitive data in the storage medium, and avoiding unauthorized users from recovering original data information by using residual data so as to achieve the aim of protecting key data.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. An information protection method based on queue data desensitization and differential privacy protection is characterized by comprising the following steps:

further comprises:

2. The method of claim 1, wherein the serving the server aggregate perturbed client models received from respective clients to obtain new global model parameters comprises:

and the server uses FedAVG algorithm to aggregate the perturbed client model received from each client to obtain new global model parameters.

3. The method of claim 1, wherein the random factor conforms to a gaussian distribution.

4. The method according to claim 1, wherein before the client performs local model training by using a gradient descent strategy according to the local database and the global model parameters sent by the received server, the method further comprises:

and dividing the swarm learning task, dividing the group task into a plurality of subtasks, and enabling each client to participate in training a machine learning model.

5. The method of claim 1, wherein generating a random factor at the client, and using the random factor to model perturb the client model, and before obtaining the perturbed client model, further comprises:

and (5) carrying out noise adding processing on the model by utilizing a differential privacy technology.

6. An information protection system based on queue data desensitization and differential privacy protection, comprising:

further comprises: a storage end;

7. The system of claim 6, wherein the server aggregates the perturbed client model received from each client to obtain new global model parameters by:

8. The system of claim 6, wherein the random factor conforms to a gaussian distribution.

9. The system of claim 6, further comprising: the division module is used for carrying out local model training by utilizing a gradient descent strategy according to the local database and the global model parameters sent by the received server side at the client side, dividing the task of swarm learning before obtaining the client side model, dividing the group task into a plurality of subtasks, and each client side jointly participates in training the machine learning model.

10. The method as recited in claim 6, further comprising: and the noise adding processing module is used for generating a random factor at the client, carrying out model disturbance on the client model by utilizing the random factor, and carrying out noise adding processing on the model by utilizing a differential privacy technology before the disturbed client model is obtained.