WO2023134076A1

WO2023134076A1 - Data protection method and system, and storage medium

Info

Publication number: WO2023134076A1
Application number: PCT/CN2022/090192
Authority: WO
Inventors: 李泽远; 王健宗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-01-12
Filing date: 2022-04-29
Publication date: 2023-07-20
Also published as: CN114357521A

Abstract

The present application is applicable to the technical field of data security. Provided are a data protection method and system, and a storage medium. In the method, locally stored first data is acquired by means of a client and then encoded to obtain a plurality of groups of second data, such that an attacker can be prevented from acquiring, by cracking the client, plaintext corresponding to the first text, thereby improving the security of the first data; client information in the second data is removed by means of a shuffle server, such that an attacker can be prevented from further stealing, by acquiring the client information in the second data, data stored in the client, thereby improving the security of the client information and the security of the data stored in the client; and time sequence information of the second data is removed, such that an attacker can be prevented from reckoning a correspondence between the second data and the client on the basis of the time sequence information of the second data, thereby improving the privacy of the client, which is participating in federated learning, and improving the defense capability of federated learning when encountering an output attack.

Description

Data protection method, system and storage medium

This application claims the priority of a Chinese patent application with application number 202210031150.3 titled "Data Protection Method and System" filed with the China Patent Office on January 12, 2022, the entire contents of which are incorporated herein by reference.

technical field

The application belongs to the technical field of data security, and in particular relates to a data protection method, system and storage medium.

Background technique

Federated Learning (Federated Learning) is a distributed learning model. By sharing model parameters between multiple clients that store data locally and a server equipped with a deep learning model, multiple The data from the client is used to train the deep learning model. Federated learning has the advantages of efficient use of data, which can improve the performance of models facing different data sets, and can ensure the data privacy of clients. Therefore, more and more deep learning models start to use federated learning for training.

In order to ensure data security, it is necessary to set up a defense mechanism against network attacks during training. Traditional defense mechanisms can identify and defend against poisoning attacks that paralyze the training process of federated learning. At present, there is an output attack on the market that obtains and inversely infers model parameters to obtain client data. The inventor found that following the training process of alliance learning during the output attack process can bypass the traditional defense mechanism and easily cause other participation. The identity information and data leakage of the trained client has potential security risks. Therefore, how to improve the defense capability of federated learning in the face of output attacks has become an urgent problem to be solved.

Contents of the invention

In view of this, the embodiments of the present application provide a data protection method, system, and storage medium to solve the problem of poor defense against output attacks in existing federated learning.

The first aspect of the present application provides a data protection method applied to a data protection system, the data protection system includes a shuffling server and a client, the shuffling server is connected to the client, and the method includes:

Obtain multiple sets of first data stored locally through the client;

Encoding the multiple sets of first data by the client to obtain multiple sets of second data, wherein the multiple sets of second data correspond to the multiple sets of first data;

The multiple sets of second data are shuffled by the shuffling server to eliminate client information in the multiple sets of second data, and eliminate timing information of the multiple sets of second data, the timing information It is used to reflect the time and order in which the shuffling server acquires the multiple sets of second data.

A second aspect of the present application provides a data protection system, the system comprising:

a shuffling server and a client, the shuffling server being connected to the client;

The client is used to acquire multiple sets of first data stored locally;

The client is configured to encode the multiple sets of first data to obtain multiple sets of second data, and the multiple sets of second data are in one-to-one correspondence with the multiple sets of first data;

The shuffling server is used for shuffling the multiple sets of second data, so as to eliminate the client information in the multiple sets of second data, and eliminate the timing information of the multiple sets of second data, the timing The information is used to reflect the time and order in which the shuffling server acquires the multiple sets of second data.

A third aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores at least one computer-readable instruction, and when the at least one computer-readable instruction is executed by a processor, the following steps are implemented:

Obtain multiple sets of first data stored locally through the client;

The data protection method, system, and storage medium described in this application obtain and encode the first data stored locally through the client to obtain multiple sets of second data, which can prevent attackers from obtaining the first data corresponding to the first data by cracking the client. plaintext to improve the security of the first data; and by shuffling the server to eliminate the client information of the second data, it can prevent the attacker from further stealing the data stored in the client by obtaining the client information of the second data, thereby improving the The security of the client information and the security of the data stored in the client; also by eliminating the timing information of the second data, it is possible to prevent an attacker from estimating the corresponding relationship between the second data and the client based on the timing information of the second data. This application can promote the construction of smart cities and be applied in smart buildings, smart security, smart communities, smart life, Internet of Things and other fields, which improves the privacy of clients participating in federated learning and improves the defense of federated learning in the face of output attacks ability.

Description of drawings

FIG. 1 is a first structural schematic diagram of a data protection system provided by an embodiment of the present application;

Fig. 2 is a second structural schematic diagram of the data protection system provided by the embodiment of the present application;

FIG. 3 is a schematic flow chart of the first data protection method provided by the embodiment of the present application;

FIG. 4 is a schematic diagram of a third structure of a data protection system provided by an embodiment of the present application;

FIG. 5 is a schematic flowchart of a second data protection method provided by an embodiment of the present application.

Detailed ways

In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and/or components, but does not exclude one or more other Presence or addition of features, wholes, steps, operations, elements, components and/or collections thereof.

It should also be understood that the term "and/or" used in the description of the present application and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations.

As used in this specification and the appended claims, the term "if" may be construed, depending on the context, as "when" or "once" or "in response to determining" or "in response to detecting ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be construed, depending on the context, to mean "once determined" or "in response to the determination" or "once detected [the described condition or event] ]” or “in response to detection of [described condition or event]”.

In addition, in the description of the specification and the appended claims of the present application, the terms "first", "second", "third" and so on are only used to distinguish descriptions, and should not be understood as indicating or implying relative importance.

Reference to "one embodiment" or "some embodiments" or the like in the specification of the present application means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "including", "comprising", "having" and variations thereof mean "including but not limited to", unless specifically stated otherwise.

In the application, in order to ensure data security, it is necessary to set up a defense mechanism against network attacks during training. Traditional defense mechanisms can identify and defend against poisoning attacks that paralyze the training process of federated learning. At present, there is an output attack on the market that acquires and infers model parameters to obtain client data. During the output attack, it follows the training process of alliance learning, which can bypass the traditional defense mechanism and easily cause other clients participating in the training. The identity information and data leakage of the terminal have potential security risks. Therefore, how to improve the defense capability of federated learning in the face of output attacks has become an urgent problem to be solved.

In view of the above-mentioned technical problems, the embodiment of the present application provides a data protection method. The client obtains and encodes the first data stored locally to obtain multiple sets of second data, which can prevent attackers from obtaining the first data by cracking the client. Corresponding plaintext to improve the security of the first data; and eliminate the client information of the second data by shuffling the server, which can prevent the attacker from further stealing the data stored in the client by obtaining the client information of the second data, Improve the security of client information and the security of data stored in the client; also by eliminating the timing information of the second data, it is possible to prevent attackers from estimating the correspondence between the second data and the client based on the timing information of the second data. It improves the privacy of clients participating in federated learning, and improves the defense capability of federated learning in the face of output attacks.

The data protection method provided in the embodiment of the present application can be applied to a data protection system, and the data protection system can be installed in a federated learning system or any other type of distributed learning system.

Fig. 1 exemplarily shows a schematic structural diagram of a data protection system 100, the data protection system 100 includes a client 110 and a shuffle (Shuffle) server 120, and the shuffle server 120 is connected to the client 110;

The client 110 is used to acquire multiple sets of first data stored locally;

The client 110 is configured to encode multiple sets of first data to obtain multiple sets of second data, and multiple sets of second data correspond to multiple sets of first data;

The shuffling server 120 is used for shuffling the multiple sets of second data, so as to eliminate the information of the client 110 in the multiple sets of second data, and eliminate the timing information of the multiple sets of second data, and the timing information is used to reflect that the shuffling server 120 The time and sequence of multiple sets of second data are acquired.

In the application, the client 110 can be a terminal device with data storage capability, and the data of the client 130 can be stored in at least one database. The database types supported by the client 130 are introduced below: according to the data storage structure of the database, The client 110 can support databases with relational and non-relational data storage structures; according to the system architecture of the database, the client 110 can support databases with both distributed and centralized system architectures; specifically, it can support Oracle, MySQL, and MongoDB , SQL Server, IBM Db2 and Dameng database and other different types of databases.

In applications, terminal devices can be mobile phones, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) devices, notebook computers, ultra-mobile personal computers (ultra-mobile personal computer, UMPC), netbook, personal digital assistant (personal digital assistant, PDA), etc., the embodiment of the present application does not impose any limitation on the specific type of the terminal device.

In the application, the client 110 may have a built-in encoding (Encode) module 130, and the encoding module 130 is used to realize the encoding of multiple sets of first data to obtain the function of multiple sets of second data. In the encoding module 130, the first data is encoded Before encoding, the first data may be encrypted, so as to improve the data security of the first data during the encoding process.

In the application, the shuffling server 120 can be set in an independent server, and a third-party user other than the client needs to perform identity authentication when accessing the shuffling server, and the identity authentication can be based on RSA (Ron Rivest-Adi Shamir-Leonard Adleman) public key cryptosystem and other encryption algorithms. The shuffling server 120 can complete the shuffling of the second data without reading the second data. For a specific shuffling method, refer to the data protection method corresponding to FIG. 3 or FIG. 5 below.

FIG. 2 exemplarily shows a schematic structural diagram of a data protection system 100, including a sequentially connected client 110, a shuffling server 120, and an analysis (Analyze) server 140, and the shuffling server 120 is connected to the analysis server 140;

The analysis server 140 is configured to integrate multiple sets of second data after shuffling according to the shuffling rules of the shuffling server 120 to obtain third data;

The analysis server 140 is also used to decode the third data according to the encoding rule of the client 110, and the decoded third data is used for training the deep learning model;

In the application, the analysis server 140 can be set in an independent server, and when the client 110 is set in an independent server, the client 110, the shuffling server 120 and the analysis server 140 can be set in three mutually independent servers . The analysis server 140 is configured to integrate and decode the shuffled second data to obtain third data, and use the third data for training a deep learning model.

In the application, the analysis server 140 includes a deep learning model. In the data protection system 100, the analysis server 140 is configured to receive the second data obtained through encoding and shuffling, and train the deep learning model based on the second data. The analysis server 140 in the data protection system 100 needs to summarize all data and analyze the data to achieve federated learning, and the server where the analysis server 140 is located has a high degree of openness, and its protection against attacks is weak. Usually, the attacker will target the analysis The server 140 initiates an attack, and analyzes the identity of the client participating in the federated learning according to the second data or the third data in the analyzing server 140, in order to try to further crack the client and obtain local information stored in the client.

In the application, the client 110 and the shuffling server 120 constitute the data protection system 100 built into the federated learning system, and obtain the second data by encoding and shuffling the acquired first data, so that the attacker cannot , obtain the identity information and data of the client that sent the first data corresponding to the second data, improve the defense capability of federated learning in the face of output attacks, and improve the security of the identity information and data of the clients participating in the training, as follows The specific encoding method and shuffling method of the client 110 and the shuffling server 120 will be described.

It can be understood that, the structure shown in the embodiment of the present application does not constitute a specific limitation on the data protection system 100 . In other embodiments of the present application, the data protection system 100 may include more or fewer components than shown in the figure, or combine certain components, or different components, for example, may also include input and output devices, network access devices, etc. . The illustrated components can be realized in hardware, software or a combination of software and hardware.

As shown in Figure 3, the data protection method provided by the embodiment of the present application is applied to a data protection system, including the following steps S301 to S303:

Step S301. Obtain multiple sets of first data stored locally through the client.

In the application, the first data of the client is part of the plaintext data generated or obtained locally by the client; the client can filter the first data from the plaintext data stored locally according to actual needs, so as to use the first data for federated learning , the client may input the first data into the encoding module after obtaining the first data, and the encoding module may be built in the client, or set in the server and connected to the client.

In one embodiment, after step S301 includes:

The first data is encrypted by the client.

In the application, after the client obtains the first data, it can encrypt the first data to obtain the ciphertext (Cipher Text) corresponding to the first data, so as to prevent the attacker from obtaining the cipher text corresponding to the first data by cracking the client or the encoding module. Plain Text to improve the security of the first data stored on the client and during the encoding process. Among them, the attacker can be a user who participates in federated learning, or a third-party user who does not participate in federated learning.

In the application, the encryption algorithm for encrypting the first data may include a symmetric encryption algorithm (Symmetric Encryption Algorithm) or an asymmetric encryption algorithm (Asymmetric Cryptographic Algorithm). Specifically, the symmetric encryption algorithm may include RC4 (Rivest Cipher 4, a stream encryption algorithm), RC2 (Rivest Cipher 2, another stream encryption algorithm), DES (Data Encryption Standard, data encryption standard) or AES (Advanced Encryption Standard (Advanced Encryption Standard), etc.; asymmetric encryption algorithms may include RSA, ECC (Elliptic Curve Cryptography, elliptic curve algorithm), DSA (Digital Signature Algorithm, a digital signature algorithm), etc., the embodiment of the present application performs the first data The specific type of the encrypted encryption algorithm is not limited in any way.

Step S302: Encoding the multiple sets of first data by the client to obtain multiple sets of second data, and the multiple sets of second data correspond to the multiple sets of first data one-to-one.

In the application, the client can encode multiple sets of first data through the encoding module to obtain multiple sets of second data, and multiple sets of second data correspond to multiple sets of first data one-to-one. Wherein, multiple sets of first data may come from multiple clients, and each client may provide one or more sets of first data. The encoding module can encode the first data into the second data in a specified encoding format, and the specified encoding format can be different types such as ASCII (American Standard Code for Information Interchange), ANSI (an extended ASCII code) or Unicode (Unicode). encoding format. The embodiment of this application does not impose any restrictions on the specific type of the specified encoding format.

In the application, multiple sets of first data are encoded by the encoding module to obtain multiple sets of second data, which can prevent attackers from obtaining the plaintext corresponding to the first data by cracking the encoding module, so as to improve the security of the first data; The data size of the first data is compressed, so that the data protection system can reduce the processing load and increase the processing speed when processing the second data subsequently.

In one embodiment, step S302 includes:

Each sub-client encodes the locally stored first data to obtain corresponding second data.

In an application, a client may consist of multiple sub-clients, and each sub-client has a built-in sub-coding unit. Each sub-encoding unit is used to encode the first data of a corresponding sub-client to obtain a set of encoded second data. By parallel processing the first data of the corresponding client by multiple sub-encoding units, the acquisition speed and encoding speed of the first data by the encoding module can be improved.

FIG. 4 exemplarily shows a schematic structural diagram of the data protection system 100 when the client 110 includes multiple sub-clients 111 , wherein each sub-client 111 has a built-in sub-encoding unit 131 .

Step S303, using the shuffling server to shuffle multiple sets of second data to eliminate client information in multiple sets of second data, and to eliminate timing information of multiple sets of second data, the timing information is used to reflect the data acquired by the shuffling server Time and sequence of multiple sets of second data.

In the application, multiple sets of second data can be shuffled through the shuffling server to eliminate client information in multiple sets of second data. The client information can be metadata of the second data, and the metadata can include The data source address of the client, the physical topology of the client, the system version information of the client, the domain name (Domain Name) of the client, the library name of the database used by the client to store the second data, etc. Among them, the data source address can specifically include the client's IP address (Internet Protocol Address), interface address, MAC address (Media Access Control Address), etc. The physical topology of the client is used to reflect all the devices included in the client or the devices connected to the client. all equipment.

In the application, by shuffling multiple sets of second data through the shuffling server, timing information of multiple sets of second data can also be eliminated, and the timing information is used to reflect the time and order in which the shuffling server acquires each set of second data.

In the application, by eliminating the client information of the second data, it is possible to prevent the attacker from obtaining the client information of the second data, and prevent the attacker from further stealing the data stored in the client based on the client information of the second data, thereby improving the Client Information Security and Client Data Security. Also by eliminating the timing information of the second data, it is possible to prevent the attacker from estimating the corresponding relationship between the second data and the client based on the timing information of the second data, which improves the privacy of the client participating in the federated learning and improves the protection against attacks of the federated learning. defense capability.

In one embodiment, step S303 includes:

Obtain the data volume of multiple sets of second data by shuffling the server;

Adding noise to the amount of data through the shuffling server, so that the amount of data satisfies differential privacy; or, deleting the preset amount of data in multiple sets of second data through the shuffling server, so that the amount of data satisfies differential privacy.

In the application, the data volume of multiple sets of second data can be obtained by shuffling the server. Specifically, when the second data is a SQL (Structured Query Language, Structured Query Language) statement, the Count (count) function can be used to determine how many The number of SQL statements in the set of second data, so as to determine the data volume of multiple sets of second data.

In the application, after determining the data volume of multiple sets of second data, noise can be added to the data volume by shuffling the server, and the noise type can include Laplace noise (Laplace Noise) or Gaussian noise (Gaussian Noise). Disturbing the amount of data so that the amount of data satisfies differential privacy (Differential Privacy) can prevent the attacker from obtaining the data amount of the second data, or allow the attacker to obtain the wrong amount of the second data.

In the application, after determining the data volume of multiple sets of second data, the preset data volume in multiple sets of second data can also be deleted through the shuffling server, wherein the preset data volume can be randomized through a randomized algorithm (Randomized Algorithm) The method is generated so that the preset data amount used for deleting multiple sets of second data is different each time, so that the data amount satisfies differential privacy.

In the application, by adding noise to the data volume through the shuffling server, or deleting the preset data volume in multiple sets of second data through the shuffling server, so that the data volume of the second data satisfies differential privacy, which can prevent attackers from obtaining the second The data volume of the data prevents the attacker from estimating the number of clients and the corresponding relationship between the second data and the client based on the data volume of the second data, improves the privacy of the clients participating in the federated learning, and improves the federated learning in the face of output attacks defense ability.

In one embodiment, after step S303, it also includes:

integrating multiple sets of shuffled second data through the analysis server to obtain third data;

The analysis server decodes the third data according to the coding rules of the client, and the decoded third data is used for training the deep learning model.

In the application, after the shuffling server completes the shuffling of multiple sets of second data, the analysis server can first obtain and integrate the multiple sets of shuffled second data to obtain third data, and decode the third data ; The analysis server may first obtain multiple sets of shuffled second data for decoding, and integrate the decoded second data. The embodiment of the present application does not impose any limitation on the order in which the analysis server integrates and decodes the second data after it acquires it.

In the application, the decoded third data can be used to train the deep learning model. At this time, the decoded third data has eliminated the client message, and the shuffling server obtains each set of second data included in the third data. Timing information, after the analysis server is cracked by the attacker, can prevent the attacker from analyzing the identity of the client participating in the federated learning based on the third data, and prevent the attacker from further obtaining the local information stored in the client based on the identity of the client, which improves customer security. Terminal identity information and data security.

In the data protection method provided by the embodiment of the present application, the client obtains and encodes the first data stored locally to obtain multiple sets of second data, which can prevent an attacker from obtaining the plaintext corresponding to the first data by cracking the client, so as to improve The security of the first data; and eliminate the client information of the second data by shuffling the server, which can prevent the attacker from further stealing the data stored in the client by obtaining the client information of the second data, and improve the security of the client information Security and the security of the data stored by the client; also by eliminating the timing information of the second data, it is possible to prevent attackers from estimating the correspondence between the second data and the client based on the timing information of the second data, which improves the chance of participating in federated learning The privacy of the client and improve the defense capability of federated learning against attacks.

As shown in FIG. 5, in one embodiment, based on the embodiment corresponding to FIG. 3, the following steps S501 to S510 are included:

Step S501. Obtain multiple sets of first data stored locally through the client.

In the application, the data protection method provided in step S501 is consistent with the above step S301, and will not be repeated here.

Step S502, each sub-client encodes each piece of sub-data of the first data stored locally to obtain a set of corresponding second data, the second data includes multiple pieces of sub-data and is one-to-one with the multiple pieces of first data correspond.

In the application, the built-in sub-encoding unit of the sub-client can include the first encoding mode and the second encoding mode. When the sub-encoding unit adopts the first encoding mode, each piece of sub-data of the first data can be encoded one by one. The encoding corresponds to generating a piece of sub-data of the second data; multiple pieces of sub-data of the first data can also be encoded in parallel to obtain a set of encoded second data; each piece of sub-data included in the second data has a corresponding piece of the first data sub-data, and the number of sub-data of the second data is the same as the number of sub-data of the first data, the difference is that each sub-data of the first data is unencoded plaintext data, and each sub-data of the second data is Encoded ciphertext data.

In the application, the sub-encoding unit encodes each piece of sub-data of the first data through the first encoding mode, and can encode each piece of sub-data without distinction, thereby improving the discreteness of the second data obtained after encoding.

Step S503, using the sub-client to obtain multiple sets of sub-data of the first data according to the data attributes of each piece of sub-data stored locally, each set of sub-data of the first data includes at least one piece of sub-data of the same data attribute .

In the application, when the sub-coding unit adopts the second coding mode, multiple sets of sub-data of the first data can be obtained by determining the data attribute of each piece of sub-data of the first data, and each set of sub-data includes at least one item with the same data The child data of the attribute. Wherein, the data attribute may specifically be a data type (such as a character type, an integer type, a floating point type, etc.), a record (Tuple), a field (Field), a primary key (Primary Key) or a foreign key (Foreign Key), etc. By acquiring multiple groups of sub-data of the first data, the sub-data with the same data attribute in the first data can be classified, so that each group of sub-data of the first data includes multiple pieces of sub-data with the same data attribute.

Step S504: Encoding each group of sub-data of the first data through the sub-client to obtain a corresponding group of second data, the second data includes multiple groups of sub-data and is in one-to-one correspondence with the multiple groups of sub-data of the first data.

In the application, after the sub-encoding unit adopts the second encoding mode and completes the grouping of the first data, and obtains multiple groups of sub-data, each group of sub-data of the first data can be encoded one by one. Specifically, the first data can be encoded one by one. A group of sub-data of the data is encoded group by group, corresponding to a group of sub-data for generating the second data; multiple groups of sub-data of the first data can also be encoded in parallel to obtain a group of encoded second data; the second data includes Each group of sub-data of the first data has a corresponding group of sub-data of the first data, and the number of groups of sub-data of the second data is the same as the number of groups of sub-data of the first data, the difference is that each group of sub-data of the first data is unencoded plaintext data, and each group of sub-data of the second data is encoded ciphertext data.

In the application, when the sub-coding unit adopts the second coding mode, by classifying the sub-data with the same data attribute in the first data, and coding each group of sub-data of the first data, the coded sub-data including multiple The second data of the group of sub-data can summarize the sub-data of the first data with the same data attribute, and improve the aggregation of each group of sub-data in the encoded second data.

In the application, the user can select the first encoding mode or the second encoding mode of the sub-coding unit to encode the first data according to actual needs, and the embodiment of the present application does not impose any restrictions on the encoding mode of the encoding module and its sub-coding units.

Step S505, using the shuffling server to shuffle each piece of sub-data of multiple sets of second data.

In an application, the shuffle server may include a first shuffle schema and a second shuffle schema. The first shuffling mode may be used for shuffling the second data encoded by the first encoding mode and the second encoding mode. Specifically, when the sub-encoding unit uses the first encoding mode to encode the first data, and the obtained second data includes multiple pieces of sub-data, the shuffling server can obtain multiple sets of second data output by all sub-encoding units, and Each piece of sub-data of the second data group is shuffled indiscriminately; when the sub-encoding unit uses the second encoding mode to encode the first data, and the obtained second data includes multiple sets of sub-data, the shuffling server can obtain all sub-data The multiple sets of second data output by the encoding unit, and indiscriminately shuffling each piece of sub-data of the multiple sets of second data.

In an application, the second shuffling mode may be used for shuffling the second data encoded by the second encoding mode. The embodiment of the present application does not impose any limitation on the shuffling mode of the shuffling server. The shuffling method in the first shuffling mode is consistent with the shuffling method provided in step S303 above, and will not be repeated here. The shuffling method in the second shuffling mode will be described below based on steps S506 and S507.

Step S506, reorganizing multiple sets of second data through the shuffling server to obtain multiple sets of fourth data, each set of fourth data includes multiple sets of sub-data with the same data attribute.

In the application, after the sub-encoding unit uses the second encoding mode to encode the first data, the shuffling server can adopt the second shuffling mode. Specifically, the shuffling server can determine the multiple sets of sub-encodings included in each set of second data. data attributes of the data, and according to the data attributes of each group of sub-data, multiple sets of fourth data are generated, and each set of fourth data includes multiple sets of sub-data with the same data attribute. Integration of multiple sets of second data can be achieved, and sub-data with the same data attribute in multiple sets of second data can be integrated into a set of fourth data.

For example, the shuffling server receives two sets of second data, the first set of second data includes the first set of sub-data and the second set of sub-data, the second set of second data includes the third set of sub-data and the fourth set of sub-data , where the data attribute of the first group of sub-data is floating-point type, the data attribute of the second group of sub-data is integer, the data attribute of the third group of sub-data is floating-point type, and the data attribute of the fourth group of sub-data is integer type, the shuffling server can generate two sets of fourth data, the first set of fourth data includes the first set of sub-data and the third set of sub-data whose data attributes are both floating-point types, and the second set of fourth data includes the above-mentioned The data attributes are both the second group of sub-data and the fourth group of sub-data of integer type.

In the application, by shuffling the server to reorganize multiple sets of second data, multiple sets of sub-data with the same data attribute can be integrated, so that each set of fourth data includes multiple pieces of sub-data with the same data attribute, improving the fourth Data Discernibility and Data Availability.

Step S507, shuffling each group of fourth data through the shuffling server.

In the application, the difference between the second shuffling mode and the first shuffling mode is that the first shuffling mode simultaneously shuffles multiple sets of second data, and the shuffled multiple sets of second data are output to the The analysis server; the second shuffling mode shuffles each group of fourth data separately, and each group of fourth data after shuffling is output to the analysis server as a data set, and each group of fourth data after shuffling includes The sub-data with the same data attribute improves the identification and data availability of the fourth data, which is beneficial for the analysis server to train the deep learning model in a targeted manner according to the data attribute. Compared with the second shuffling mode, the multiple sets of second data shuffled in the first shuffling mode have no correlation among them and have higher privacy.

Step S508, integrate the multiple sets of shuffled second data through the analysis server to obtain third data;

Step S509, the analysis server decodes the third data according to the coding rules of the client, and the decoded third data is used for training the deep learning model.

In application, the analysis methods in step S508 and step S509 are consistent with the analysis method provided in step S303 above, and will not be repeated here.

In step S510, the analysis server decodes each set of shuffled fourth data according to the encoding rules of the client, and each decoded set of fourth data is used for training a deep learning model.

In application, the decoding method in step S510 is consistent with the decoding method provided in step S303 above, and will not be repeated here. The difference is that the analysis server can obtain the data attributes of the sub-data included in each group of fourth data, so that the analysis server can perform targeted training on the deep learning model according to the fourth data of different data attributes, which improves the training efficiency of the analysis server.

In the application, the encoding module of the client or the sub-encoding unit of the sub-client can select one of the first encoding mode and the second encoding mode to encode the first data, wherein the first encoding mode can encode the first Data encoding can improve the discreteness of the encoded second data, and the second encoding mode can be encoded according to the data attributes of the sub-data of the first data, which can improve the aggregation of each group of sub-data of the second data type; shuffling The server can choose the first shuffling mode and the second shuffling mode to shuffle the second data, wherein the first shuffling mode can shuffle the second data indiscriminately, which improves the quality of the shuffled second data. Privacy, the second shuffling mode can reorganize multiple sets of second data according to data attributes to obtain fourth data. The fourth data includes multiple sets of sub-data with the same data attributes, which improves the identification and data availability. Through the optional encoding mode and shuffling mode, users can improve the discreteness or aggregation of the encoded second data according to actual needs, and improve the privacy of the shuffled second data or the shuffled fourth data. Data availability of data, thus increasing the flexibility of data processing.

It should be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts that are not detailed or recorded in a certain embodiment, refer to the relevant descriptions of other embodiments.

Those skilled in the art can appreciate that the modules and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

In the embodiments provided in this application, it should be understood that the disclosed terminal device and method may be implemented in other ways. For example, the terminal device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be in electrical, mechanical or other forms.

In the embodiments provided in the present application, the computer-readable storage medium may be non-volatile or volatile.

In the embodiments provided in this application, the computer-readable storage medium may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function, etc.; store data The zone can store data created according to the use of blockchain nodes, etc.

The above-described embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still implement the foregoing embodiments Modifications to the technical solutions described in the examples, or equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application, and should be included in the Within the protection scope of this application.

Claims

A data protection method applied to a data protection system, the data protection system includes a shuffling server and a client, the shuffling server is connected to the client, wherein the method includes:

Obtain multiple sets of first data stored locally through the client;

Encoding the multiple sets of first data by the client to obtain multiple sets of second data, wherein the multiple sets of second data correspond to the multiple sets of first data;

The multiple sets of second data are shuffled by the shuffling server to eliminate client information in the multiple sets of second data, and eliminate timing information of the multiple sets of second data, the timing information It is used to reflect the time and order in which the shuffling server acquires the multiple sets of second data.
The data protection method according to claim 1, wherein the data protection system further comprises an analysis server connected to the shuffling server;

After the shuffling of the multiple sets of second data by the shuffling server, it further includes:

integrating the multiple sets of shuffled second data by the analysis server to obtain third data;

The analysis server decodes the third data according to the encoding rule of the client, and the decoded third data is used for training a deep learning model.
The data protection method according to claim 1, wherein the client includes a plurality of sub-clients, each of the sub-clients is respectively connected to the shuffling server, and each of the sub-clients stores a group first data;

Encoding the multiple sets of first data by the client to obtain multiple sets of second data includes:

Each of the sub-clients encodes the locally stored first data to obtain corresponding second data.
The data protection method according to claim 3, wherein said encoding the locally stored first data by said sub-client to obtain corresponding second data comprises:

Each piece of sub-data of the locally stored first data is encoded by each of the sub-clients to obtain a set of corresponding second data, the second data includes a plurality of pieces of sub-data and is related to the first data The multiple pieces of sub-data correspond one-to-one.
The data protection method according to claim 3, wherein said encoding the locally stored first data by said sub-client to obtain corresponding second data comprises:

According to the data attributes of each piece of sub-data stored locally, the sub-client acquires multiple sets of sub-data of the first data, and each set of sub-data of the first data includes at least one of the same subdata of the data attribute;

Each set of sub-data of the first data is encoded by the sub-client to obtain a set of corresponding second data, the second data includes multiple sets of sub-data and is combined with multiple sets of sub-data of the first data The data correspond one to one.
The data protection method according to claim 1, wherein said shuffling said plurality of sets of second data by said shuffling server comprises:

Obtain the data volume of the plurality of sets of second data through the shuffling server;

Add noise to the amount of data through the shuffling server, so that the amount of data satisfies differential privacy; or, delete the preset data amount in the multiple sets of second data through the shuffling server, so that the data The amount satisfies differential privacy.
The data protection method according to claim 1, wherein said shuffling said plurality of sets of second data by said shuffling server comprises:

Shuffling each piece of sub-data of the plurality of sets of second data is performed by the shuffling server.
The data protection method according to claim 1, wherein said shuffling said plurality of sets of second data by said shuffling server comprises:

Reorganizing the multiple sets of second data by the shuffling server to obtain multiple sets of fourth data, each set of fourth data includes multiple sets of sub-data with the same data attribute;

Each group of fourth data is shuffled by the shuffling server.
The data protection method according to claim 8, wherein, after shuffling each group of fourth data by the shuffling server, the method further comprises:

The analysis server decodes each group of shuffled fourth data according to the encoding rules of the client, and each decoded group of fourth data is used for training a deep learning model.
A data protection system, wherein the system includes:

a client and a shuffling server, the shuffling server is connected to the client;

The client is used to acquire multiple sets of first data stored locally;

The client is configured to encode the multiple sets of first data to obtain multiple sets of second data, and the multiple sets of second data are in one-to-one correspondence with the multiple sets of first data;

The shuffling server is used for shuffling the multiple sets of second data, so as to eliminate the client information in the multiple sets of second data, and eliminate the timing information of the multiple sets of second data, the timing The information is used to reflect the time and order in which the shuffling server acquires the multiple sets of second data.
The data protection system according to claim 10, wherein the data protection system further comprises an analysis server connected to the shuffling server;

After the shuffling server shuffles the multiple sets of second data, it further includes:

The analysis server integrates the shuffled sets of second data to obtain third data;

The analysis server decodes the third data according to the encoding rule of the client, and the decoded third data is used for training a deep learning model.
The data protection system according to claim 10, wherein the client includes a plurality of sub-clients, each of the sub-clients is respectively connected to the shuffling server, and each of the sub-clients stores a group first data;

The client encodes the multiple sets of first data to obtain multiple sets of second data, including:

Each of the sub-clients encodes the locally stored first data to obtain corresponding second data.
The data protection system according to claim 12, wherein the sub-client encodes the locally stored first data to obtain the corresponding second data, comprising:

Each of the sub-clients encodes each piece of sub-data of the locally stored first data to obtain a set of corresponding second data, the second data includes a plurality of pieces of sub-data and is consistent with the first data Multiple pieces of sub-data are in one-to-one correspondence.
The data protection system according to claim 12, wherein the sub-client encodes the locally stored first data to obtain the corresponding second data, comprising:

The sub-client obtains multiple sets of sub-data of the first data according to the data attribute of each piece of sub-data stored locally, and each set of sub-data of the first data includes at least one piece of the same data attribute subdata;

The sub-client encodes each set of sub-data of the first data to obtain a set of corresponding second data, the second data includes multiple sets of sub-data and is combined with multiple sets of sub-data of the first data One to one correspondence.
The data protection system according to claim 10, wherein the shuffling server shuffling the multiple sets of second data comprises:

The shuffling server acquires the data volume of the plurality of sets of second data;

The shuffling server adds noise to the data amount, so that the data amount satisfies differential privacy; or, deletes the preset data amount in the plurality of sets of second data through the shuffling server, so that the data amount Satisfy differential privacy.
A computer-readable storage medium, wherein the computer-readable storage medium stores at least one computer-readable instruction, and when the at least one computer-readable instruction is executed by a processor, the following steps are implemented:

Obtain multiple sets of first data stored locally through the client;

Encoding the multiple sets of first data by the client to obtain multiple sets of second data, wherein the multiple sets of second data correspond to the multiple sets of first data;

The multiple sets of second data are shuffled by the shuffling server to eliminate the client information in the multiple sets of second data, and eliminate the timing information of the multiple sets of second data, the timing information is used for Reflecting the time and order in which the shuffling server acquires the multiple sets of second data.
The storage medium according to claim 16, wherein the data protection system further comprises an analysis server connected to the shuffling server;

After the multiple sets of second data are shuffled by the shuffling server, the at least one computer-readable instruction is further used to implement the following steps when executed by the processor:

integrating the multiple sets of shuffled second data by the analysis server to obtain third data;

The analysis server decodes the third data according to the encoding rule of the client, and the decoded third data is used for training a deep learning model.
The storage medium according to claim 16, wherein the client includes a plurality of sub-clients, each of which is connected to the shuffling server, and each of the sub-clients locally stores a set of first data;

When the at least one computer-readable instruction is executed by the processor to implement encoding the multiple sets of first data through the client to obtain multiple sets of second data, it specifically includes:

Each of the sub-clients encodes the locally stored first data to obtain corresponding second data.
The storage medium according to claim 18, wherein the at least one computer-readable instruction is executed by the processor to implement encoding the locally stored first data through the sub-client to obtain the corresponding The second data specifically include:

According to the data attributes of each piece of sub-data stored locally, the sub-client acquires multiple sets of sub-data of the first data, and each set of sub-data of the first data includes at least one of the same subdata of the data attribute;

Each set of sub-data of the first data is encoded by the sub-client to obtain a set of corresponding second data, the second data includes multiple sets of sub-data and is combined with multiple sets of sub-data of the first data The data correspond one to one.
The storage medium according to claim 16, wherein when the at least one computer-readable instruction is executed by the processor to implement the shuffling of the sets of second data by the shuffling server, specifically include:

Obtain the data volume of the plurality of sets of second data through the shuffling server;

Add noise to the amount of data through the shuffling server, so that the amount of data satisfies differential privacy; or, delete the preset data amount in the multiple sets of second data through the shuffling server, so that the data The amount satisfies differential privacy.