CN111324911B - Privacy data protection method, system and device - Google Patents

Privacy data protection method, system and device Download PDF

Info

Publication number
CN111324911B
CN111324911B CN202010409778.3A CN202010409778A CN111324911B CN 111324911 B CN111324911 B CN 111324911B CN 202010409778 A CN202010409778 A CN 202010409778A CN 111324911 B CN111324911 B CN 111324911B
Authority
CN
China
Prior art keywords
data
processing
private
processing operation
private data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010409778.3A
Other languages
Chinese (zh)
Other versions
CN111324911A (en
Inventor
王力
周俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010409778.3A priority Critical patent/CN111324911B/en
Publication of CN111324911A publication Critical patent/CN111324911A/en
Application granted granted Critical
Publication of CN111324911B publication Critical patent/CN111324911B/en
Priority to PCT/CN2021/093372 priority patent/WO2021228149A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Storage Device Security (AREA)

Abstract

The embodiment of the specification discloses a method, a system and a device for protecting private data. The method comprises the following steps: private data is obtained from at least two data sources, respectively. And performing a first processing operation on the private data of the at least two data sources to obtain at least two pieces of first processing data, wherein the first processing operation is used for hiding at least one part of the private data. And executing a second processing operation on the first processing data to acquire second processing data, wherein the second processing operation is used for dividing the first processing data from at least two data sources into at least two batches after mixing. And sequentially transmitting at least two batches of the second processing data to a trusted execution environment to execute a third processing operation, wherein the third processing operation is used for recovering at least one part of at least one piece of private data based on the second processing data and processing the at least one part of private data.

Description

Privacy data protection method, system and device
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a method, a system, and an apparatus for protecting private data.
Background
With the development of information technology, data has become an important network resource. For example, a user of a smart terminal, such as a smart phone, uses user interaction data, software running data, etc. generated by various phone software. By analyzing and processing the private data, the service (such as software bug repair, personalized recommendation for the user, and the like) can be better provided for the user. Currently, one of the ways to obtain such private data is by monitoring the user's client (e.g., a smartphone) under the user's permission. However, user data is extremely private, and how to protect the privacy of the data at the time of collection is crucial.
Therefore, there is a need to provide a method for protecting private data to improve the security of data collection and processing.
Disclosure of Invention
One aspect of embodiments of the present specification provides a method of protecting private data. The privacy data protection method comprises the following steps: the private data may be obtained from at least two respective data sources. A first processing operation may be performed on the private data of the at least two data sources to obtain at least two copies of the first processed data, wherein the first processing operation is to conceal at least a portion of the private data. A second processing operation may be performed on the first processed data to obtain second processed data, where the second processing operation is configured to mix and divide the first processed data from the at least two data sources into at least two batches. At least two batches of the second processing data may be sequentially transmitted to the trusted execution environment to perform a third processing operation, where the third processing operation is configured to recover at least a portion of the at least one private data based on the second processing data and perform data processing on the at least a portion.
Another aspect of an embodiment of the present specification provides a private data protection system, including: the first obtaining module may be configured to obtain the private data from at least two data sources, respectively. The first processing module may be configured to perform a first processing operation on the private data of the at least two data sources to obtain at least two pieces of first processed data, where the first processing operation is configured to hide at least a portion of the private data. The second processing module may be configured to perform a second processing operation on the first processing data to obtain second processing data, where the second processing operation is configured to mix and divide the first processing data originating from the at least two data sources into at least two batches. And the third processing module may be configured to sequentially transmit at least two batches of the second processing data to the trusted execution environment to perform a third processing operation, where the third processing operation is configured to recover at least a portion of at least one piece of private data based on the second processing data, and perform data processing on the at least a portion.
Another aspect of an embodiment of the present specification provides a method of protecting private data, the method including: at least two pieces of first processing data which are originated from more than two data sources are acquired, wherein the first processing data comprise privacy data of the data sources which are processed by a first processing operation, and the first processing operation is used for hiding at least one part of the privacy data. A second processing operation may be performed on the first processed data to obtain second processed data, where the second processing operation is configured to mix and divide the first processed data from the at least two data sources into at least two batches. At least two batches of the second processing data may be sequentially transmitted to the trusted execution environment to perform a third processing operation, where the third processing operation is configured to recover at least a portion of the at least one private data based on the second processing data and perform data processing on the at least a portion.
Another aspect of an embodiment of the present specification provides a private data protection system, including: the second obtaining module may be configured to obtain at least two pieces of first processed data from two or more data sources, where the first processed data includes private data of the data sources processed by a first processing operation, and the first processing operation is configured to conceal at least a portion of the private data. The fourth processing module may be configured to perform a second processing operation on the first processing data to obtain second processing data, where the second processing operation is configured to mix and divide the first processing data originating from the at least two data sources into at least two batches. And a fifth processing module, configured to sequentially transmit at least two batches of the second processing data to a trusted execution environment to perform a third processing operation, where the third processing operation is configured to recover at least a portion of at least one piece of private data based on the second processing data, and perform data processing on the at least a portion.
Another aspect of an embodiment of the present specification provides a private data protection apparatus comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement a method of privacy data protection.
Drawings
The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is a diagram of an exemplary application scenario of a private data protection system in accordance with some embodiments of the present description;
FIG. 2 is an exemplary flow diagram of a method of protecting private data, according to some embodiments of the present description;
FIG. 3 is an exemplary flow diagram illustrating isolation of portions of data according to some embodiments of the present description;
FIG. 4 is an exemplary block diagram of a private data protection system in accordance with some embodiments of the present description;
FIG. 5 is an exemplary flow diagram of another method of protecting private data, according to some embodiments of the present description;
FIG. 6 is an exemplary block diagram of another privacy data protection system in accordance with some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
In some embodiments, for protection processing of private data (e.g., data obtained by monitoring a program running on the front end), sensitive information in the data (e.g., data source) may be erased and then processed using a method similar to data desensitization. The processing mode has simple scheme, is easy to attack and is difficult to really and effectively protect the security of the private data. Therefore, in still other embodiments of the present disclosure, a method for protecting private data is disclosed, which can protect data in the whole process of collecting, transmitting, and processing the data, and can effectively improve the security of protecting the private data of a user. The technical solutions disclosed in the present specification are explained in detail by the description of the drawings below.
FIG. 1 is a diagram of an exemplary application scenario of a private data protection system, shown in some embodiments in accordance with the present description. As shown in fig. 1, the application scenario 100 may include a private data protection system 110, a network 120, a terminal 130, and a storage device 140.
The private data protection system 110 may perform one or more of the functions described herein. For example, the private data protection system 110 may be used to process the private data multiple times to protect the security of the private data. In some embodiments, the private data protection system 110 may communicate with other components in the application scenario 100 for data transmission. For example, the private data protection system 110 may communicate with other components in the private data protection system 100, such as the terminal 130 and/or the storage device 140, to obtain data and/or information. In some embodiments, the private data protection system 110 may include program modules, such as a first processing module, a second processing module, and a third processing module, with one or more functions. In some embodiments, the various program modules may be implemented on different hardware devices, such as server 110-1, server 110-2, server 110-3, etc., as shown in FIG. 1. The above servers may be distributed. In some embodiments, the first processing module may be integrated in the terminal 130, and is configured to directly process the private data collected on the terminal 130, such as the first processing operation mentioned in this specification. The second processing module and the third processing module may be disposed at a back end, such as the server 110-2/110-3, for further processing the received data, such as the second processing operation and the third processing operation mentioned in this specification. In some embodiments, each of the plurality of servers can perform one or more of the functions disclosed herein, either alone or in combination. For example, the first processing operation, the second processing operation, and the third processing operation may be performed by separate servers, respectively. As another example, server 110-1 may perform a first processing operation, and server 110-2 may perform a second processing operation as well as a third processing operation. Various modifications are also within the scope of this disclosure. In some embodiments, some or all of the servers included in the private data protection system 110 may be implemented on a cloud platform. For example, the cloud platform may include one or any combination of a private cloud, a public cloud, a hybrid cloud, a community cloud, a decentralized cloud, an internal cloud, and the like.
In some embodiments, the servers in the private data protection system 110 may include one or more processing devices (e.g., single core processing devices or multi-core processing devices). By way of example only, the processing device may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an Application Specific Instruction Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a programmable logic circuit (PLD), a controller, a micro-controller unit, a Reduced Instruction Set Computer (RISC), a microprocessor, or the like, or any combination thereof.
The network 120 may facilitate the exchange of data and/or information between various components in the application scenario 100. For example, the private data protection system 110 may obtain the private data of the user of the terminal 130 from the terminal 130 through the network 120. In some embodiments, one or more components (e.g., the private data protection system 110, the terminal 130, the storage 140) in the application scenario 100 may send data and/or information to other components in the application scenario 100 via the network 120. In some embodiments, network 120 may be any type of wired or wireless network. For example, network 120 may include a wireline network, a fiber optic network, a telecommunications network, an intranet, the Internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a Bluetooth network, a ZigBee network, a Near Field Communication (NFC) network, a global system for mobile communications (GSM) network, a Code Division Multiple Access (CDMA) network, a Time Division Multiple Access (TDMA) network, a General Packet Radio Service (GPRS) network, an enhanced data rates for GSM evolution (EDGE) network, a Wideband Code Division Multiple Access (WCDMA) network, a High Speed Downlink Packet Access (HSDPA) network, a Long Term Evolution (LTE) network, a User Datagram Protocol (UDP) network, a Transmission control protocol/Internet protocol (TCP/IP) network, a Short Message Service (SMS) network, a Wireless Application Protocol (WAP) network, One or more combinations of ultra-wideband (UWB) networks, mobile communication (1G, 2G, 3G, 4G, 5G) networks, Wi-Fi, Li-Fi, narrowband Internet of things (NB-IoT), infrared communication, and the like. In some embodiments, network 120 may include one or more network access points. For example, the network 120 may include wired or wireless network access points. Through these access points, one or more components in the application scenario 100 may connect to the network 120 to exchange data and/or information.
In some embodiments, terminal 130 may be a computing device or group of computing devices. The computing device may include one or any combination of a smartphone 130-1, a tablet 130-2, a laptop 130-3, a desktop computer 130-4, and the like. The group of computing devices may be centralized or distributed. In some embodiments, the terminal 130 may send data and/or information to the private data protection system 110. For example, the terminal 130 may be a smartphone, which may transmit (directly or after processing such as a first processing operation) data generated during the execution of an application installed thereon to the privacy data protection system 110. Accordingly, the private data protection system 110 may transmit the processing result of the data and/or information to the terminal 130. For example, the private data protection system 110 may send the bug repaired new version application to the terminal 130 for updating.
Storage device 140 may be used to store data and/or instructions. The data may include data generated by the terminal 130 (e.g., private data that has or has not undergone the first processing operation), a result of processing of the private data by the private data protection system 110, and so on. The instructions include instructions needed for the private data protection system 110 to implement the functionality as disclosed herein. In some embodiments, the storage device 140 may be implemented in a single central server, multiple servers connected by communication links, or multiple personal devices, or may be generated by multiple personal devices and cloud servers. In some embodiments, storage device 140 may include mass storage, removable storage, volatile read-and-write memory (e.g., random access memory, RAM), read-only memory (ROM), the like, or any combination thereof. Exemplary mass storage devices may include magnetic disks, optical disks, solid state disks, and the like. Exemplary removable memory may include flash drives, floppy disks, optical disks, memory cards, compact disks, magnetic tape, and the like. Exemplary volatile read and write memories can include Random Access Memory (RAM). Exemplary RAM may include Dynamic Random Access Memory (DRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Static Random Access Memory (SRAM), thyristor random access memory (T-RAM), zero capacitance random access memory (Z-RAM), and the like. Exemplary read-only memories may include mask read-only memory (MROM), programmable read-only memory (PROM), erasable programmable read-only memory (perrom), electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), digital versatile disc read-only memory, and the like. In some embodiments, the storage device 140 may be implemented on a cloud platform. For example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a decentralized cloud, an internal cloud, and the like, or any combination thereof.
In some embodiments, the storage device 140 may be connected with the network 120 to communicate with one or more components (e.g., the private data protection system 110, the terminal 130, etc.) in the application scenario 100. One or more components in the application scenario 100 may access data or instructions stored in the storage device 140 through the network 120. In some embodiments, the storage device 140 may be directly connected or in communication with one or more components (e.g., the private data protection system 110, the terminal 130, etc.) in the application scenario 100. In some embodiments, the storage device 140 may be part of the private data protection system 110.
It should be noted that the above description of the various components in the application scenario 100 is for illustration and description only and does not limit the scope of applicability of the present description. It will be apparent to those skilled in the art, given the benefit of this disclosure, that additions or subtractions of components in the application scenario 100 may be made. However, such variations are still within the scope of the present description.
FIG. 2 is an exemplary flow diagram of a method of protecting private data, according to some embodiments of the present description. In some embodiments, flow 200 may be performed by a processing device (e.g., private data protection system 400, or private data protection system 110 as shown in fig. 1). For example, the process 200 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 200. As shown in fig. 2, the process 200 may include the following operations.
In step 202, private data is obtained from at least two data sources respectively. Step 202 may be performed by the first obtaining module 410.
In some embodiments, the data source may refer to a generation source of private data. For example, the terminal 130, such as a smart phone, has mobile application software installed thereon, and the mobile application software can generate private data during use. Thus, the terminal 130 may be referred to as a data source. The private data may be a generic term for data that needs to be protected from disclosure and theft, leading to a malicious result. Continuing with the previous example, an individual (which may be referred to herein as a user) while using mobile software on the terminal 130 will generate various interaction data (such as personal information entered when the user registers, asset data submitted when using a network payment platform, etc.), software operational data, software error reports, etc. If the data is leaked, the data can cause malignant consequences such as stolen user assets, attack of software bugs and the like. Thus, these data may be referred to as private data. The presentation of the privacy data may be any form including, but not limited to, text, pictures, audio, video, and the like, or any combination thereof.
It will be appreciated that the generated private data will be different for different users of the terminal 130. Therefore, the private data acquired by the first acquiring module 410 may be divided into multiple pieces, and each piece of private data has a data source corresponding to a source. For example, different terminals 130 may be considered different data sources, and private data from the same terminal 130 will be treated as a copy.
In some embodiments, the first obtaining module 410 may be a plurality of modules, respectively integrated at different data sources, and directly obtain the private data from the data sources. For example, the first obtaining module 410 may be installed on the terminal 130 along with the application software, and obtain the privacy data by monitoring the terminal 130 in real time if the user allows. In some embodiments, the private data may be previously dumped into a storage, such as storage device 140, or in the cloud. The first obtaining module 410 may obtain the private data by communicating with a storage device.
Step 204, a first processing operation is performed on the private data of the at least two data sources to obtain at least two pieces of first processing data. Step 204 may be performed by the first processing module 420.
In some embodiments, the first processing operation may refer to an operation of changing the content and/or form of the private data to distinguish the changed private data from its original form. For example, the first processing operation may be to convert or delete certain information of the private data to achieve an overall hiding. In some embodiments, the first processing operation may be for suppressing at least a portion of the private data. Concealment can be understood as the inability of the processed data to reflect explicitly the original data, requiring a specific recovery method. Illustratively, the first processing operation may be selected from one or more of data erasure, data coarsening, data segmentation, secret sharing, and differential privacy.
In some embodiments, the data erasure may be used to delete at least a portion of the private data. For example, assuming that the private data includes the identity information of the user, such as an identification number, a contact address, etc., belonging to the private sensitive data of the user, the data erasure may delete the identity information of the user from the private data, or represent all or part of the identity information of the user with a meaningless symbol, such as a symbol.
In some embodiments, the data coarsening may be used to reduce the accuracy of the private data. For example, it is assumed that information such as user login time, privacy data generation time, user online time, and the like is included in the privacy data. This time information is accurate to seconds when it is acquired, and the data coarsening may coarsen the time accurate to seconds to minutes, hours, days, or the like.
In some embodiments, the data segments may be used to divide the private data into at least two segments by data amount. Here, the fragment may refer to a part of data stripped from the private data. For example, a string whose private data is 100K may be divided into two 50K strings, 5 20K strings, 10K strings, and the like.
In some embodiments, the secret sharing may be configured to divide the private data into at least two operation fragments, where the at least two operation fragments are capable of recovering the private data after being operated according to a preset algorithm. The operation fragmentation may refer to a data fragmentation obtained by splitting the private data by using a specific algorithm. For example, assuming that a secret (e.g., private data) is hidden in a polynomial (e.g., a constant term of the polynomial), the calculation results obtained by respectively substituting four values into the polynomial may be regarded as operation fragments. As another example, the private data may be divided into a plurality of random number additive shards of unknown meaning. The predetermined algorithm employed for recovering the private data based on the budget segments may include addition, subtraction, multiplication, square root, etc. For example, for the recovery of the private data exemplified above, an equation may be constructed based on the operation fragment, or an interpolation polynomial formula may be used for solving, so as to achieve the purpose of polynomial recovery.
In some embodiments, the differential privacy is configured to divide the private data into at least two operation fragments, add noise to the at least two operation fragments, and after all the operation fragments are operated according to a preset algorithm, the added noise can be eliminated and the private data can be recovered. The noise may include laplacian noise, gaussian noise, or the like. For example, assume that the private data is divided into two operation pieces a and B. Then positive noise may be added to a and negative noise may be added to B. The noise can be cancelled by addition during data recovery.
In some embodiments, the first processed data may refer to data resulting from suppression of at least a portion of the private data. For example, data obtained by deleting a user account in the private data, data obtained by coarsely granulating time in the user data, data pieces obtained by dividing the private data by a data amount, operation pieces obtained by dividing the private data, and operation pieces added with noise are used.
In some embodiments, the first processing module 420 may perform the first processing operation on the private data of the at least two data sources in a variety of ways. As an example, the first processing module 420 may obtain a privacy level of a data source corresponding to the piece of private data before processing the private data. The security level is a security level, for example, level 1, level 2, or level 3, and the higher the security level is, the higher the protection level required for the privacy data is, and the finer the processing is. The security level may be determined according to user requirements (for example, the user may designate the security level as 2) or user types, for example, the security level of a data source corresponding to an industry with higher privacy degree, such as finance, medical treatment, etc., may be set to a higher level. The first processing module 420 may determine the type and/or degree of processing to perform the first operation on the piece of private data based on the security classification. The type may refer to the content of the above-mentioned first operation process, such as one or more of data erasure, data coarsening, data segmentation, secret sharing, and differential privacy. The degree of processing may refer to the degree of sophistication of the processing of the private data. For example, the data erasure processing degree may include partial erasure, total erasure, recoverable erasure, unrecoverable erasure, and the like, the data coarsening processing degree may include coarsening time accurate to seconds to minutes, hours, or days, and the data segmentation processing degree may include dividing the data into several fragments, several tens of fragments, several hundreds of fragments, or several thousands of fragments, and the like. After the privacy data are processed according to different fineness degrees through the first processing operation, sensitive information in the original privacy data can be hidden, and privacy protection of the privacy data is achieved.
In some embodiments, the first processing module 420 may assign an identifier to the processed private data when performing the first processing operation on the private data, so as to distinguish which original data the processed private data belongs to. For example, assuming that the first processing module 420 performs secret sharing on a certain piece of private data, the obtained multiple operation fragments will be assigned with the same identifier, so as to identify that the multiple operation fragments belong to the same piece of private data in a subsequent operation, such as the third processing operation mentioned in this specification.
In some embodiments, the first processing module 420 may be integrated at the data source at the same time as the first acquisition module 410. After the first obtaining module 410 obtains the private data directly from the data source, the first processing module 420 may perform a first processing operation on the private data. Thus, the original privacy data can be prevented from being attacked in the transmission process.
At step 206, a second processing operation is performed on the first processed data to obtain second processed data. Step 206 may be performed by the second processing module 430.
In some embodiments, the second processing operation may be to divide the first processed data from at least two data sources into at least two batches after mixing, so as to achieve the purpose of obfuscating and hiding the first processed data. It is understood that when certain specific information is mixed with other information, the hiding of the specific information can be achieved to some extent. As an example, it is assumed that first processed data obtained after the first processing operation is performed on the private information from the two data sources are data fragmentation abcdef and data fragmentation 123456, respectively. The second processing operation may be to randomly divide the 12 data slices after the 12 data slices are mixed and scrambled. A slice of data that is divided together will be referred to as a batch. For example, the above 12 data slices may be divided into two batches, a3b2c1 and d5f6e 4.
In some embodiments, the second processing operation may also be used for data anonymity. The anonymity of the data may refer to hiding certain information in the first processed data, including the source of the first processed data, the arrival time of the first processed data, the order of arrival of the first processed data, and the like. The source of the first processed data may refer to an address of a data source of the corresponding private data, e.g. an original IP address of the data source. The arrival time of the first data may refer to an acquisition time of the corresponding private data, for example, a specific collection time. The arrival order of the first processing data may refer to an acquisition order of the corresponding private data, for example, a collection order of each piece of private data when the private data are acquired from the three data sources, respectively. As an example, assuming that private data corresponding to a certain copy of the first processed data originates from the data source a, the arrival time is xxxx, and at the same time, the 3 rd bit is obtained in all copies of the private data, the second processing operation can achieve the purpose of data anonymity by modifying or eliminating the above information. It can be understood that there is no limitation on the execution sequence of the batch division of the first processing data and the data anonymity, and the two may be performed simultaneously, or any one of them may be performed before, which is not limited in this specification.
In some embodiments, the second processing operation may also be used to isolate a portion of the data in the first processed data. The isolation may be understood as limiting the portion of data to participate in subsequent processing, such as the third processing operation mentioned in this specification. After the first processing data is processed in batches, the second processing module 430 may perform statistics on the information contained in the data in each batch and determine whether the batch needs to be isolated based on the statistics. For example, assuming that the private data obtained from the plurality of data sources is consumption transaction data, the second processing module 430 may count the number of times the data related to the consumption amount in the batch appears, and if the counted number is less than a predetermined number of times, the second processing module 430 may isolate the first processing data of the batch. For further description of the isolation of the partial data in the first processed data, reference may be made to fig. 3 of the present specification.
Based on the above description, the second processing data may refer to data obtained by performing the second processing operation on the first processing data. For example, the first processed data after being divided into batches and subjected to data anonymization will be referred to as second processed data. By further processing the first processed data through the second processing operation, sensitive information in the first processed data can be removed, and meanwhile, the data is scrambled into batches, so that some key data can be hidden in a large range (for example, the key data is hidden in a plurality of batches), and the security of the private data is further protected.
And 208, sequentially transmitting at least two batches of the second processing data to the trusted execution environment to execute a third processing operation. Step 208 may be performed by 440 a third processing module 440.
In some embodiments, the third processing module 440 may transmit the second processing data to the trusted execution environment and perform a third processing operation on the second processing data in the trusted execution environment. At least two batches of the second processing data are transmitted in sequence, even if an attacker results in a certain batch or a certain number of batches of data, the original private data cannot be recovered (because the key information is hidden in a plurality of batches, the data can be recovered without depending on the data of one batch). This increases the security during data transmission. In some embodiments, the third processing operation may be configured to recover at least a portion of the at least one private data based on the second processed data and to perform data processing on the at least a portion. Since the second processing data is data obtained after two rounds of processing (the first processing operation and the second processing operation), the original private data can be obtained only by performing recovery. The third processing operation may be understood as a reverse operation to the first processing operation and/or the second processing operation. The third processing module 440 may know the processing rules of the first processing operation and/or the second processing operation, e.g., sharding rules that perform secret sharing, etc. After receiving the second processing data, the third processing module 440 may obtain data corresponding to the same data source from multiple batches and restore the data. Partial and/or complete processed data (processed data derived from the same private data) is required for data recovery. For example, when the first processing operation includes data segmentation on a certain piece of private data, the third processing module 440 needs to determine, from all batches, to recover all fragments belonging to the certain piece of private data based on the identifier assigned to the data fragment at the time of data segmentation. For another example, when the first processing operation includes secret sharing of a certain piece of private data, according to the principle of secret sharing, the third processing module 440 only needs to obtain a predetermined number of operation fragments from all batches based on the identifier given to the operation fragment during secret sharing, and may perform data recovery based on a preset algorithm.
A Trusted Execution Environment (TEE) may provide a secure computing Environment isolated from an untrusted Environment. Executing the third processing operation in the trusted execution environment may ensure that the data processing flow is not tampered and the recovered private data is not intercepted based on the characteristics of the trusted execution environment, so as to ensure the security of the private data. Trusted execution environments include SGX (software Guard extensions), SEV (secure Encrypted virtualization), TrustZone, and the like. In some embodiments, a trusted execution environment may be deployed in a processing device (e.g., a server 110-1/110-2/110-3 in the private data protection system 110) and used to perform a third processing operation. In some embodiments, the third processing module 440 may be integrated in a trusted execution environment.
In some embodiments, the data processing on at least a part of the recovered private data may be analyzing, combining, counting, and performing model training using the part of the data. For example, user interaction data is analyzed to determine user preferences, and software operating data is analyzed to improve the program. The embodiment of the present specification does not limit the processing method of the restored data.
In the technical scheme disclosed in the embodiment of the specification, the private data is subjected to data conversion protection processing in a link of collecting the private data, and the converted data is further subjected to operations such as mixed scrambling and hidden source hiding in a subsequent link to ensure the safety of the collected private data, so that the real information of the private data is effectively prevented from being leaked, the converted and scrambled data can be restored into the private data in a trusted execution environment, and the leakage problem during data recovery is prevented. Compared with a scheme for encrypting and decrypting the private data, the private data protection method disclosed in the specification can change some data in the data processing process. For example, given that there is personal revenue in the collected private data, there may be little difference between these personal revenues through processing, thereby hiding certain specific high-revenue data. Even if the data is stolen, the data cannot be restored like an encryption and decryption method. This further enhances the protection of the data.
It should be noted that the above description related to the flow 200 is only for illustration and description, and does not limit the applicable scope of the present specification. Various modifications and alterations to flow 200 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description. For example, changes to the flow steps described herein, such as the addition of pre-processing steps and storage steps, may be made.
FIG. 3 is an exemplary flow diagram illustrating isolation of portions of data according to some embodiments of the present description. In some embodiments, flow 300 may be performed by a processing device (e.g., private data protection system 400, or private data protection system 110 as shown in fig. 1). For example, the process 300 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 300. In some embodiments, the process 300 may be performed by a second processing module 430 located on the processing tool for each of the lots resulting from step 206. As shown in fig. 3, the process 300 may include the following operations.
Step 302 determines the statistical number of times the target information appears in the data contained in the batch.
In some embodiments, the target information may be information associated with a certain keyword or keywords. The keywords may be predetermined. For example, assuming that the data source corresponding to the private data is a terminal installed with a network payment platform, the user may perform various operations such as online payment, financing, and the like on the terminal using the network payment platform. The keyword may be a user account number, an account password, an account amount, a database, etc., and the target information may be a user account number, an account amount, a transaction amount, an account password, a database access/query number, etc.
The statistical number may refer to the number of times the target information appears in the batch. For example, if the account amount of the user appears 10 times, 20 times, or 50 times in the first processing data of one batch, the statistical number may be 10, 20, or 50 times.
In some embodiments, the second processing model 430 may determine the statistical number of times the target information appears in the batch by way of information matching.
Step 304, comparing the counted times with a preset threshold value to obtain a comparison result.
In some embodiments, the preset threshold may be set according to experience or actual requirements, and the specification is not particularly limited. For example, the preset threshold may be 5, 10, 15, 20, etc.
The comparison result may be a magnitude relationship between the statistical number and a preset threshold. For example, the counted number is greater than a preset threshold, the counted number is less than the preset threshold, and the counted number is equal to the preset threshold.
And step 306, determining whether the batch is isolated based on the comparison result.
In some embodiments, the isolation may be understood as separating the data of the batch from the data of other batches independently, the isolated data limiting participation in the third processing operation. Restricting participation may be understood as prohibiting participation in the third processing operation if not allowed. For example, the isolated data may participate in the third processing operation only upon obtaining permission of an operator of the private data protection system 500 (or the private data protection system 110 as shown in fig. 1).
In some embodiments, the second processing module 430 may determine, as the data that needs to be isolated, the data of the batch corresponding to the comparison result that the statistical number of times is smaller than the preset threshold. For example, if the preset threshold is 50 times and the statistical number is 30 times, the data of the batch is isolated. It can be understood that the target information is a more critical part of the collected private data, and when the number of occurrences of the target information does not meet a requirement, for example, the number of occurrences is too small to perform effective analysis, or the number of occurrences of the target information in a certain batch is much smaller than the number of occurrences of the target information in other batches during the batch, the influence of the target information in the batch on the whole may be considered to be small and may be ignored. On the other hand, no processing is required, which also saves computational resources.
It should be noted that the above description of the process 300 is for illustration and description only and is not intended to limit the scope of the present disclosure. Various modifications and changes to flow 300 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description. For example, changes to the flow steps described herein, such as the addition of pre-processing steps and storage steps, may be made.
FIG. 4 is an exemplary block diagram of a private data protection system shown in accordance with some embodiments of the present description. As shown in fig. 5, the system may include a first acquisition module 410, a first processing module 420, a second processing module 430, and a third processing module 440.
The first obtaining module 410 may obtain the private data from at least two data sources, respectively.
In some embodiments, the data source may refer to a generation source of private data. The private data may be a generic term for data that needs to be protected from disclosure and theft, leading to a malicious result. Since the data sources are different, the generated private data are also different, and the first obtaining module 410 may obtain multiple pieces of private data, each piece of private data having a data source corresponding to the source. In some embodiments, the first obtaining module 410 may be a plurality of modules, respectively integrated at different data sources, and directly obtain the private data from the data sources. In some embodiments, the private data may be previously dumped into a storage, such as storage device 140, or in the cloud. The first obtaining module 410 may obtain the private data by communicating with a storage device.
The first processing module 420 may perform a first processing operation on the private data of the at least two data sources to obtain at least one piece of first processed data.
In some embodiments, the first processing operation may refer to an operation of changing the content and/or form of the private data to distinguish the changed private data from its original form. The first processing operation may be for suppressing at least a portion of the private data. Concealment can be understood as the inability of the processed data to reflect explicitly the original data, requiring a specific recovery method. Illustratively, the first processing operation may be selected from one or more of data erasure, data coarsening, data segmentation, secret sharing, and differential privacy.
In some embodiments, the first processing module 420 may perform the first processing operation on each piece of private data in a variety of ways. As an example, the first processing module 420 may obtain a security level of a data source corresponding to the piece of private data before processing the private data, and determine a type and/or degree of performing the first operation processing on the piece of private data based on the security level.
The second processing module 430 may perform a second processing operation on the first processed data to obtain second processed data.
In some embodiments, the second processing operation may be to mix and divide the first processed data from at least two data sources into at least two batches, so as to achieve the purpose of obfuscating and hiding the first processed data. For example, the second processing module 430 may combine two copies of the first processed data: the data slice abcdef and the data slice 123456 are scrambled and randomly divided into two batches, a3b2c1 and d5f6e 4. In some embodiments, the second processing operation may also be used for data anonymity. The anonymity of the data may refer to hiding certain information in the first processed data, including the source of the first processed data, the arrival time of the first processed data, the order of arrival of the first processed data, and the like. The second processing module 430 may achieve data anonymity by altering or eliminating the above information. In some embodiments, the second processing operation may also be used to isolate a portion of the data in the first processed data. The isolation may be understood as limiting the portion of data to participate in subsequent processing, such as the third processing operation mentioned in this specification. After the first processing data is processed in batches, the second processing module 430 may perform statistics on the information contained in the data in each batch and determine whether the batch needs to be isolated based on the statistics.
The third processing module 440 may sequentially transfer at least two batches of the second processing data to the trusted execution environment to perform a third processing operation.
In some embodiments, the third processing module 440 may transmit the second processing data to the trusted execution environment and perform a third processing operation on the second processing data in the trusted execution environment. The third processing operation may be configured to recover at least a portion of the at least one private data based on the second processed data and perform data processing on the at least a portion. The third processing operation may be understood as a reverse operation to the first processing operation and/or the second processing operation. The third processing module 440 may know the processing rules of the first processing operation and/or the second processing operation, e.g., sharding rules that perform secret sharing, etc. After receiving the second processing data, the third processing module 440 may obtain data corresponding to the same data source from multiple batches and restore the data. Partial and/or complete processed data (processed data derived from the same private data) is required for data recovery.
For a detailed description of the modules of the private data protection system, reference may be made to the flowchart section of this specification, for example, the relevant description of fig. 2 to 4.
FIG. 5 is an exemplary flow diagram of a method of protecting private data, according to some embodiments of the present description. It will be appreciated that in some embodiments, the first processing module may be deployed at the data source, and the backend device may retrieve the data from the data source directly after the first processing operation for subsequent processing. In some embodiments, the flow 500 directly processes first processed data from a data source, which may be performed by a processing device (e.g., the private data protection system 600, or the private data protection system 110 as shown in fig. 1) deployed on a backend or cloud. For example, the process 500 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 500. As shown in fig. 5, the flow 500 may include the following operations.
At step 502, at least two first processed data from more than two data sources are obtained. Step 502 may be performed by the second obtaining module 610.
In some embodiments, the data source may refer to a generation source of private data. For example, the terminal 130, such as a smart phone, has mobile application software installed thereon, and the mobile application software can generate private data during use. Thus, the terminal 130 may be referred to as a data source. The private data may be a generic term for data that needs to be protected from disclosure and theft, leading to a malicious result. Such as operational data generated by the terminal 130 in use in the above example, as well as human-machine interaction data, etc.
In some embodiments, the first processing data may refer to an operation of changing content and/or form of the private data to distinguish the changed private data from an original form thereof, and may be selected from one or more of data erasure, data coarsening, data segmentation, secret sharing, and differential privacy. For more description of the first processing operation and the private data, reference may be made to fig. 2 and the related description thereof in this specification, and details are not repeated here.
In some embodiments, the first processed data may refer to data resulting from suppression of at least a portion of the private data. The data source may perform a first processing operation on the private data after obtaining the first processing data to obtain the first processed data. The second obtaining module 610 may communicate with a data source to obtain the first processed data. In some embodiments, the second obtaining module 610 may obtain a copy of the first processed data from each of the data sources. Each data source may also transmit respective first processed data to a storage device, such as a cloud, and the second obtaining module 610 may communicate with the cloud to obtain a total copy of the first processed data.
Step 504, a second processing operation is performed on the first processed data to obtain second processed data. Step 604 may be performed by a fourth processing module 620.
In some embodiments, the second processing operation may be to mix and divide the first processed data from at least two data sources into at least two batches, so as to achieve the purpose of obfuscating and hiding the first processed data. The second processing data may be data obtained by performing a second processing operation on the first processing data. In some embodiments, the second processing operation may also be used for data anonymity. The anonymity of the data may refer to hiding certain information in the first processed data including the source of the first processed data, the arrival time of the first processed data, the order of arrival of the first processed data, and the like.
For more description of the second processing operation and the second processing data, reference may be made to fig. 2 and the related description thereof in this specification, and details are not repeated here.
Step 506, sequentially transmitting at least two batches of the second processing data to the trusted execution environment to execute a third processing operation. Step 506 may be performed by a fifth determination module 630.
In some embodiments, the fifth processing module 630 may transmit the second processing data to the trusted execution environment and perform a third processing operation on the second processing data in the trusted execution environment. The third processing operation may be configured to recover at least a portion of the at least one private data based on the second processed data and perform data processing on the at least a portion. The third processing operation may be understood as a reverse operation to the first processing operation and/or the second processing operation. The fifth processing module 630 may be aware of the processing rules of the first processing operation and/or the second processing operation, and may recover the data based on the processing rules after receiving the second processed data.
In some embodiments, the data processing on at least a part of the recovered private data may be analyzing, combining, counting, and performing model training using the part of the data. For example, user interaction data is analyzed to determine user preferences, and software operating data is analyzed to improve the program. The embodiment of the present specification does not limit the processing method of the restored data.
For more description of the third processing operation, reference may be made to fig. 2 to fig. 3 and the related description thereof, and details are not repeated here.
It should be noted that the above description related to the flow 500 is only for illustration and explanation, and does not limit the applicable scope of the present application. Various modifications and changes to flow 500 may occur to those skilled in the art upon review of the present application. However, such modifications and variations are intended to be within the scope of the present application.
FIG. 6 is a block diagram of a private data protection system in accordance with some embodiments of the present description. As shown in fig. 6, the system may include a second acquisition module 610, a fourth processing module 620, and a fifth processing module 630.
The second obtaining module 610 may obtain at least two first processed data originating from more than two data sources.
In some embodiments, the first processed data may refer to data resulting from suppression of at least a portion of the private data. The data source may perform a first processing operation on the private data after obtaining the first processing data to obtain the first processed data. The second obtaining module 610 may communicate with a data source to obtain the first processed data. In some embodiments, the second obtaining module 610 may obtain a copy of the first processed data from each of the data sources. Each data source may also transmit respective first processed data to a storage device, such as a cloud, and the second obtaining module 610 may communicate with the cloud to obtain a total copy of the first processed data.
The fourth processing module 620 may perform a second processing operation on the first processed data to obtain second processed data.
In some embodiments, the fourth processing module 620 may obtain the second processed data by performing a second processing operation on the first processed data. The second processing operation may be to mix and divide the first processing data from at least two data sources into at least two batches, so as to achieve the purpose of obfuscating and hiding the first processing data. The second processing data may be data obtained by performing a second processing operation on the first processing data. In some embodiments, the second processing operation may also be used for data anonymity. The anonymity of the data may refer to hiding certain information in the first processed data including the source of the first processed data, the arrival time of the first processed data, the order of arrival of the first processed data, and the like.
The fifth processing module 630 may sequentially transfer at least two batches of the second processing data to the trusted execution environment to perform the third processing operation.
In some embodiments, the fifth processing module 630 may transmit the second processing data to the trusted execution environment and perform a third processing operation on the second processing data in the trusted execution environment. The third processing operation may be configured to recover at least a portion of the at least one private data based on the second processed data and perform data processing on the at least a portion. The third processing operation may be understood as a reverse operation to the first processing operation and/or the second processing operation. The fifth processing module 630 may be aware of the processing rules of the first processing operation and/or the second processing operation, and may recover the data based on the processing rules after receiving the second processed data.
For a detailed description of the modules of the private data protection system, reference may be made to the flow chart section of this specification, e.g., the associated description of fig. 5.
It should be understood that the systems shown in fig. 4 and 6 and their modules may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).
It should be noted that the above description of the private data protection system and the modules thereof is only for convenience of description, and the description is not limited to the scope of the embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, the first processing module 420, the second processing module 430, and the third processing module 440 disclosed in fig. 4 may be different modules in a system, or may be a module that implements the functions of two or more modules. For example, the second processing module 430 and the third processing module 440 may be different processing modules for respectively performing different processing operations, or may be integrated into one module and have functions of performing various processing operations. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.
The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: the data conversion protection processing is carried out on the private data from the link of collecting the private data, and meanwhile, the converted data is further protected and processed in the modes of mixed disordering, hidden sources and the like in the subsequent link, so that the safety of the collected private data can be ensured, and the real information of the private data is prevented from being leaked. And the converted and disordered data can be restored into private data based on third processing operation, so that the private data can be conveniently subjected to data analysis and other applications, and the method is flexible in scheme, easy to implement and good in practicability. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims (16)

1. A method of privacy data protection, wherein the method comprises:
respectively acquiring privacy data from at least two data sources;
performing a first processing operation on the private data of the at least two data sources to obtain at least two pieces of first processing data, wherein the first processing operation is used for hiding at least one part of the private data;
performing a second processing operation on the first processing data to obtain second processing data, wherein the second processing operation is used for mixing the first processing data from at least two data sources and dividing the first processing data into at least two batches for hiding the source of the first processing data, hiding the arrival time of the first processing data or hiding the arrival sequence of the first processing data;
and sequentially transmitting at least two batches of the second processing data to a trusted execution environment to execute a third processing operation, wherein the third processing operation is used for recovering at least one part of at least one piece of private data based on the second processing data and processing the at least one part of private data.
2. The method of claim 1, wherein the first processing operation is selected from one or more of:
data erasure, data coarsening, data segmentation, secret sharing, or differential privacy.
3. The method of claim 1, wherein the performing a first processing operation on the private data of the at least two data sources comprises:
obtaining a security level of a data source corresponding to the private data;
based on the security classification, a type and/or a degree of processing of performing the first operation processing on the piece of private data is determined.
4. The method of claim 1, wherein the second processing operation is further to isolate a portion of the first processed data.
5. The method of claim 4, wherein the isolating the portion of the first processed data comprises:
for each of the batches to be processed,
determining the statistical number of times of target information appearing in the data contained in the batch;
comparing the statistical times with a preset threshold value to obtain a comparison result;
determining whether the batch is isolated based on the comparison, the isolation being limiting participation in the third processing operation.
6. The method of claim 1, wherein the trusted execution environment comprises at least SGX, SEV, or TrustZone.
7. A method of privacy data protection, wherein the method comprises:
acquiring at least two pieces of first processing data from more than two data sources, wherein the first processing data comprise private data of the data sources processed by a first processing operation, and the first processing operation is used for hiding at least one part of the private data;
performing a second processing operation on the first processing data to obtain second processing data, wherein the second processing operation is used for mixing the first processing data from at least two data sources and dividing the first processing data into at least two batches for hiding the source of the first processing data, hiding the arrival time of the first processing data or hiding the arrival sequence of the first processing data;
and sequentially transmitting at least two batches of the second processing data to a trusted execution environment to execute a third processing operation, wherein the third processing operation is used for recovering at least one part of at least one piece of private data based on the second processing data and processing the at least one part of private data.
8. A private data protection system, wherein the system comprises:
the first acquisition module is used for respectively acquiring privacy data from at least two data sources;
the first processing module is used for executing a first processing operation on the private data of the at least two data sources to obtain at least one piece of first processed data, wherein the first processing operation is used for hiding at least one part of the private data;
the second processing module is used for executing a second processing operation on the first processing data to acquire second processing data, wherein the second processing operation is used for mixing the first processing data from at least two data sources and dividing the first processing data into at least two batches so as to hide the source of the first processing data, hide the arrival time of the first processing data or hide the arrival sequence of the first processing data;
and the third processing module is used for sequentially transmitting at least two batches of the second processing data to the trusted execution environment to execute a third processing operation, wherein the third processing operation is used for recovering at least one part of at least one piece of private data based on the second processing data and processing the at least one part of private data.
9. The system of claim 8, wherein the first processing operation is selected from one or more of:
data erasure, data coarsening, data segmentation, secret sharing, or differential privacy.
10. The system of claim 8, wherein to perform a first processing operation on each piece of private data, the first processing module is to:
obtaining a security level of a data source corresponding to the private data;
based on the security classification, a type and/or a degree of processing of performing the first operation processing on the piece of private data is determined.
11. The system of claim 8, wherein the second processing operation is further to isolate a portion of the first processed data.
12. The system of claim 11, wherein to isolate a portion of the first processed data, the second processing module is to:
for each of the batches to be processed,
determining the statistical number of times of target information appearing in the data contained in the batch;
comparing the statistical times with a preset threshold value to obtain a comparison result;
determining whether the batch is isolated based on the comparison, the isolation being limiting participation in the third processing operation.
13. The system of claim 8, wherein the trusted execution environment includes at least SGX, SEV, or TrustZone.
14. A private data protection system, wherein the system comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring at least two pieces of first processing data from more than two data sources, the first processing data comprises private data of the data sources after being processed by a first processing operation, and the first processing operation is used for hiding at least one part of the private data;
the fourth processing module is used for executing a second processing operation on the first processing data to acquire second processing data, wherein the second processing operation is used for mixing the first processing data from at least two data sources and dividing the first processing data into at least two batches so as to hide the source of the first processing data, hide the arrival time of the first processing data or hide the arrival sequence of the first processing data;
and the fifth processing module is configured to sequentially transmit at least two batches of the second processing data to the trusted execution environment to execute a third processing operation, where the third processing operation is configured to recover at least one part of the at least one private data based on the second processing data, and perform data processing on the at least one part.
15. An apparatus for protecting private data, comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of any of claims 1-6.
16. An apparatus for protecting private data, comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of claim 7.
CN202010409778.3A 2020-05-15 2020-05-15 Privacy data protection method, system and device Active CN111324911B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010409778.3A CN111324911B (en) 2020-05-15 2020-05-15 Privacy data protection method, system and device
PCT/CN2021/093372 WO2021228149A1 (en) 2020-05-15 2021-05-12 Private data protection method, system, and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010409778.3A CN111324911B (en) 2020-05-15 2020-05-15 Privacy data protection method, system and device

Publications (2)

Publication Number Publication Date
CN111324911A CN111324911A (en) 2020-06-23
CN111324911B true CN111324911B (en) 2021-01-01

Family

ID=71169982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010409778.3A Active CN111324911B (en) 2020-05-15 2020-05-15 Privacy data protection method, system and device

Country Status (2)

Country Link
CN (1) CN111324911B (en)
WO (1) WO2021228149A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324911B (en) * 2020-05-15 2021-01-01 支付宝(杭州)信息技术有限公司 Privacy data protection method, system and device
CN111984987B (en) * 2020-09-01 2024-04-02 上海梅斯医药科技有限公司 Method, device, system and medium for desensitizing and restoring electronic medical records
CN114936650A (en) * 2020-12-06 2022-08-23 支付宝(杭州)信息技术有限公司 Method and device for jointly training business model based on privacy protection
CN113065162B (en) * 2021-04-25 2022-05-17 支付宝(杭州)信息技术有限公司 Method and device for processing private data in shared form
CN113282959A (en) * 2021-06-09 2021-08-20 支付宝(杭州)信息技术有限公司 Service data processing method and device and electronic equipment
CN113379042B (en) * 2021-07-23 2022-05-17 支付宝(杭州)信息技术有限公司 Business prediction model training method and device for protecting data privacy
CN114153854B (en) * 2022-02-09 2022-05-10 支付宝(杭州)信息技术有限公司 Secret sharing-based multi-key grouping information acquisition method and system
CN114338017B (en) * 2022-03-04 2022-06-10 支付宝(杭州)信息技术有限公司 Sorting method and system based on secret sharing
CN116436704B (en) * 2023-06-13 2023-08-18 深存科技(无锡)有限公司 Data processing method and data processing equipment for user privacy data
CN117061189B (en) * 2023-08-26 2024-01-30 上海六坊信息科技有限公司 Data packet transmission method and system based on data encryption
CN117113421B (en) * 2023-10-24 2024-02-09 北京三特信息技术有限公司 Sensitive data protection system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908732A (en) * 2017-11-14 2018-04-13 北京恺思睿思信息技术有限公司 A kind of mutually isolated multi-source big data convergence analysis method and system
CN108984733A (en) * 2018-07-13 2018-12-11 北京京东金融科技控股有限公司 cross-domain data fusion method, system and storage medium
CN109726758A (en) * 2018-12-28 2019-05-07 辽宁工业大学 A kind of data fusion publication algorithm based on difference privacy

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10642992B2 (en) * 2013-01-04 2020-05-05 Pure Storage, Inc. Password augmented all-or-nothin transform
CN109426732B (en) * 2017-08-22 2021-09-21 创新先进技术有限公司 Data processing method and device
CN108959958A (en) * 2018-06-14 2018-12-07 中国人民解放军战略支援部队航天工程大学 A kind of method for secret protection and system being associated with big data
CN109871698B (en) * 2019-01-14 2021-10-26 深圳市奥特尔软件技术有限公司 Data processing method, data processing device, computer equipment and storage medium
CN110399746B (en) * 2019-07-15 2021-06-18 北京邮电大学 Anonymous data publishing method and device based on sensitivity grading
CN110633575A (en) * 2019-09-19 2019-12-31 腾讯云计算(北京)有限责任公司 Data encryption method, device, equipment and storage medium
CN115409198A (en) * 2019-12-11 2022-11-29 支付宝(杭州)信息技术有限公司 Distributed prediction method and system thereof
CN110866284A (en) * 2020-01-16 2020-03-06 支付宝(杭州)信息技术有限公司 Data fusion processing method, device and system based on privacy data protection
CN111324911B (en) * 2020-05-15 2021-01-01 支付宝(杭州)信息技术有限公司 Privacy data protection method, system and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908732A (en) * 2017-11-14 2018-04-13 北京恺思睿思信息技术有限公司 A kind of mutually isolated multi-source big data convergence analysis method and system
CN108984733A (en) * 2018-07-13 2018-12-11 北京京东金融科技控股有限公司 cross-domain data fusion method, system and storage medium
CN109726758A (en) * 2018-12-28 2019-05-07 辽宁工业大学 A kind of data fusion publication algorithm based on difference privacy

Also Published As

Publication number Publication date
WO2021228149A1 (en) 2021-11-18
CN111324911A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
CN111324911B (en) Privacy data protection method, system and device
Volety et al. Cracking Bitcoin wallets: I want what you have in the wallets
CN106375331B (en) Attack organization mining method and device
CN110457912B (en) Data processing method and device and electronic equipment
CA2906475C (en) Method and apparatus for substitution scheme for anonymizing personally identifiable information
JP2017091515A (en) Computer-implemented system and method for automatically identifying attributes for anonymization
Vidyarthi et al. Static malware analysis to identify ransomware properties
US9916472B2 (en) Obfuscation and protection of data rights
JP2022544929A (en) Machine learning with functional obfuscation
US20160337348A1 (en) Security for cloud systems and virtualization cloud systems, mobile cloud systems and mobile virtualization cloud systems, and computer clusters and mobile device clusters
US20170155683A1 (en) Remedial action for release of threat data
Lee et al. Fileless cyberattacks: Analysis and classification
Tayyab et al. Cryptographic based secure model on dataset for deep learning algorithms
CN114896603A (en) Service processing method, device and equipment
CN113055153B (en) Data encryption method, system and medium based on fully homomorphic encryption algorithm
CN111475690B (en) Character string matching method and device, data detection method and server
CN108416229A (en) A kind of data desensitization method for classification information
CN115208611A (en) Identity authentication method, identity authentication device, computer equipment, storage medium and program product
US11455404B2 (en) Deduplication in a trusted execution environment
Ortloff et al. Replicating a study of ransomware in Germany
Nkoro et al. Explainable metaverse ransomware detection using SHAP
Alsharabi et al. Analysis of ransomware using reverse engineering techniques to develop effective countermeasures
Bhavyasree et al. Public auditing to provide privacy preservation of cloud data using ring signatures
Layton Relative cyberattack attribution
CN117171720B (en) Data attribution right identification system and method based on behavior fingerprint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40030789

Country of ref document: HK