CN111723394B - Privacy protection distributed computing method and system for dynamically loading code base - Google Patents

Privacy protection distributed computing method and system for dynamically loading code base Download PDF

Info

Publication number
CN111723394B
CN111723394B CN202010321903.5A CN202010321903A CN111723394B CN 111723394 B CN111723394 B CN 111723394B CN 202010321903 A CN202010321903 A CN 202010321903A CN 111723394 B CN111723394 B CN 111723394B
Authority
CN
China
Prior art keywords
key
trusted
job
code
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010321903.5A
Other languages
Chinese (zh)
Other versions
CN111723394A (en
Inventor
吴鹏飞
沈晴霓
吴中海
杨雅辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202010321903.5A priority Critical patent/CN111723394B/en
Publication of CN111723394A publication Critical patent/CN111723394A/en
Application granted granted Critical
Publication of CN111723394B publication Critical patent/CN111723394B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Abstract

The invention discloses a privacy protection distributed computing method and a privacy protection distributed computing system for dynamically loading a code base, which relate to the field of cloud platform distributed computing and cloud user privacy protection, protect the confidentiality of data, operation codes and the code base which are relied on when a cluster of a cloud platform executes operation, ensure that malicious cloud internal attackers cannot obtain and tamper codes and input data of users in the operation executing process, ensure the privacy information of the users in the cloud computing process, successfully resist various attacks mentioned above, avoid frequent page exchange in the task executing process by introducing a verifiable secret sharing and careless transmission protocol, and ensure the minimization of a trusted area by loading the dynamic code base according to needs.

Description

Privacy protection distributed computing method and system for dynamically loading code base
Technical Field
The invention relates to the field of cloud platform distributed computing and cloud user privacy protection, in particular to a distributed computing method and a distributed computing system which are applied to a cloud platform, allow dynamic tasks to load a required code base and simultaneously protect user data and code confidentiality in the process of executing operation.
Background
Cloud computing is a mainstream method for processing mass data at present. The user deploys the distributed system, i.e., the cluster, by renting resources provided by the cloud service provider, and uploads input data to the cloud to execute the distributed job. And after the operation is finished, the user downloads and outputs the obtained result from the cloud. However, in some scenarios, such as online medical or genetic testing, the user's input data is often private. If the file is uploaded to the cloud directly in a clear text form, the file can be obtained by a malicious cloud attacker (such as a cloud administrator or a cloud service provider). Traditional privacy-preserving distributed computing systems designed with trusted processors require all dependent code libraries to be loaded into the trusted region during the job deployment phase. This can cause significant overhead due to frequent page swapping, due to the limited size of the trusted area.
Distributed computing
Distributed computing is applied to distributed clusters, and a computing target (generally called a job) is completed together through the cooperation between nodes. During the execution of the job, the system divides the job into a plurality of stages of tasks, and each task is distributed to one node. And after one task is executed, the output data is sent to other nodes to be used as the input of the next task to be continuously executed or used as the final output to be returned to the user. Compared with the traditional calculation on a single node, the distributed calculation has the characteristics of high speed, high fault tolerance rate and the like, and is very suitable for processing big data. At present, the mainstream applications are applied to many clusters of the cloud platform, such as Hadoop MapReduce, apache Spark, storm, and the like.
At the beginning of these cluster designs, no security measures are taken to ensure the efficiency of system execution. If the calculation is directly carried out in the clear text, the data depended on by the calculation is inevitably leaked. However, if a common encryption/decryption algorithm is adopted, such as AES, RSA, etc. The ciphertext needs to be decrypted before computation, which also exposes the plaintext to an internal cloud attacker. Further work has found that privacy is also compromised by the access patterns of the shuffled transmission of ciphertexts between servers (see o. Ohlimenko, m. Costa, c. Fournet, c. Gkansidis, m. Kohlweiss, and d. Sharma. Observation and prediction results in mapreduce. In Proceedings of the 22nd ACM sigas Conference on Computer and Communications Security (CCS). ACM 2015, pp.1570-1581.). New methods need to be explored to protect the privacy of data during the execution of distributed computing jobs.
Trusted processor
The trusted processor Intel SGX (Software Guard eXtensions) is a CPU extension that can protect the security of code execution on an Intel processor. It prevents a malicious attacker from obtaining and tampering with data in an application from the operating system or virtual machine monitor by creating a piece of hardware-isolated execution environment (called enclave) in memory. Meanwhile, the Intel SGX also provides some additional security functions, such as that a user can verify whether the code is safely and accurately loaded in a trusted area through remote attestation (remote attestation); after the trusted zone is destroyed, if the data stored encrypted externally has not been tampered with, the data encapsulation (data sealing) may allow it to be reloaded into the new trusted zone. However, the size of the trusted area of the Intel SGX is limited, and current versions can only support 92MB of data access. If accessing a data that is not in the trusted region requires switching the memory page into the trusted region and reading it into the cache, this process is called EPC-paging. Research work has shown that this overhead is very high, roughly thousands of times that of traditional memory accesses (see S.Arnautov, B.Trach, F.Gregor, T.Knauth, A.Martin, C.Priebe, J.Lind, D.Muthukumuran, D.O' Keeffe, M.Stillwell, D.Goltzsche, D.M.Eyers, R.Kapitza, P.R.Pietzuch, and C.Fetzer.SCONE: secure linux contacts with Intel SGX.in the Proceedings of the 12th USI. Symposium on Operating Systems Design and evaluation (OSDI). 2016, 689-703.). And frequent page swapping can make the system 5 times or more slower (see M.Taassori, A.Shafiee, and R.Balasub Monian.vacuum: reducing paging headers in SGX with efficiency integration verification structures. In Proceedings of the 23rd International Conference on architecture supply for Programming pages and Operating Systems (PLOS). 2018, pp.665-678.).
In a cloud scenario, in order to facilitate users to develop SGX applications, cloud service providers will typically provide some codebase, such as machine learning or SSL. When a user invokes these codebases, they need to be compiled into a dynamically linked library. To ensure that the codebase is not tampered with during the job execution phase, the usual approach is to load them into the trusted region of each job node of the distributed system during the job deployment phase. However, these codebases are not called all the time, and if they remain in the trusted area all the time, they may cause EPC-paging to trigger early, which has a great performance impact on the currently executed task.
Disclosure of Invention
In the existing cloud outsourcing computing scenario, after a user rents a cloud service, the data of the user is uploaded to the cloud for computing, and no information related to user input should be revealed in the whole computing process. In order to realize efficient distributed computation and protect the privacy information of users in the process of executing the operation, the invention provides a privacy protection distributed computation method for dynamically loading a code base, and designs and realizes a set of complete system. By using the method and the system, a malicious attacker in the cloud cannot obtain and tamper codes and input data of the user in the operation execution process, so that privacy information of the user in the cloud computing process is ensured, and the various attacks can be successfully resisted. When the cluster of the cloud platform executes the job, the confidentiality of data (including job input, intermediate results and output), job codes and a code base which are relied on by the cluster is protected; meanwhile, the invention avoids frequent page exchange in the task execution process by introducing the verifiable secret sharing and the careless transmission protocol, thereby realizing the loading of the dynamic code base according to the requirement and ensuring the minimization of the trusted area. Compared with the privacy protection system designed in the prior art, the system has higher performance.
The technical scheme of the invention is as follows:
a privacy protection distributed computing method for dynamically loading a code base comprises the following steps:
1) Job deployment:
a user encrypts a job input and a job code by using a symmetric key, uploads the job input to a cluster, sends the job code to a trusted area of each node in the cluster, and transmits the symmetric key through a secure channel;
the user verifies the signature of the trusted processor, if the verification is passed, the node can decrypt in the trusted area to obtain a symmetric key and is used for decrypting the job input and the job code;
2) And (3) code base reconstruction:
before the cluster executes the task, secret shares are transmitted through an oblivious transmission protocol, the accuracy of the secret shares is verified, and a code base on which the task depends is reconstructed in a trusted area after the secret shares are verified;
3) And (3) task execution:
the cluster task is executed in a trusted area, and after the task at the previous stage is finished, the generated intermediate result key value is used for processing according to the number of the tasks at the next stage and the symmetric key to generate a new key value with confidentiality;
4) And returning the result:
after the task is executed, the user downloads the output data generated by the operation from the cluster, and decrypts the output data by using the symmetric key to obtain a final result.
Further, the code base on which the job depends is shared to each node of the cluster using a verifiable secret sharing mechanism.
Further, the verifiable secret sharing mechanism employs the Feldman secret sharing algorithm, the output of which includes secret shares sent to each node, and also contains public commitments broadcast to all nodes, used to verify the accuracy of the shares upon codebase reconstruction.
Further, the method for transmitting the secret shares by the oblivious transmission protocol is as follows: the node performing the task sends a request for secret shares to other nodes that meet a threshold number, the corresponding shares being transmitted via an oblivious transport protocol.
Further, the oblivious transport protocol is based on key derivation function (key derivation function) and DDH assumption difficulties.
Further, the method for verifying the accuracy of the secret share is as follows: and after the node receives a certain amount of secret shares, verifying the accuracy of each share, if all the shares pass the verification, reconstructing the code base in the trusted area, otherwise, canceling the execution of the operation.
Further, the code base is reconstructed using lagrange interpolation.
Further, the symmetric key is transmitted by using the secure channel established by remote confirmation, and the method comprises the following steps:
1) The user firstly sends a public code (such as encryption and decryption, key generation and the like) to the trusted area;
2) Loading codes in the trusted area, calling a key generation algorithm to generate a pair of public and private keys, signing data in the trusted area by using the private key, and returning the data and the public key to a user;
3) The user firstly utilizes the public key to verify the accuracy of the signature, and if the signature passes the verification, the returned public key is utilized to encrypt the symmetric key and the symmetric key is returned to the trusted area; if not, reinitiating remote attestation;
4) After the trusted area receives the message of the user, the symmetric key of the user can be obtained by utilizing the private key for decryption.
Further, the generated intermediate result key value pair is used as an original key value pair, the key of the original key value pair utilizes a hash function to obtain a hash value, and the hash value modulo the number of the tasks at the later stage to obtain a new key of the key value pair; the key and value of the original key-value pair are encrypted using the symmetric key to obtain the value of the new key-value pair.
Further, the cluster presets a threshold value as the maximum point-to-point transmission quantity in network transmission to hide the access mode, and if the intermediate result does not reach the threshold value in transmission, the system generates random noise data in a trusted area and encrypts the data by using a symmetric key.
Further, the encrypted noise data is decrypted and then deleted before the task is executed at the later stage.
Further, after a task is executed, the trusted area is destroyed, and sensitive data stored in the trusted area is encrypted and stored in the external memory by using a data encapsulation function of the trusted processor, so that the sensitive data is reloaded when a new trusted area is created.
A privacy protection distributed computing system for dynamically loading a code library comprises a distributed code library management system, a safe mixed washing monitor and a job application program based on a trusted processor; wherein the content of the first and second substances,
distributed code library management system: the system comprises a control unit and a storage unit, wherein the control unit is used for running on a cluster main node and the storage unit is used for running on each working node, the control unit is used for recording metadata information (such as library marks, time stamps and the like) of each share, and the storage unit is used for storing and transmitting secret shares;
a secure shuffle monitor: the method is used for monitoring the shuffling process among nodes, generating a certain amount of noise data when the transmission of intermediate results shows that the network flow among the nodes does not reach a preset threshold value, and encrypting by using a symmetric key so as to hide a network access mode;
trusted processor based job application: the method comprises the following steps that a user realizes an outer layer application program by using any language, realizes a core task function by using the language supported by a trusted processor, and calls a trusted execution program in the outer layer application program, restores a code library, creates and destroys programs such as a trusted area and the like in order to realize the operation execution of privacy protection; the user encrypts the job input and job code with the symmetric key and transmits to the cluster, verifies the signature of the trusted processor, downloads the output data generated by the job from the cluster, and decrypts with the symmetric key to obtain the final result.
The invention has the beneficial effects that:
the privacy protection of cloud computing is a problem which is very important in both academic circles and industrial circles at present. Existing research work typically relies on complex cryptographic computations, such as homomorphic encryption techniques, which can result in significant computational and storage overhead. The efforts to implement cloud-secure computing with trusted processors are still very limited and the overhead due to frequent page swapping is common in practice. According to the distributed computing method and system, existing research work is analyzed, a new distributed computing method and system suitable for a cloud platform are provided, on one hand, privacy information of users in the operation execution process is protected, on the other hand, system bottleneck caused by frequent page exchange is eliminated, and system performance is improved to a great extent. The invention hopes to provide beneficial reference and reference for research and product development in the cloud security field.
The code amount increased by the method is only 0.12% of the code of the source Hadoop MapReduce system, and the method has little influence on the system. And as a pluggable module, the system provides the execution of the privacy-protected job as an optional function to the user, and the user can freely select to use or not use the function according to the requirement on data confidentiality. Through performance testing of the system, the system designed by the invention can improve the execution efficiency by 15.2% -31.3% compared with the most advanced Intel SGX-based cluster at present. Moreover, the invention has strong portability, and for other distributed systems, such as Apache Spark and Storm, the efficient privacy protection calculation can be realized as long as the method is realized.
Drawings
FIG. 1 is a hierarchy diagram of a privacy preserving distributed computing system with dynamically loaded codebases, under an embodiment.
FIG. 2 is a flow diagram of a privacy preserving distributed computing system operational lifecycle for dynamically loading codebases, under an embodiment.
Detailed Description
To further illustrate the features and advantages of the present invention, the following detailed description is given by way of example only and with reference to the accompanying drawings.
The embodiment provides a privacy protection distributed computing method and system for a dynamic loading code base. The system is completely compatible with a source-generated MapReduce programming framework. If the user needs to run the job in a privacy-preserving manner, this can be done by calling the provided API in the code. Meanwhile, the system also has good portability, and can be applied to other clusters such as Apache Spark and Storm by realizing corresponding computing interfaces.
Before the system is used, the system needs to be deployed. Besides the deployment mode according to the source Hadoop, the corresponding script needs to be modified, and some file paths and code libraries are added to realize the corresponding functions. The user then needs to write and compile the program in accordance with the programming framework provided by the system. And finally, uploading the codes and the input data to a cluster to execute the operation. After the operation is finished, the user downloads the ciphertext output from the cluster and decrypts the ciphertext output locally by using the symmetric key.
In this embodiment, the code implementation of the system is based on a source code of the Hadoop 2.7.2 version, an SGX SDK 2.1 version, a JDK 1.8.0 version, and an SGX GMP code library, and runs on an operating system of the Ubuntu 16.04TLS version. In order to realize the system, relevant codes such as Hadoop start scripts and shuffling processes are mainly modified, functions such as a distributed code library management system are expanded, and the total code amount is 2257 lines. The following describes in detail the system deployment, system programming framework, job lifecycle, and implementation in conjunction with the accompanying drawings.
System deployment
The system of the embodiment depends on a native Hadoop MapReduce computing framework, so that the MapReduce system needs to be configured and operated firstly. Configuration and installation of the SGX SDK and SGX GMP code bases are also required. After the distributed code library management system is successfully operated, the distributed code library management system realized by the invention is configured and operated, the hierarchical diagram of the complete system is shown in figure 1, and the gray rectangle represents the added modules of the system. The specific deployment steps are as follows:
step 1: compiling and running the distributed code library management system, and sending the compiled jar file to an access path of a cluster Hadoop code library;
step 2: and adding a class file operated by the node in the jar file of the distributed code library management system into a starting script of the hash-daemon, so that the distributed code library management system can be started and closed along with the source cluster.
And step 3: adding the path of the jar file of the distributed code library management system into the hash-env.sh script, so that the path of the jar file can be seen through the hash classpath command. The purpose is to make the user self-invoked API when writing an application.
And 4, step 4: a specified path is created at each node according to the declaration in the distributed code library management system, and the path stores the secret share of the code library received by each node, the recovered code library and the related information (public and private keys, log and the like) of the node.
And 5: xml configuration file of Hadoop adds an attribute MASTER _ IP _ PORT with value 8555. The aim is that the working node can know the running port of the code library management module of the master node, and the subsequent operation needs to be accessed.
And 6: the dependency LIBRARY in the SGX SDK is added to the environment variable LD _ LIBRARY _ PATH and also to the/etc/LD. The method aims to enable the task execution process to access the dependent API and avoid the exception that the code cannot be accessed.
And 7: restarting the cluster, executing jps command, and indicating the configuration success if prompting the PID executed by the code library management system.
System programming framework
The invention is compatible with a native MapReduce programming framework. For the two classes of map and reduce, the source MapReduce provides three methods respectively:
setup (): the method is only executed once before the map and reduce methods are executed, and is used for finishing the work of job initialization;
map ()/reduce (): the specific map and reduce methods are realized;
clear (): the method is executed only once after the map and reduce methods are executed, and is used for finishing the cleaning and deleting of the operation variables;
on this basis, the user needs to call the API provided by the present invention to implement privacy protection calculation, and the specific requirements are as follows:
for each class's setup method, the user needs to call three APIs:
1) Code base recovery SGX _ library recons (libName): the code base management system responsible for calling the current node restores the specified code base. Upon receiving the instruction, the code library management system will request the master node for the IP and port of the other nodes and implement an oblivious transfer protocol with them to transfer the required secret shares. When enough shares are received, the node reconstructs the code base inside the trusted zone.
2) Load the code library of map task system (lib. So): as the outer layer operation code is written by Java, JNI is called when the map and reduce written by the inner layer C language are called to be specifically realized. At compile time, the inner bridging code will generate a dynamically linked library of. Therefore, in the setup phase, this bridge code needs to be loaded first, so as to create a trusted area later.
3) Creating a trusted region JNICALL _ mainClass _ startEnclave (): after the dynamic link library of the map and reduce tasks is loaded, the compiled method executed in the trusted area needs to be loaded into the trusted area. Here the code implemented in the bridging code that creates the trusted zone is called. By this point, the job initialization work is all ended.
For each class of map or reduce methods, the user needs to call an API:
specifically, the map and reduce Task method JNICALL _ mainClass _ Task (): specific task implementation details need to be defined in the trusted area in the form of C language, so that the details are still called by JNI here. During execution, the corresponding method in the bridge code is called first, and then the sensitive code in the trusted area is called.
For each class's clean () method, the user needs to call an API:
destroy trusted region jnical _ mainClass _ stopencave (): after the map and reduce tasks are executed, the trusted area created on the node needs to be destroyed. Here again, the code implemented in the bridging code that deletes the trusted zone is invoked. After the trusted area is destroyed, the current task will not occupy the EPC resource of the node, thus realizing the functions of dynamic loading and deleting.
Life cycle of operation
After the user writes and compiles the job code according to the provided programming framework, the job code can be submitted to the cluster to be operated. Typically, the compiled job code contains 5 parts: jar packages of outer project code, bridging code (. So files) of map and reduce, and task code (envelope.signed.so files) that they execute within the trusted zone, respectively. The lifecycle of jobs in the cluster is illustrated in FIG. 2, with operations within the gray zone represented as being performed within the SGX trusted zone. The specific process comprises the following five stages:
1) Job deployment: the user needs to distribute the bridging code of map and reduce and the task code executed in the trusted area under the designated path of each node. And generating a job key, performing remote verification of integrity, encrypting input and secret shared codes, and the like, according to the requirements of the previous privacy protection distributed computing method.
2) And (3) code base reconstruction: before executing map and reduce tasks, the nodes previously transmit secret shares by executing an oblivious transport protocol and reconstruct the code base required by the tasks within the trusted zone.
3) map task execution: calling a map function in an enclave.
4) reduce task execution: and after receiving the output of the map task, calling a reduce function in an attribute.
5) And returning a result: and after all tasks are finished, the user downloads the ciphertext from the cluster and decrypts by using the symmetric key.
Practice of
The present embodiment tests system performance by implementing two applications, wordCount and matrix calculation. The system is deployed in a server with 1 master node and 5 working node clusters. The server is provided with a 3.00GHz Intel Xeon E3-1220 v6 CPU, 169B memory and 100GB disk.
The experimental method comprises the following steps: two jobs (the latter having frequent page exchanges) are run on the system of the present invention and the most advanced privacy protection distributed system obblic (see p.wu, q.shen, r.h.deng, x.liu, y.zhang, and z.wu.obblic: an SGX-based distributed computing frame with a formal proof. In Proceedings of the 14th ACM association on Computer and Communications Security (AsiaCCS), ACM,2019, pp.86-99), and their run times and overheads at different stages are tested separately, comparing their performance.
The experimental results are as follows: the overhead and total system run time for the different phases are shown in table 1.
TABLE 1 comparison of two system run times (units: seconds)
Figure BDA0002461739430000081
From table 1, it can be seen that when a job is executed by the system of the present invention, the job execution overhead is much lower than that of a system using a conventional job deployment protocol (all dependent code blocks are loaded into the trusted region), because when the code library is always kept inside the trusted region, there is an additional performance loss due to frequent page swapping. The system adopts a secret sharing and careless transmission protocol, so that the part of codes can be temporarily stored outside a trusted area, more memory resources are provided for task execution, and the part of overhead is avoided. In WordCount, the map and reduce tasks are reduced for 83 seconds, and the matrix calculation is reduced for 39 seconds. For the inadvertent transmission and code base reconstruction adopted by the invention, the occupied time is very small and is far less than the overhead caused by EPC-paging. For complex cloud computing environments, cloud service providers need to provide users with third party code libraries that are stored in a trusted region in the form of dynamically linked libraries. Experimental verification and analysis show that the system can be well applied to the scene.
The above embodiments are only used as examples for giving the technical solution of the present invention, and are not limited thereto. And that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. The scope of the invention is to be determined by the contents of the appended claims.

Claims (7)

1. A privacy protection distributed computing method for dynamically loading a code base is characterized by comprising the following steps:
a user encrypts a job input and a job code by using a symmetric key, uploads the job input to a cluster, sends the job code to a trusted area of each node in the cluster, and transmits the symmetric key through a secure channel; sharing a code base depended by the operation to each node of the cluster by utilizing a verifiable secret sharing mechanism; the verifiable secret sharing mechanism employs a Feldman secret sharing algorithm, the output of which includes secret shares sent to each node, and also contains public commitments broadcast to all nodes to verify the accuracy of the shares upon code library reconstruction;
the user verifies the signature of the trusted processor, if the signature passes the verification, the node is allowed to decrypt in the trusted area to obtain a symmetric key, and the symmetric key is used for decrypting the job input and the job code;
before the cluster executes the task, secret shares are transmitted through an oblivious transmission protocol, the accuracy of the secret shares is verified, and a code base on which the task depends is reconstructed in a trusted area after the secret shares are verified; the method for transmitting secret shares by means of an oblivious transmission protocol is as follows: the node executing the task sends a request of secret shares to other nodes meeting a certain threshold number, and corresponding shares are transmitted through an oblivious transmission protocol; the oblivious transport protocol is based on the key generation function and DDH assumption difficulty; the method for verifying the accuracy of the secret share comprises the following steps: after the node receives a certain amount of secret shares, verifying the accuracy of each share, if all the shares pass the verification, reconstructing a code base in a credible area by using a Lagrange interpolation method, otherwise, canceling the execution of the operation;
the cluster task is executed in a trusted area, and after the task at the previous stage is finished, the generated intermediate result key value is used for processing according to the number of the tasks at the next stage and the symmetric key to generate a new key value with confidentiality;
after the task is executed, the user downloads the output data generated by the operation from the cluster, and decrypts the output data by using the symmetric key to obtain a final result.
2. The method of claim 1, wherein transmitting the symmetric key using the secure channel established by remote attestation comprises the steps of:
the user sends the public codes to the trusted area, wherein the codes comprise encryption and decryption codes and key generation codes;
loading codes in the trusted area, calling a key generation algorithm to generate a pair of public and private keys, signing data in the trusted area by using the private key, and returning the data and the public key to the user;
the user verifies the accuracy of the signature by using the public key, and if the signature passes the verification, the returned public key is used for encrypting the symmetric key and the symmetric key is returned to the trusted area; if not, reinitiating remote attestation;
and after the trusted area receives the message of the user, the trusted area decrypts the message by using the private key to obtain the symmetric key of the user.
3. The method of claim 1, wherein the intermediate result key-value pair is generated as an original key-value pair, and wherein the key of the original key-value pair is hashed using a hash function, the hash function modulo the number of subsequent stage tasks to obtain a new key-value pair key; the key and value of the original key-value pair are encrypted using the symmetric key to obtain the value of the new key-value pair.
4. The method of claim 1, wherein the cluster presets a threshold as the maximum amount of point-to-point transmissions in the network transmission to hide access patterns, and wherein if the intermediate result is not reached at the time of transmission, the system generates random noise data in the trusted area and encrypts using the symmetric key.
5. A method as claimed in claim 1, wherein the encrypted noise data is decrypted and then deleted before the later stage task is performed.
6. The method of claim 1, wherein after a task is executed, the trusted region is destroyed, and sensitive data stored in the trusted region is stored in an external memory in an encrypted manner by using a data encapsulation function of the trusted processor, so that the sensitive data can be reloaded when a new trusted region is created.
7. A distributed computing system operable to perform the privacy preserving distributed computing method of claim 1, comprising a distributed code library management system, a secure shuffle monitor, and a trusted processor based job application; wherein the content of the first and second substances,
distributed code library management system: the secret share management system comprises a control unit and a storage unit, wherein the control unit is used for operating on a cluster main node and the storage unit is used for operating on each working node, the control unit is used for recording metadata information of each share, and the storage unit is used for storing and transmitting the secret shares;
a secure shuffle monitor: the method is used for monitoring the shuffling process among nodes, generating a certain amount of noise data when the transmission of intermediate results shows that the network flow among the nodes does not reach a preset threshold value, and encrypting by using a symmetric key so as to hide a network access mode;
trusted processor based job application: the system comprises an outer layer application program realized by any language and a core task function realized by the language supported by a trusted processor, and is responsible for calling a trusted execution program, a recovery code library program and a program for creating and destroying a trusted area in the outer layer application program, wherein a user encrypts job input and job codes by using a symmetric key and transmits the job input and job codes to a cluster, verifies the signature of the trusted processor, downloads output data generated by the job from the cluster, and decrypts by using the symmetric key to obtain a final result.
CN202010321903.5A 2020-04-22 2020-04-22 Privacy protection distributed computing method and system for dynamically loading code base Active CN111723394B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010321903.5A CN111723394B (en) 2020-04-22 2020-04-22 Privacy protection distributed computing method and system for dynamically loading code base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010321903.5A CN111723394B (en) 2020-04-22 2020-04-22 Privacy protection distributed computing method and system for dynamically loading code base

Publications (2)

Publication Number Publication Date
CN111723394A CN111723394A (en) 2020-09-29
CN111723394B true CN111723394B (en) 2022-10-11

Family

ID=72564165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010321903.5A Active CN111723394B (en) 2020-04-22 2020-04-22 Privacy protection distributed computing method and system for dynamically loading code base

Country Status (1)

Country Link
CN (1) CN111723394B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11822675B2 (en) * 2021-06-24 2023-11-21 International Business Machines Corporation Securing customer data and internal register data during hardware checkstops in a multi-tenant environment
CN113568755B (en) * 2021-08-04 2023-11-17 上海易景信息科技有限公司 Distributed compiling system and distributed compiling method
CN114297700B (en) * 2021-11-11 2022-09-23 北京邮电大学 Dynamic and static combined mobile application privacy protocol extraction method and related equipment
CN114462047B (en) * 2022-01-25 2024-03-29 北京工业大学 Cloud outsourcing calculation safety method based on SGX technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102769615A (en) * 2012-07-02 2012-11-07 北京大学 Task scheduling method and system based on MapReduce mechanism
CN107347096A (en) * 2017-07-07 2017-11-14 安徽大学 A kind of location privacy protection method based on Cloud Server
CN107851167A (en) * 2015-07-31 2018-03-27 微软技术许可有限责任公司 Protection calculates the technology of data in a computing environment
CN110555933A (en) * 2019-07-31 2019-12-10 中钞信用卡产业发展有限公司杭州区块链技术研究院 Electronic voting method, device, equipment and computer storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10460234B2 (en) * 2018-01-19 2019-10-29 Microsoft Technology Licensing, Llc Private deep neural network training

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102769615A (en) * 2012-07-02 2012-11-07 北京大学 Task scheduling method and system based on MapReduce mechanism
CN107851167A (en) * 2015-07-31 2018-03-27 微软技术许可有限责任公司 Protection calculates the technology of data in a computing environment
CN107347096A (en) * 2017-07-07 2017-11-14 安徽大学 A kind of location privacy protection method based on Cloud Server
CN110555933A (en) * 2019-07-31 2019-12-10 中钞信用卡产业发展有限公司杭州区块链技术研究院 Electronic voting method, device, equipment and computer storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ObliComm: Towards Building an Efficient Oblivious Communication System;Pengfei Wu 等;《IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING》;20191023;第2333-2348页 *
ObliDC: An SGX-based Oblivious Distributed Computing Framework with Formal Proof;Pengfei Wu 等;《Asia CCS "19: Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security》;20190702;86-99页 *
基于数字签名的增强的不经意传输协议;赵春明 等;《电子与信息学报》;20060228;第303-306页 *

Also Published As

Publication number Publication date
CN111723394A (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN111723394B (en) Privacy protection distributed computing method and system for dynamically loading code base
Schuster et al. VC3: Trustworthy data analytics in the cloud using SGX
RU2759331C2 (en) Unsealing data with sealing enclave
US11436341B2 (en) System and method for cryptographic keys security in the cloud
US10148442B2 (en) End-to-end security for hardware running verified software
JP7089529B2 (en) Data sealing using a sealing enclave
KR102467687B1 (en) Cross-Platform Enclave Identity
Hoang et al. Hardware-supported ORAM in effect: Practical oblivious search and update on very large dataset
KR102466793B1 (en) Abstract Enclave Identity
Alder et al. Migrating SGX enclaves with persistent state
EP3201819A1 (en) Automated verification of a software system
Hein et al. Secure Block Device--Secure, Flexible, and Efficient Data Storage for ARM TrustZone Systems
Schuster et al. Vc3: Trustworthy data analytics in the cloud
Merlo et al. You shall not repackage! demystifying anti-repackaging on android
Tople et al. {PRO-ORAM}: Practical {Read-Only} Oblivious {RAM}
Cui et al. SPEED: Accelerating enclave applications via secure deduplication
Wu et al. Exploring dynamic task loading in SGX-based distributed computing
da Silva et al. Squad: A secure, simple storage service for sgx-based microservices
Will et al. Intel Software Guard Extensions Applications: A Survey
Andrade et al. SRX–Secure Data Backup and Recovery for SGX Applications
WO2016159883A1 (en) Extracting information from a data set in a distributed computing environment
Severinsen Secure programming with intel sgx and novel applications
Teschke Hardening Applications with Intel SGX
Dreyer A Secure Message Broker in an Untrusted Environment
Sagar et al. Survey on Various Cloud Security Approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant