WO2023109246A1 - 一种面向断点隐私保护的方法、装置、设备及介质 - Google Patents

一种面向断点隐私保护的方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023109246A1
WO2023109246A1 PCT/CN2022/121482 CN2022121482W WO2023109246A1 WO 2023109246 A1 WO2023109246 A1 WO 2023109246A1 CN 2022121482 W CN2022121482 W CN 2022121482W WO 2023109246 A1 WO2023109246 A1 WO 2023109246A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
privacy
privacy loss
participants
participant
Prior art date
Application number
PCT/CN2022/121482
Other languages
English (en)
French (fr)
Inventor
赵蕾
Original Assignee
新智我来网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 新智我来网络科技有限公司 filed Critical 新智我来网络科技有限公司
Publication of WO2023109246A1 publication Critical patent/WO2023109246A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a method, device, device and medium for breakpoint privacy protection.
  • Federated learning is a distributed learning paradigm for distributed datasets that preserves data privacy.
  • the classic federated learning framework includes a central node and different local participants, and each participant uploads, distributes, and updates model parameters for training, requiring each participant to maintain consistency in training.
  • Each participant can train an accurate global model based on local privacy data through joint learning, and realize the connection between each participant on the basis of protecting the privacy data of each participant.
  • the embodiments of the present disclosure provide a method, device, device, and medium for breakpoint privacy protection, so as to solve the problem that the prior art cannot control the privacy loss in the model training process, when a certain participant does not participate in the current When training the federated learning model, it is impossible to balance the privacy and usability of the model.
  • the first aspect of the embodiments of the present disclosure provides a method for breakpoint privacy protection, including:
  • the privacy loss caused by the central node accessing the corresponding model parameters of each participant is counted, and the total value of the privacy loss corresponding to each participant is obtained;
  • the noise corresponding to the number of non-participating parties is added to the corresponding model parameters.
  • the second aspect of the embodiments of the present disclosure provides a breakpoint-oriented privacy protection device, including:
  • Training module noise addition module, aggregation module, statistics module, calculation module, noise supplementation module;
  • the training module is used to obtain the data of each participant, so as to respectively train the corresponding local models according to the data of each participant, and respectively obtain the model parameters corresponding to the local models;
  • the noise adding module is used to add noise to the corresponding model parameters through a preset encryption algorithm to obtain the corresponding encryption value of each model parameter;
  • the aggregation module is used to upload each encrypted value to the central node, and perform an aggregation operation on each encrypted value through the central node to obtain corresponding aggregation parameters, so as to calculate the mean value of the model parameters corresponding to each participant according to the aggregation parameters;
  • the statistics module is used to use the preset privacy loss calculation mechanism, according to the mean value of the model parameters, to count the privacy loss generated by the central node accessing the corresponding model parameters of each participant, and obtain the total value of the privacy loss corresponding to each participant;
  • the calculation module is used to compare the total value of privacy loss with the preset threshold value of privacy loss. If the total value of privacy loss is less than the preset threshold value of privacy loss, then according to the total value of privacy loss and the preset threshold value of privacy loss, calculate the The number of non-participants participating in the current joint learning model training; where the status of non-participants is a breakpoint;
  • the noise supplement module is used to supplement and add noise corresponding to the number of non-participants to the corresponding model parameters according to the preset encryption algorithm.
  • a third aspect of the embodiments of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, where the processor implements the steps of the above method when executing the computer program.
  • a fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above method are implemented.
  • the embodiments of the present disclosure have the following beneficial effects: the obtained data of each participant is used to train the local model corresponding to each participant, so that the corresponding model parameters can be obtained; Add noise interference to the model parameters to obtain the encrypted value corresponding to the noise, so that the difference between the encrypted value and the model parameters can be concealed, so that the attacker cannot obtain the real model parameters according to the query results, and does not need to care about the background knowledge owned by the attacker; Through the aggregation operation of each encrypted value by the central node, the mean value of the model parameters corresponding to each participant can be determined, and the current model corresponding to each participant can be updated accordingly; through the preset privacy loss calculation mechanism, statistical model training process The total value of the privacy loss, compared with the preset privacy loss threshold, can effectively control the privacy protection effect of the model; when there are non-participants who have not participated in the current joint learning model training, the noise corresponding to the non-participants of the breakpoint Added to the model parameters corresponding to each participant, so as
  • FIG. 1 is a schematic diagram of a joint learning architecture according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a method for breakpoint privacy protection provided by an embodiment of the present disclosure
  • Fig. 3 is a schematic flowchart of another method for breakpoint privacy protection provided by an embodiment of the present disclosure
  • Fig. 4 is a schematic structural diagram of a device for breakpoint privacy protection provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • Fig. 6 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present disclosure.
  • Federated learning refers to the comprehensive utilization of various AI (Artificial Intelligence, artificial intelligence) technologies on the premise of ensuring data security and user privacy, and joint multi-party cooperation to jointly mine data value and generate new intelligent business models and models based on joint modeling.
  • Federated learning has at least the following characteristics:
  • Participating nodes control the weakly centralized joint training mode of their own data to ensure data privacy and security in the process of co-creating intelligence.
  • FIG. 1 is a schematic diagram of a joint learning architecture according to an embodiment of the present disclosure.
  • the architecture of joint learning may include a server (central node) 101 , and participants 102 , 103 , and 104 .
  • the basic model can be established by the server 101, and the server 101 sends the model to the participant 102, the participant 103 and the participant 104 with which a communication connection is established.
  • the basic model can also be uploaded to the server 101 after being created by any participant, and the server 101 sends the model to other participants that have established communication connections with it.
  • Participant 102, participant 103 and participant 104 build a model according to the downloaded basic structure and model parameters, use local data for model training, obtain updated model parameters, and encrypt and upload the updated model parameters to the server 101.
  • the server 101 aggregates the model parameters sent by the participant 102 , the participant 103 and the participant 104 to obtain the global model parameters, and returns the global model parameters to the participant 102 , the participant 103 and the participant 104 .
  • the participant 102, the participant 103 and the participant 104 iterate their models according to the received global model parameters until the models finally converge, thereby realizing the training of the models.
  • the data uploaded by participant 102, participant 103, and participant 104 are model parameters, local data will not be uploaded to server 101, and all participants can share the final model parameters, so data can be guaranteed Co-modeling is achieved on the basis of privacy.
  • the number of participants is not limited to the above three, but can be set according to needs, which is not limited in this embodiment of the present disclosure. Since the existing technology cannot control the privacy loss in the model training process, when a participant does not participate in the current joint learning model training, it is impossible to balance the privacy and usability of the model, so it is necessary to propose a A method to control privacy loss and achieve privacy protection when participating in federated learning model training.
  • FIG. 2 is a schematic flowchart of a method for breakpoint privacy protection provided by an embodiment of the present disclosure.
  • a breakpoint privacy protection-oriented method in FIG. 2 may be executed by the breakpoint privacy protection-oriented server in FIG. 1 .
  • the method for breakpoint privacy protection includes:
  • the statistics center node accesses the privacy loss generated by the model parameters corresponding to each participant, and obtains the total value of the privacy loss corresponding to each participant.
  • the server when facing breakpoint privacy protection, obtains the private data of each participant from each participant, and then trains the local model corresponding to each participant according to the obtained private data of each participant, so that Obtain the model parameters corresponding to the local model from the local model corresponding to each participant.
  • the server adds noise to the model parameters corresponding to the obtained local model through the preset encryption algorithm, so as to obtain the encrypted value after adding noise corresponding to each model parameter; then upload the encrypted value corresponding to each model parameter to the center
  • the node through the central node, aggregates the encrypted value corresponding to each model parameter to obtain the aggregation parameter corresponding to each model parameter, and then calculates the mean value of the model parameter corresponding to each participant according to the aggregation parameter.
  • the server can count the privacy loss caused by the central node accessing the model parameter lock corresponding to each participant, and obtain the corresponding privacy of each participant.
  • the total value of the loss based on which the iterative process in the joint learning model training process is controlled.
  • This application introduces a preset privacy loss calculation mechanism to calculate the privacy loss caused by the central node's access to training data, so as to control and guide the whole process of the entire analysis activity.
  • the server compares the total privacy loss calculated by the preset privacy loss calculation mechanism with the preset privacy loss threshold, and judges in real time whether the total privacy loss during the joint learning model training process exceeds the preset privacy loss threshold.
  • the preset privacy loss calculation mechanism in the embodiment of the present application is an accounting (Moments Accountant, MA) mechanism.
  • the total value of privacy loss is less than the preset privacy loss threshold, it indicates that there are non-participants who have not participated in the training of the current joint learning model, and the current total value of privacy loss cannot reach the balance between the privacy and usability of the model, so it is necessary to According to the total value of privacy loss and the preset privacy loss threshold, the number of non-participants among the participating parties who have not participated in the current joint learning model training is calculated. It should be noted that the status of the non-participating party in this application is a breakpoint.
  • the server After calculating the number of non-participants who have not participated in the training of the current joint learning model, the server will add noise corresponding to the number of non-participants to the model parameters corresponding to each participant according to the preset encryption algorithm to ensure that the current joint learning When there are uninvolved parties in model training, a balance is reached between the privacy and usability of the model.
  • the obtained data of each participant is used to train the corresponding local model of each participant, so that the corresponding model parameters can be obtained; noise interference is added to the model parameters through the preset encryption algorithm to obtain The encrypted value corresponding to the noise can cover up the difference between the encrypted value and the model parameters, so that the attacker cannot obtain the real model parameters according to the query results, and does not need to care about the background knowledge owned by the attacker; Values are aggregated to determine the mean value of the model parameters corresponding to each participant, and based on this, the current model corresponding to each participant can be updated; through the preset privacy loss calculation mechanism, the total value of privacy loss during the statistical model training process, By comparing with the preset privacy loss threshold, the privacy protection effect of the model can be effectively controlled; when there are non-participants who have not participated in the current joint learning model training, the noise corresponding to the non-participants of the breakpoint is added to the corresponding noise of each participant.
  • the parameters of the model so as to balance
  • noise is added to the model parameters corresponding to each participant through a preset encryption algorithm to obtain an encrypted value corresponding to each model parameter, specifically including:
  • noise is added to the total value of the gradient corresponding to each model parameter after clipping through the noise addition sub-protocol;
  • the encryption value corresponding to each model parameter is obtained.
  • the server calculates the gradient corresponding to each model parameter according to the model parameter corresponding to each participant, and then tailors the gradient corresponding to each model parameter, so that according to the preset
  • the encryption algorithm through the noise adding sub-protocol, adds noise to the total gradient value corresponding to each model parameter after trimming, and obtains the encryption value corresponding to each model parameter according to the total gradient value corresponding to each model parameter after adding noise.
  • the preset encryption algorithm is used to add noise to the model parameters corresponding to each participant, so as to realize the encryption of the private data of each participant, so that the difference between the encrypted value and the model parameters can be concealed.
  • the difference makes it impossible for the attacker to obtain the real model parameters according to the query results, and does not need to care about the background knowledge owned by the attacker.
  • the noise adding sub-protocol before adding noise to the total gradient value corresponding to each model parameter after pruning through the noise adding sub-protocol according to the preset encryption algorithm, it also includes:
  • the server when adding noise to the model parameters of each participant, the server counts the total value of noise added during the joint learning model training process, and determines the total number of participants corresponding to the current joint learning model training, and then according to the statistical noise Add the ratio of the total value to the total number of participants, and calculate the noise to be added corresponding to the model parameters of each participant.
  • the noise to be added corresponding to each participant is calculated by the ratio of the total value of noise added to the total number of participants, which is convenient for subsequent joint learning model training when there are no participants , the noise to be added corresponding to each non-participating party can be determined, so that the loss of privacy can be controlled.
  • the number of non-participants among the participants that have not participated in the current joint learning model training is calculated, specifically including:
  • the number of non-participants who have not participated in the current joint learning model training among the participants is determined.
  • the server can determine the privacy loss difference corresponding to the current joint learning model training, and then correspond to each non-participating party according to the privacy loss difference Calculate the ratio between the noises to be added, and calculate the number of non-participants among the participants who have not participated in the current joint learning model training.
  • the total value of the privacy loss trained by the current joint learning model does not reach the preset privacy loss threshold, it is determined by the ratio of the privacy loss difference to the noise to be added corresponding to each non-participating party The number of non-participating parties, so that according to the calculated number of non-participating parties and the noise to be added corresponding to each non-participating party, the privacy loss difference corresponding to the current joint learning model training can be supplemented, so as to ensure the relationship between the privacy and usability of the model. To achieve a balance between the model to ensure the quality of service.
  • the preset encryption algorithm after adding noise corresponding to the number of non-participants to the corresponding model parameters according to the preset encryption algorithm, it also includes:
  • the encrypted value after supplementing the noise is aggregated to obtain the aggregation parameter after supplementing the noise;
  • the local model corresponding to each participant is updated according to the mean value of the model parameters corresponding to each participant after adding noise.
  • the server after the server adds noise corresponding to the number of non-participants to the corresponding model parameters according to the preset encryption algorithm, it obtains the encrypted value corresponding to each model parameter after the noise is added, and corresponds to each model parameter after the noise is added to
  • the encrypted value of the encrypted value is uploaded to the central node to re-aggregate the encrypted value after supplementing the noise through the central node to obtain the corresponding aggregation parameter after supplementing the noise; according to the difference between the total number of participants and the number of non-participants, calculate Calculate the number of participants, and then calculate the mean value of the model parameters corresponding to each participant after supplementing the noise according to the aggregation parameters after supplementing the noise and the ratio between the number of participants, so that according to the model parameters corresponding to each participant after supplementing the noise
  • the mean value updates the local model corresponding to each participant.
  • the mean value of the model parameters corresponding to each participant is determined, so that the local model corresponding to each participant can be updated according to the mean value of the model parameters, and the completion A training update of this model makes the performance of the local model corresponding to each participant better.
  • the preset privacy loss threshold after comparing the total value of privacy loss with the preset privacy loss threshold, it further includes:
  • the current joint learning model training is stopped.
  • the server compares the total value of privacy loss with the preset privacy loss threshold, if the server determines that the total value of privacy loss is greater than or equal to the preset privacy loss threshold, it means that the total value of privacy loss trained by the joint learning model has reached the expected value. , at this time, the privacy and usability of the model have reached a balance, so stop the current joint model training to avoid adding too much noise to the model, which will cause the model to be unusable.
  • the preset encryption algorithm is a differential privacy algorithm.
  • the differential privacy algorithm combined with the privacy loss calculation mechanism can ensure that the privacy loss reaches the preset privacy loss threshold, and differential privacy is to cover up the differences between real data by adding noise interference, Differential privacy query can ensure that the results remain unchanged when data is added or deleted, and the attacker cannot obtain real data based on the query results, and does not need to care about the background knowledge owned by the attacker.
  • Fig. 3 is a schematic flowchart of another method for breakpoint privacy protection provided by an embodiment of the present disclosure.
  • the server trains the local models corresponding to each participant according to the obtained data of each participant, and obtains the model parameters corresponding to each local model, and then uses the differential privacy algorithm to process the obtained model parameters Add noise to obtain the encrypted value corresponding to each model parameter; then upload the encrypted value corresponding to each model parameter to the central node, so as to perform aggregation operations on each encrypted value through the central node, obtain the corresponding aggregation parameter, and calculate The average value of the model parameters corresponding to each participant, and the average value of the model parameters is sent to each participant, so that each participant updates its corresponding local model according to the average value of the model parameters.
  • the server uses the MA mechanism to count the privacy loss during the model training process, and judges whether the total value of the privacy loss exceeds the preset privacy loss threshold. If not, it sends it to each participant through the central node to determine whether the number of participants is equal to the total number of participants Quantity; when the number of participants is less than the total number of participants, calculate the privacy loss difference between the preset privacy loss threshold and the total value of privacy loss, according to the ratio of the privacy loss difference to the noise to be added corresponding to each participant, Determine the number of non-participants, and add the noise corresponding to the number of non-participants to the model parameters corresponding to each participant, so that the privacy and usability of the model can be balanced.
  • Fig. 4 is a schematic structural diagram of a device for breakpoint privacy protection provided by an embodiment of the present disclosure. As shown in Figure 4, the device for breakpoint privacy protection includes:
  • the training module 401 is configured to obtain the data of each participant, so as to respectively train the corresponding local models according to the data of each participant, and respectively obtain the model parameters corresponding to the local models;
  • the noise adding module 402 is configured to add noise to the corresponding model parameters through a preset encryption algorithm to obtain the corresponding encryption value of each model parameter;
  • the aggregation module 403 is configured to upload each encrypted value to the central node, and perform an aggregation operation on each encrypted value through the central node to obtain corresponding aggregation parameters, so as to calculate the mean value of the model parameters corresponding to each participant according to the aggregation parameters;
  • the statistics module 404 is configured to use the preset privacy loss calculation mechanism to calculate the privacy loss caused by the central node accessing the model parameters corresponding to each participant according to the mean value of the model parameters, and obtain the total value of the privacy loss corresponding to each participant;
  • the calculation module 405 is configured to compare the total value of privacy loss with the preset privacy loss threshold, and if the total value of privacy loss is less than the preset privacy loss threshold, calculate the participant's The number of non-participants who have not participated in the training of the current joint learning model; among them, the status of non-participants is a breakpoint;
  • the noise supplement module 406 is configured to supplement and add noise corresponding to the number of non-participants to corresponding model parameters according to a preset encryption algorithm.
  • adding noise interference to the model parameters corresponding to each participant through the noise adding module can cover up the difference between the encrypted value and the model parameters, so that the attacker cannot obtain the real model according to the query results parameter, and does not need to care about the background knowledge owned by the attacker; the total value of privacy loss during the model training process can be counted through the MA mechanism, which can effectively control the privacy protection effect of the model; the noise supplementary module can supplement the noise corresponding to the non-participants Added to the model parameters corresponding to each participant, so as to balance the privacy and usability of the model and improve the service quality of the model.
  • Fig. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device 5 of this embodiment includes: a processor 501 , a memory 502 , and a computer program 503 stored in the memory 502 and operable on the processor 501 .
  • the processor 501 executes the computer program 503
  • the steps in the foregoing method embodiments are implemented.
  • the processor 501 executes the computer program 503, the functions of the modules/units in the foregoing device embodiments are realized.
  • the computer program 503 can be divided into one or more modules/units, and one or more modules/units are stored in the memory 502 and executed by the processor 501 to complete the present disclosure.
  • One or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 503 in the electronic device 5 .
  • the electronic equipment 5 may be electronic equipment such as desktop computers, notebooks, palmtop computers, and cloud servers.
  • the electronic device 5 may include but not limited to a processor 501 and a memory 502 .
  • FIG. 5 is only an example of the electronic device 5, and does not constitute a limitation to the electronic device 5. It may include more or less components than those shown in the figure, or combine certain components, or different components.
  • an electronic device may also include an input and output device, a network access device, a bus, and the like.
  • the processor 501 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the storage 502 may be an internal storage unit of the electronic device 5 , for example, a hard disk or a memory of the electronic device 5 .
  • the memory 502 can also be an external storage device of the electronic device 5, for example, a plug-in hard disk equipped on the electronic device 5, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card ( Flash Card), etc.
  • the memory 502 may also include both an internal storage unit of the electronic device 5 and an external storage device.
  • the memory 502 is used to store computer programs and other programs and data required by the electronic device.
  • the memory 502 can also be used to temporarily store data that has been output or will be output.
  • Fig. 6 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present disclosure. As shown in FIG. 6 , the computer-readable storage medium stores a computer program 601 , and when the computer program 601 is executed by a processor, the steps of the above method are realized.
  • the disclosed device/electronic equipment and method may be implemented in other ways.
  • the device/electronic device embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods. Multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • an integrated module/unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the present disclosure realizes all or part of the processes in the methods of the above embodiments, and can also be completed by instructing related hardware through computer programs.
  • the computer programs can be stored in computer-readable storage media, and the computer programs can be processed. When executed by the controller, the steps in the above-mentioned method embodiments can be realized.
  • a computer program may include computer program code, which may be in source code form, object code form, executable file, or some intermediate form or the like.
  • the computer-readable medium may include: any entity or device capable of carrying computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (Read-Only Memory, ROM), random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in computer readable media may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer readable media may not Including electrical carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Automation & Control Theory (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本公开提供了一种面向断点隐私保护的方法、装置、设备及介质。该方法包括:获取各参与方的数据以训练对应的本地模型,并获取对应的模型参数;通过预设的加密算法对对应的模型参数添加噪声得到对应的加密值;将各加密值上传至中心节点进行聚合操作得到对应的聚合参数,并计算各参与方对应的模型参数均值;通过预设的隐私损失计算机制统计对应的模型参数所产生的隐私损失得到对应的隐私损失总值;若隐私损失总值小于预设隐私损失阈值,则根据隐私损失总值与预设隐私损失阈值计算参与方中未参与当前联合学习模型训练的未参与方数量;根据预设的加密算法,对对应的模型参数补充添加未参与方数量对应的噪声。

Description

一种面向断点隐私保护的方法、装置、设备及介质 技术领域
本公开涉及计算机技术领域,尤其涉及一种面向断点隐私保护的方法、装置、设备及介质。
背景技术
随着深度学习理论技术的飞速发展,人工智能在各行各业中的优势逐渐展示出来,同时深度学习的隐私保护也越发重要。联合学习是一种保护数据隐私的分布数据集的分布式学习范式。经典的联合学习框架包含中心节点和不同的本地参与方,通过各参与方对模型参数上传、下发以及更新来训练,需要各参与方在训练中保持一致性。各参与方能够通过联合学习,并基于本地隐私数据训练出准确的全局模型,在保护各参与方隐私数据的基础上,实现了各参与方之间的连通。
但是在实际情况中,由于参与方较多,容易因为某些原因而导致某个参与方未能及时参与本轮训练。目前,为避免泄露各参与方的信息,大多通过差分隐私的方式对联合学习进行加密。在加密过程中,虽然能够通过添加噪声干扰来掩盖真实数据之间的差异性,但是无法控制隐私损失,并且无法针对某参与方未参与本轮训练而进行处理。
发明内容
有鉴于此,本公开实施例提供了一种用于断点隐私保护的方法、装置、设备及介质,以解决现有技术无法控制模型训练过程中的隐私损失,当某个参与方未参与当前联合学习模型训练时,无法使模型的隐私性与可用性达到平衡的问题。
本公开实施例的第一方面,提供了一种面向断点隐私保护的方法,包括:
获取各参与方的数据,以根据各参与方的数据,分别训练对应的本地模型,并分别获取本地模型对应的模型参数;
通过预设的加密算法对对应的模型参数添加噪声,得到各模型参数对应的加密值;
将各加密值上传至中心节点,通过中心节点对各加密值进行聚合操作,得到对应的聚合参数,以根据聚合参数,计算各参与方对应的模型参数均值;
通过预设的隐私损失计算机制,根据模型参数均值,统计中心节点访问各参与方对应的模型参数所产生的隐私损失,得到各参与方对应的隐私损失总值;
将隐私损失总值与预设隐私损失阈值进行对比,若隐私损失总值小于预设隐私损失阈值,则根据隐私损失总值与预设隐私损失阈值,计算出参与方中未参与当前联合学习模型训练的未参与方数量;其中,未参与方的状态为断点;
根据预设的加密算法,对对应的模型参数补充添加未参与方数量对应的噪声。
本公开实施例的第二方面,提供了一种面向断点隐私保护的装置,包括:
训练模块、噪声添加模块、聚合模块、统计模块、计算模块、噪声补充模块;
训练模块,用于获取各参与方的数据,以根据各参与方的数据,分别训练对应的本地模型,并分别获取本地模型对应的模型参数;
噪声添加模块,用于通过预设的加密算法对对应的模型参数添加噪声,得到各模型参数对应的加密值;
聚合模块,用于将各加密值上传至中心节点,通过中心节点对各加密值进行聚合操作,得到对应的聚合参数,以根据聚合参数,计算各参与方对应的模型参数均值;
统计模块,用于通过预设的隐私损失计算机制,根据模型参数均值,统计中心节点访问各参与方对应的模型参数所产生的隐私损失,得到各参与方对应的隐私损失总值;
计算模块,用于将隐私损失总值与预设隐私损失阈值进行对比,若隐私损失总值小于预设隐私损失阈值,则根据隐私损失总值与预设隐私损失阈值,计算出参与方中未参与当前联合学习模型训练的未参与方数量;其中,未参与方的状态为断点;
噪声补充模块,用于根据预设的加密算法,对对应的模型参数补充添加未参与方数量对应的噪声。
本公开实施例的第三方面,提供了一种电子设备,包括存储器、处理器以及存储在存储器中并且可在处理器上运行的计算机程序,该处理器执行计算机程序时实现上述方法的步骤。
本公开实施例的第四方面,提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述方法的步骤。
本公开实施例与现有技术相比存在的有益效果是:通过获取到的各参与方的数据训练各参与方对应的本地模型,从而能够获取到对应的模型参数;通过预设的加密算法对模型参数添加噪声干扰得到噪声对应的加密值,从而能够掩盖加密值与模型参数之间的差异性,使攻击者无法根据查询结果得到真实的模型参数,并且不需要关心攻击者拥有的背景知识;通过中心节点对各加密值进行聚合操作,能够确定出各参与方对应的模型参数均值,据此能够更新各参与方对应的本次模型;通过预设的隐私损失计算机制,统计模型训练过程中的隐私损失总值,通过与预设隐私损失阈值比较,能够有效地控制模型的隐私保护效果;当存在未参与当前联合学习模型训练的未参与方时,将断点的未参与方对应的噪声添加至各参与方对应 的模型参数,从而使模型的隐私性与可用性达到平衡。这样不仅对参与方的隐私数据提供了有力的隐私保护效果,还节省了隐私预算,同时还保障了模型的服务质量。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1是本公开实施例的一种联合学习的架构示意图;
图2是本公开实施例提供的一种面向断点隐私保护的方法的流程示意图;
图3是本公开实施例提供的另一种面向断点隐私保护的方法的流程示意图;
图4是本公开实施例提供的一种面向断点隐私保护的装置的结构示意图;
图5是本公开实施例提供的一种电子设备的结构示意图;
图6是本公开实施例提供的一种计算机可读存储介质的结构示意图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本公开实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本公开。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本公开的描述。
联合学习是指在确保数据安全及用户隐私的前提下,综合利用多种AI(Artificial Intelligence,人工智能)技术,联合多方合作共同挖掘数据价值,催生基于联合建模的新的智能业态和模式。联合学习至少具有以下特点:
(1)参与节点控制自有数据的弱中心化联合训练模式,确保共创智能过程中的数据隐私安全。
(2)在不同应用场景下,利用筛选和/或组合AI算法、隐私保护计算,建立多种模型聚合优化策略,以获取高层次、高质量的模型。
(3)在确保数据安全及用户隐私的前提下,基于多种模型聚合优化策略,获取提升联合学习引擎的效能方法,其中效能方法可以是通过解决包括计算架构并行、大规模跨域网络下的信息交互、智能感知、异常处理机制等,提升联合学习引擎的整体效能。
(4)获取各场景下多方用户的需求,通过互信机制,确定合理评估各联合参与方的真 实贡献度,进行分配激励。
基于上述方式,可以建立基于联合学习的AI技术生态,充分发挥行业数据价值,推动垂直领域的场景落地。
下面将结合附图详细说明根据本公开实施例的一种用于断点隐私保护的方法、装置、设备及介质。
图1是本公开实施例的一种联合学习的架构示意图。如图1所示,联合学习的架构可以包括服务器(中心节点)101以及参与方102、参与方103和参与方104。
在联合学习过程中,基本模型可以通过服务器101建立,服务器101将该模型发送至与其建立通信连接的参与方102、参与方103和参与方104。基本模型还可以是任一参与方建立后上传至服务器101,服务器101将该模型发送至与其建立通信连接的其他参与方。参与方102、参与方103和参与方104根据下载的基本结构和模型参数构建模型,利用本地数据进行模型训练,获得更新的模型参数,并将更新的模型参数加密上传至服务器101。服务器101对参与方102、参与方103和参与方104发送的模型参数进行聚合,获得全局模型参数,并将全局模型参数传回至参与方102、参与方103和参与方104。参与方102、参与方103和参与方104根据接收的全局模型参数对各自的模型进行迭代,直到模型最终收敛,从而实现对模型的训练。在联合学习过程中,参与方102、参与方103和参与方104上传的数据为模型参数,本地数据并不会上传至服务器101,且所有参与方可以共享最终的模型参数,因此可以在保证数据隐私的基础上实现共同建模。需要说明的是,参与方的数量不限于如上所述的三个,而是可以根据需要进行设置,本公开实施例对此不作限制。由于现有技术无法控制模型训练过程中的隐私损失,当某个参与方未参与当前联合学习模型训练时,无法使模型的隐私性与可用性达到平衡,所以需要提出一种在某个参与方未参与联合学习模型训练时控制隐私损失,达到隐私保护的方法。
图2是本公开实施例提供的一种面向断点隐私保护的方法的流程示意图。图2的一种面向断点隐私保护的方法可以由图1的面向断点隐私保护的服务器执行。如图2所示,该面向断点隐私保护的方法包括:
S201,获取各参与方的数据,以根据各参与方的数据,分别训练对应的本地模型,并分别获取本地模型对应的模型参数。
S202,通过预设的加密算法对对应的模型参数添加噪声,得到各模型参数对应的加密值。
S203,将各加密值上传至中心节点,通过中心节点对各加密值进行聚合操作,得到对应的聚合参数,以根据聚合参数,计算各参与方对应的模型参数均值。
S204,通过预设的隐私损失计算机制,根据模型参数均值,统计中心节点访问各参与方 对应的模型参数所产生的隐私损失,得到各参与方对应的隐私损失总值。
S205,将隐私损失总值与预设隐私损失阈值进行对比,若隐私损失总值小于预设隐私损失阈值,则根据隐私损失总值与预设隐私损失阈值,计算出参与方中未参与当前联合学习模型训练的未参与方数量。
S206,根据预设的加密算法,对对应的模型参数补充添加未参与方数量对应的噪声。
具体地,在面向断点隐私保护时,服务器从每个参与方中获取参与方的隐私数据,然后根据获取到的每个参与方的隐私数据,分别训练每个参与方对应的本地模型,从而从每个参与方对应的本地模型中获取本地模型对应的模型参数。服务器通过预设的加密算法,对获取到的本地模型对应的模型参数分别添加噪声,从而得到每个模型参数对应的添加噪声后的加密值;再将每个模型参数对应的加密值上传到中心节点,通过中心节点对每个模型参数对应的加密值进行聚合操作,从而得到各模型参数对应的聚合参数,然后根据聚合参数能够计算出每个参与方对应的模型参数均值。服务器通过预设的隐私损失计算机制以及已经计算出的每个参与方对应的模型参数均值,能够统计中心节点访问每个参与方对应的模型参数锁产生的隐私损失,得到各参与方对应的隐私损失总值,据此对联合学习模型训练过程中的迭代过程进行控制。本申请通过引入预设的隐私损失计算机制,对中心节点访问训练数据产生的隐私损失进行核算,从而对整个分析活动的全过程加以控制和引导。
服务器将预设的隐私损失计算机制统计出来的隐私损失总值与预设隐私损失阈值进行对比,实时判断联合学习模型训练过程中的隐私损失总值是否超过预设隐私损失阈值。需要说明的是,本申请实施例中预设的隐私损失计算机制为会计(Moments Accountant,MA)机制。
若确定出隐私损失总值小于预设隐私损失阈值,则表明存在未参与当前联合学习模型训练的未参与方,当前隐私损失总值达不到模型的隐私性与可用性之间的平衡,所以需要根据隐私损失总值与预设隐私损失阈值,计算出参与方中未参与当前联合学习模型训练的未参与方数量。需要说明的是,本申请中未参与方的状态为断点。在计算出未参与当前联合学习模型训练的未参与方数量之后,服务器会根据预设的加密算法,向各参与方对应的模型参数中补充添加未参与方数量对应的噪声,以确保当前联合学习模型训练存在未参与方时,模型的隐私性与可用性之间达到平衡。
根据本公开实施例提供的技术方案,通过获取到的各参与方的数据训练各参与方对应的本地模型,从而能够获取到对应的模型参数;通过预设的加密算法对模型参数添加噪声干扰得到噪声对应的加密值,从而能够掩盖加密值与模型参数之间的差异性,使攻击者无法根据查询结果得到真实的模型参数,并且不需要关心攻击者拥有的背景知识;通过中心节点对各 加密值进行聚合操作,能够确定出各参与方对应的模型参数均值,据此能够更新各参与方对应的本次模型;通过预设的隐私损失计算机制,统计模型训练过程中的隐私损失总值,通过与预设隐私损失阈值比较,能够有效地控制模型的隐私保护效果;当存在未参与当前联合学习模型训练的未参与方时,将断点的未参与方对应的噪声添加至各参与方对应的模型参数,从而使模型的隐私性与可用性达到平衡。这样不仅对参与方的隐私数据提供了有力的隐私保护效果,还节省了隐私预算,同时还保障了模型的服务质量。
在一些实施例中,通过预设的加密算法对各参与方对应的模型参数添加噪声,得到各模型参数对应的加密值,具体包括:
计算各模型参数对应的梯度,并对各模型参数对应的梯度进行裁剪;
根据预设的加密算法通过噪声添加子协议,对裁剪后的各模型参数对应的梯度总值添加噪声;
根据添加噪声后的梯度总值,得到各模型参数对应的加密值。
具体地,服务器在获取到各参与方对应的模型参数后,根据各参与方对应的模型参数,计算出各模型参数对应的梯度,然后对各模型参数对应的梯度进行裁剪,从而根据预设的加密算法,通过噪声添加子协议,对裁剪后的各模型参数对应梯度总值添加噪声,并根据添加噪声后的各模型参数对应的梯度总值,得出各模型参数对应的加密值。
根据本公开实施例提供的技术方案,通过预设的加密算法对各参与方对应的模型参数添加噪声,以实现对各参与方的隐私数据的进行加密,从而能够掩盖加密值与模型参数之间的差异性,使攻击者无法根据查询结果得到真实的模型参数,并且不需要关心攻击者拥有的背景知识。
在一些实施例中,根据预设的加密算法通过噪声添加子协议,对裁剪后的各模型参数对应的梯度总值添加噪声之前,还包括:
确定出当前联合学习模型训练对应的总参与方数量以及对应的噪声添加总值;
根据总参与方数量以及噪声添加总值,计算每个参与方的模型参数对应的待添加噪声。
具体地,服务器在对各参与方的模型参数添加噪声时,统计联合学习模型训练过程中的噪声添加总值,并确定出当前联合学习模型训练对应的总参与方数量,然后根据统计出的噪声添加总值与总参与方数量的比值,计算出每个参与方的模型参数对应的待添加噪声。
根据本公开实施例提供的技术方案,通过噪声添加总值与总参与方数量的比值计算出每个参与方对应的待添加噪声,便于后续在联合学习模型训练过程中存在未参与方的情况下,能够确定出每个未参与方对应的待添加噪声,从而能够控制隐私损失。
在一些实施例中,根据隐私损失总值与预设隐私损失阈值,计算出参与方中未参与当 前联合学习模型训练的未参与方数量,具体包括:
根据隐私损失总值与预设隐私损失阈值,确定当前联合学习模型训练对应的隐私损失差值;
根据隐私损失差值与每个参与方对应的待添加噪声之间的比值,确定出参与方中未参与当前联合学习模型训练的未参与方数量。
具体地,服务器根据预设隐私损失阈值和隐私损失总值之间的差值,能够确定出当前联合学习模型训练对应的隐私损失差值,然后根据该隐私损失差值与每个未参与方对应的待添加噪声之间的比值,计算出参与方中未参与当前联合学习模型训练的未参与方数量。
根据本公开实施例提供的技术方案,在当前联合学习模型训练的隐私损失总值未达到预设隐私损失阈值时,通过隐私损失差值与每个未参与方对应的待添加噪声的比值确定出未参与方的数量,从而能够根据计算出来的未参与方数量以及每个未参与方对应的待添加噪声,补充当前联合学习模型训练对应的隐私损失差值,从而确保模型的隐私性与可用性之间达到平衡,保障模型的服务质量。
在一些实施例中,根据预设的加密算法,对对应的模型参数补充添加未参与方数量对应的噪声之后,还包括:
得到补充噪声后的各模型参数对应的加密值,并将补充噪声后的加密值上传至中心节点;
通过中心节点对补充噪声后的加密值进行聚合操作,得到补充噪声后的聚合参数;
根据补充噪声后的聚合参数,计算补充噪声后的各参与方对应的模型参数均值;
根据补充噪声后的各参与方对应的模型参数均值,更新各参与方对应的本地模型。
具体地,服务器根据预设的加密算法,对对应的模型参数补充添加未参与方数量对应的噪声之后,得到补充噪声后的各模型参数对应的加密值,并将补充噪声后的各模型参数对应的加密值上传至中心节点,以通过中心节点对补充噪声后的加密值重新进行聚合操作,得到补充噪声后对应的聚合参数;根据总参与方数量与未参与方数量之间的差值,计算出参与方数量,再根据补充噪声后的聚合参数以及参与方数量之间的比值,计算出补充噪声后的各参与方对应的模型参数均值,从而根据补充噪声后的各参与方对应的模型参数均值,更新各参与方对应的本地模型。
根据本公开实施例提供的技术方案,通过计算补充噪声后的模型参数对应的聚合参与,确定出各参与方对应的模型参数均值,从而能够根据模型参数均值更新各参与方对应的本地模型,完成本模型的一次训练更新,使各参与方对应的本地模型性能更好。
在一些实施例中,将隐私损失总值与预设隐私损失阈值进行对比之后,还包括:
若确定隐私损失总值大于或等于预设隐私损失阈值,则停止当前联合学习模型训练。
具体地,服务器在将隐私损失总值与预设隐私损失阈值进行对比之后,若确定出隐私损失总值大于或等于预设隐私损失阈值,则表示联合学习模型训练的隐私损失总值已经达到预期,此时模型的隐私性与可用性已达到平衡,所以停止当前联合模型训练,避免模型的噪声添加过度,而导致模型无法达到可利用的程度。
根据本公开实施例提供的技术方案,通过将隐私损失总值与预设隐私损失阈值实时对比,可以准确的对模型的隐私性与可用性之间的平衡进行控制,提高模型的服务质量。
在一些实施例中,预设的加密算法为差分隐私算法。
根据本公开实施例提供的技术方案,通过差分隐私算法结合隐私损失计算机制,能够确保隐私损失达到预设隐私损失阈值,并且,差分隐私是通过添加噪声干扰来掩盖真实数据之间的差异性,差分隐私查询可确保数据增加或者删除时结果保持不变,攻击者无法根据查询结果得到真实数据,且不用关心攻击者拥有的背景知识。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。
图3是本公开实施例提供的另一种面向断点隐私保护的方法的流程示意图。如图3所示,服务器根据获取到的各参与方的数据,分别训练各参与方对应的本地模型,并获取每个本地模型对应的模型参数,然后通过差分隐私算法,对获取到的模型参数添加噪声,从而得到每个模型参数对应的加密值;再将各模型参数对应的加密值上传至中心节点,以通过中心节点对各加密值进行聚合操作,得出对应的聚合参数,并计算出各参与方对应的模型参数均值,将模型参数均值发送给各参与方,以使各参与方根据模型参数均值更新自身对应的本地模型。
服务器通过MA机制统计模型训练过程中的隐私损失,并判断隐私损失总值是否超过预设隐私损失阈值,若否,则通过中心节点发送给各参与方,以判断参与方数量是否等于总参与方数量;在参与方数量小于总参与方数量时,计算预设隐私损失阈值与隐私损失总值之间的隐私损失差值,根据隐私损失差值与每个参与方对应的待添加噪声的比值,确定出未参与方数量,并对各参与方对应的模型参数添加未参与方数量对应的噪声,从而使模型的隐私性与可用性达到平衡。
下述为本公开装置实施例,可以用于执行本公开方法实施例。对于本公开装置实施例中未披露的细节,请参照本公开方法实施例。
图4是本公开实施例提供的一种面向断点隐私保护的装置的结构示意图。如图4所示,该面向断点隐私保护的装置包括:
训练模块401,被配置为获取各参与方的数据,以根据各参与方的数据,分别训练对应 的本地模型,并分别获取本地模型对应的模型参数;
噪声添加模块402,被配置为通过预设的加密算法对对应的模型参数添加噪声,得到各模型参数对应的加密值;
聚合模块403,被配置为将各加密值上传至中心节点,通过中心节点对各加密值进行聚合操作,得到对应的聚合参数,以根据聚合参数,计算各参与方对应的模型参数均值;
统计模块404,被配置为通过预设的隐私损失计算机制,根据模型参数均值,统计中心节点访问各参与方对应的模型参数所产生的隐私损失,得到各参与方对应的隐私损失总值;
计算模块405,被配置为将隐私损失总值与预设隐私损失阈值进行对比,若隐私损失总值小于预设隐私损失阈值,则根据隐私损失总值与预设隐私损失阈值,计算出参与方中未参与当前联合学习模型训练的未参与方数量;其中,未参与方的状态为断点;
噪声补充模块406,被配置为根据预设的加密算法,对对应的模型参数补充添加未参与方数量对应的噪声。
根据本公开实施例提供的技术方案,通过噪声添加模块对各参与方对应的模型参数添加噪声干扰,能够掩盖加密值与模型参数之间的差异性,使攻击者无法根据查询结果得到真实的模型参数,并且不需要关心攻击者拥有的背景知识;通过MA机制统计模型训练过程中的隐私损失总值,能够有效地控制模型的隐私保护效果;通过噪声补充模块能够将未参与方对应的噪声补充添加至各参与方对应的模型参数中,从而使模型的隐私性与可用性达到平衡,提高模型的服务质量。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开实施例的实施过程构成任何限定。
图5是本公开实施例提供的一种电子设备的结构示意图。如图5所示,该实施例的电子设备5包括:处理器501、存储器502以及存储在该存储器502中并且可在处理器501上运行的计算机程序503。处理器501执行计算机程序503时实现上述各个方法实施例中的步骤。或者,处理器501执行计算机程序503时实现上述各装置实施例中各模块/单元的功能。
示例性地,计算机程序503可以被分割成一个或多个模块/单元,一个或多个模块/单元被存储在存储器502中,并由处理器501执行,以完成本公开。一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述计算机程序503在电子设备5中的执行过程。
电子设备5可以是桌上型计算机、笔记本、掌上电脑及云端服务器等电子设备。电子设备5可以包括但不仅限于处理器501和存储器502。本领域技术人员可以理解,图5仅仅是电子设备5的示例,并不构成对电子设备5的限定,可以包括比图示更多或更少的部件,或 者组合某些部件,或者不同的部件,例如,电子设备还可以包括输入输出设备、网络接入设备、总线等。
处理器501可以是中央处理单元(Central Processing Unit,CPU),也可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器502可以是电子设备5的内部存储单元,例如,电子设备5的硬盘或内存。存储器502也可以是电子设备5的外部存储设备,例如,电子设备5上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器502还可以既包括电子设备5的内部存储单元也包括外部存储设备。存储器502用于存储计算机程序以及电子设备所需的其它程序和数据。存储器502还可以用于暂时地存储已经输出或者将要输出的数据。
图6是本公开实施例提供的一种计算机可读存储介质的结构示意图。如图6所示,该计算机可读存储介质存储有计算机程序601,计算机程序601被处理器执行时实现上述方法的步骤。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
在本公开所提供的实施例中,应该理解到,所揭露的装置/电子设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/电子设备实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本公开实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,计算机程序可以存储在计算机可读存储介质中,该计算机程序在被处理器执行时,可以实现上述各个方法实施例的步骤。计算机程序可以包括计算机程序代码,计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质可以包括:能够携带计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号、电信信号以及软件分发介质等。需要说明的是,计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如,在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
以上实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的精神和范围,均应包含在本公开的保护范围之内。

Claims (10)

  1. 一种面向断点隐私保护的方法,其特征在于,包括:
    获取各参与方的数据,以根据所述各参与方的数据,分别训练对应的本地模型,并分别获取所述本地模型对应的模型参数;
    通过预设的加密算法对所述对应的模型参数添加噪声,得到各模型参数对应的加密值;
    将各加密值上传至中心节点,通过所述中心节点对所述各加密值进行聚合操作,得到对应的聚合参数,以根据所述聚合参数,计算所述各参与方对应的模型参数均值;
    通过预设的隐私损失计算机制,根据所述模型参数均值,统计所述中心节点访问所述各参与方对应的模型参数所产生的隐私损失,得到所述各参与方对应的隐私损失总值;
    将所述隐私损失总值与预设隐私损失阈值进行对比,若所述隐私损失总值小于所述预设隐私损失阈值,则根据所述隐私损失总值与所述预设隐私损失阈值,计算出参与方中未参与当前联合学习模型训练的未参与方数量;其中,未参与方的状态为断点;
    根据所述预设的加密算法,对所述对应的模型参数补充添加所述未参与方数量对应的噪声。
  2. 根据权利要求1所述的方法,其特征在于,所述通过预设的加密算法对所述对应的模型参数添加噪声,得到各模型参数对应的加密值,具体包括:
    计算各模型参数对应的梯度,并对所述各模型参数对应的梯度进行裁剪;
    根据预设的加密算法通过噪声添加子协议,对裁剪后的各模型参数对应的梯度总值添加噪声;
    根据添加噪声后的梯度总值,得到所述各模型参数对应的加密值。
  3. 根据权利要求2所述的方法,其特征在于,所述根据预设的加密算法通过噪声添加子协议,对裁剪后的各模型参数对应的梯度总值添加噪声之前,所述方法还包括:
    确定出当前联合学习模型训练对应的总参与方数量以及对应的噪声添加总值;
    根据所述总参与方数量以及所述噪声添加总值,计算每个参与方的模型参数对应的待添加噪声。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述隐私损失总值与所述预设隐私损失阈值,计算出参与方中未参与当前联合学习模型训练的未参与方数量,具体包括:
    根据所述隐私损失总值与所述预设隐私损失阈值,确定当前联合学习模型训练对应的隐私损失差值;
    根据所述隐私损失差值与每个参与方对应的待添加噪声之间的比值,确定出参与方中 未参与当前联合学习模型训练的未参与方数量。
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述预设的加密算法,对所述对应的模型参数补充添加所述未参与方数量对应的噪声之后,所述方法还包括:
    得到补充噪声后的各模型参数对应的加密值,并将补充噪声后的加密值上传至所述中心节点;
    通过所述中心节点对所述补充噪声后的加密值进行聚合操作,得到补充噪声后的聚合参数;
    根据所述补充噪声后的聚合参数,计算补充噪声后的各参与方对应的模型参数均值;
    根据所述补充噪声后的各参与方对应的模型参数均值,更新所述各参与方对应的本地模型。
  6. 根据权利要求1所述的方法,其特征在于,所述将所述隐私损失总值与预设隐私损失阈值进行对比之后,所述方法还包括:
    若确定所述隐私损失总值大于或等于所述预设隐私损失阈值,则停止当前联合学习模型训练。
  7. 根据权利要求1所述的方法,其特征在于,所述预设的加密算法为差分隐私算法。
  8. 一种面向断点隐私保护的装置,其特征在于,包括:训练模块、噪声添加模块、聚合模块、统计模块、计算模块、噪声补充模块;
    所述训练模块,用于获取各参与方的数据,以根据所述各参与方的数据,分别训练对应的本地模型,并分别获取所述本地模型对应的模型参数;
    所述噪声添加模块,用于通过预设的加密算法对所述对应的模型参数添加噪声,得到各模型参数对应的加密值;
    所述聚合模块,用于将各加密值上传至中心节点,通过所述中心节点对所述各加密值进行聚合操作,得到对应的聚合参数,以根据所述聚合参数,计算所述各参与方对应的模型参数均值;
    所述统计模块,用于通过预设的隐私损失计算机制,根据所述模型参数均值,统计所述中心节点访问所述各参与方对应的模型参数所产生的隐私损失,得到所述各参与方对应的隐私损失总值;
    所述计算模块,用于将所述隐私损失总值与预设隐私损失阈值进行对比,若所述隐私损失总值小于所述预设隐私损失阈值,则根据所述隐私损失总值与所述预设隐私损失阈值,计算出参与方中未参与当前联合学习模型训练的未参与方数量;其中,未参与方的状态为断点;
    所述噪声补充模块,用于根据所述预设的加密算法,对所述对应的模型参数补充添加所述未参与方数量对应的噪声。
  9. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并且可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1所述方法的步骤。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1所述方法的步骤。
PCT/CN2022/121482 2021-12-17 2022-09-26 一种面向断点隐私保护的方法、装置、设备及介质 WO2023109246A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111555029.2A CN116340959A (zh) 2021-12-17 2021-12-17 一种面向断点隐私保护的方法、装置、设备及介质
CN202111555029.2 2021-12-17

Publications (1)

Publication Number Publication Date
WO2023109246A1 true WO2023109246A1 (zh) 2023-06-22

Family

ID=86774797

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/121482 WO2023109246A1 (zh) 2021-12-17 2022-09-26 一种面向断点隐私保护的方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN116340959A (zh)
WO (1) WO2023109246A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371558A (zh) * 2023-12-04 2024-01-09 环球数科集团有限公司 一种用于隐私保护环境下执行机器学习的系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737743A (zh) * 2020-06-22 2020-10-02 安徽工业大学 一种深度学习差分隐私保护方法
CN112668726A (zh) * 2020-12-25 2021-04-16 中山大学 一种高效通信且保护隐私的个性化联邦学习方法
CN112818394A (zh) * 2021-01-29 2021-05-18 西安交通大学 具有本地隐私保护的自适应异步联邦学习方法
CN113094758A (zh) * 2021-06-08 2021-07-09 华中科技大学 一种基于梯度扰动的联邦学习数据隐私保护方法及系统
CN113591145A (zh) * 2021-07-28 2021-11-02 西安电子科技大学 基于差分隐私和量化的联邦学习全局模型训练方法
US20210360010A1 (en) * 2020-05-12 2021-11-18 Sharecare AI, Inc. Privacy Interface for Data Loss Prevention via Artificial Intelligence Models

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210360010A1 (en) * 2020-05-12 2021-11-18 Sharecare AI, Inc. Privacy Interface for Data Loss Prevention via Artificial Intelligence Models
CN111737743A (zh) * 2020-06-22 2020-10-02 安徽工业大学 一种深度学习差分隐私保护方法
CN112668726A (zh) * 2020-12-25 2021-04-16 中山大学 一种高效通信且保护隐私的个性化联邦学习方法
CN112818394A (zh) * 2021-01-29 2021-05-18 西安交通大学 具有本地隐私保护的自适应异步联邦学习方法
CN113094758A (zh) * 2021-06-08 2021-07-09 华中科技大学 一种基于梯度扰动的联邦学习数据隐私保护方法及系统
CN113591145A (zh) * 2021-07-28 2021-11-02 西安电子科技大学 基于差分隐私和量化的联邦学习全局模型训练方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371558A (zh) * 2023-12-04 2024-01-09 环球数科集团有限公司 一种用于隐私保护环境下执行机器学习的系统
CN117371558B (zh) * 2023-12-04 2024-03-08 环球数科集团有限公司 一种用于隐私保护环境下执行机器学习的系统

Also Published As

Publication number Publication date
CN116340959A (zh) 2023-06-27

Similar Documents

Publication Publication Date Title
WO2023124296A1 (zh) 基于知识蒸馏的联合学习训练方法、装置、设备及介质
CN113609521A (zh) 一种基于对抗训练的联邦学习隐私保护方法及系统
Roth et al. Nvidia flare: Federated learning from simulation to real-world
CN113469373B (zh) 基于联邦学习的模型训练方法、系统、设备及存储介质
WO2023109246A1 (zh) 一种面向断点隐私保护的方法、装置、设备及介质
US20240176906A1 (en) Methods, apparatuses, and systems for collaboratively updating model by multiple parties for implementing privacy protection
CN113988310A (zh) 深度学习模型选择方法、装置、计算机设备和介质
CN113902122A (zh) 联邦模型协同训练方法、装置、计算机设备及存储介质
CN114116705A (zh) 联合学习中确定参与方贡献值的方法及装置
CN114116707A (zh) 确定联合学习中参与方贡献度的方法及装置
WO2023124219A1 (zh) 一种联合学习模型迭代更新方法、装置、系统及存储介质
CN115510472A (zh) 一种面向云边聚合系统的多重差分隐私保护方法及系统
CN116402366A (zh) 基于联合学习的数据贡献评价方法及装置
CN113887746A (zh) 基于联合学习的降低通信压力的方法及装置
CN116050557A (zh) 电力负荷预测方法、装置、计算机设备和介质
CN113887495A (zh) 基于迁移学习的视频标注方法及装置
WO2023071529A1 (zh) 设备数据清洗方法、装置、计算机设备及介质
WO2023093229A1 (zh) 一种联合学习参数聚合方法、装置及系统
CN113887745A (zh) 数据异构的联合学习方法及装置
WO2023082787A1 (zh) 联合学习中确定参与方贡献度的方法、联合学习训练方法及装置
CN116502513A (zh) 基于联合学习建立数据贡献方的调控方法、装置及设备
CN116502512A (zh) 基于联合学习建立模型需求方的调控方法、装置及设备
CN116362102A (zh) 基于目标化的联合学习方法、装置、电子设备及存储介质
Yang Luo et al. Collaborative Modeling of Medical Image Segmentation Based on Blockchain Network
CN116432010A (zh) 一种联合学习模型的训练方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22905998

Country of ref document: EP

Kind code of ref document: A1