WO2024082515A1 - 一种去中心化的联邦聚类学习方法、装置、设备及介质 - Google Patents

一种去中心化的联邦聚类学习方法、装置、设备及介质 Download PDF

Info

Publication number
WO2024082515A1
WO2024082515A1 PCT/CN2023/079371 CN2023079371W WO2024082515A1 WO 2024082515 A1 WO2024082515 A1 WO 2024082515A1 CN 2023079371 W CN2023079371 W CN 2023079371W WO 2024082515 A1 WO2024082515 A1 WO 2024082515A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster center
cluster
clustering
sample
center
Prior art date
Application number
PCT/CN2023/079371
Other languages
English (en)
French (fr)
Inventor
孙银银
李仲平
Original Assignee
上海零数众合信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海零数众合信息科技有限公司 filed Critical 上海零数众合信息科技有限公司
Publication of WO2024082515A1 publication Critical patent/WO2024082515A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the field of federated learning, and in particular to a decentralized federated clustering learning method, apparatus, device and medium.
  • Vertical federated learning is a federated learning technology that allows task initiators and data providers to conduct data mining on common samples with different characteristics and cluster analysis on fused data sets. It has been applied in scenarios that protect data privacy and security. For example, a common application scenario is that banks conduct vertical federated learning with data providers that provide samples with different characteristics, and achieve data analysis and fusion while protecting data privacy.
  • the present application provides a decentralized federated clustering learning method, apparatus, device and medium, which can achieve efficient federated clustering learning while ensuring the privacy security of the task initiator and the data party.
  • a decentralized federated clustering learning method which is performed by a task initiator and includes:
  • the samples in the local data set of the data party have the same number as the samples in the local data set of the task initiator, but have different sample features;
  • the distance from each sample in the joint data set composed of the task initiator and at least two data parties to each cluster is determined, and based on a preset encryption algorithm, the total distance of each sample in the joint data set relative to the current cluster center is obtained, and the clustering result is generated according to the total distance;
  • the distance between the cluster center of this update and the cluster center of the previous iteration calculated by each data party is obtained, and the total distance between the cluster center of this update and the cluster center of the previous iteration corresponding to the joint data set is determined;
  • the cluster center after the last iteration update is determined as the final cluster center.
  • a decentralized federated clustering learning device configured in a task initiator and includes:
  • An initial determination module used to interact with at least two data parties based on a preset clustering algorithm and a preset encryption algorithm to determine an optimal initial clustering center and at least two initial clusters; the samples in the local data set of the data party have the same number as the samples in the local data set of the task initiator, but different sample features;
  • a generation module is used to determine the distance from each sample in the joint data set composed of the task initiator and at least two data parties to each cluster during the iterative update of the optimal initial cluster center and the initial cluster, and obtain the total distance of each sample in the joint data set relative to the current cluster center based on a preset encryption algorithm, and generate the current clustering result according to the total distance;
  • a sending module used to send the clustering result to at least two data parties, and to instruct each data party to update the locally stored cluster center according to the clustering result, and calculate the distance between the cluster center updated this time and the cluster center of the previous iteration;
  • a determination module is used to obtain the distance between the cluster center of this update and the cluster center of the previous iteration calculated by each data party based on a preset encryption algorithm, and determine the total distance between the cluster center of this update and the cluster center of the previous iteration corresponding to the joint data set;
  • the judgment module is used to determine whether the preset iteration termination condition is met according to the total distance between the cluster center of the current update and the cluster center of the previous iteration corresponding to the joint data set. If so, the cluster center after the last iteration update is determined as the final cluster center.
  • an electronic device comprising:
  • the memory stores a computer program that can be executed by the at least one processor, and the computer program is executed by the at least one processor so that the at least one processor can execute the decentralized federated clustering learning method described in any embodiment of the present application.
  • a computer-readable storage medium stores computer instructions, and the computer instructions are used to enable a processor to implement the present application when executed. Please use the decentralized federated clustering learning method described in any embodiment.
  • the task initiator interacts with at least two data parties based on a preset clustering algorithm and a preset encryption algorithm to determine an optimal initial clustering center and at least two initial clusters; in the process of iteratively updating the optimal initial clustering center and the initial cluster, the distance from each sample in the joint data set composed of the task initiator and the at least two data parties to each cluster is determined, and based on the preset encryption algorithm, the total distance of each sample in the joint data set relative to the clustering center of this time is obtained, and the clustering result of this time is generated according to the total distance; the clustering result of this time is sent to at least two data parties to indicate that each data party shall select the optimal initial clustering center and the initial cluster according to the optimal initial clustering center and the initial cluster according to the optimal initial clustering center.
  • the locally stored cluster center is updated, and the distance between the cluster center of this update and the cluster center of the previous iteration is calculated; based on the preset encryption algorithm, the distance between the cluster center of this update and the cluster center of the previous iteration calculated by each data party is obtained, and the total distance between the cluster center of this update and the cluster center of the previous iteration corresponding to the joint data set is determined; according to the total distance between the cluster center of this update and the cluster center of the previous iteration corresponding to the joint data set, it is determined whether the preset iteration termination condition is met. If so, the cluster center after the last iteration update is determined as the final cluster center.
  • Cluster learning is realized through the interaction between the task initiator and the data party, and centralization is removed, which can avoid the participation of third parties and ensure that data is not leaked by third parties. Further combined with the preset encryption algorithm, the data privacy security of the task initiator and the data party can be effectively guaranteed. In addition, the learning efficiency of federated clustering can be improved by determining the optimal initial cluster center through the interaction between the two parties.
  • FIG1 is a flow chart of a decentralized federated clustering learning method provided in Example 1 of the present application.
  • FIG2 is a flow chart of a decentralized federated clustering learning method provided in Example 2 of the present application.
  • FIG3 is a structural block diagram of a decentralized federated clustering learning device provided in Example 3 of the present application.
  • FIG. 4 is a schematic diagram of the structure of an electronic device provided in Embodiment 4 of the present application.
  • the initialized clustering center is obtained randomly, which can easily lead to the clustering result falling into the local optimum; in addition, the task initiator and the data party are often deployed on the client, and the third party deployed on the server participates in the clustering learning of the task initiator and the data party.
  • This deployment method of the server and the client has great security risks; further, the security of the interaction between the task initiator and the data party is not guaranteed, and it cannot resist malicious attacks against the task initiator and the data party.
  • the technical solution of the present application takes into account the above problems, and clustering is performed by allowing the task initiator and the data party to interact, thereby avoiding the participation of a third party and achieving decentralization; by allowing the task initiator and the data party to interact, thereby determining the optimal initial clustering center, the efficiency of clustering learning can be effectively improved; by utilizing privacy computing technology, the task initiator and the data party interact based on a preset encryption algorithm, which can effectively protect the data privacy security of all parties.
  • This application proposes a decentralized, end-to-end deployment method, a multi-party vertical federated learning scheme that resists malicious attacks from participating parties. The specific implementation process will be described in detail in subsequent embodiments.
  • FIG1 is a flowchart of a decentralized federated clustering learning method provided in Example 1 of the present application. This embodiment is applicable to the situation where the task initiator and the data party interact to implement federated clustering learning under the premise of ensuring the data privacy security of all participants.
  • the method can be executed by a decentralized federated clustering learning device, which can be implemented in software and/or hardware and can be integrated into an electronic device with a decentralized federated clustering learning function. As shown in FIG1, the method includes:
  • S101 Based on a preset clustering algorithm and a preset encryption algorithm, interact with at least two data parties to determine an optimal initial clustering center and at least two initial clusters.
  • the preset clustering algorithm can be a k-means++ clustering algorithm.
  • the preset encryption algorithm can be a Verifiable Secret Share (VSS) algorithm.
  • the task initiator is the initiator of the federated learning task.
  • the data party refers to the participant who provides the private data required for the federated learning task.
  • the samples in the local data set of the data party have the same number as the samples in the local data set of the task initiator, but the sample characteristics are different.
  • the optimal initial clustering center refers to the initial clustering center determined by the interaction between the task initiator and the data party.
  • the number of optimal initial clustering centers is at least two.
  • the initial cluster is a cluster associated with the initial clustering center.
  • One optimal initial clustering center corresponds to one initial cluster.
  • a preset clustering algorithm and a preset encryption algorithm interact with at least two data parties to determine the optimal initial clustering center and at least two initial clusters, including: based on the preset clustering algorithm, randomly obtaining a sample number as a target number, sending the target number to at least two data parties, for Instruct each data party to use the target sample corresponding to the target number as the first cluster center and calculate the distance from each sample to the first cluster center; based on a preset encryption algorithm, obtain the total distance from each sample in the joint data set to the first cluster center, select the sample with the maximum distance as the second cluster center, calculate the total distance from all samples in the joint data set to the second cluster center, and determine the third cluster center; based on the third cluster center, interact with at least two data parties to perform iterative updates, and if a preset number of clusters is detected, determine that the iteration is terminated; determine the optimal initial clustering center and at least two initial clusters based on the total distance of the preset number of clusters
  • based on the third cluster center interact with at least two data parties and perform iterative updates, including: based on the third cluster center, perform operations similar to those after determining the first cluster center and the second cluster center as described above, that is, based on a preset encryption algorithm, obtain the total distance from each sample in the joint data set to the cluster center; if the total distance is a matrix with n rows and 1 column, then the sample corresponding to the maximum value item in the matrix is used as the new cluster center, where n is the number of samples in the local data set of the task initiator, and k is the number of cluster centers; if the total distance is a matrix with n rows and k columns, then the sample corresponding to the minimum value in the row direction and the maximum value in the column direction in the matrix is used as the new cluster center, where the value range of k is from 2 to n.
  • the process of obtaining the total distance from each sample in the joint data set to the newly added cluster center and determining the new cluster center based on the preset encryption algorithm is repeated until a preset number of clusters (such as 10) are determined, at which time the iteration is terminated.
  • each of the preset number of clusters has a corresponding total distance
  • the preset number of clusters with the smallest total distance can be selected as the optimal cluster
  • the corresponding optimal cluster center point is also found, that is, the optimal initial cluster center and at least two initial clusters are determined.
  • the optimal initial cluster center and at least two initial clusters are determined, including: drawing a curve with clusters as independent variables and the sum of squared distances of each cluster as dependent variable based on the total distance of the corresponding preset number of clusters; determining the cluster corresponding to the inflection point position in the curve as the initial cluster, and determining the optimal initial cluster center based on the cluster centers of each initial cluster.
  • the sum of squares due to error (SSE) of the distances between clusters refers to the sum of squares due to error (SSE) of each sample in the joint data set relative to the center of the corresponding cluster.
  • the current cluster center refers to the cluster center determined in any iterative update process during the iteration process.
  • the total distance can represent the distance of each sample in the joint data set relative to the current cluster center.
  • the distance from each sample in the joint data set consisting of the task initiator and at least two data parties to each cluster is determined, and based on a preset encryption algorithm, the total distance of each sample in the joint data set relative to the current clustering center is obtained, including: determining the distance from each sample in the joint data set consisting of the task initiator and at least two data parties to each cluster according to the sum variance matrix of the samples relative to the cluster center; based on the preset encryption algorithm, interacting with at least two data parties, and determining the total distance of each sample in the joint data set relative to the current clustering center according to the total distance matrix of the samples relative to the cluster center.
  • the closest cluster center of each sample in the joint data set can be determined according to the total distance, and each sample can be divided into the cluster to which the closest cluster center belongs, that is, the clustering result is generated according to the total distance.
  • S103 Send the clustering result to at least two data parties, instructing each data party to update the locally stored cluster center according to the clustering result, and calculate the distance between the cluster center updated this time and the cluster center of the previous iteration.
  • each data party after each data party receives the clustering result sent by the task initiator, it can update the last clustering center stored locally according to the clustering result. Specifically, according to the clustering result, the average value of samples in each cluster can be used as the updated clustering center.
  • this clustering is the first clustering
  • the initial total distance matrix of each sample in the joint data set composed of the task initiator and at least two data parties relative to the optimal initial clustering center can be determined accordingly; based on the initial total distance matrix, an initial clustering result is generated, and the initial clustering result is sent to each data party to instruct each data party to calculate the average value of each cluster sample based on the initial clustering result, and update the locally stored clustering center.
  • the task initiator can calculate the distance between the cluster center of this update and the cluster center of the previous iteration, and each data party calculates the distance between the cluster center of this update and the cluster center of the previous iteration.
  • the sum of the locally calculated distance and the distance calculated by the data party is determined, that is, the total distance between the cluster center of this update corresponding to the joint data set and the cluster center of the previous iteration is determined.
  • the task initiator can directly obtain the total distance determined by the task initiator and each data party, but does not know the specific data of the distance determined by the data party. The same is true for the data party. In this way, the data privacy security of each participant can be effectively guaranteed.
  • S105 Determine whether the preset iteration termination condition is met based on the total distance between the cluster center of the current update and the cluster center of the previous iteration corresponding to the joint data set. If so, determine the last iteration termination condition.
  • the cluster center after iterative update is the final cluster center.
  • the iterative update process described in S102-S104 above is continued until the preset iteration termination condition is met.
  • the task initiator interacts with at least two data parties based on a preset clustering algorithm and a preset encryption algorithm to determine an optimal initial clustering center and at least two initial clusters; in the process of iteratively updating the optimal initial clustering center and the initial cluster, the distance from each sample in the joint data set composed of the task initiator and the at least two data parties to each cluster is determined, and based on the preset encryption algorithm, the total distance of each sample in the joint data set relative to the clustering center of this time is obtained, and the clustering result of this time is generated according to the total distance; the clustering result of this time is sent to at least two data parties to indicate that each data party shall select the optimal initial clustering center and the initial cluster according to the optimal initial clustering center and the initial cluster according to the optimal initial clustering center.
  • the locally stored cluster center is updated, and the distance between the cluster center of this update and the cluster center of the previous iteration is calculated; based on the preset encryption algorithm, the distance between the cluster center of this update and the cluster center of the previous iteration calculated by each data party is obtained, and the total distance between the cluster center of this update corresponding to the joint data set and the cluster center of the previous iteration is determined; according to the total distance between the cluster center of this update corresponding to the joint data set and the cluster center of the previous iteration, it is determined whether the preset iteration termination condition is met, and if so, the cluster center after the last iteration update is determined as the final cluster center.
  • Cluster learning is achieved through interaction between the task initiator and the data party, which realizes decentralization and avoids the participation of a third party. It ensures that the data is not leaked by a third party. Further combined with the preset encryption algorithm, it can effectively guarantee the privacy security of the task initiator and the data party and prevent malicious behavior of the participants.
  • the preset kmeans++ algorithm is used to obtain the idea of initializing the cluster center and the preset encryption algorithm is used to determine the optimal initial cluster center, which can improve the learning efficiency of federated clustering. At the same time, in each iteration, the total distance of each sample relative to the cluster center is calculated and the cluster center is updated, thereby avoiding local optimality.
  • FIG2 is a flow chart of a decentralized federated clustering learning method provided in Example 2 of the present application. Based on the above embodiment, this embodiment provides a task initiator A and a data provider B. Interact with C to implement an example of federated clustering learning, as shown in Figure 2.
  • the method includes:
  • Task initiator A interacts with data parties B and C to determine the optimal initial clustering center.
  • the initial number of clusters is k, and the sample numbers of the determined k optimal initial clustering centers are sent to data parties B and C.
  • the optimal initial cluster center sample ID can be stored in id_list, which refers to a list storing the ID (unique code, Identity document) of the initialized cluster center sample.
  • k can be a value greater than or equal to 2.
  • the task initiator A can randomly obtain a sample as a cluster center and send the sample number to the data parties B and C.
  • S1 may include the following steps:
  • A, B, and C calculate the sum variance SSE of the sample to the cluster to obtain dist_A, dist_B, and dist_C.
  • the SSEs calculated for the samples of the three parties will not be leaked.
  • A calculates the total SSE based on dist_total, selects a sample id with the maximum distance from dist_total as the second cluster center c2, and sends c2 to B and C.
  • A, B, and C obtain the cluster center according to id_list, and calculate the SSE of the sample to the cluster center in id_list respectively to obtain the updated dist_A, dist_B, and dist_C.
  • Task initiator A obtains the inflection point of the curve as the number of cluster centers based on the curve, determines the optimal initial cluster center and at least two initial clusters, and sends them to data parties B and C.
  • task initiator A, data party B and data party C respectively calculate the Euclidean distance from the samples in the local data set to the optimal initial clustering center based on the optimal initial clustering center.
  • Task initiator A uses a verifiable secret sharing algorithm to calculate the total Euclidean distance from each sample to the cluster center.
  • Task initiator A calculates the clustering result based on the total Euclidean distance and sends it to data party B and data party C.
  • Task initiator A, data party B and data party C update the cluster center point based on the clustering result, calculate the distance between the updated cluster center point and the previous center point, and use a verifiable secret sharing algorithm to calculate the total distance between A, B and C.
  • Task initiator A, data party B and data party C obtain the latest clustering results based on the final clustering center.
  • the technical solution of the embodiment of the present application provides a feasible implementation method for the task initiator A to interact with the data parties B and C to realize federated clustering learning, eliminates the third party, and facilitates end-to-end deployment.
  • the Kmeans++ clustering method is used to determine the initialization distance center, which can avoid falling into the local optimum during the algorithm training process.
  • verifiable secret sharing and joint addition to calculate the SSE of multi-party fusion data to each cluster and the distance between clusters, data privacy and intermediate parameters in the training process can be effectively protected.
  • Figure 3 is a structural block diagram of a decentralized federated clustering learning device provided in Example 3 of the present application; a decentralized federated clustering learning device provided in an embodiment of the present application can execute the decentralized federated clustering learning method provided in any embodiment of the present application, and has functional modules and beneficial effects corresponding to the execution method, and the device can be configured in the task initiator.
  • the device comprises:
  • the initial determination module 301 is used to interact with at least two data parties based on a preset clustering algorithm and a preset encryption algorithm to determine the optimal initial clustering center and at least two initial clusters; the samples in the local data set of the data party have the same number as the samples in the local data set of the task initiator, but different sample features;
  • the generation module 302 is used to determine the distance from each sample in the joint data set composed of the task initiator and at least two data parties to each cluster during the iterative update of the optimal initial cluster center and the initial cluster, and obtain the total distance of each sample in the joint data set relative to the current cluster center based on a preset encryption algorithm, and generate the current clustering result according to the total distance;
  • a sending module 303 is used to send the clustering result to at least two data parties, and to instruct each data party to update the locally stored cluster center according to the clustering result, and calculate the distance between the cluster center updated this time and the cluster center of the previous iteration;
  • the determination module 304 is used to obtain the distance between the cluster center of the current update and the cluster center of the previous iteration calculated by each data party based on a preset encryption algorithm, and determine the update time of each sample in the joint data set. The total distance between the new cluster center and the cluster center of the previous iteration;
  • the judgment module 305 is used to determine whether the preset iteration termination condition is met based on the total distance between the cluster center updated for each sample in the joint data set and the cluster center of the last iteration. If so, the cluster center after the last iteration update is determined as the final cluster center.
  • the task initiator interacts with at least two data parties based on a preset clustering algorithm and a preset encryption algorithm to determine an optimal initial clustering center and at least two initial clusters; in the process of iteratively updating the optimal initial clustering center and the initial cluster, the distance from each sample in the joint data set composed of the task initiator and the at least two data parties to each cluster is determined, and based on the preset encryption algorithm, the total distance of each sample in the joint data set relative to the clustering center of this time is obtained, and the clustering result of this time is generated according to the total distance; the clustering result of this time is sent to at least two data parties to indicate that each data party should select the optimal initial clustering center and the initial cluster according to the optimal initial clustering center and the initial cluster.
  • the locally stored cluster center is updated, and the distance between the cluster center updated this time and the cluster center of the previous iteration is calculated; based on the preset encryption algorithm, the distance between the cluster center updated this time and the cluster center of the previous iteration calculated by each data party is obtained, and the total distance between the cluster center updated for each sample in the joint data set and the cluster center of the previous iteration is determined; according to the total distance between the cluster center updated for each sample in the joint data set and the cluster center of the previous iteration, it is determined whether the preset iteration termination condition is met. If so, the cluster center after the last iteration update is determined as the final cluster center.
  • Cluster learning is realized through the interaction between the task initiator and the data party, which can avoid the participation of a third party and ensure that the data is not leaked by a third party. Further combined with the preset encryption algorithm, the privacy security of the task initiator and the data party can be effectively guaranteed. In addition, the learning efficiency of federated clustering can be improved by determining the optimal initial cluster center through the interaction between the two parties.
  • the initial determination module 301 may include:
  • a sending unit configured to randomly obtain a sample number as a target number based on a preset clustering algorithm, and send the target number to at least two data parties, to instruct each data party to use the target sample corresponding to the target number as the first cluster center and calculate the distance between each sample and the first cluster center;
  • a calculation unit configured to obtain the total distance from each sample in the joint data set to the first cluster center based on a preset encryption algorithm, select the sample with the maximum distance as the second cluster center, calculate the total distance from all samples in the joint data set to the second cluster center and determine the third cluster center;
  • a judgment unit configured to interact with at least two data cubes based on the third cluster center to perform iterative update, and determine that the iteration is terminated if a preset number of clusters are detected;
  • the determination unit is used to determine the optimal initial cluster center and at least two initial clusters according to the total distance of the determined preset number of clusters.
  • the determination unit is specifically used for:
  • the sum of squares of the distances of each cluster is plotted with the cluster as the independent variable. is the curve of the dependent variable;
  • the cluster corresponding to the inflection point in the curve is determined as the initial cluster, and the optimal initial clustering center is determined based on the cluster center of each initial cluster.
  • the generating module 302 is specifically used for:
  • An initial clustering result is generated according to the initial total distance matrix, and the initial clustering result is sent to each data party to instruct each data party to calculate the average value of each cluster sample according to the initial clustering result and update the locally stored cluster center.
  • the determination module 305 is specifically used for:
  • the cluster center after the last iteration update is determined as the final cluster center.
  • Fig. 4 is a schematic diagram of the structure of an electronic device provided in Embodiment 4 of the present application.
  • Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that can be used to implement an embodiment of the present application.
  • the electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • the electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices (such as helmets, glasses, watches, etc.) and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present application described and/or required herein.
  • the electronic device 10 includes at least one processor 11, and a memory connected to the at least one processor 11, such as a read-only memory (ROM) 12, a random access memory (RAM) 13, etc., wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can be based on Various appropriate actions and processes are performed according to a computer program stored in a read-only memory (ROM) 12 or a computer program loaded from a storage unit 18 into a random access memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored.
  • the processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14.
  • An input/output (I/O) interface 15 is also connected to the bus 14.
  • a number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16, such as a keyboard, a mouse, etc.; an output unit 17, such as various types of displays, speakers, etc.; a storage unit 18, such as a disk, an optical disk, etc.; and a communication unit 19, such as a network card, a modem, a wireless communication transceiver, etc.
  • the communication unit 19 allows the electronic device 10 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • the processor 11 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the processor 11 executes the various methods and processes described above, such as a decentralized federated clustering learning method.
  • the decentralized federated clustering learning method may be implemented as a computer program, which is tangibly contained in a computer-readable storage medium, such as a storage unit 18.
  • part or all of the computer program may be loaded and/or installed on the electronic device 10 via the ROM 12 and/or the communication unit 19.
  • the processor 11 may be configured to perform the decentralized federated clustering learning method in any other appropriate manner (e.g., by means of firmware).
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chips
  • CPLDs load programmable logic devices
  • Various implementations can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • a programmable processor which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • the computer programs for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer or other programmable data processing device, so that when the computer program is executed by the processor, the functions/operations specified in the flowchart and/or block diagram are implemented.
  • the computer program may be executed entirely on the machine, partially on the computer, or on the processor.
  • the software may execute partly on the remote machine, partly on the remote machine as a stand-alone software package, or entirely on a remote machine or server.
  • a computer-readable storage medium may be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or equipment, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be a machine-readable signal medium.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • the systems and techniques described herein may be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and a pointing device (e.g., a mouse or trackball) through which the user can provide input to the electronic device.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or trackball
  • Other types of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form (including acoustic input, voice input, or tactile input).
  • the systems and techniques described herein may be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes frontend components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components.
  • the components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a local area network (LAN), a wide area network (WAN), a blockchain network, and the Internet.
  • a computing system may include a client and a server.
  • the client and the server are generally remote from each other and usually interact through a communication network.
  • the client and server relationship is generated by computer programs running on the corresponding computers and having a client-server relationship with each other.
  • the server may be a cloud server, also known as a cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the defects of difficult management and weak business scalability in traditional physical hosts and VPS services.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种去中心化的联邦聚类学习方法、装置、设备及介质。该方法包括:由任务发起方执行,基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇;在对最优初始聚类中心和初始簇迭代更新的过程中,确定联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,根据总距离,与至少两个数据方交互,确定三方的总距离;根据总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心。可以在保证任务发起方和数据方隐私安全的前提下,实现高效的联合聚类学习。

Description

一种去中心化的联邦聚类学习方法、装置、设备及介质
本申请要求在2022年10月18日提交中国专利局、申请号为202211274810.7的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及联邦学习领域,尤其涉及一种去中心化的联邦聚类学习方法、装置、设备及介质。
背景技术
纵向联邦学习是任务发起方和数据方对拥有的不同特征的共同样本进行数据挖掘,对融合数据集进行聚类分析的联邦学习技术,已应用在保护数据隐私安全场景。例如,常见的应用场景为银行方与提供不同特征样本的数据方进行纵向联邦学习,在保护数据隐私前提下,实现数据分析和融合。
如何提高纵向联邦聚类学习的效率,在保证数据安全的前提下联合对数据进行聚类分析,是亟待解决的问题。
发明内容
本申请提供了一种去中心化的联邦聚类学习方法、装置、设备及介质,可以在保证任务发起方和数据方隐私安全的前提下,实现高效的联合聚类学习。
根据本申请的一方面,提供了一种去中心化的联邦聚类学习方法,由任务发起方执行,包括:
基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇;所述数据方本地数据集中的样本与任务发起方本地数据集中的样本编号相同,样本特征不同;
在对最优初始聚类中心和初始簇迭代更新的过程中,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,并根据所述总距离,生成本次聚类结果;
将所述本次聚类结果向至少两个数据方发送,用于指示各数据方根据本次聚类结果,更新本地存储的聚类中心,并计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离;
基于预设加密算法,获取各数据方计算的本次更新的聚类中心与上一次迭代的聚类中心之间的距离,确定联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离;
根据所述联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心。
根据本申请的另一方面,提供了一种去中心化的联邦聚类学习装置,所述装置配置于任务发起方中,包括:
初始确定模块,用于基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇;所述数据方本地数据集中的样本与任务发起方本地数据集中的样本编号相同,样本特征不同;
生成模块,用于在对最优初始聚类中心和初始簇迭代更新的过程中,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,并根据所述总距离,生成本次聚类结果;
发送模块,用于将所述本次聚类结果向至少两个数据方发送,用于指示各数据方根据本次聚类结果,更新本地存储的聚类中心,并计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离;
确定模块,用于基于预设加密算法,获取各数据方计算的本次更新的聚类中心与上一次迭代的聚类中心之间的距离,确定联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离;
判断模块,用于根据所述联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心。
根据本申请的另一方面,提供了一种电子设备,所述电子设备包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行本申请任一实施例所述的去中心化的联邦聚类学习方法。
根据本申请的另一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使处理器执行时实现本申 请任一实施例所述的去中心化的联邦聚类学习方法。
本申请实施例的技术方案,任务发起方基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇;在对最优初始聚类中心和初始簇迭代更新的过程中,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,并根据总距离,生成本次聚类结果;将本次聚类结果向至少两个数据方发送,用于指示各数据方根据本次聚类结果,更新本地存储的聚类中心,并计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离;基于预设加密算法,获取各数据方计算的本次更新的聚类中心与上一次迭代的聚类中心之间的距离,确定联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离;根据联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心。通过任务发起方和数据方两方交互实现聚类学习,去除中心化,可以避免第三方的参与,保证数据不被第三方泄露,进一步结合预设的加密算法,可以有效保证任务发起方和数据方的数据隐私安全,另外,通过两方交互来确定最优初始聚类中心,可以提高联邦聚类的学习效率。
附图说明
图1是本申请实施例一提供的一种去中心化的联邦聚类学习方法的流程图;
图2是本申请实施例二提供的一种去中心化的联邦聚类学习方法的流程示意图;
图3是本申请实施例三提供的一种去中心化的联邦聚类学习装置的结构框图;
图4是本申请实施例四提供的电子设备的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“目标”、“候选”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实 施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
在相关技术中,初始化的聚类中心是随机获取的,这样的方式容易导致聚类结果陷入局部最优;另外,任务发起方和数据方往往部署于客户端,由部署于服务器的第三方参与实现任务发起方和数据方的聚类学习,这样服务器和客户端的部署方式,有较大的安全隐患;进一步的,任务发起方与数据方之间的交互安全性没有保障,不能抗击针对任务发起方和数据方的恶意攻击。本申请的技术方案考虑到上述问题,通过使得任务发起方和数据方交互,从而进行聚类,避免了第三方的参与,实现了去中心化;通过使得任务发起方和数据方交互,从而确定最优的初始聚类中心,可以有效提高聚类学习的效率;通过利用隐私计算技术,使得在任务发起方和数据方基于预设加密算法进行交互,可以有效保障各方的数据隐私安全。本申请提出的一种去中心化、适用于端到端的部署方式的、抗参与方恶意攻击的多方纵向联邦学习方案,具体的实现过程将在后续实施例详细介绍。
实施例一
图1是本申请实施例一提供的一种去中心化的联邦聚类学习方法的流程图,本实施例适用于在保证所有参与者数据隐私安全的前提下,任务发起方与数据方交互实现联邦聚类学习的情况,该方法可以由去中心化的联邦聚类学习装置来执行,该装置可以采用软件和/或硬件的方式实现,并可集成于具有去中心化的联邦聚类学习功能的电子设备中。如图1所示,该方法包括:
S101、基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇。
其中,预设聚类算法可以是k-means++聚类算法(k-means++clustering algorithm)算法。预设加密算法可以是可验证秘密共享算法(Verifiable Secret Share,VSS)。任务发起方是联邦学习任务的启动执行方。数据方是指提供联邦学习任务所需私有数据的参与方。数据方本地数据集中的样本与任务发起方本地数据集中的样本编号相同,样本特征不同。最优初始聚类中心是指任务发起方和数据方交互确定的初始聚类中心。最优初始聚类中心的数量为至少两个。初始簇是与初始聚类中心关联的簇。一个最优初始聚类中心对应一个初始簇。
可选的,基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇,包括:基于预设聚类算法,随机获取一个样本的编号以作为目标编号,将目标编号发送给至少两个数据方,用于 指示各数据方将目标编号对应的目标样本作为第一个簇中心并计算各样本到第一簇中心的距离;基于预设加密算法,获取联合数据集中的各样本到第一个簇中心的总距离,选择最大距离的样本作为第二个簇中心,计算联合数据集中的所有样本到第二个簇中心的总距离,并确定第三个簇中心;基于第三个簇中心,与至少两个数据方交互,进行迭代更新,若检测到确定出预设个数的簇,则确定迭代终止;根据确定的预设个数的簇的总距离,确定最优初始聚类中心和至少两个初始簇。
可选的,基于第三个簇中心,与至少两个数据方交互,进行迭代更新,包括:基于第三个簇中心,执行与上述确定第一个簇中心和确定第二个簇中心之后相似的操作,即基于预设加密算法,获取联合数据集中的各样本到该簇中心的总距离,若该总距离为n行1列的矩阵,则将矩阵中最大值项对应的样本作为新的簇中心,其中n为任务发起方本地数据集中的样本数,k为聚类中心数,若总距离为n行k列的矩阵,则将矩阵中行方向最小值列方向最大值项对应的样本作为新的簇中心,其中k的取值范围为从2到n。
可选的,重复执行基于预设加密算法,获取联合数据集中的各样本到新增簇中心的总距离并确定新的簇中心的过程,直到确定出预设个数的簇(如10个),此时确定迭代终止。
可选的,确定的预设个数的每个簇都有对应的总距离,可以选择总距离最小的预设个数的簇作为最优簇,对应的最优簇中心点也找到,即确定最优初始聚类中心和至少两个初始簇。
可选的,根据确定的预设个数的簇的总距离,确定最优初始聚类中心和至少两个初始簇,包括:根据对应预设个数的簇的总距离,绘制以簇为自变量,各簇的距离平方和为因变量的曲线;确定曲线中拐点位置对应的簇为初始簇,并根据各初始簇的簇中心,确定最优初始聚类中心。
其中,簇的距离平方和是指联合数据集中的各样本相对于对应簇中心的误差平方和(The sum of squares due to error,SSE)。当簇大于拐点的簇时曲线趋于平缓,当簇小于拐点的簇时曲线急速下降。
S102、在对最优初始聚类中心和初始簇迭代更新的过程中,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,并根据总距离,生成本次聚类结果。
其中,本次聚类中心是指迭代过程中的任一次迭代更新过程中确定的聚类中心。总距离可以表征联合数据集中的各样本相对于本次聚类中心的距离。
可选的,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,包括:根据样本相对于簇中心的和方差矩阵,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离;基于预设加密算法,与至少两个数据方进行交互,根据样本相对于簇中心的总距离矩阵,确定联合数据集中的各样本相对于本次聚类中心的总距离。
可选的,确定总距离之后,可以根据总距离,确定联合数据集中的各样本最接近的聚类中心,将各样本划分到其最接近的聚类中心所属的簇,即根据总距离,生成本次聚类结果。
S103、将本次聚类结果向至少两个数据方发送,用于指示各数据方根据本次聚类结果,更新本地存储的聚类中心,并计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离。
可选的,各数据方接收到任务发起方发送的本次聚类结果之后,可以根据聚类结果,将本地存储的上一次的聚类中心进行更新,具体的,可以根据该聚类结果,将每个簇中样本的平均值,作为更新后的聚类中心。
可选的,若本次聚类为初次聚类,则相应的,可以确定任务发起方和至少两个数据方组成的联合数据集中的各样本相对于最优初始聚类中心的初始总距离矩阵;根据初始总距离矩阵,生成初始聚类结果,将初始聚类结果发送给各数据方,用于指示各数据方根据初始聚类结果,计算各个簇样本的平均值,对本地存储的聚类中心进行更新。
S104、基于预设加密算法,获取各数据方计算的本次更新的聚类中心与上一次迭代的聚类中心之间的距离,确定联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离。
可选的,任务发起方可以计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离,各数据方计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离,最后基于预设加密算法,确定本地计算的距离与数据方计算的距离的和,即确定联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离。
需要说明的是,通过预设加密算法,任务发起方可以直接获取任务发起方和各数据方确定的总距离,但并不知道数据方确定的距离的具体数据,对于数据方也是同理,通过这样的方式,可以有效保障各参与方数据隐私的安全。
S105、根据联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次 迭代更新后的聚类中心为最终聚类中心。
可选的,根据联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心,包括:根据联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,若检测到该总距离小于预设距离阈值,或迭代次数大于预设的最大迭代次数,则确定满足预设的迭代终止条件;确定最后一次迭代更新后的聚类中心为最终聚类中心。
可选的,若根据总距离,确定不满足预设的迭代终止条件,则继续上述S102-S104所述的迭代更新过程,直到满足预设的迭代终止条件为止。
本申请实施例的技术方案,任务发起方基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇;在对最优初始聚类中心和初始簇迭代更新的过程中,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,并根据总距离,生成本次聚类结果;将本次聚类结果向至少两个数据方发送,用于指示各数据方根据本次聚类结果,更新本地存储的聚类中心,并计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离;基于预设加密算法,获取各数据方计算的本次更新的聚类中心与上一次迭代的聚类中心之间的距离,确定联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离;根据联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心。
通过任务发起方和数据方两方交互实现聚类学习,实现去中心化,可以避免第三方的参与,保证数据不被第三方泄露,进一步结合预设的加密算法,可以有效保证任务发起方和数据方的隐私安全,防止参与方恶意的行为,另外,通过预设的kmeans++算法获取初始化聚类中心思想和预设的加密算法来确定最优初始聚类中心,可以提高联邦聚类的学习效率,同时,在每次迭代过程中通过计算各样本相对于本次聚类中心的总距离,并更新聚类中心,从而可以避免局部最优。
实施例二
图2是本申请实施例二提供的一种去中心化的联邦聚类学习方法的流程示意图,本实施例在上述实施例的基础上,给出了一种任务发起方A,与数据方B 和C交互,实现联邦聚类学习的实例,如图2所示,该方法包括:
S1、任务发起方A与数据方B、C交互,确定最优初始聚类中心,初始簇数为k,将确定的k个最优初始聚类中心样本编号发送至数据方B、C。
其中,最优初始聚类中心样本编号可以存储在id_list中,id_list是指存储有初始化聚类中心样本ID(唯一编码,Identity document)的列表。k可以为大于等于2的数值。
可选的,任务发起方A可以随机获取一个样本,作为一个簇中心,并将样本编号发送给数据方B、C。
具体的,S1可以包括如下步骤:
S1.1、任务发起方A随机获取一个样本id=i作为一个簇中心,并将样本id=i发送给数据方B、C。A、B、C分别计算样本到簇的和方差SSE得到dist_A、dist_B、dist_C。
S1.2、使用VSS联合加法计算每个样本到簇的总距离dist_total。
示例性的,id=i的样本对应在dist_A、dist_B、dist_C中的SSE分别为ai、bi、ci,通过使用VSS联合加法,在A方可以获取ri=ai+bi+ci的结果,ri是第i个样本的总距离,同时三方各自样本计算的SSE不会泄露。
S1.3、A根据dist_total计算总SSE,在dist_total选一个最大距离的样本id作为第二个簇中心点c2,并将c2发送给B、C,此时id_list=[c1,c2]。示例性的,k=1时,总SSE=r1+r2+…+rn。
S1.4、A、B、C根据id_list获取簇中心,分别计算样本到id_list中簇中心的SSE,得到更新的dist_A、dist_B、dist_C。
S1.5、根据更新的dist_A、dist_B、dist_C,使用可验证的秘密分享联合加法计算更新的每个样本到簇的总距离dist_total,对该更新的dist_total先在每个行方向取最小值,然后在列方向上取最大值对应id的样本作为新增的簇中心,计算更新的总距离dist_total对应的总SSE,循环执行上述步骤S1.4和S1.5,直到确定出预设个数的簇。绘制以簇为自变量,各簇的总SSE为因变量的曲线。
S1.6:任务发起方A根据曲线,获取曲线拐点作为聚类中心数,确定最优初始聚类中心和至少两个初始簇,将其发送给数据方B、C。
S2、任务发起方A、数据方B和数据方C分别根据最优初始聚类中心,计算本地数据集中的样本到最优初始聚类中心的欧式距离。
S3、任务发起方A使用可验证的秘密分享算法,计算每个样本到聚类中心的总欧式距离。
S4、任务发起方A根据总欧式距离计算聚类结果,并发送给数据方B和数据方C;任务发起方A、数据方B和数据方C根据聚类结果更新聚类中心点,计算更新的聚类中心点与上一次中心点的距离,使用可验证的秘密分享算法,计算A、B、C的总距离。
S5、判断是否收敛:如果总距离小于10E-6,或者迭代次数大于设置的最大迭代次数,则执行S6,若否,则执行S2。
S6、任务发起方A、数据方B和数据方C根据最终聚类中心获取最新的聚类结果。
本申请实施例的技术方案,给出了任务发起方A,与数据方B和C交互,实现联邦聚类学习的一种可实施方式,去除了第三方,易于端对端的部署,另外在确定初始化距离中心时使用了Kmeans++聚类方法,可以避免算法训练过程中陷入局部最优,另外,通过使用可验证的秘密分享联合加法计算多方融合数据到各簇的SSE、簇间的距离,可以有效保护数据隐私和训练过程中的中间参数。
实施例三
图3是本申请实施例三提供的一种去中心化的联邦聚类学习装置的结构框图;本申请实施例所提供的一种去中心化的联邦聚类学习装置可执行本申请任一实施例所提供的去中心化的联邦聚类学习方法,具备执行方法相应的功能模块和有益效果,该装置可以配置于任务发起方中。
如图3所示,该装置包括:
初始确定模块301,用于基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇;所述数据方本地数据集中的样本与任务发起方本地数据集中的样本编号相同,样本特征不同;
生成模块302,用于在对最优初始聚类中心和初始簇迭代更新的过程中,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,并根据所述总距离,生成本次聚类结果;
发送模块303,用于将所述本次聚类结果向至少两个数据方发送,用于指示各数据方根据本次聚类结果,更新本地存储的聚类中心,并计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离;
确定模块304,用于基于预设加密算法,获取各数据方计算的本次更新的聚类中心与上一次迭代的聚类中心之间的距离,确定联合数据集中的各样本次更 新的聚类中心与上一次迭代的聚类中心之间的总距离;
判断模块305,用于根据所述联合数据集中的各样本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心。
本申请实施例的技术方案,任务发起方基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇;在对最优初始聚类中心和初始簇迭代更新的过程中,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,并根据总距离,生成本次聚类结果;将本次聚类结果向至少两个数据方发送,用于指示各数据方根据本次聚类结果,更新本地存储的聚类中心,并计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离;基于预设加密算法,获取各数据方计算的本次更新的聚类中心与上一次迭代的聚类中心之间的距离,确定联合数据集中的各样本次更新的聚类中心与上一次迭代的聚类中心之间的总距离;根据联合数据集中的各样本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心。通过任务发起方和数据方两方交互实现聚类学习,可以避免第三方的参与,保证数据不被第三方泄露,进一步结合预设的加密算法,可以有效保证任务发起方和数据方的隐私安全,另外,通过两方交互来确定最优初始聚类中心,可以提高联邦聚类的学习效率。
进一步的,所述初始确定模块301可以包括:
发送单元,用于基于预设聚类算法,随机获取一个样本的编号以作为目标编号,将所述目标编号发送给至少两个数据方,用于指示各数据方将目标编号对应的目标样本作为第一个簇中心并计算各样本到所述第一簇中心的距离;
计算单元,用于基于预设加密算法,获取联合数据集中的各样本到第一个簇中心的总距离,选择最大距离的样本作为第二个簇中心,计算联合数据集中的所有样本到第二个簇中心的总距离并确定第三个簇中心;
判断单元,用于基于第三个簇中心,与至少两个数据方交互,进行迭代更新,若检测到确定出预设个数的簇,则确定迭代终止;
确定单元,用于根据确定的预设个数的簇的总距离,确定最优初始聚类中心和至少两个初始簇。
进一步的,确定单元具体用于:
根据对应预设个数的簇的总距离,绘制以簇为自变量,各簇的距离平方和 为因变量的曲线;
确定曲线中拐点位置对应的簇为初始簇,并根据各初始簇的簇中心,确定最优初始聚类中心。
进一步的,生成模块302具体用于:
根据样本相对于簇中心的和方差矩阵,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离;
基于预设加密算法,与至少两个数据方进行交互,根据样本相对于簇中心的距离矩阵,确定联合数据集中的各样本相对于本次聚类中心的总距离。
进一步的,上述装置还用于:
确定任务发起方和至少两个数据方组成的联合数据集中的各样本相对于最优初始聚类中心的初始总距离矩阵;
根据所述初始总距离矩阵,生成初始聚类结果,将初始聚类结果发送给各数据方,用于指示各数据方根据初始聚类结果,计算各个簇样本的平均值,对本地存储的聚类中心进行更新。
进一步的,判断模块305具体用于:
根据所述联合数据集中的各样本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,若检测到该总距离小于预设距离阈值,或迭代次数大于预设的最大迭代次数,则确定满足预设的迭代终止条件;
确定最后一次迭代更新后的聚类中心为最终聚类中心。
实施例四
图4是本申请实施例四提供的电子设备的结构示意图。图4示出了可以用来实施本申请的实施例的电子设备10的结构示意图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备(如头盔、眼镜、手表等)和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。
如图4所示,电子设备10包括至少一个处理器11,以及与至少一个处理器11通信连接的存储器,如只读存储器(ROM)12、随机访问存储器(RAM)13等,其中,存储器存储有可被至少一个处理器执行的计算机程序,处理器11可以根 据存储在只读存储器(ROM)12中的计算机程序或者从存储单元18加载到随机访问存储器(RAM)13中的计算机程序,来执行各种适当的动作和处理。在RAM 13中,还可存储电子设备10操作所需的各种程序和数据。处理器11、ROM 12以及RAM 13通过总线14彼此相连。输入/输出(I/O)接口15也连接至总线14。
电子设备10中的多个部件连接至I/O接口15,包括:输入单元16,例如键盘、鼠标等;输出单元17,例如各种类型的显示器、扬声器等;存储单元18,例如磁盘、光盘等;以及通信单元19,例如网卡、调制解调器、无线通信收发机等。通信单元19允许电子设备10通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
处理器11可以是各种具有处理和计算能力的通用和/或专用处理组件。处理器11的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的处理器、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。处理器11执行上文所描述的各个方法和处理,例如去中心化的联邦聚类学习方法。
在一些实施例中,去中心化的联邦聚类学习方法可被实现为计算机程序,其被有形地包含于计算机可读存储介质,例如存储单元18。在一些实施例中,计算机程序的部分或者全部可以经由ROM 12和/或通信单元19而被载入和/或安装到电子设备10上。当计算机程序加载到RAM 13并由处理器11执行时,可以执行上文描述的去中心化的联邦聚类学习方法的一个或多个步骤。备选地,在其他实施例中,处理器11可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行去中心化的联邦聚类学习方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本申请的方法的计算机程序可以采用一个或多个编程语言的任何组合来编写。这些计算机程序可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器,使得计算机程序当由处理器执行时使流程图和/或框图中所规定的功能/操作被实施。计算机程序可以完全在机器上执行、部分地在 机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本申请的上下文中,计算机可读存储介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的计算机程序。计算机可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。备选地,计算机可读存储介质可以是机器可读信号介质。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在电子设备上实施此处描述的系统和技术,该电子设备具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给电子设备。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)、区块链网络和互联网。
计算系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务中,存在的管理难度大,业务扩展性弱的缺陷。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请的技术方案所期望的结果,本文在此不 进行限制。

Claims (10)

  1. 一种去中心化的联邦聚类学习方法,由任务发起方执行,包括:
    基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇;数据方本地数据集中的样本与任务发起方本地数据集中的样本编号相同,样本特征不同;
    在对最优初始聚类中心和初始簇迭代更新的过程中,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,并根据所述总距离,生成本次聚类结果;
    将所述本次聚类结果向至少两个数据方发送,用于指示各数据方根据本次聚类结果,更新本地存储的聚类中心,并计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离;
    基于预设加密算法,获取各数据方计算的本次更新的聚类中心与上一次迭代的聚类中心之间的距离,确定联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离;
    根据所述联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心。
  2. 根据权利要求1所述的方法,其中,基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇,包括:
    基于预设聚类算法,随机获取一个样本的编号以作为目标编号,将所述目标编号发送给至少两个数据方,用于指示各数据方将目标编号对应的目标样本作为第一个簇中心并计算各样本到所述第一簇中心的距离;
    基于预设加密算法,获取联合数据集中的各样本到第一个簇中心的总距离,选择最大距离的样本作为第二个簇中心,计算联合数据集中的所有样本到第二个簇中心的总距离并确定第三个簇中心;
    基于第三个簇中心,与至少两个数据方交互,进行迭代更新,若检测到确定出预设个数的簇,则确定迭代终止;
    根据确定的预设个数的簇的总距离,确定最优初始聚类中心和至少两个初始簇。
  3. 根据权利要求2所述的方法,其中,根据确定的预设个数的簇的总距离,确定最优初始聚类中心和至少两个初始簇,包括:
    根据对应预设个数的簇的总距离,绘制以簇为自变量,各簇的距离平方和为因变量的曲线;
    确定曲线中拐点位置对应的簇为初始簇,并根据各初始簇的簇中心,确定最优初始聚类中心。
  4. 根据权利要求1所述的方法,其中,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,包括:
    根据样本相对于簇中心的和方差矩阵,确定任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离;
    基于预设加密算法,与至少两个数据方进行交互,根据样本相对于簇中心的距离矩阵,确定联合数据集中的各样本相对于本次聚类中心的总距离。
  5. 根据权利要求1所述的方法,还包括:
    确定任务发起方和至少两个数据方组成的联合数据集中的各样本相对于最优初始聚类中心的初始总距离矩阵;
    根据所述初始总距离矩阵,生成初始聚类结果,将初始聚类结果发送给各数据方,用于指示各数据方根据初始聚类结果,计算各个簇样本的平均值,对本地存储的聚类中心进行更新。
  6. 根据权利要求1所述的方法,其中,根据所述联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心,包括:
    根据所述联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,若检测到该总距离小于预设距离阈值,或迭代次数大于预设的最大迭代次数,则确定满足预设的迭代终止条件;
    确定最后一次迭代更新后的聚类中心为最终聚类中心。
  7. 一种去中心化的联邦聚类学习装置,其中,所述装置配置于任务发起方中,包括:
    初始确定模块,用于基于预设聚类算法和预设加密算法,与至少两个数据方进行交互,确定最优初始聚类中心和至少两个初始簇;所述数据方本地数据集中的样本与任务发起方本地数据集中的样本编号相同,样本特征不同;
    生成模块,用于在对最优初始聚类中心和初始簇迭代更新的过程中,确定 任务发起方和至少两个数据方组成的联合数据集中的各样本到各个簇的距离,并基于预设加密算法,获取联合数据集中的各样本相对于本次聚类中心的总距离,并根据所述总距离,生成本次聚类结果;
    发送模块,用于将所述本次聚类结果向至少两个数据方发送,用于指示各数据方根据本次聚类结果,更新本地存储的聚类中心,并计算本次更新的聚类中心与上一次迭代的聚类中心之间的距离;
    确定模块,用于基于预设加密算法,获取各数据方计算的本次更新的聚类中心与上一次迭代的聚类中心之间的距离,确定联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离;
    判断模块,用于根据所述联合数据集对应的本次更新的聚类中心与上一次迭代的聚类中心之间的总距离,确定是否满足预设的迭代终止条件,若是,则确定最后一次迭代更新后的聚类中心为最终聚类中心。
  8. 根据权利要求7所述的装置,其中,所述初始确定模块包括:
    发送单元,用于基于预设聚类算法,随机获取一个样本的编号以作为目标编号,将所述目标编号发送给至少两个数据方,用于指示各数据方将目标编号对应的目标样本作为第一个簇中心并计算各样本到所述第一簇中心的距离;
    计算单元,用于基于预设加密算法,获取联合数据集中的各样本到第一个簇中心的总距离,选择最大距离的样本作为第二个簇中心,计算联合数据集中的所有样本到第二个簇中心的总距离并确定第三个簇中心;
    判断单元,用于基于第三个簇中心,与至少两个数据方交互,进行迭代更新,若检测到确定出预设个数的簇,则确定迭代终止;
    确定单元,用于根据确定的预设个数的簇的总距离,确定最优初始聚类中心和至少两个初始簇。
  9. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的去中心化的联邦聚类学习方法。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使处理器执行时实现权利要求1-6中任一项所述的去中心 化的联邦聚类学习方法。
PCT/CN2023/079371 2022-10-18 2023-03-02 一种去中心化的联邦聚类学习方法、装置、设备及介质 WO2024082515A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211274810.7A CN115545215B (zh) 2022-10-18 2022-10-18 一种去中心化的联邦聚类学习方法、装置、设备及介质
CN202211274810.7 2022-10-18

Publications (1)

Publication Number Publication Date
WO2024082515A1 true WO2024082515A1 (zh) 2024-04-25

Family

ID=84734602

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/079371 WO2024082515A1 (zh) 2022-10-18 2023-03-02 一种去中心化的联邦聚类学习方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN115545215B (zh)
WO (1) WO2024082515A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115545215B (zh) * 2022-10-18 2023-10-27 上海零数众合信息科技有限公司 一种去中心化的联邦聚类学习方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231760A (zh) * 2020-11-20 2021-01-15 天翼电子商务有限公司 一种隐私保护的分布式纵向K-means聚类
CN113657525A (zh) * 2021-08-23 2021-11-16 同盾科技有限公司 基于KMeans的跨特征联邦聚类方法及相关设备
WO2021249500A1 (zh) * 2020-06-12 2021-12-16 支付宝(杭州)信息技术有限公司 针对多方的隐私数据进行聚类的方法和装置
CN114386071A (zh) * 2022-01-12 2022-04-22 平安科技(深圳)有限公司 去中心的联邦聚类方法、装置、电子设备及存储介质
CN115545215A (zh) * 2022-10-18 2022-12-30 上海零数众合信息科技有限公司 一种去中心化的联邦聚类学习方法、装置、设备及介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210174257A1 (en) * 2019-12-04 2021-06-10 Cerebri AI Inc. Federated machine-Learning platform leveraging engineered features based on statistical tests
CN112101579B (zh) * 2020-11-18 2021-02-09 杭州趣链科技有限公司 基于联邦学习的机器学习方法、电子装置和存储介质
CN113344220B (zh) * 2021-06-18 2022-11-11 山东大学 一种联邦学习中基于局部模型梯度的用户筛选方法、系统、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021249500A1 (zh) * 2020-06-12 2021-12-16 支付宝(杭州)信息技术有限公司 针对多方的隐私数据进行聚类的方法和装置
CN112231760A (zh) * 2020-11-20 2021-01-15 天翼电子商务有限公司 一种隐私保护的分布式纵向K-means聚类
CN113657525A (zh) * 2021-08-23 2021-11-16 同盾科技有限公司 基于KMeans的跨特征联邦聚类方法及相关设备
CN114386071A (zh) * 2022-01-12 2022-04-22 平安科技(深圳)有限公司 去中心的联邦聚类方法、装置、电子设备及存储介质
CN115545215A (zh) * 2022-10-18 2022-12-30 上海零数众合信息科技有限公司 一种去中心化的联邦聚类学习方法、装置、设备及介质

Also Published As

Publication number Publication date
CN115545215B (zh) 2023-10-27
CN115545215A (zh) 2022-12-30

Similar Documents

Publication Publication Date Title
WO2021120676A1 (zh) 联邦学习网络下的模型训练方法及其相关设备
EP3559891B1 (en) Executing multi-party transactions using smart contracts
WO2022262183A1 (zh) 联邦计算的处理方法、装置、电子设备和存储介质
AU2021204543B2 (en) Digital signature method, signature information verification method, related apparatus and electronic device
WO2019153491A1 (zh) 一种学生信息存储方法、可读存储介质、终端设备及装置
US20220131707A1 (en) Digital Signature Method, Signature Information Verification Method, Related Apparatus and Electronic Device
CN114186256B (zh) 神经网络模型的训练方法、装置、设备和存储介质
AU2022203072B2 (en) Node grouping method, apparatus and electronic device
WO2024082515A1 (zh) 一种去中心化的联邦聚类学习方法、装置、设备及介质
EP4195111A1 (en) Method and apparatus for training longitudinal federated learning model
EP3934168A2 (en) Group service implementation method and device, equipment and storage medium
EP4195084A1 (en) Method and device for adjusting model parameters, and storage medium and program product
US20220263663A1 (en) Digital Signature Method, Signature Information Authentication Method, And Relevant Electronic Devices
CN112615852A (zh) 数据的处理方法、相关装置及计算机程序产品
CN113037489A (zh) 数据处理方法、装置、设备和存储介质
US11734455B2 (en) Blockchain-based data processing method and apparatus, device, and storage medium
CN114186669B (zh) 神经网络模型的训练方法、装置、设备和存储介质
CN112737777A (zh) 基于密钥的门限签名和验签方法、装置、设备和介质
US20230419118A1 (en) Intelligent scaling factors for use with evolutionary strategies-based artificial intelligence (ai)
CN113824546B (zh) 用于生成信息的方法和装置
CN115664839A (zh) 隐私计算进程的安全监控方法、装置、设备、介质
WO2024098589A1 (zh) 事务监管方法、装置、电子设备及存储介质
CN117094417A (zh) 联邦学习的安全评估模型建立方法、装置、系统和介质
CN104426764A (zh) 用于双重ip地址恢复的方法和系统
CN114154978A (zh) 区块链上关于数字货币的密钥管理方法、交易方法及装置