CN114491615A - Asynchronous longitudinal federal learning fair incentive mechanism method based on block chain - Google Patents

Asynchronous longitudinal federal learning fair incentive mechanism method based on block chain Download PDF

Info

Publication number
CN114491615A
CN114491615A CN202111488605.6A CN202111488605A CN114491615A CN 114491615 A CN114491615 A CN 114491615A CN 202111488605 A CN202111488605 A CN 202111488605A CN 114491615 A CN114491615 A CN 114491615A
Authority
CN
China
Prior art keywords
participants
information
model
local
participant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111488605.6A
Other languages
Chinese (zh)
Inventor
张延楠
尚璇
张帅
李伟
蔡亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Qulian Technology Co Ltd
Original Assignee
Hangzhou Qulian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Qulian Technology Co Ltd filed Critical Hangzhou Qulian Technology Co Ltd
Priority to CN202111488605.6A priority Critical patent/CN114491615A/en
Publication of CN114491615A publication Critical patent/CN114491615A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Strategic Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an asynchronous longitudinal federal learning fair incentive mechanism method based on a block chain, which comprises the following steps: registering and applying a participant with an advertisement recommendation requirement, and deploying a local model for extracting embedded representation information of sample data by the participant according to a longitudinal federal learning task; the participator trains a local model by using local data and uploads embedded representation information to a block chain; the main participant collects complete embedded representation information meeting the preset number from the block chain according to the time stamp, then carries out embedded representation information aggregation, and updates a top model for advertisement recommendation by using the aggregated embedded representation; the validation committee validates the validity of the new block, broadcasts the validated new block, synchronously updates the book information in the block chain, and the participants download a new top model from the book information to perform the next round of local training; the validation committee scores the data quality contribution of the participants and assigns incentive values to the participants based on the contribution scores.

Description

Asynchronous longitudinal federal learning fair incentive mechanism method based on block chain
Technical Field
The invention belongs to the technical field of deep learning and privacy security, and particularly relates to an asynchronous longitudinal federal learning fair incentive mechanism method based on a block chain.
Background
In recent years, due to the large amount of data resources and abundant computing resources, deep learning has been widely applied in many fields, such as the fields of face recognition, automatic driving, machine translation, and the like, with excellent performance. Data privacy security is becoming a hot topic of concern for countries and regions, and numerous data security protection regulations are being enacted. Data islanding phenomenon among enterprises is increasingly aggravated due to the limitation of data privacy protection regulations, and therefore, exploring a machine learning technology for protecting privacy is becoming a focus of attention in academic circles and industrial circles. In order to solve the data island problem, distributed machine learning for protecting data privacy is provided, and the distributed machine learning is realized in a mode that data cannot go out of local and a model goes out of local.
The federal learning technology for protecting privacy is widely applied to real advertisement accurate delivery business. In the advertisement delivery service, according to the difference of data distribution of the participants, the federal learning is divided into horizontal federal learning, vertical federal learning and federal transfer learning. The horizontal federal learning is applicable to the scene that sample id spaces are different and feature spaces are the same among enterprises participating in advertisement putting, for example, a machine learning model is trained in cooperation among e-commerce with the same business from different regions. The longitudinal federated learning is suitable for scenes that sample id spaces are the same and feature spaces are different among enterprises participating in advertisement delivery, for example, two e-commerce platforms with different functions from the same region cooperate to train a machine learning model for accurate advertisement delivery. Federal transfer learning is applicable to scenarios where both the sample space and the feature space overlap less between enterprises. Due to the increasingly tight cooperation between enterprises from different domains (with the same feature space, different sample spaces), longitudinal federal learning is currently receiving a great deal of attention from both academic and industrial sectors.
The method is mainly characterized in that multiple enterprises jointly carry out longitudinal federal learning to achieve the purpose of accurate advertisement putting, and the process mainly comprises three steps: firstly, performing member alignment between participants (including the participants and a main participant) in an initialization process, wherein the participants use a local model to extract embedded features on a local data set; then sending the data to a cooperative party for aggregation and completing the remaining forward propagation by using a top model; and finally, according to the solved probability vector and the real label cross loss function of the main task, performing back propagation and updating the top model parameters and the local model parameters of the participants. The participant with the label in the enterprise participating in the joint training is defined as the main participant, and the participant with only the characteristic information is defined as the participant.
However, the existing longitudinal federal learning method applicable to advertisement accurate delivery business has 3 problems: 1. the algorithm in the longitudinal federal learning is synchronously designed, and the normal running of the advertisement putting service is difficult to maintain when part of equipment is down; 2. in the longitudinal federal learning, local models of participants are maintained locally by the participants, a test task is difficult to be initiated by a single participant in a test stage, and the service of advertisement putting is awakened and limited; 3. due to the lack of an incentive mechanism for the advertisement putting accuracy in the longitudinal federal learning method, the enthusiasm of the participants can influence the overall performance of the advertisement putting business.
The decentralized block chain technology provides a solution for solving the problem of accurate advertisement delivery based on longitudinal federal learning. The block chain technology is a distributed book, records data processing and query processes in a transparent mode, and has the characteristics of decentralization, traceability and non-tampering. Block chains can be classified into public chains, alliance chains, and private chains. The alliance chain adopts a mixed networking mechanism and has partial control right on nodes in the network. The alliance chain reserves the characteristics of partial transparency, openness, tamper resistance and the like of the public chain, has the characteristics of authority management, identity authentication and the like, is widely concerned, and mainly focuses on the application of the block chain in the aspects of data security, trusted authentication and the like.
In view of the wide application of the longitudinal federal learning in the field of precise advertisement delivery, the commercial cooperation among enterprises is involved, and the marginal calculation of the IOT device in the internet of things is involved, meanwhile, the privacy security of the algorithm needs to be strengthened and strengthened, and a set of incentive mechanism with clear reward and penalty needs to be designed to promote the participators in the longitudinal federal learning to provide higher-quality data for the joint training. Therefore, the asynchronous longitudinal federal learning method with the fair incentive mechanism is provided by combining the blockchain technology and has important value.
Disclosure of Invention
In view of the foregoing, the present invention provides an asynchronous longitudinal federal learning fair incentive mechanism method based on a block chain, so as to improve the accuracy of advertisement delivery.
In order to achieve the purpose of the invention, the technical scheme provided by the invention is as follows:
an asynchronous longitudinal federal learning fair incentive mechanism method based on a block chain comprises the following steps:
step 1, registering and applying for participants with advertisement recommendation requirements, deploying a longitudinal federal learning task by a task coordinator through a block chain, aligning data among the participants and deploying a local model for extracting embedded representation information of sample data according to the longitudinal federal learning task;
step 2, the participator trains a local model by using local data and uploads the embedded representation information of the sample data to a block chain;
step 3, selecting one of all participants as a main participant, performing embedded representation information aggregation after the main participant collects complete embedded representation information meeting the preset number from the block chain according to the time stamp, updating the gradient information of a top model and a local model for advertisement recommendation by using the aggregated embedded representation, and generating a new block for the gradient information of the local model to store;
step 4, the validation committee validates the validity of the new block, broadcasts the new block passing the validation, synchronously updates the book information in the block chain, and the participator downloads the gradient information of the local model from the book information for the next local training;
step 5, when the longitudinal federal learning task is finished, the participants and the main participants upload the local model and the top model to a new block for storage;
and 6, the verification committee scores the data quality contribution of the participants and assigns incentive values to the participants according to the contribution scores.
In one embodiment, when a participant registers for application, the size and the computing power condition of a data set of the participant and the network communication rate are used as registration information to be uploaded;
when a task coordinator deploys a longitudinal federated learning task through a block chain, the task coordinator distributes a public key and a private key of a signature, at the moment, a participant has a digital signature of an expenditure incentive value, and the task coordinator creates an established block in the block chain, wherein the established block contains longitudinal federated learning task information;
the method comprises the following steps that matching communication is carried out between participants and nearby computing nodes, 1 computing node simultaneously matches a plurality of participants with a longitudinal federated learning task, and the participants download longitudinal federated learning task information from a creation block through the computing nodes to carry out local deployment.
In one embodiment, the data secret alignment matching is completed between the participants through RSA encryption technology and a hash algorithm.
In one embodiment, the participant trains the deployed local model with local data, completes 1 round of forward propagation of sample data to obtain embedded representation information of the sample data, and uploads the embedded representation information with signatures, local resource running consumption, in the form of a blockchain transaction to the associated compute node.
In one embodiment, a computing node receives embedded representation information with a signature, verifies the legality of the signature and issues a transaction, a main participant monitors the transaction issued by collecting the participants through the computing node and acquires the embedded representation information issued by the participants, a monitoring mechanism of the main participant comprises a timestamp for monitoring the uploading transaction of the participants and the integrity of the embedded representation information, and when the main participant monitors that the first n participants finish uplink of the embedded representation information, the embedded representation information of the n participants is aggregated and spliced to form aggregated embedded representation information; at the same time, the listening counter is reset to 0, and the information counter is incremented by 1 once the new uplink is listened to.
In one embodiment, when the main participant updates the gradient information of the top model and the local model for advertisement recommendation by using the aggregated embedded representation, the aggregated embedded representation is updated and input into the top model, an output vector is obtained through calculation, the output vector is mapped by a classifier to output a prediction probability vector, the top model is updated by using the cross entropy of the prediction probability vector and a real label as a loss function, and simultaneously the gradient information of each local model is generated.
In one embodiment, the participant downloads the updated gradient information of the new top model from the book information, implements a back propagation algorithm in the local model, and updates the local model parameters using the updated gradient information to complete local training.
In one embodiment, the validation committee employs the following two scoring mechanisms for data quality contribution to the participants:
a mutual information evaluation scoring mechanism, which is used for calculating a mutual information value between the real label and the embedded representation information uploaded by the participant, wherein the mutual information value is used as a part of data quality contribution scoring;
an embedding missing verification scoring mechanism, which is used for assigning the embedding representation information of each participant to be a full 0 vector, splicing the full 0 vector with the embedding representation information of all other participants, inputting the spliced full 0 vector and the embedding representation information of all other participants to a top model to obtain a prediction result, calculating the accuracy of the participants according to the prediction result and a real label, scoring the participants in a mode of giving a higher score according to the lower accuracy, and taking the score as the other part of the data quality contribution score;
the sum of the data quality contribution scores of the two parts is taken as the total score of the participants.
In one embodiment, the scoring the participants in a manner that assigns a higher score depending on the lower accuracy includes:
the accuracy of each participant is ranked and then the score for each participant is calculated according to the following formula:
Figure RE-GDA0003527265500000061
where Score2 represents the Score and x represents the rank of the accuracy of the participant in order from low to high.
In one embodiment, the method further comprises: the verification reasoning of the top model obtained by the longitudinal federal learning of the participators comprises the following steps:
in the single-mode verification reasoning, a participant downloads a top model from a new block, local data is input into a local model to obtain embedded representation information, then the embedded representation information is input into the top model to obtain a prediction result of the top model, and the local model and the top model are verified and reasoned according to the prediction result;
in the multi-party mode verification reasoning, a plurality of participants download the top model from the new block at the same time, each participant inputs local data into a respective local model to obtain embedded representation information, then the respective embedded representation information is input into the top model to obtain a prediction result of the top model, and the local model and the top model are verified and reasoned according to the prediction result.
The technical conception of the invention is as follows: 1. the technical scheme includes that a blockchain account book storage technology is introduced, embedded representations of participants are uploaded to a block in real time, once the embedded representations of the participants collecting set thresholds are collected, local model updating and top model updating protocols of a current round are executed, the asynchronous processing mechanism reduces queuing waiting updating time of the participants and avoids training termination caused by equipment downtime. 2. After training the model is accomplished in vertical federal study, the participant links up the top model with local model and main participant and keeps, avoids the model information to be maliciously tampered, develops two kinds of test models, promptly: single-mode and multi-mode to facilitate participants in reasoning about the primary task labels using the trained models. 3. The incentive mechanism in longitudinal federal learning is assigned incentive values for review by the validation committee. And the excitation mechanism is combined with a mutual information evaluation mechanism and an embedding missing verification mechanism to measure the quality of the data of the participants. The mutual information evaluation utilizes the existing mutual information evaluation mechanism to score the data quality of the participants.
Compared with the prior art, the invention has the beneficial effects that at least:
when the asynchronous longitudinal federal learning method with the fair incentive mechanism is used for accurate advertisement recommendation, the account book storage technology of the block chain technology is introduced, and the elastic aggregation mechanism is provided, so that the longitudinal federal learning supports asynchronous updating, the calculation operation efficiency is improved, and the equipment downtime is effectively avoided. By chaining the trained local model and the trained top model, on one hand, malicious tampering or loss of model information is effectively avoided, and on the other hand, a participant can call the model in real time in an inference stage to test. To this end, the present invention develops two test modes: single mode and multi-party mode. In addition, the invention provides a mutual information evaluation mechanism and an embedding loss verification mechanism to effectively evaluate the data quality of the longitudinal federal learning participants, thereby promoting the fairness of the distribution mechanism, and improving the accuracy of advertisement putting when the finally obtained top model is used for advertisement recommendation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic diagram of an asynchronous longitudinal federal learning method with a fair incentive mechanism according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of asynchronous update of an asynchronous longitudinal federal learning method with a fair incentive mechanism according to an embodiment of the present invention.
Fig. 3 is a flowchart of an asynchronous vertical federal learning method with a fair incentive mechanism according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
In view of the wide application of longitudinal federal learning in realizing commercial scenes, particularly the existing advertisement accurate delivery service, but the existing advertisement accurate delivery service lacks a fair and effective incentive mechanism and cannot ensure the safety, the embodiment provides a block chain-based asynchronous longitudinal federal learning fair incentive mechanism method, so that the fairness and the reliability of participation in advertisement recommendation services among enterprises are promoted, and the advertisement delivery accuracy is improved. The method mainly comprises 3 technical solutions: 1. introducing a block chain decentralized technology to change a synchronous training mechanism of longitudinal federal learning and provide an elastic aggregation mechanism so as to support asynchronous updating of longitudinal federal learning; 2. introducing a block chain to chain the participator model, and providing a solution for a single user to call the model at any time in an inference stage; 3. the incentive mechanism combines a mutual information evaluation mechanism and an embedding missing verification mechanism to measure the quality of data of the participants, provides a fair incentive mechanism for longitudinal federal learning, and promotes fair distribution of the longitudinal federal learning participants.
Fig. 1 is a schematic diagram of an asynchronous longitudinal federal learning method with a fair incentive mechanism according to an embodiment of the present invention. Fig. 3 is a flowchart of an asynchronous longitudinal federated learning method with a fair incentive mechanism according to an embodiment of the present invention. As shown in fig. 1 and fig. 3, an embodiment of the method for block chain-based asynchronous longitudinal federal learning fair incentive mechanism includes the following steps:
1) and (5) an initialization phase.
The initialization stage comprises the steps that participants with the advertisement accurate recommendation requirements register and apply, the task coordinator distributes a public key and a private key, and the task coordinator creates local model structure parameter information and top model information. In addition, data alignment and model deployment work should be performed between the participants.
1.1) the participant applies for registration.
And the participator with the user information resource carries out identity registration and identity verification, and uploads the data set size, the calculation capacity condition and the network communication rate of the participator as registration information. One of all participants is selected as a main participant, generally a task initiator is used as the main participant, and the main participant needs to perform not only the training of the local model but also the training of the top model.
1.2) the task coordinator distributes the public and private keys of the signature and creates the foundational block.
Task coordinators in a blockchain are often served by a trusted authority, such as a government department of credit. The task coordinator assigns the public and private keys of the signature when the participant has a digital signature that disburses the incentive value. Creation of a foundational Block by a task coordinator, including the local model Structure f of the participantslocal(. The) and the principal participant's top model structure ftop(. The) and its initialization parameter information [ theta ]12,…,θmt}; training the number of rounds E; initialization stimulus value C1,C2,…,Cm,Ct}; learning rate η of model learning.
1.3) the participants are matched with the computing nodes.
The participators perform matching communication with nearby computing nodes, the computing nodes are usually corresponding to edge nodes and have certain computing resources and communication resources, and 1 computing node can be matched with a plurality of participators to participate in longitudinal federal learning training.
1.4) the participants deploy local models.
Participants download local model f from Chuangshi blocks through computing nodeslocal(. o) and initializing the local model flocalParameter of (·) { theta ]12,…,θmWith this local model deployed to the participant's device, the participant's device has certain computing and communication resources.
1.5) data secret alignment match between the participants.
The data secret alignment between the participants is completed based on RSA encryption technology and hash algorithm.
2) And a participant local model training and uploading stage.
2.1) participant local model forward propagation.
Party i utilizes local data XiModel f after deploymenti(. to) complete 1 round of forward propagation and obtain the embedded representation information EiIn which Ei=fi(Xi)。
2.2) Participation will embed representation EiLocal resource operation consumption TconThe digital signature is uploaded to the associated compute node in a blockchain transaction.
3) And a forward propagation stage of the participator top model.
3.1) the compute node receives the Embedded representation E with the signatureiThereafter, the validity of the signature is verified and the transaction is issued.
3.2) the main participant collects the transactions issued by the computing nodes to realize soft aggregation.
The main participant monitors the affairs issued by the collecting participant through the computing node and obtains the embedded representation information issued by the participant. The main participant monitoring mechanism comprises a timestamp for monitoring the uploading transaction of the participants and the integrity of the embedded representation, when the main participant monitors that the previous n participants finish the embedded uplink, the embedded representations of the n participants are aggregated and spliced to form an initial embedded representation Einit. At the same time, the listening counter is reset to 0, and once the new uplink embedded counter is heard is incremented by 1.
Fig. 2 is a schematic diagram of asynchronous update of an asynchronous longitudinal federal learning method with a fair incentive mechanism according to an embodiment of the present invention. As shown in fig. 2, since the participants monitor the previous n participants to complete the embedded uplink by monitoring the timestamps of the transaction uploaded by the participants and the integrity of the embedded representation at the main participant, the embedded representations of the n participants are aggregated and spliced, and the process realizes asynchronous aggregation update of the embedded representation of the local model of the same round of training of the multiple participants.
3.3) the main participant utilizes the top model to complete the forward propagation.
Initial embedded representation E generated by the principal participant with local aggregationinitInput top model ftop(. to complete the forward propagation process to obtain an output vector LinitAnd obtaining a final model prediction probability vector L through 1 Softmax layerpredict. Wherein L isinit=ftop(Einit),
Figure RE-GDA0003527265500000101
Wherein z isjAnd representing the prediction result of the jth class mark model.
3.4) prediction vector L obtained by the principal participant using the top modelpredictAnd a genuine label LrightCalculating a cross entropy loss function Lutility
Figure RE-GDA0003527265500000111
Wherein, yi,jIs a real tag, y'i,jIs a predictive tag.
And 3.5) the main participant operates a back propagation algorithm on the top model, updates the top model, generates gradient information of the local model, and uploads the gradient information to form a new block.
The main participant runs a back propagation algorithm, and the gradient g of the top model is solved by using a main task loss functiont. Wherein,
Figure RE-GDA0003527265500000112
subsequently, the primary participant utilizes the topPartial model gradient update parameter information θtThe updated parameter information may be represented as θt=θt-η·gt
3.6) the validation committee validates the validity of the new block, broadcasts the validated new block, and synchronously updates the ledger information in the block chain.
3.7) the participant loads the transaction by the computing node and updates the local model.
After the participant i monitors the transaction updated by the main participant, the update gradient information g is downloadedtObtaining gradient information g by implementing a back propagation algorithm in the local modeliAnd updating the local model parameters thetai. Wherein
Figure RE-GDA0003527265500000113
Local model update parameter θi=θi-η·gi
3.8) proceed from there until the model converges to complete the training of longitudinal federal learning.
Training of the local and top models is typically done in accordance with a default training round for the participant enrollment phase.
4) And a longitudinal federal learning model uploading stage.
4.1) the training-completed participant uploads the local model to the node to form a transaction.
And the main participant and the participants upload the models which are respectively trained through the matched computing nodes. The issued transaction includes the complete model structure and model parameter information.
4.2) the main participant collects the transaction and uploads the top model to the block chain to form a new block.
And 4.3) verifying the validity of the new block by a verification committee, broadcasting the new block passing the verification, wherein the block is responsible for recording a longitudinal federal learning model completing training and updating the ledger information of the block chain.
5) The mechanism of actuation.
And the participants who finish the longitudinal federal learning training score the data quality contribution degree of the participants by the verification committee, and assign incentive values to the participants according to the score of the contribution degree. The scoring mechanism consists of two parts: mutual information evaluation mechanism and embedding missing verification mechanism.
And 5.1) scoring by a mutual information evaluation mechanism.
The validation Committee calculated the authentic tag Y and uploaded the embedded representation E from the participants using a mutual information testing tool on a sample-by-sample basisiThe mutual information values between them, forming a contribution Score 1. A larger mutual information value indicates that the embedding and the real label correspondence of the participant are more compact, which indicates that the data of the participant has a higher contribution degree.
Score1=H(Y|Ei),
Wherein, H (x | y) represents the joint cross entropy of the variables x and y, and Score1 represents the Score obtained by the mutual information evaluation mechanism.
5.2) the mechanism of embedded missing verification scores.
And (4) combining the accuracy of the prediction output of the local model and the top model based on the random embedding missing mode to judge the data contribution degree of the participant. And for each participant i, assigning the embedded representation information of the participant i into a full 0 vector, splicing the full 0 vector with the embedded representation information of all other participants, inputting the spliced full 0 vector to a top model to obtain a prediction result, calculating the accuracy of the participant i according to the prediction result and a real label, and reading the ID information and the current accuracy of the participant i of the full 0 vector into a dictionary for storage.
The dictionaries are sorted from small to large according to accuracy, and the value with the highest score is given when the accuracy is lowest.
Figure RE-GDA0003527265500000131
Where Score2 represents the Score and x represents the number of orders that the participant orders in the dictionary from low to high in accuracy.
And 5.3) comprehensively scoring and distributing incentive values.
Contribution Score of participant iiExpressed as: scoreiScore1+ Score2, the validation committee assigned the incentive value according to the Score.
6) And (4) reasoning phase of longitudinal federal learning.
6.1) selecting participation mode: single mode, multi-party mode.
Firstly, selecting a participation mode according to the number of participants in a participation inference tag stage, and when the number of the participants in an inference stage is only 1, selecting a single mode by the participants; when the number of participants in the inference phase is greater than 1, the participants select the multi-party mode.
6.2) in the single mode, the participators download the trained top model from the new block, each participator inputs local data into the local model to obtain embedded representation information, then the embedded representation information is input into the top model to obtain a prediction result of the top model, and the local model and the top model are verified and reasoned according to the prediction result.
In the multi-party mode, a plurality of participants download the top model from the new block at the same time, each participant inputs local data into the respective local model to obtain embedded representation information, then the respective embedded representation information is input into the top model to obtain a prediction result of the top model, and the local model and the top model are verified and inferred according to the prediction result.
In single mode, participant j has a test sample characteristic XjDownloading test model information from the block chain, inputting the test sample characteristics into the test model to obtain a prediction label Pj. This process can be expressed as:
Pj=concat(f1(Xj),…,fm-n(Xj),ft(Xj)),
wherein concat (. cndot.) represents direct splicing in a certain dimension, XjTest sample data characteristics for the participating parties, f1,…,fnLocal model representing the participants, ftopA top model representing the principal participant.
In the advertisement recommendation service, a top model constructed by the block chain-based asynchronous longitudinal federal learning fair incentive mechanism method is used for advertisement recommendation, and local data sets of advertisement recommendation service participants are partial advertisement media information data sets, such as Criteo and Avazu data sets. In a longitudinal federated learning system for advertisement recommendation service, a local model usually adopts 5 layers of fully-connected network layers, each layer of network layer is provided with a nonlinear activation function ReLU, the input dimension of the local model corresponds to the characteristic dimension of a data set of a participant, and the output dimension is usually set to 64; the top model usually adopts 2 layers of nonlinear fully-connected network layers, the input dimension of the top model is 64 x n, wherein n is the number of participants, and the output dimension corresponds to the category number of the data set label. In the training process of the actual scene of the advertisement inference service, the participants finish forward propagation in the local advertisement media data set by using the local model to output embedded characteristics, the main participants finish forward propagation in the top model after aggregating the embedded characteristics, and then perform backward propagation after calculating a loss function and finish updating model parameters. In the process, the model information forming transaction is issued in the block chain. In the test process of the actual scene of the advertisement inference service, two test models exist as described above: single mode and multi-party mode. The single mode is that a certain participant downloads the model for testing, and the multi-party mode is that a plurality of participants download the model together for testing.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. An asynchronous longitudinal federal learning fair incentive mechanism method based on a block chain is characterized by comprising the following steps:
step 1, registering and applying for participants with advertisement recommendation requirements, deploying a longitudinal federal learning task by a task coordinator through a block chain, aligning data among the participants and deploying a local model for extracting embedded representation information of sample data according to the longitudinal federal learning task;
step 2, the participator trains a local model by using local data and uploads the embedded representation information of the sample data to a block chain;
step 3, selecting one of all participants as a main participant, performing embedded representation information aggregation after the main participant collects complete embedded representation information meeting the preset number from the block chain according to the time stamp, updating the gradient information of a top model and a local model for advertisement recommendation by using the aggregated embedded representation, and generating a new block for the gradient information of the local model to store;
step 4, the validation committee validates the validity of the new block, broadcasts the new block passing the validation, synchronously updates the book information in the block chain, and the participator downloads the gradient information of the local model from the book information for the next local training;
step 5, when the longitudinal federal learning task is finished, the participants and the main participants upload the local model and the top model to a new block for storage;
and 6, the verification committee scores the data quality contribution of the participants and assigns incentive values to the participants according to the contribution scores.
2. The block chain-based asynchronous longitudinal federal learning fair incentive mechanism method as claimed in claim 1, wherein in step 1, when a participant registers for application, the size of its own data set, the calculation power condition and the network communication rate are uploaded as registration information;
when a task coordinator deploys a longitudinal federated learning task through a block chain, the task coordinator distributes a public key and a private key of a signature, at the moment, a participant has a digital signature of an expenditure incentive value, and the task coordinator creates an established block in the block chain, wherein the established block contains longitudinal federated learning task information;
the participators and nearby computing nodes carry out matching communication, 1 computing node simultaneously matches a plurality of participators to participate in the longitudinal federal learning task, and the participators download longitudinal federal learning task information from the innovation block through the computing nodes to carry out local deployment.
3. The block chain-based asynchronous longitudinal federal learning fair incentive method as claimed in claim 1, wherein in step 1, the participants complete data secret alignment matching by RSA-based encryption technology and hash algorithm.
4. The block chain-based asynchronous longitudinal federal learning fair incentive method of claim 1, wherein in step 2, the participating party trains the deployed local model by using local data, completes 1 round of forward propagation of sample data to obtain the embedded representation information of the sample data, and uploads the embedded representation information with signature, the local resource operation consumption and the block chain transaction form to the associated computing node.
5. The block chain-based asynchronous longitudinal federal learning fair incentive mechanism method of claim 1, wherein in step 3, the computing node receives embedded representation information with a signature, verifies the validity of the signature and issues transactions, the main participant monitors transactions issued by the participants collected by the computing node, and obtains the embedded representation information issued by the participants, the monitoring mechanism of the main participant comprises monitoring the timestamp of the transaction uploaded by the participants and the integrity of the embedded representation information, when the main participant monitors that the previous n participants finish uplink of the embedded representation information, the embedded representation information of the n participants is aggregated and spliced to form aggregated embedded representation information; at the same time, the listening counter is reset to 0, and the information counter is incremented by 1 once the new uplink is listened to.
6. The block chain-based asynchronous longitudinal federal learning fair incentive method as claimed in claim 1, wherein in step 3, when the main participant updates the gradient information of the top model and the local model for advertisement recommendation by using the aggregated embedded representation, the aggregated embedded representation is updated and input into the top model, an output vector is obtained through calculation, the output vector is mapped by a classifier to output a prediction probability vector, the top model is updated by using the cross entropy of the prediction probability vector and the real label as a loss function, and simultaneously the gradient information of each local model is generated.
7. The block chain-based asynchronous longitudinal federal learning fair incentive mechanism method as claimed in claim 1, wherein in step 4, the participating parties download updated gradient information of the new top model from the book information, implement a back propagation algorithm in the local model, and update local model parameters by using the updated gradient information to complete local training.
8. The block chain-based asynchronous longitudinal federal learning fair incentive method in accordance with claim 1, wherein in step 5, the following two scoring mechanisms are adopted for the data quality contribution degree of the validation committee to the participants:
a mutual information evaluation scoring mechanism, which is used for calculating a mutual information value between the real label and the embedded representation information uploaded by the participant, wherein the mutual information value is used as a part of data quality contribution scoring;
an embedding missing verification scoring mechanism, which is used for assigning the embedding representation information of each participant to be a full 0 vector, splicing the full 0 vector with the embedding representation information of all other participants, inputting the spliced full 0 vector and the embedding representation information of all other participants to a top model to obtain a prediction result, calculating the accuracy of the participants according to the prediction result and a real label, scoring the participants in a mode of giving a higher score according to the lower accuracy, and taking the score as the other part of the data quality contribution score;
the sum of the data quality contribution scores of the two parts is taken as the total score of the participants.
9. The block chain-based asynchronous longitudinal federal learning fair incentive scheme method of claim 8, wherein said scoring participants by assigning higher scores with lower accuracy comprises:
the accuracy of each participant is ranked and then the score for each participant is calculated according to the following formula:
Figure RE-FDA0003527265490000041
where Score2 represents the Score and x represents the rank of the accuracy of the participant in order from low to high.
10. The block chain based asynchronous longitudinal federal learning fair incentive scheme method of claim 1, further comprising: the verification reasoning of the top model obtained by the longitudinal federal learning of the participators comprises the following steps:
in the single-mode verification reasoning, a participant downloads a top model from a new block, local data is input into a local model to obtain embedded representation information, then the embedded representation information is input into the top model to obtain a prediction result of the top model, and the local model and the top model are verified and reasoned according to the prediction result;
in the multi-party mode verification reasoning, a plurality of participants download the top model from the new block at the same time, each participant inputs local data into a respective local model to obtain embedded representation information, then the respective embedded representation information is input into the top model to obtain a prediction result of the top model, and the local model and the top model are verified and reasoned according to the prediction result.
CN202111488605.6A 2021-12-08 2021-12-08 Asynchronous longitudinal federal learning fair incentive mechanism method based on block chain Pending CN114491615A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111488605.6A CN114491615A (en) 2021-12-08 2021-12-08 Asynchronous longitudinal federal learning fair incentive mechanism method based on block chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111488605.6A CN114491615A (en) 2021-12-08 2021-12-08 Asynchronous longitudinal federal learning fair incentive mechanism method based on block chain

Publications (1)

Publication Number Publication Date
CN114491615A true CN114491615A (en) 2022-05-13

Family

ID=81492250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111488605.6A Pending CN114491615A (en) 2021-12-08 2021-12-08 Asynchronous longitudinal federal learning fair incentive mechanism method based on block chain

Country Status (1)

Country Link
CN (1) CN114491615A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114819197A (en) * 2022-06-27 2022-07-29 杭州同花顺数据开发有限公司 Block chain alliance-based federal learning method, system, device and storage medium
CN115345317A (en) * 2022-08-05 2022-11-15 北京交通大学 Fair reward distribution method based on fairness theory and oriented to federal learning
CN115660114A (en) * 2022-11-11 2023-01-31 湖北文理学院 Asynchronous federal learning architecture system and method based on block chain

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114819197A (en) * 2022-06-27 2022-07-29 杭州同花顺数据开发有限公司 Block chain alliance-based federal learning method, system, device and storage medium
CN114819197B (en) * 2022-06-27 2023-07-04 杭州同花顺数据开发有限公司 Federal learning method, system, device and storage medium based on blockchain alliance
CN115345317A (en) * 2022-08-05 2022-11-15 北京交通大学 Fair reward distribution method based on fairness theory and oriented to federal learning
CN115660114A (en) * 2022-11-11 2023-01-31 湖北文理学院 Asynchronous federal learning architecture system and method based on block chain

Similar Documents

Publication Publication Date Title
CN110460600B (en) Joint deep learning method capable of resisting generation of counterattack network attacks
CN114491615A (en) Asynchronous longitudinal federal learning fair incentive mechanism method based on block chain
CN108876599B (en) Poverty relief loan management system
CN112540926B (en) Federal learning method for fair resource allocation based on blockchain
CN111125779A (en) Block chain-based federal learning method and device
Sun et al. RC-chain: Reputation-based crowdsourcing blockchain for vehicular networks
CN109165944A (en) Multiple party signatures authentication method, device, equipment and storage medium based on block chain
CN113434269A (en) Block chain-based distributed privacy calculation method and device
CN109299943A (en) A kind of method and device of the intellectual property transaction based on block chain
CN113704810B (en) Federal learning-oriented cross-chain consensus method and system
CN112784994A (en) Block chain-based federated learning data participant contribution value calculation and excitation method
CN111899023B (en) Block chain-based crowd-sourced method and system for crowd-sourced machine learning security through crowd sensing
CN112069550B (en) Electronic contract evidence-storing system based on intelligent contract mode
CN114462624B (en) Method for developing trusted federation learning based on blockchain
CN111583039A (en) Safe interaction method, incentive method and transaction system for manager-free blockchain transaction
CN111988137A (en) DPoS (dual port service) consensus method and system based on threshold signature and fair reward
CN114091953B (en) Credibility evaluation method and system based on heterogeneous blockchain
CN109858270A (en) A kind of construction method and system of decentralization digital identity
CN108769256A (en) A kind of method for allocating tasks and system based on block chain
Khezr et al. An edge intelligent blockchain-based reputation system for IIoT data ecosystem
CN110868286A (en) Method for generating random number based on block chain intelligent contract
CN115952532A (en) Privacy protection method based on federation chain federal learning
CN110322378A (en) Conversion method, device, computer equipment and the storage medium of learning outcome
CN112862303A (en) Crowdsourcing quality evaluation system and method based on block chain
CN111966976A (en) Anonymous investigation method based on zero knowledge proof and block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination