CN111160572B

CN111160572B - Multi-label-based federal learning method, device and system

Info

Publication number: CN111160572B
Application number: CN202010251416.6A
Authority: CN
Inventors: 陆梦倩; 汲小溪; 王维强
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2020-07-17
Anticipated expiration: 2040-04-01
Also published as: CN111160572A; CN111723943A; CN111723943B

Abstract

The embodiment of the specification provides a multi-label-based federal learning method, a multi-label-based federal learning device and a multi-label-based federal learning system, wherein the method comprises the following steps: when a plurality of organizations learn the federation, the trusted execution environment may acquire a plurality of tag data sets provided by the plurality of organizations, wherein any tag data set includes original tags of a plurality of users, and at least one of the plurality of users has inconsistency of the plurality of original tags in the plurality of tag data sets; after a plurality of label data groups are obtained, learning training is carried out on the plurality of label data groups by using a preset weak supervision learning algorithm to obtain a uniform target label data group, wherein the target label data group comprises target labels of a plurality of users; sending the target tag data set to the plurality of organizations for federal learning by the plurality of organizations based on the target tag data set.

Description

Multi-label-based federal learning method, device and system

Technical Field

The document relates to the field of federal learning, in particular to a multi-label-based federal learning method, a multi-label-based federal learning device and a multi-label-based federal learning system.

Background

Federated learning (also called joint learning and league learning) is a machine learning framework, and can effectively help a plurality of organizations to perform data use and machine learning modeling under the condition of meeting the requirements of user privacy protection and data safety.

In general, when multiple organizations are performing federal learning, learning training can be performed based on the label data and feature data of the samples, wherein the label data of the samples can be provided by the multiple organizations. However, due to different business scenarios or definitions of tags of a plurality of organizations, the tag data provided by the plurality of organizations are usually inconsistent, so that when performing federal learning, it cannot be determined which organization's tag data is subjected to learning training.

Disclosure of Invention

The embodiment of the specification provides a multi-label-based federal learning method, a multi-label-based federal learning device and a multi-label-based federal learning system, which are used for solving the problem that in federal learning, under the condition that label data provided by a plurality of organizations are inconsistent, learning training cannot be determined based on the label data of the organizations.

In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:

in a first aspect, a multi-tag-based federated learning method is provided, which is applied to a trusted execution environment and includes:

acquiring a plurality of label data sets provided by a plurality of organizations, wherein the label data sets comprise original labels of a plurality of users, and at least one user in the plurality of users has a plurality of inconsistent original labels in the plurality of label data sets;

learning and training the plurality of label data groups by using a preset weak supervision learning algorithm to obtain a target label data group, wherein the target label data group comprises target labels of the plurality of users;

and sending the target tag data group to the plurality of organizations, and carrying out federal learning by the plurality of organizations based on the target tag data group.

In a second aspect, a multi-tag-based federated learning apparatus is provided, which is applied to a trusted execution environment, and includes:

an acquisition unit that acquires a plurality of tag data sets provided by a plurality of organizations, the tag data sets including original tags of a plurality of users, at least one of the plurality of users having a plurality of original tags in the plurality of tag data sets that are inconsistent;

the processing unit is used for learning and training the plurality of label data groups by utilizing a preset weak supervision learning algorithm to obtain a target label data group, and the target label data group comprises target labels of the plurality of users;

and the sending unit is used for sending the target tag data group to the plurality of organizations, and the plurality of organizations carry out federal learning based on the target tag data group.

In a third aspect, an electronic device is provided, which includes:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the steps of:

In a fourth aspect, a computer-readable storage medium is presented, the computer-readable storage medium storing one or more programs which, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the steps of:

In a fifth aspect, a multi-label-based federal learning method is provided, which is applied to an organization, and includes:

acquiring a tag data group including original tags of a plurality of users;

sending the tag data group to a trusted execution environment, and performing learning training on the tag data group and other tag data groups from other organizations by the trusted execution environment by using a preset weak supervision learning algorithm to obtain a target tag data group, wherein at least one user in the multiple users has inconsistency of multiple original tags in the multiple tag data groups provided by the multiple organizations, and the target tag data group comprises target tags of the multiple users;

receiving a portion of the target tag data in the set of target tag data returned by the trusted execution environment;

performing federated learning based on the portion of the target tag data.

In a sixth aspect, a multi-label-based federal learning device is provided, which is applied to an organization, and includes:

an acquisition unit that acquires a tag data group including original tags of a plurality of users;

the sending unit is used for sending the tag data group to a trusted execution environment, the trusted execution environment performs learning training on the tag data group and other tag data groups from other organizations by using a preset weak supervision learning algorithm to obtain a target tag data group, at least one of the plurality of users has inconsistency of a plurality of original tags in the plurality of tag data groups provided by the plurality of organizations, and the target tag data group comprises target tags of the plurality of users;

a receiving unit, configured to receive a part of target tag data in the target tag data set returned by the trusted execution environment;

and the federal learning unit is used for performing federal learning based on the part of target tag data.

In a seventh aspect, an electronic device is provided, which includes:

a processor; and

acquiring a tag data group including original tags of a plurality of users;

performing federated learning based on the portion of the target tag data.

In an eighth aspect, a computer-readable storage medium is provided, which stores one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the steps of:

acquiring a tag data group including original tags of a plurality of users;

performing federated learning based on the portion of the target tag data.

In a ninth aspect, a multi-tag based federated learning system is presented, comprising a trusted execution environment and a plurality of institutions, wherein:

a plurality of mechanisms acquire a plurality of tag data sets, wherein the tag data sets comprise original tags of a plurality of users, and at least one user in the plurality of users has inconsistency of the original tags in the tag data sets; sending the plurality of tag data sets to the trusted execution environment;

the trusted execution environment performs learning training on the plurality of label data sets by using a preset weak supervised learning algorithm to obtain a target label data set, wherein the target label data set comprises target labels of the plurality of users; sending the target tag data set to the plurality of agencies;

the plurality of mechanisms receive the target tag data set returned by the trusted execution environment, and any mechanism receives part of the target tag data in the target tag data set; and performing federal learning based on the received part of the target tag data.

At least one technical scheme adopted by one or more embodiments of the specification can achieve the following technical effects:

in the technical solution provided by one or more embodiments of the present specification, in federal learning, under the condition that tag data provided by multiple organizations are inconsistent, a trusted execution environment may acquire multiple tag data sets provided by the multiple organizations, where any tag data set includes original tags of multiple users, and then the trusted execution environment may perform learning training on the multiple tag data sets based on a weak supervised learning algorithm to obtain a uniform target tag data set, where the target tag data set includes target tags of the multiple users. Therefore, when the plurality of organizations perform federal learning, the federal learning can be performed based on the unified target tag data set, and the problem that learning training cannot be performed based on the tag data of which organization cannot be determined under the condition that the tag data provided by the plurality of organizations are inconsistent is effectively solved.

In addition, because the credibility and the security level of the credible execution environment are high, the credible execution environment acquires a plurality of label data sets provided by a plurality of organizations and performs learning training, so that the privacy and the security of the plurality of label data sets provided by the plurality of organizations can be ensured.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative efforts.

FIG. 1 is a schematic diagram of an exemplary system architecture provided by an embodiment of the present disclosure;

FIG. 2 is a flow diagram of a multi-label based federated learning method of one embodiment of the present description;

FIG. 3 is a flow diagram of a multi-label based federated learning method of one embodiment of the present description;

FIG. 4 is a scenario diagram of a multi-label based federated learning approach of one embodiment of the present description;

FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present description;

FIG. 6 is a schematic structural diagram of a multi-tag based federated learning device in one embodiment of the present description;

FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present description;

FIG. 8 is a schematic structural diagram of a multi-tag based federated learning device in one embodiment of the present description;

fig. 9 is a structural schematic diagram of a multi-label based federated learning system in one embodiment of the present description.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person skilled in the art without making creative efforts based on the embodiments in the present description shall fall within the protection scope of this document.

In a scenario of federal learning, when a plurality of organizations perform federal learning, it is generally required to provide tag data of a plurality of users and perform learning training based on the tag data of the plurality of users. However, the tag data provided by multiple organizations are not always completely consistent, and the reason for the inconsistency may include multiple aspects, for example, the definitions of tags by multiple organizations are slightly different due to different business scenarios of the multiple organizations; some organizations cannot collect the black labels completely, so that the same user belongs to the blacklist user in the organization A and does not belong to the blacklist user in the organization B; some organizations provide data tags that are of poor quality, dirty data is present, etc. In the case where the tag data provided by a plurality of organizations are inconsistent, it may result in a failure of the plurality of organizations to determine which organization's tag data is subject to federal learning.

In order to solve the above problem, embodiments of the present specification provide a multi-label-based federated learning method, an apparatus, and a system, where when a plurality of organizations perform federated learning, a trusted execution environment may obtain a plurality of label data sets provided by the plurality of organizations, and perform learning training on the plurality of label data sets based on a weak supervision learning algorithm to obtain a unified target label data set. Therefore, a plurality of organizations can carry out federal learning based on a unified target tag data set, and the problem that learning training based on tag data of which organization cannot be determined under the condition that the tag data provided by the plurality of organizations are inconsistent is effectively solved.

A possible application scenario of the technical solution provided in the embodiment of the present specification is described below with reference to fig. 1.

As shown in fig. 1, a system architecture provided in the embodiment of the present specification includes: the trusted execution environment 11, the mechanisms 12, 13, … …, and the mechanism 1N (N is an integer greater than 2, and may be determined according to the actual number of mechanisms). The plurality of organizations can be connected with the trusted execution environment 11 through a network to perform data interaction, the trusted execution environment 11 has high credibility and security level, and can ensure privacy and security of data of the plurality of organizations, the plurality of organizations can also be connected through a network (not shown in the figure), any organization can provide personal data and tag data of a plurality of users, but the plurality of organizations do not perform interaction of the personal data and the tag data, so as to protect privacy.

In the application scenario shown in fig. 1, the trusted execution environment 11 and the entity 12, the entity 13, … …, and the entity 1N may be used as an execution subject of a multi-tag-based federated learning method provided in the embodiments of the present specification. The trusted execution environment 11 may be constructed from a hardware level by using an SGX technique, and executes the flow of the multi-tag-based federal learning method in a security environment (enclave) generated by the trusted execution environment 11, and optionally, the trusted execution environment 11 may be constructed by any one of the organization 12, the organization 13, the … …, and the organization 1N, or may be constructed separately from the organization 12, the organization 13, the … …, and the organization 1N, which is not limited specifically herein.

It should be noted that the tag data described in this specification may be understood as data formed by a user identifier of a user and a tag, where the user identifier may be a user name used when the user registers in an organization, may also be a mobile phone number of the user, and may also be an identity number of the user, and the tag of the user may have a tag value, and the tag value may represent the tag of the user, for example, in a risk identification scenario, if the tag value of the user is 0, the tag of the user may be represented as a white list, and if the tag value of the user is 1, the tag of the user may be represented as a black list.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

FIG. 2 is a flow diagram of a multi-tag based federated learning method that may be performed by the trusted execution environment 11 shown in FIG. 1 according to one embodiment of the present description. The method is as follows.

S202: the method comprises the steps of obtaining a plurality of label data sets provided by a plurality of organizations, wherein the label data sets comprise original labels of a plurality of users, and at least one user in the plurality of users has a plurality of original labels in the plurality of label data sets.

In S202, when the plurality of organizations perform federal learning, a plurality of users for federal learning, which may be users common to the plurality of organizations, may be determined. Upon determining a plurality of users, a plurality of organizations may generate or collect a plurality of tag data sets for the plurality of users.

In this embodiment, a mechanism may generate or collect a tag data set, where the tag data set may include original tags of multiple users, and at least one of the multiple users may have multiple original tags in the multiple tag data sets that are inconsistent.

Note that, since the target of federal learning by a plurality of institutions is uniform, the meanings of a plurality of label data sets generated or collected by a plurality of institutions should be uniform. For example, if the objective of federal learning is to obtain an objective model for risk identification, the meaning of the plurality of tag data sets may be whether the plurality of users belong to risk users or normal users.

In one possible implementation, if an organization does not define an original tag for a certain user or cannot generate or collect the original tag of the user due to insufficient data, the original tag of the user may be regarded as empty in the tag array data provided by the organization.

It should be further noted that, the plurality of original tags in the plurality of tag data groups of the same user are not consistent, specifically, N original tags in N tag data groups of the same user are different from original tags in other tag data groups of the same user, where N is an integer greater than 0 and smaller than M, and M is the total number of the plurality of tag data groups.

After providing the plurality of tag data sets, the multi-set mechanism may send the plurality of tag data sets to the trusted execution environment, and the trusted execution environment may obtain the plurality of tag data sets. In order to ensure the security of data transmission, when a plurality of mechanisms send a plurality of tag data sets, the plurality of mechanisms may encrypt the plurality of tag data sets and send the plurality of encrypted tag data sets to the trusted execution environment. The encryption method may be a public key and private key encryption method, or may be another encryption method.

S204: and performing learning training on the plurality of label data sets by using a preset weak supervised learning algorithm to obtain a target label data set, wherein the target label data set comprises target labels of the plurality of users.

In S204, after the trusted execution environment obtains the plurality of tag data sets, the weak supervised learning algorithm may be used to perform learning training on the plurality of tag data sets, so as to obtain one target tag data set including target tags of the plurality of users.

In this embodiment of the specification, the trusted execution environment may perform learning training on a plurality of tag data sets by using a snorkel weak supervised learning framework, and the specific implementation manner is as follows:

first, a first characteristic variable, a second characteristic variable and a third characteristic variable are determined according to a plurality of tag data sets and an initial target tag data set.

The initial target tag data set may be defined by the trusted execution environment, and may specifically include initial target tags of a plurality of users. It should be understood that the initial target tag data set is not the final target tag data set, and the initial target tag data set is defined herein to facilitate obtaining the first, second, and third characteristic variables described above to solve for the final target tag data set.

The first characteristic variable may represent whether an original tag exists in any tag data group of any user among a plurality of users, and may be specifically represented by the following formula:

；

where L is a plurality of tag data sets, Y is the initial target tag data set,L _i,jrepresenting the original tag of user i in the tag data set provided by organization j. If it isL _i,jIf not empty, then

Is 1, ifL _i,jIs emptyThen, then

Is 0.

The second feature variable may represent whether an object tag of any one of the multiple users (an initial object tag in an initial object tag data set) is consistent with an original tag of the user in any tag data set, and may be specifically represented by the following formula:

；

wherein L, Y andL _i,jhas the same meaning as in the first characteristic variable described above,y _ithe initial target tag in the initial tag data set is for user i. If it isL _i,jAndy _iwhen they are consistent, then

Is 1, if not, then

Is 0.

The third characteristic variable may represent whether the original tags of any one of the multiple users in any two tag data sets are consistent, and may be specifically represented by the following formula:

；

wherein L, Y andL _i,jhas the same meaning as in the first characteristic variable described above,L _i,krepresents the original label of user i in institution k, C represents the institution (j, k) pair between multiple institutions, and j is not equal to k. If the user is at the original label in any two label data groupsL _i,jAndL _i,kwhen they are consistent, then

Is 1, if not, then

Is 0.

And secondly, obtaining a generating model and an objective function based on the first characteristic variable, the second characteristic variable and the third characteristic variable.

The generative model may be used to solve the final target tag data set according to the target parameters, and may be specifically represented by the following formula:

，

wherein w is a target parameter, Z_w ^-1Is a probability normalization coefficient.

The objective parameter w is an unknown quantity, and the objective function can be used for solving the objective parameter w according to the objective tag data set. The objective function may be represented by the following formula:

。

and finally, solving a final target label data group according to the generated model and the target function.

Specifically, the target tag data set may be fixed, the target tag data set is made to be the initial target tag data set, an objective function is solved by using a random gradient descent method to obtain an objective parameter, the objective parameter is substituted into the generation model after the objective parameter is obtained, the generation model is solved by using a Gibbs sampling method based on the objective parameter to obtain the target tag data set, and the target tag data set may be regarded as a target tag data set obtained by updating the initial target tag data set.

After the target tag data set is obtained, the target tag data set can be substituted into the target function, and the step of solving the target parameter and the target tag data set is executed in a circulating manner until convergence occurs, wherein the target tag data set at this time is the target tag data set which needs to be obtained finally.

In the above generation model, the consistency, the existence, the accuracy and the like of the tag data of the plurality of organizations are taken into consideration, and the obtained target tag data set is a comprehensive tag integrating the tag data of the plurality of organizations, so that the tag is not lost and the accuracy is better.

S206: and sending the target tag data group to the plurality of organizations, and carrying out federal learning by the plurality of organizations based on the target tag data group.

In S206, after obtaining the target tag data group, the trusted execution environment may send the target tag data group to a plurality of organizations, and the plurality of organizations perform distributed storage on the target tag data group, so as to avoid tag data leakage caused by the target tag data group being held by any one of the organizations alone.

When the trusted execution environment sends the target tag data group to the multiple organizations, the method may specifically include:

firstly, aiming at any target user in a plurality of users, the target label of the target user in the target label data group can be determined;

secondly, determining a plurality of original tags of the target user in a plurality of tag data sets, wherein one original tag corresponds to one mechanism, namely one original tag is provided by one mechanism;

thirdly, determining a candidate mechanism according to the target label and the plurality of original labels, wherein the original labels corresponding to the candidate mechanism are consistent with the target label;

specifically, the original tags of the target user in multiple institutions may be compared with the target tags of the target user, and the original tags of the target user in which institutions are consistent with the target tags of the target user are determined, where the determined institutions are candidate institutions, and the number of the candidate institutions may be one or multiple.

And finally, sending the target label of the target user to the candidate organization.

Specifically, in the case where the number of candidate establishments is 1, the tag data of the target user may be transmitted to the one candidate establishment.

In the case where the number of candidate institutions is greater than 1 (assuming that the number of candidate institutions is M), it is preferable that N target institutions be selected from among the M candidate institutions, and the target tags of the target users be transmitted to the N target institutions. And N is an integer which is larger than 1 and smaller than M, and the N target mechanisms can be randomly selected or selected in other modes when being selected.

In this embodiment of the present specification, after the trusted execution environment transmits the target tag data to the plurality of organizations by the above-mentioned method, at least the following technical effects can be achieved:

(1) the target label of any user can be held by at least 1 mechanism, so that the target label of the user can not be lost;

(2) when the target tag of any target user is sent to a plurality of candidate organizations, the target tag is not sent to all the candidate organizations, but is selected to be sent to part of the candidate organizations (namely, the N target organizations), so that for any organization, even if the target tag of a certain target user is consistent with the original tag of the target user in the organization, the trusted execution environment may not send the target tag of the target user to the organization, that is, the target tag finally taken by the organization is a part of the correct original tag of the organization (namely, the original tag consistent with the target tag in a plurality of original tags provided by the organization), and whether the rest original tags are correct or not cannot be known. In this way, it is avoided that any mechanism reversely deduces the target tag data set based on the target tag that the mechanism has taken and the original tag that the mechanism provided, and the target tag that the mechanism can take is the information that the mechanism originally knows.

(3) The target label of the same user can be held by a plurality of organizations, and the time consumption of data cross calculation in the federal learning can be optimized to a certain extent.

In this embodiment of the present specification, when sending the user tag of the target user to the target mechanism, in order to ensure the security of the target tag in the transmission process, the target tag of the target user may be encrypted and then sent, and there are various encryption methods, which are not illustrated here.

Optionally, in order to avoid the target tag being leaked in the transmission process, when the target tag of the target user is sent to the target mechanism, the user identifier of the target user may be determined, and the user identifier of the target user is sent to the target mechanism instead of the target tag, after the target mechanism receives the user identifier of the target user, the original tag of the target user may be determined according to the user identifier, and the original tag is the same as the target tag of the target user, so that the purpose of sending the target tag to the target mechanism may be achieved. The user identifier of the target user may be a user name, a mobile phone number, an identity card number, or the like of the user, and may be specifically determined according to an actual service scenario of the target mechanism.

Therefore, the user identification of the target user is sent to replace the user label of the target user, so that the target label can be prevented from being leaked in the transmission process.

When the user identifier of the target user is sent to the target mechanism, in order to protect the user identifier of the target user from being leaked in the transmission process, the user identifier may be encrypted and then sent, and the encryption methods may be various, which is not illustrated here.

After the trusted execution environment sends the target tag data group to the multiple organizations by the above-mentioned method, any organization may receive part of the target tag data in the target tag data group, and then the multiple organizations may perform federal learning based on the received part of the target tag data, and the specific implementation manner may be referred to the embodiment shown in fig. 3 below, which is not described in detail here.

In this way, when a plurality of organizations perform federal learning, since federal learning can be performed based on a unified target tag data set, the problem that learning training based on tag data of which organization cannot be determined when tag data provided by the plurality of organizations are inconsistent is effectively solved. In addition, because the credibility and the security level of the credible execution environment are high, the credible execution environment acquires a plurality of groups of labels provided by a plurality of mechanisms and performs learning training, and the privacy and the security of a plurality of label data groups provided by the plurality of mechanisms can be effectively ensured.

FIG. 3 is a flow diagram of a multi-tag based federated learning method, which may be performed by any of the entities shown in FIG. 1, according to one embodiment of the present description. The method is as follows.

S302: a tag data set including original tags for a plurality of users is obtained.

In S302, when an organization performs federal learning with other organizations, a plurality of users for federal learning, which may be users common to the organization and other organizations, may be determined. After determining the plurality of users, the organization may generate or collect raw tags for the plurality of users, resulting in a tag data set comprising a plurality of raw tags corresponding to the plurality of users.

S304: and sending the tag data group to a trusted execution environment, and performing learning training on the tag data group and other tag data groups from other organizations by the trusted execution environment by using a preset weak supervision learning algorithm to obtain a target tag data group, wherein at least one user in the multiple users has inconsistency of multiple original tags in the multiple tag data groups provided by the multiple organizations, and the target tag data group comprises target tags of the multiple users.

In S304, after generating or collecting a tag data set that encapsulates the original tags of the multiple users, the organization may send the tag data set to the trusted execution environment.

The trusted execution environment, upon receiving the tag data set of the organization, may also receive other tag data sets for a plurality of users from other organizations, where at least one of the plurality of users has an original tag inconsistency among the plurality of tag data sets provided by the plurality of organizations. And then, the trusted execution environment can utilize a preset weak supervised learning algorithm to perform learning training on a plurality of label data sets from a plurality of organizations to obtain a uniform target label data set. The specific implementation manner may refer to corresponding contents recorded in the embodiment shown in fig. 2, and a description thereof is not repeated here.

S306: receiving a portion of the target tag data in the set of target tag data returned by the trusted execution environment.

In S306, after obtaining the target tag data set, the trusted execution environment may send the target tag data set to a plurality of organizations, and the specific implementation manner may refer to corresponding contents described in the embodiment shown in fig. 2, which is not described repeatedly here.

After sending the target tag data set to the plurality of organizations, the trusted execution environment may receive, for any of the organizations, a portion of the target tag data in the target tag data set.

In one implementation, the part of the target tag data received by the organization may include a target tag of a part of users among the plurality of users, where the target tag of the part of users is consistent with an original tag of the part of users in a tag data set provided by the organization.

In another implementation, the part of the target tag data received by the organization may include a user identifier of a part of users among the plurality of users, and the target tag of the part of users is consistent with the original tag of the part of users in the tag data set provided by the organization.

S308: performing federated learning based on the portion of the target tag data.

In S308, the organization, after receiving the partial target tag data, may learn federately based on the partial target tag data.

In one implementation, if the target tags of a part of users are included in the part of target tag data received by the organization, when performing federal learning, the personal data of the part of users in the organization may be acquired, and the learning training is performed based on the personal data of the part of users and the received target tags.

In another implementation manner, if part of target tag data received by an organization includes user identifiers of part of users, when performing federal learning, an original tag corresponding to the part of users may be first searched in a set of tag data provided by the organization according to the user identifiers of the part of users, then personal data of the part of users in the organization is acquired, and finally, learning training is performed on the personal data of the part of users based on the original tag of the part of users.

In the embodiment of the present specification, since the target tags of the users are distributed among and held by the organizations, for example, the organization a holds tags of the users 1, 2, and 3, the organization B holds tags of the users 2, 3, and 4, and the organization 3 holds tags of the samples 2, 4, and 5, in the federal learning, the calculation roles of the organizations are not always fixed, the organization a sometimes acts as a tag holder, and the organization B sometimes acts as a tag holder. In this case, when calculating the loss function, the plurality of organizations may calculate the loss function based on their own target tags, and finally the losses calculated by the plurality of organizations may be added.

Optionally, after the agency learns the federation and obtains a target model, the agency may predict a plurality of users based on the target model to obtain model scores of the plurality of users, and then may compare the model of the plurality of users with the original tag values of the plurality of users to determine an AUC (Area under ROC curve) between the model of the plurality of users and the original tag values of the plurality of users, and if the AUC is smaller than a set value, the agency may fine-tune the target model in combination with its own condition to adapt to a business scenario of the agency itself. The set value may be any value between 0.7 and 0.9, and may be determined according to actual conditions, and is not specifically limited herein.

When the target model is fine-tuned, an optional scheme is that difference values between model scores of a plurality of users and original label values are calculated, a new model is trained by utilizing personal data of the users, the difference value is fitted based on the new model, and finally the model scores predicted by the new model and the model scores predicted by the target model are added to obtain a final predicted value.

FIG. 4 is a scenario diagram of a multi-label based federated learning approach of one embodiment of the present description.

In fig. 4, when the agency 41, the agency 42, the agency … …, and the agency 4N perform federal learning, one tag data set for a plurality of users may be collected for each of a plurality of users in common, and a plurality of tag data sets may be obtained, where any one tag data set includes original tags of the plurality of users, and at least one of the plurality of users has a plurality of original tags in the plurality of tag data sets that are inconsistent. Thereafter, the plurality of authorities send the plurality of tag data sets to the trusted execution environment.

In fig. 4, the tag data group transmitted by entity 41 is represented by tag data group a1, the tag data group transmitted by entity 42 is represented by tag data group a2, … …, and the tag data group transmitted by entity 4N is represented by tag data group AN.

After the trusted execution environment receives the plurality of label data sets, learning training is carried out on the plurality of label data sets by using a preset weak supervision learning algorithm, and a unified target label data set is obtained. The target tag data set includes target tags of a plurality of users.

The trusted execution environment may then send the target tag data set to a plurality of authorities. As can be seen from the description in the embodiment of fig. 2, the trusted execution environment sends part of the target tag data in the target tag data set to any one of the entities 41, 42, … …, and 4N, that is, sends the tag data of part of the users to any one of the entities, and the target tag of the user is sent to at least one of the entities for any one of the users.

When the trusted execution environment sends the target tags of the partial users to any mechanism, in order to avoid the target tags from being leaked in the transmission process, the user identifications of the partial users can be sent instead of the target tags of the partial users. As shown in FIG. 4, the trusted execution environment sends the user identification of partial user 1 to organization 41, the user identification of partial user 2 to organization 42, … …, and the user identification of partial user N to organization 4N. Any one of the partial users 1 to N may be one user, or may be multiple users, and any two of the partial users 1 to N may include the same user.

After receiving the user identifiers of the partial users, the entity 41, the entity 42, the entity … …, and the entity 4N may search the original tags of the partial users according to the user identifiers of the partial users, and perform federal learning based on the original data and personal data of the partial users to obtain the target model. After the target model is obtained, the plurality of organizations may further fine-tune the target model in combination with their own service scenarios, and the specific implementation manner may refer to corresponding contents in the embodiment shown in fig. 3, which is not described in detail here.

It should be noted that, in a possible implementation manner, after the trusted execution environment is trained to obtain a set of target tag data, the trusted execution environment may obtain personal data of multiple users from multiple organizations without sending the target tag data to the multiple organizations, and perform model training according to the obtained personal data and the determined target tag data to obtain a target model, that is, the trusted execution environment may execute a step of federal learning of multiple organizations and train to obtain the target model.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 5, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

And the processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the multi-tag-based federal learning device on a logic level. The processor executes the program stored in the memory and is specifically used for executing the following steps:

The method performed by the multi-tag-based federated learning apparatus as disclosed in the embodiment illustrated in fig. 5 of the present specification may be implemented in or by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may further execute the method in fig. 2, and implement the function of the multi-tag-based federal learning apparatus in the embodiment shown in fig. 2, which is not described herein again in this specification.

Of course, besides the software implementation, the electronic device of the embodiment of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.

Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 2, and in particular for performing the steps of:

Fig. 6 is a schematic structural diagram of a multi-tag based federal learning device 60 according to an embodiment of the present disclosure. Referring to fig. 6, in one software implementation, the multi-tag based federal learning device 60 can include: an acquisition unit 61, a processing unit 62 and a transmission unit 63, wherein:

an acquisition unit 61 configured to acquire a plurality of tag data sets provided by a plurality of organizations, the tag data sets including original tags of a plurality of users, at least one of the plurality of users having a plurality of original tags in the plurality of tag data sets that are inconsistent;

the processing unit 62 is configured to perform learning training on the plurality of tag data sets by using a preset weak supervised learning algorithm to obtain a target tag data set, where the target tag data set includes target tags of the plurality of users;

and a transmitting unit 63 configured to transmit the target tag data set to the plurality of organizations, and the plurality of organizations perform federal learning based on the target tag data set.

Optionally, the processing unit 62 performs learning training on the plurality of tag data sets by using a preset weak supervised learning algorithm to obtain a target tag data set, including:

determining a first characteristic variable, a second characteristic variable and a third characteristic variable according to the plurality of label data groups and an initial target label data group, wherein the first characteristic variable represents whether an original label exists in any one label data group of any user, the second characteristic variable represents whether a target label of any user is consistent with the original label of any one label data group of the user, and the third characteristic variable represents whether the original label of any user is consistent in any two label data groups;

obtaining a generating model and an objective function based on the first characteristic variable, the second characteristic variable and the third characteristic variable, wherein the generating model is used for solving the target label data group according to target parameters, and the objective function is used for solving the target parameters according to the target label data group;

and obtaining the target label data group according to the generated model and the target function.

Optionally, the obtaining, by the processing unit 62, the target tag data set according to the generated model and the target function includes:

solving the target function by using a random gradient descent method based on the initial target tag data group to obtain the target parameter;

solving the generation model by using a Gibbs sampling method based on the target parameters obtained by solving, and updating the initial target tag data group;

and circularly executing the steps until convergence, and obtaining the target tag data group.

Optionally, the sending unit 63 sends the target tag data set to the plurality of mechanisms, including:

for any target user of the plurality of users, performing the steps of:

determining a target tag of the target user in the target tag data set;

determining a plurality of original tags of the target user in the plurality of tag data sets, wherein one original tag corresponds to one mechanism;

determining a candidate mechanism according to the target label and the plurality of original labels, wherein the original labels corresponding to the candidate mechanism are consistent with the target label;

and sending the target label of the target user to the candidate organization.

Optionally, the sending unit 63 sends the target tag of the target user to the candidate organization, including:

if the number M of the candidate mechanisms is larger than 1, N target mechanisms are selected from the candidate mechanisms, wherein N is larger than 0 and smaller than M;

and sending the target label of the target user to the N target mechanisms.

Optionally, the sending unit 63 sends the target tag of the target user to the N target organizations, including:

determining a user identification of the target user;

and replacing the target label with the user identification of the target user and sending the user identification of the target user to the N target organizations.

Optionally, the sending unit 63 sends the user identifier of the target user to the N target organizations instead of the target tag, including:

encrypting the user identification of the target user to obtain the encrypted user identification;

and sending the encrypted user identification to the N target organizations.

The multi-tag-based federal learning device 60 provided in the embodiments of the present specification can also execute the method in fig. 2, and implement the functions of the multi-tag-based federal learning device in the embodiments shown in fig. 2, which are not described herein again.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 7, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 7, but this does not indicate only one bus or one type of bus.

acquiring a tag data group including original tags of a plurality of users;

performing federated learning based on the portion of the target tag data.

The method performed by the multi-tag-based federated learning apparatus as disclosed in the embodiment illustrated in fig. 7 of the present specification may be implemented in or by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may further execute the method in fig. 3, and implement the function of the multi-tag-based federal learning apparatus in the embodiment shown in fig. 3, which is not described herein again in this specification.

Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 3, and in particular for performing the steps of:

acquiring a tag data group including original tags of a plurality of users;

performing federated learning based on the portion of the target tag data.

Fig. 8 is a schematic structural diagram of a multi-tag based federal learning device 80 according to an embodiment of the present disclosure. Referring to fig. 8, in one software implementation, the multi-tag based federal learning device 80 can include: an obtaining unit 81, a sending unit 82, a receiving unit 83 and a federal learning unit 84, wherein:

an acquisition unit 81 that acquires a tag data group including original tags of a plurality of users;

the sending unit 82 is configured to send the tag data set to a trusted execution environment, and the trusted execution environment performs learning training on the tag data set and other tag data sets from other organizations by using a preset weak supervised learning algorithm to obtain a target tag data set, where at least one of the multiple users has inconsistency of multiple original tags in the multiple tag data sets provided by the multiple organizations, and the target tag data set includes target tags of the multiple users;

a receiving unit 83, configured to receive a part of the target tag data in the target tag data set returned by the trusted execution environment;

and a federal learning unit 84 for federal learning based on the partial target tag data.

Optionally, the partial target tag data includes a target tag of a partial user among the multiple users, and the target tag of the partial user is consistent with an original tag of the partial user in the tag data set.

Optionally, the partial target tag data includes a user identifier of a partial user among the multiple users, and the target tag of the partial user is consistent with the original tag of the target user in the tag data set.

Optionally, the federal learning unit 84 performs federal learning based on the part of the target tag data, including:

according to the user identification of the part of users, original tags corresponding to the part of users are searched in the tag data;

acquiring personal data of the part of users;

and performing learning training on the personal data of the part of users based on the original labels of the part of users.

Optionally, the federal learning unit 84 further obtains model scores of the multiple users after federal learning is performed based on the partial target tag data, where the model scores are obtained by predicting the multiple users based on a target model obtained by federal learning;

and if the AUC of the model of the users and the original label values of the users is smaller than a set value, adjusting the target model.

The multi-tag-based federal learning device 80 provided in the embodiments of the present specification can also execute the method in fig. 3, and implement the functions of the multi-tag-based federal learning device in the embodiments shown in fig. 3, which are not described herein again.

Fig. 9 is a schematic structural diagram of a multi-tag based federated learning system 90 in one embodiment of the present description. The system 90 comprises a trusted execution environment 91 and a plurality of mechanisms 92, wherein:

the multiple mechanisms 92 acquire multiple tag data sets, wherein the tag data sets comprise original tags of multiple users, and at least one of the multiple users has multiple original tags in the multiple tag data sets that are inconsistent; sending the plurality of tag data sets to the trusted execution environment 91;

the trusted execution environment 91 performs learning training on the plurality of label data sets by using a preset weak supervised learning algorithm to obtain a target label data set, wherein the target label data set comprises target labels of the plurality of users; sending the target tag data set to the plurality of agencies 92;

the plurality of entities 92 receiving the target tag data set returned by the trusted execution environment 91, any one of the entities receiving a portion of the target tag data in the target tag data set; and performing federal learning based on the received part of the target tag data.

The plurality of mechanisms 92 may include a mechanism 1, a mechanism 2, … …, and a mechanism N, where N is an integer greater than or equal to 2 and may be determined according to the actual number of mechanisms.

The specific implementation of the above steps can refer to corresponding contents described in the embodiments shown in fig. 2 and fig. 3, and a repeated description is omitted here.

The trusted execution environment 91 shown in fig. 9 may also execute the methods shown in fig. 2 and fig. 4, and implement the functions of the trusted execution environment in the embodiments shown in fig. 2 and fig. 4, which are not described herein again in this specification. The plurality of mechanisms 92 shown in fig. 9 may also perform the methods shown in fig. 3 and fig. 4, and implement the functions of the mechanisms in the embodiments shown in fig. 3 to fig. 4, which are not described herein again in this specification.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of protection of this document. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present specification shall be included in the scope of protection of this document.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A multi-label-based federated learning method is applied to a trusted execution environment and comprises the following steps:

sending the target tag data set to the plurality of organizations, and performing federal learning by the plurality of organizations based on the target tag data set;

wherein sending the target tag data set to the plurality of institutions comprises:

for any target user of the plurality of users, performing the steps of:

determining a target tag of the target user in the target tag data set;

and sending the target label of the target user to the candidate organization.

2. The method of claim 1, wherein learning training the plurality of label data sets by using a predetermined weak supervised learning algorithm to obtain a target label data set comprises:

3. The method of claim 2, deriving the target tag data set from the generative model and the objective function, comprising:

4. The method of claim 1, sending the target tag of the target user to the candidate authority, comprising:

and sending the target label of the target user to the N target mechanisms.

5. The method of claim 4, sending the target tags of the target users to the N target authorities, comprising:

determining a user identification of the target user;

6. The method of claim 5, sending the user identification of the target user to the N target institutions in place of the target tag, comprising:

and sending the encrypted user identification to the N target organizations.

7. A multi-label-based federal learning method is applied to an organization and comprises the following steps:

acquiring a tag data group including original tags of a plurality of users;

receiving partial target tag data in the target tag data group returned by the trusted execution environment, wherein the partial target tag data comprises a target tag of a partial user in the multiple users, and the target tag of the partial user is consistent with an original tag of the partial user in the tag data group;

performing federated learning based on the portion of the target tag data.

8. The method of claim 7, wherein the first and second light sources are selected from the group consisting of,

the part of target label data comprises user identification of a part of users in the plurality of users, and target labels of the part of users are consistent with original labels of the target users in the label data group.

9. The method of claim 8, performing federated learning based on the portion of the target tag data, comprising:

acquiring personal data of the part of users;

10. The method of claim 7, after federated learning based on the portion of the target tag data, the method further comprising:

obtaining model scores of the users, wherein the model scores are obtained by predicting the users based on a target model obtained by federal learning;

and if the area AUC under the curve of the model of the users and the original label values of the users is smaller than a set value, adjusting the target model.

11. A multi-label-based federated learning apparatus applied to a trusted execution environment comprises:

a sending unit, configured to send the target tag data set to the plurality of organizations, and the plurality of organizations perform federal learning based on the target tag data set;

wherein the sending unit sends the target tag data group to the plurality of institutions includes:

for any target user of the plurality of users, performing the steps of:

determining a target tag of the target user in the target tag data set;

and sending the target label of the target user to the candidate organization.

12. An electronic device, comprising:

a processor; and

for any target user of the plurality of users, performing the steps of:

determining a target tag of the target user in the target tag data set;

and sending the target label of the target user to the candidate organization.

13. A computer readable storage medium storing one or more programs which, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the steps of:

for any target user of the plurality of users, performing the steps of:

determining a target tag of the target user in the target tag data set;

and sending the target label of the target user to the candidate organization.

14. A multi-label based federated learning device is applied to an organization, and comprises:

a receiving unit, configured to receive partial target tag data in the target tag data set returned by the trusted execution environment, where the partial target tag data includes a target tag of a partial user among the multiple users, and the target tag of the partial user is consistent with an original tag of the partial user in the tag data set;

15. An electronic device, comprising:

a processor; and

acquiring a tag data group including original tags of a plurality of users;

performing federated learning based on the portion of the target tag data.

16. A computer readable storage medium storing one or more programs which, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the steps of:

acquiring a tag data group including original tags of a plurality of users;

performing federated learning based on the portion of the target tag data.

17. A multi-tag based federated learning system comprising a trusted execution environment and a plurality of institutions, wherein:

wherein the trusted execution environment sends the target tag data set to the plurality of authorities, including: for any target user of the plurality of users, performing the steps of: determining a target tag of the target user in the target tag data set; determining a plurality of original tags of the target user in the plurality of tag data sets, wherein one original tag corresponds to one mechanism; determining a candidate mechanism according to the target label and the plurality of original labels, wherein the original labels corresponding to the candidate mechanism are consistent with the target label; sending the target label of the target user to the candidate organization;

the plurality of mechanisms receive the target tag data set returned by the trusted execution environment, and any mechanism receives part of target tag data in the target tag data set, wherein the part of target tag data comprises target tags of part of users in the plurality of users, and the target tags of the part of users are consistent with original tags of the part of users in the tag data set; and performing federal learning based on the received part of the target tag data.