CN113723621B - Longitudinal federal learning modeling method, device, equipment and computer medium - Google Patents

Longitudinal federal learning modeling method, device, equipment and computer medium Download PDF

Info

Publication number
CN113723621B
CN113723621B CN202110417898.2A CN202110417898A CN113723621B CN 113723621 B CN113723621 B CN 113723621B CN 202110417898 A CN202110417898 A CN 202110417898A CN 113723621 B CN113723621 B CN 113723621B
Authority
CN
China
Prior art keywords
target
label
sub
labels
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110417898.2A
Other languages
Chinese (zh)
Other versions
CN113723621A (en
Inventor
韩雨锦
李怡欣
陈忠
王虎
黄志翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202110417898.2A priority Critical patent/CN113723621B/en
Publication of CN113723621A publication Critical patent/CN113723621A/en
Application granted granted Critical
Publication of CN113723621B publication Critical patent/CN113723621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a longitudinal federal learning modeling method, a device, equipment and a computer readable storage medium, which are used for acquiring a target label required by longitudinal federal learning modeling when applied to a business party; decomposing the target label to obtain a target sub-label; distributing the target sub-label to each data party corresponding to the target label, so that each data party carries out longitudinal federal learning modeling based on the distributed target sub-label and local data of the data party; wherein, the business side stores the label, and the data side does not store the label. All the target sub-tags are assembled to comprehensively describe the target tag, so that each data party can only obtain target tag information of a specific angle by means of the target sub-tags, but can not obtain all information of the target tag, the problem of large calculation resource and time consumption caused by transmission of the encrypted target tag is avoided, and the modeling efficiency can be improved.

Description

Longitudinal federal learning modeling method, device, equipment and computer medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a longitudinal federal learning modeling method, apparatus, device, and computer medium.
Background
With the advent of the big data age, there are hard-to-break barriers between data sources, such as in an artificial intelligence based product recommendation service, product sellers have data of products, data of users purchasing goods, but no data of user purchasing power and payment habits. In most industries, data exists in island form, and due to the problems of industry competition, privacy safety, complex administrative procedures and the like, even if data integration is realized among different departments of the same company, important resistance is faced, and in reality, it is almost impossible or the required cost is huge to integrate data scattered in various places and various institutions. In order to solve the problem of island of data, under the condition that users of two data sets overlap more and user features overlap less, the data sets can be segmented according to the longitudinal direction (namely feature dimension), and the part of data with the same user features and the incomplete user features of the two users is taken out for training so as to obtain a model for processing all the features to process the data, namely, the data is processed by a longitudinal federal learning method.
However, in longitudinal federal learning, privacy security in the modeling process is ensured through secure multi-party calculation and cryptography, so that encryption transmission is required for information to be protected in the interaction process of all parties, a large amount of calculation resources and time are consumed, and the modeling efficiency is low.
In summary, how to improve the efficiency of longitudinal federal learning modeling is a problem to be solved by those skilled in the art.
Disclosure of Invention
The purpose of the application is to provide a longitudinal federal learning modeling method, which can solve the technical problem of how to improve the efficiency of longitudinal federal learning modeling to a certain extent. The application also provides a longitudinal federal learning modeling device, electronic equipment and a computer readable storage medium.
In a first aspect, the present application provides a longitudinal federal learning modeling method, applied to a business party, including:
acquiring a target label required by longitudinal federal learning modeling;
decomposing the target label to obtain a target sub-label;
distributing the target sub-label to each data party corresponding to the target label, so that each data party carries out longitudinal federal learning modeling based on the distributed target sub-label and local data of the data party;
Wherein, the business side stores the label, and the data side does not store the label.
Optionally, the decomposing the target tag to obtain a target sub-tag includes:
decomposing the target label to obtain a sub-label;
sequencing the sub-labels according to the arrangement mode of the normalized amplitude descending order to obtain sequencing sub-labels;
determining a sub-tag number value that causes the target sub-tag to be similar to the target tag based on an energy loss calculation method;
and selecting the sub-label with the sub-label quantity value as the target sub-label from the sorting sub-labels.
Optionally, the determining, based on the energy loss calculation method, a sub-tag number value that makes the target sub-tag similar to the target tag includes:
calculating a first loss value of the sequencing sub-tag and the target tag through a first calculation formula based on an MSE loss calculation method;
determining the number of sub-tags such that the first loss value is less than a first preset value, the first preset value being a critical value for determining that the target sub-tag is similar to the target tag;
the first calculation formula includes:
Wherein β represents the first loss value; y is Y i Representing an ith tag in the target tags; n represents the total number of tags in the target tag; y'. i And k represents the quantity value of the sub-label obtained by decomposing the ith label in the target label.
Optionally, the assigning the target sub-label to each data party corresponding to the target label includes:
determining an allocation method for enabling the target sub-label obtained by allocation of each data party to be dissimilar to the target label based on the energy loss calculation method;
and distributing the target sub-label to each data party according to the distribution method.
Optionally, the decomposing the target tag to obtain a sub-tag includes:
and carrying out Fourier decomposition on the target label to obtain the sub-label.
Optionally, the decomposing the target tag to obtain a target sub-tag includes:
sorting the target labels to obtain stable sorting labels;
and decomposing the sorting labels to obtain the target sub-labels.
Optionally, the sorting the target labels to obtain stable sorting labels includes:
Sorting the target labels based on a target sorting method to obtain the stable sorting labels, wherein the target sorting method comprises the following steps: descending order sorting, ascending order sorting, rectangular wave sorting.
Optionally, after the target sub-label is allocated to each data party corresponding to the target label, the method further includes:
obtaining fitting sub-labels obtained after longitudinal federal learning modeling is carried out on each data party;
and carrying out longitudinal federal learning modeling based on the fitting sub-label and the target label.
Optionally, the performing longitudinal federal learning modeling based on the fitted sub-tag and the target tag includes:
and performing longitudinal federal learning modeling based on the fitting sub-label and the target label according to a gradient lifting method.
In a second aspect, the present application provides a longitudinal federal learning modeling method, applied to a data party, including:
receiving a target sub-label distributed by a service party;
performing longitudinal federal learning modeling based on the distributed target sub-tags and the local data of the data party;
wherein, the business side stores a label, and the data side does not store the label; the business party obtains a target label required by longitudinal federal learning modeling, decomposes the target label to obtain the target sub-label, and distributes the target sub-label to each data party corresponding to the target label.
In a third aspect, the present application provides a longitudinal federal learning modeling apparatus, applied to a business party, including:
the label acquisition module is used for acquiring a target label required by longitudinal federal learning modeling;
the label decomposing module is used for decomposing the target label to obtain a target sub-label;
the label distribution module is used for distributing the target sub-label to each data party corresponding to the target label so that each data party carries out longitudinal federal learning modeling based on the distributed target sub-label and local data of the data party;
wherein, the business side stores the label, and the data side does not store the label.
In a fourth aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the longitudinal federal learning modeling method as described in any one of the above when executing the computer program.
In a fifth aspect, the present application provides a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of a vertical federal learning modeling method as described in any of the above.
In the method, after the business party obtains the target label required by longitudinal federal learning modeling, the target label is not encrypted, but is decomposed to obtain all target sub-labels for describing the target label from different angles, and then the target sub-labels are distributed to all data parties corresponding to the target label. The longitudinal federal learning modeling device, the electronic equipment and the computer readable storage medium provided by the application also solve the corresponding technical problems.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic diagram of a system framework to which the longitudinal federal learning modeling scheme provided herein is applicable;
FIG. 2 is a flowchart of a method for longitudinal federal learning modeling according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of data interactions in the present application;
FIG. 4 is a flowchart of a specific longitudinal federal learning modeling method according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of a specific longitudinal federal learning modeling method according to an embodiment of the present disclosure;
FIG. 6 is a flowchart of a specific longitudinal federal learning modeling method according to an embodiment of the present disclosure;
FIG. 7 is another schematic diagram of data interactions in the present application;
FIG. 8 is a schematic structural diagram of a longitudinal federal learning modeling apparatus provided herein;
fig. 9 is a block diagram of an electronic device provided in the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
With the advent of the big data age, there are hard-to-break barriers between data sources, such as in an artificial intelligence based product recommendation service, product sellers have data of products, data of users purchasing goods, but no data of user purchasing power and payment habits. In most industries, data exists in island form, and due to the problems of industry competition, privacy safety, complex administrative procedures and the like, even if data integration is realized among different departments of the same company, important resistance is faced, and in reality, it is almost impossible or the required cost is huge to integrate data scattered in various places and various institutions. In order to solve the problem of island of data, under the condition that users of two data sets overlap more and user features overlap less, the data sets can be segmented according to the longitudinal direction (namely feature dimension), and the part of data with the same user features and the incomplete user features of the two users is taken out for training so as to obtain a model for processing all the features to process the data, namely, the data is processed by a longitudinal federal learning method. However, in longitudinal federal learning, privacy security in the modeling process is ensured through secure multi-party calculation and cryptography, so that encryption transmission is required for information to be protected in the interaction process of all parties, a large amount of calculation resources and time are consumed, and the modeling efficiency is low. In order to overcome the technical problems, the application provides a longitudinal federal learning modeling scheme which can improve the efficiency of longitudinal federal learning modeling.
In the longitudinal federal learning modeling scheme of the present application, the system framework adopted may be specifically shown in fig. 1, and may specifically include: a service party 01 and a number of data parties 02 establishing a communication connection with the service party 01. And the business side stores the label, and the data side does not store the label.
In the application, when the business side 01 is used for executing the steps of the longitudinal federal learning modeling method, the method can include obtaining a target label required by longitudinal federal learning modeling; decomposing the target label to obtain a target sub-label; and distributing the target sub-label to each data party corresponding to the target label, so that each data party carries out longitudinal federal learning modeling based on the distributed target sub-label and the local data of the data party. Furthermore, the business side 01 can be also provided with a target label database and a target sub-label database. The target label database is used for storing various labels, such as labels of goods, labels of goods purchased by users and the like. The target sub-label database can be used for storing the target sub-labels obtained after decomposition. Further, in the present application, the business party 01 may respond to longitudinal federal learning modeling requests of one or more data parties 02, etc.
Fig. 2 is a flowchart of a longitudinal federal learning modeling method according to an embodiment of the present application. Referring to fig. 2, the longitudinal federal learning modeling method may include:
step S101: and obtaining target labels required by longitudinal federal learning modeling.
In this embodiment, the service party stores the labels, and the labels required by the service party and the data party for performing longitudinal federal learning modeling may be only part of the labels stored by the service party, so that the service party may first obtain the target labels required by the service party for performing longitudinal federal learning modeling, and specifically, may determine the required target labels by performing information interaction with the data party. The types of the service party, the data party and the tag can be determined according to actual needs, for example, the data party can be a user client, the service party can be a server for providing corresponding service for the user, the tag is a tag corresponding to the service, and the like, the server is taken as an example of a server for providing commodity transaction, the tag can be an attribute of commodity, a transaction amount of commodity, and the like, the server is taken as an example of a server for providing information browsing, and the tag can be an abstract, a keyword, and the like of information.
Step S102: and decomposing the target label to obtain a target sub-label.
In this embodiment, after the service party obtains the target tag, the service party does not encrypt the target tag and then transmits the encrypted target tag to the data party, but needs to decompose the target tag to obtain a target sub-tag which only carries part of information of the target tag and describes the target tag from a specific angle, that is, a single target sub-tag only reflects part of information of the target tag, and all target sub-tags can reflect complete information of the target tag in a collective manner.
It can be appreciated that in the process of decomposing the target label to obtain the sub-label, fourier decomposition may be performed on the target label to obtain the sub-label. Because the Fourier decomposition requires data to be as stable as possible, in the process of decomposing the target label to obtain the target sub-label, the target label can be ordered to obtain a stable ordering label, and the ordering label is decomposed to obtain the target sub-label. Specifically, the types of sorting methods applied to sort the target tags may include: descending order sorting, ascending order sorting, rectangular wave sorting, etc. Of course, the target tag may be decomposed by other methods, which are not specifically limited herein.
Step S103: and distributing the target sub-label to each data party corresponding to the target label, so that each data party carries out longitudinal federal learning modeling based on the distributed target sub-label and the local data of the data party.
In this embodiment, the target label cannot be decomposed to obtain the target sub-label, and then the target sub-label can be distributed to each data party corresponding to the target label, and because all the target sub-labels are collected to reflect the complete information of the target label, after the target sub-label is distributed to each data party, each data party cannot obtain the target label by means of the target sub-label obtained by itself, so that the situation that the target label is not required to be encrypted, and the data party cannot obtain all the information of the target label is ensured, and therefore, if the subsequent data parties perform longitudinal federal learning modeling based on the distributed target sub-label and the local data of the data party, only the longitudinal federal learning modeling can be performed according to the partial information of the target label.
It should be noted that, in the process of performing longitudinal federal learning modeling based on the allocated target sub-tags and the local data of the data party, the data party may build a model for each target sub-tag and the corresponding local data, and perform modeling with the target sub-tag as the training direction of the model, which may refer to fig. 3, and the application is not specifically limited herein.
In the method, after the business party obtains the target label required by longitudinal federal learning modeling, the target label is not encrypted, but is decomposed to obtain all target sub-labels for describing the target label from different angles, and then the target sub-labels are distributed to all data parties corresponding to the target label.
Fig. 4 is a flowchart of a specific longitudinal federal learning modeling method according to an embodiment of the present application. Referring to fig. 4, the longitudinal federal learning modeling method may include:
step S201: and obtaining target labels required by longitudinal federal learning modeling.
Step S202: and decomposing the target label to obtain a sub-label.
In this embodiment, in the process of decomposing the target label to obtain the target sub-label, the service party may decompose the target label to obtain the corresponding sub-label, because the number of sub-labels obtained by decomposition is greater, if all the sub-labels are used as the target sub-labels at this time, the number of target sub-labels obtained by each data party later will be greater, which is not beneficial to the rapid execution of the longitudinal federal learning modeling, so as to avoid this situation, and to improve the efficiency of the longitudinal federal learning modeling method as much as possible, only part of the sub-labels may be selected as the target sub-labels to participate in the subsequent longitudinal federal learning modeling.
Step S203: and sequencing the sub-labels according to the arrangement mode of the normalized amplitude descending order to obtain sequencing sub-labels.
In this embodiment, considering that a single sub-tag only carries part of information of a target tag, in order to make the target sub-tag selected later carry as much information of the target tag as possible, and the magnitude of the sub-tag can reflect how much information of the target tag is carried by the sub-tag, the sub-tags can be sorted according to the arrangement mode of the normalized magnitude descending order, so as to obtain sorted sub-tags, and then the target sub-tag is determined based on the sorted sub-tags.
Step S204: based on the energy loss calculation method, a sub-tag number value that causes the target sub-tag to be similar to the target tag is determined.
Step S205: and selecting the sub-label with the previous sub-label quantity value from the ordered sub-labels as a target sub-label.
In this embodiment, the sub-tags are ordered according to the order of decreasing normalized amplitude, so that the more the sub-tags located in front of the ordered sub-tags carry target tag information than the sub-tags located in back of the ordered sub-tags, so that a certain number of sub-tags located in front of the ordered sub-tags can be determined as target sub-tags, but because the target sub-tags are not all sub-tags, the target sub-tags can lose part of information of the target tags, in order to avoid that the target sub-tags lose excessive target tag information, the number of sub-tags in the target sub-tags needs to be limited, so as to ensure that the target sub-tags carry more target tag information, namely, the target sub-tags are similar to the target tags, and because the energy loss can represent the difference between two data, the application can determine the number of sub-tags which enable the target sub-tags to be similar to the target tags based on the energy loss calculation method, and then select the sub-tags of the previous number of sub-tags as the target sub-tags in the ordered sub-tags.
It can be understood that, in determining the number of sub-tags value that makes the target sub-tag similar to the target tag based on the energy loss calculation method, in order to quickly determine the number of sub-tags value, a first loss value of ordering the sub-tags and the target tag may be calculated by a first calculation formula based on an MSE (mean-square error) loss calculation method; determining a sub-tag number value which enables the first loss value to be smaller than a first preset value, wherein the first preset value is a critical value for judging that the target sub-tag is similar to the target tag;
wherein, the first calculation formula includes:
wherein β represents a first loss value; y is Y i Representing an ith tag in the target tags; n represents the total number of tags in the target tag; y'. i And k represents the number value of the sub-labels obtained by decomposing the ith label in the target label.
It should be noted that, based on the MSE loss calculation method, the loss values of the sorting sub-tags and the target tags obtained by preliminary calculation are as follows:
the above formula can reflect the loss value caused by the specific value of the tag, but since the value of the target tag may be a special value such as 0,1, etc., the arrangement trend of these special values will also cause loss value, which is reflected on the signal, the two signals are identical in shape but different in amplitude, so that the amplitude parameter a needs to be limited, namely d E(v) =0, giveSubstituting the first loss value into the first calculation formula to calculate the first loss values of the sorting sub-label and the target label is as follows:
wherein β represents a first loss value; y is Y i Representing an ith tag in the target tags; n represents the total number of tags in the target tag; y'. i And k represents the number value of the sub-labels obtained by decomposing the ith label in the target label.
It can be understood that in this embodiment, the number of sub-labels obtained by decomposing each label is assumed to be the same, and the number of target sub-labels corresponding to each label is the same, so in this embodiment, the number of sub-labels is n×k. However, in practical application, the decomposition condition of each tag may be determined according to practical needs, for example, the number of sub-tags obtained by decomposing each tag is different, and the number of corresponding target sub-tags is different.
Step S206: and distributing the target sub-label to each data party corresponding to the target label, so that each data party carries out longitudinal federal learning modeling based on the distributed target sub-label and the local data of the data party.
Fig. 5 is a flowchart of a specific longitudinal federal learning modeling method according to an embodiment of the present application. Referring to fig. 5, the vertical federal learning modeling method may include:
Step S301: and obtaining target labels required by longitudinal federal learning modeling.
Step S302: and decomposing the target label to obtain a sub-label.
Step S303: and sequencing the sub-labels according to the arrangement mode of the normalized amplitude descending order to obtain sequencing sub-labels.
Step S304: based on the energy loss calculation method, a sub-tag number value that causes the target sub-tag to be similar to the target tag is determined.
Step S305: and selecting the sub-label with the previous sub-label quantity value from the ordered sub-labels as a target sub-label.
Step S306: and determining an allocation method for enabling the target sub-label allocated by each data party to be dissimilar to the target label based on the energy loss calculation method.
Step S307: and distributing the target sub-labels to all the data parties according to the distribution method, so that all the data parties perform longitudinal federal learning modeling based on the distributed target sub-labels and the local data of the data parties.
In this embodiment, in order to avoid malicious acquisition of the target tag by the data party, it is required to ensure that each data party cannot infer the target tag according to the target sub-tag allocated by itself, which requires that the target sub-tag allocated by the data party is dissimilar to the target tag, so that in the process of allocating the target sub-tag to each data party corresponding to the target tag, the service party may determine an allocation method for enabling the target sub-tag allocated by each data party to be dissimilar to the target tag based on an energy loss calculation method, and then allocate the target sub-tag to each data party according to the allocation method.
In the process of determining the allocation method for enabling the target sub-label allocated by each data party to be dissimilar to the target label based on the energy loss calculation method, the second loss value of the target sub-label and the target label allocated by the data party can be calculated through a second calculation formula based on the MSE loss calculation method; determining a target sub-label quantity value which enables the second loss value to be larger than a second preset value, wherein the second preset value is a critical value for judging that the target sub-label distributed by the data party is similar to the target label;
wherein the second calculation formula includes:
wherein α represents a second loss value; y is Y i Representing an ith tag in the target tags; n represents the target labelThe total number of tags; y' i And m represents the number value of the target sub-label corresponding to the ith label in the target labels allocated to the data party.
Fig. 6 is a flowchart of a specific longitudinal federal learning modeling method according to an embodiment of the present application. Referring to fig. 6, the vertical federal learning modeling method may include:
step S401: and obtaining target labels required by longitudinal federal learning modeling.
Step S402: and decomposing the target label to obtain a sub-label.
Step S403: and sequencing the sub-labels according to the arrangement mode of the normalized amplitude descending order to obtain sequencing sub-labels.
Step S404: based on the energy loss calculation method, a sub-tag number value that causes the target sub-tag to be similar to the target tag is determined.
Step S405: and selecting the sub-label with the previous sub-label quantity value from the ordered sub-labels as a target sub-label.
Step S406: and determining an allocation method for enabling the target sub-label allocated by each data party to be dissimilar to the target label based on the energy loss calculation method.
Step S407: and distributing the target sub-labels to all the data parties according to the distribution method, so that all the data parties perform longitudinal federal learning modeling based on the distributed target sub-labels and the local data of the data parties.
Step S408: and obtaining fitting sub-labels obtained after longitudinal federal learning modeling is carried out on each data party.
Step S409: and performing longitudinal federal learning modeling based on the fitting sub-label and the target label.
In this embodiment, since the longitudinal federation learning modeling cannot be performed by using the corresponding data of the data party either, the target sub-label cannot be allocated to each data party according to the allocation method, so that after each data party performs the longitudinal federation learning modeling based on the allocated target sub-label and the local data of the data party, the fitting sub-label carrying the local data information of the data party obtained after each data party performs the longitudinal federation learning modeling can be obtained, and the longitudinal federation learning modeling is performed based on the fitting sub-label and the target label, and the process can refer to fig. 7.
In the process of performing longitudinal federal learning modeling based on the fitting sub-label and the target label, the business party can perform longitudinal federal learning modeling based on the fitting sub-label and the target label according to the gradient lifting method in order to ensure that the established model meets the business requirement.
The calculation formula of gradient lifting is as follows:
wherein y' i Representing the next gradient lifting direction of the model built by the service party; y is i Representing the corresponding value of the fit sub-label; f (F) m-1 (x i ) And the output value of the model built by the service party at the mth round is represented.
The longitudinal federal learning modeling method provided in this embodiment may include the following steps when applied to a data party:
receiving a target sub-label distributed by a service party;
performing longitudinal federal learning modeling based on the distributed target sub-labels and the local data of the data party;
wherein, the business side stores the label, the data side does not store the label; the business side obtains target labels required by longitudinal federal learning modeling, decomposes the target labels to obtain target sub-labels, and distributes the target sub-labels to all data sides corresponding to the target labels.
In this embodiment, the corresponding steps of the longitudinal federal learning modeling method performed by the data side may refer to the above embodiments, and are not described herein.
The technical scheme in the application is described below by taking a service party as a server for providing commodity transaction and a data party as a user client as an example. The process of data interaction between the server and the user client may be as follows:
the server acquires a target label required by longitudinal federal learning modeling;
the server decomposes the target label to obtain sub-labels;
the server sorts the sub-labels according to the arrangement mode of the normalized amplitude descending order to obtain sorted sub-labels;
the server determines a sub-label quantity value which enables the target sub-label to be similar to the target label based on an energy loss calculation method;
the server selects the sub-label with the front sub-label quantity value from the sorting sub-labels as a target sub-label;
the server determines an allocation method for enabling target sub-labels allocated by each user client to be dissimilar to the target labels based on an energy loss calculation method;
the server distributes the target sub-labels to each user client according to a distribution method;
each user client performs longitudinal federal learning modeling based on the distributed target sub-labels and the local data of the user client;
each user client transmits the fitting sub-label obtained after the user client performs longitudinal federal learning modeling to a server;
And the server carries out longitudinal federal learning modeling based on the fitting sub-label and the target label.
Referring to fig. 8, the embodiment of the present application further discloses a longitudinal federal learning modeling apparatus, which is applied to a service party, and includes:
the tag acquisition module 101 is used for acquiring a target tag required by longitudinal federal learning modeling;
the tag decomposition module 102 is configured to decompose the target tag to obtain a target sub-tag;
the tag allocation module 103 is configured to allocate the target sub-tag to each data party corresponding to the target tag, so that each data party performs longitudinal federal learning modeling based on the allocated target sub-tag and local data of the data party;
wherein, the business side stores the label, and the data side does not store the label.
Therefore, in the application, after the business side obtains the target label required by longitudinal federal learning modeling, the business side does not encrypt the target label, but decomposes the target label to obtain each target sub-label describing the target label from different angles, and then distributes the target sub-label to each data side corresponding to the target label, and as all the target sub-labels are assembled to comprehensively describe the target label, each data side can only obtain the target label information of a specific angle by means of the target sub-label and can not obtain all the information of the target label, namely the application of the longitudinal federal learning modeling by the data side under the condition that the target label can not be known can avoid the problems of large calculation resources and time consumption caused by transmission of the encrypted target label, and can improve modeling efficiency
In some embodiments, the tag resolution module may be specifically configured to: decomposing the target label to obtain a sub-label; sequencing the sub-labels according to the arrangement mode of the normalized amplitude descending order to obtain sequencing sub-labels; determining a sub-tag number value that makes the target sub-tag similar to the target tag based on an energy loss calculation method; and selecting the sub-label with the previous sub-label quantity value from the ordered sub-labels as a target sub-label.
In some embodiments, the tag resolution module may be specifically configured to: calculating a first loss value of the sequencing sub-tag and the target tag through a first calculation formula based on an MSE loss calculation method; determining a sub-tag number value which enables the first loss value to be smaller than a first preset value, wherein the first preset value is a critical value for judging that the target sub-tag is similar to the target tag;
the first calculation formula includes:
wherein β represents a first loss value; y is Y i Representing target tagsThe i-th tag of (a); n represents the total number of tags in the target tag; y'. i And k represents the number value of the sub-labels obtained by decomposing the ith label in the target label.
In some embodiments, the tag allocation module may be specifically configured to allocate the target sub-tag to each data party corresponding to the target tag, including: determining an allocation method for enabling target sub-labels obtained by allocation of all data parties to be dissimilar to the target labels based on an energy loss calculation method; and distributing the target sub-labels to each data party according to a distribution method.
In some embodiments, the tag resolution module may be specifically configured to: and carrying out Fourier decomposition on the target label to obtain a sub-label.
In some embodiments, the tag resolution module may be specifically configured to: sorting the target labels to obtain stable sorting labels; and decomposing the ordered labels to obtain target sub-labels.
In some embodiments, the tag resolution module may be specifically configured to: sorting the target labels based on a target sorting method to obtain stable sorting labels, wherein the target sorting method comprises the following steps: descending order sorting, ascending order sorting, rectangular wave sorting.
In some embodiments, the longitudinal federal learning modeling apparatus may further include:
the fitting sub-label obtaining module is used for obtaining fitting sub-labels obtained after longitudinal federal learning modeling is carried out on each data party after the target sub-labels are distributed to each data party corresponding to the target labels by the label distribution module; and performing longitudinal federal learning modeling based on the fitting sub-label and the target label.
In some embodiments, the fit sub-tag acquisition module may be specifically configured to: and performing longitudinal federal learning modeling based on the fitting sub-label and the target label according to the gradient lifting method.
Further, the embodiment of the application also provides electronic equipment. Fig. 9 is a block diagram of an electronic device 20, according to an exemplary embodiment, and the contents of the diagram should not be construed as limiting the scope of use of the present application in any way.
Fig. 9 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement the relevant steps in the longitudinal federal learning modeling method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be a server.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, video data 223, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, so as to implement the operation and processing of the processor 21 on the massive video data 223 in the memory 22, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further comprise a computer program capable of performing other specific tasks in addition to the computer program capable of performing the longitudinal federal learning modeling method performed by the electronic device 20 as disclosed in any of the previous embodiments. The data 223 may include various tag data collected by the electronic device 20.
Further, the embodiment of the application also discloses a storage medium, wherein the storage medium stores a computer program, and when the computer program is loaded and executed by a processor, the steps of the longitudinal federal learning modeling method disclosed in any one of the previous embodiments are realized.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. A longitudinal federal learning modeling method, applied to a business party, comprising:
acquiring a target label required by longitudinal federal learning modeling;
decomposing the target label to obtain a target sub-label;
distributing the target sub-label to each data party corresponding to the target label, so that each data party carries out longitudinal federal learning modeling based on the distributed target sub-label and local data of the data party;
wherein, the business side stores a label, and the data side does not store the label;
the decomposing the target label to obtain a target sub-label includes:
Decomposing the target label to obtain a sub-label;
sequencing the sub-labels according to the arrangement mode of the normalized amplitude descending order to obtain sequencing sub-labels;
determining a sub-tag number value that causes the target sub-tag to be similar to the target tag based on an energy loss calculation method;
and selecting the sub-label with the sub-label quantity value as the target sub-label from the sorting sub-labels.
2. The method of claim 1, wherein the determining a sub-tag number value that causes the target sub-tag to be similar to the target tag based on the energy loss calculation method comprises:
calculating a first loss value of the sequencing sub-tag and the target tag through a first calculation formula based on an MSE loss calculation method;
determining the number of sub-tags such that the first loss value is less than a first preset value, the first preset value being a critical value for determining that the target sub-tag is similar to the target tag;
the first calculation formula includes:
wherein β represents the first loss value; y is Y i Representing an ith tag in the target tags; n represents the total number of tags in the target tag; y is Y i ' represents the sub-label obtained by decomposing the ith label in the target labels, and k represents the quantity value of the sub-label obtained by decomposing the ith label in the target labels.
3. The method of claim 2, wherein the assigning the target sub-label to each data party corresponding to the target label comprises:
determining an allocation method for enabling the target sub-label obtained by allocation of each data party to be dissimilar to the target label based on the energy loss calculation method;
and distributing the target sub-label to each data party according to the distribution method.
4. A method according to any one of claims 1 to 3, wherein said decomposing the target tag to obtain sub-tags comprises:
and carrying out Fourier decomposition on the target label to obtain the sub-label.
5. The method of claim 4, wherein the decomposing the target tag to obtain a target sub-tag comprises:
sorting the target labels to obtain stable sorting labels;
and decomposing the sorting labels to obtain the target sub-labels.
6. The method of claim 5, wherein the sorting the target tags to obtain smooth sorted tags comprises:
sorting the target labels based on a target sorting method to obtain the stable sorting labels, wherein the target sorting method comprises the following steps: descending order sorting, ascending order sorting, rectangular wave sorting.
7. The method of claim 4, wherein after the assigning the target sub-label to each data party corresponding to the target label, further comprising:
obtaining fitting sub-labels obtained after longitudinal federal learning modeling is carried out on each data party;
and carrying out longitudinal federal learning modeling based on the fitting sub-label and the target label.
8. The method of claim 7, wherein the performing longitudinal federal learning modeling based on the fitted sub-tags and the target tags comprises:
and performing longitudinal federal learning modeling based on the fitting sub-label and the target label according to a gradient lifting method.
9. A longitudinal federal learning modeling method, applied to a data party, comprising:
receiving a target sub-label distributed by a service party;
Performing longitudinal federal learning modeling based on the distributed target sub-tags and the local data of the data party;
wherein, the business side stores a label, and the data side does not store the label; the business side obtains target labels required by longitudinal federal learning modeling, decomposes the target labels to obtain sub-labels, sorts the sub-labels according to a descending order of normalized amplitude, obtains sorted sub-labels, determines sub-label quantity values which enable the target sub-labels to be similar to the target labels based on an energy loss calculation method, selects the sub-labels with the former sub-label quantity values as the target sub-labels in the sorted sub-labels, and distributes the target sub-labels to all data sides corresponding to the target labels.
10. A longitudinal federal learning modeling apparatus, for use with a business party, comprising:
the label acquisition module is used for acquiring a target label required by longitudinal federal learning modeling;
the label decomposing module is used for decomposing the target label to obtain a target sub-label;
the label distribution module is used for distributing the target sub-label to each data party corresponding to the target label so that each data party carries out longitudinal federal learning modeling based on the distributed target sub-label and local data of the data party;
Wherein, the business side stores a label, and the data side does not store the label;
the tag decomposition module is specifically configured to: decomposing the target label to obtain a sub-label; sequencing the sub-labels according to the arrangement mode of the normalized amplitude descending order to obtain sequencing sub-labels; determining a sub-tag number value that causes the target sub-tag to be similar to the target tag based on an energy loss calculation method; and selecting the sub-label with the sub-label quantity value as the target sub-label from the sorting sub-labels.
11. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the longitudinal federal learning modeling method according to any of claims 1 to 9 when executing the computer program.
12. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the steps of the longitudinal federal learning modeling method according to any of claims 1 to 9.
CN202110417898.2A 2021-04-19 2021-04-19 Longitudinal federal learning modeling method, device, equipment and computer medium Active CN113723621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110417898.2A CN113723621B (en) 2021-04-19 2021-04-19 Longitudinal federal learning modeling method, device, equipment and computer medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110417898.2A CN113723621B (en) 2021-04-19 2021-04-19 Longitudinal federal learning modeling method, device, equipment and computer medium

Publications (2)

Publication Number Publication Date
CN113723621A CN113723621A (en) 2021-11-30
CN113723621B true CN113723621B (en) 2024-02-06

Family

ID=78672617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110417898.2A Active CN113723621B (en) 2021-04-19 2021-04-19 Longitudinal federal learning modeling method, device, equipment and computer medium

Country Status (1)

Country Link
CN (1) CN113723621B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062493A (en) * 2019-12-20 2020-04-24 深圳前海微众银行股份有限公司 Longitudinal federation method, device, equipment and medium based on public data
CN111241567A (en) * 2020-01-16 2020-06-05 深圳前海微众银行股份有限公司 Longitudinal federal learning method, system and storage medium based on secret sharing
CN111428884A (en) * 2020-03-30 2020-07-17 深圳前海微众银行股份有限公司 Federal modeling method, device and readable storage medium based on forward law
CN112070240A (en) * 2020-09-07 2020-12-11 清华大学 Layered federal learning framework for efficient communication and optimization method and system thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11836583B2 (en) * 2019-09-09 2023-12-05 Huawei Cloud Computing Technologies Co., Ltd. Method, apparatus and system for secure vertical federated learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062493A (en) * 2019-12-20 2020-04-24 深圳前海微众银行股份有限公司 Longitudinal federation method, device, equipment and medium based on public data
CN111241567A (en) * 2020-01-16 2020-06-05 深圳前海微众银行股份有限公司 Longitudinal federal learning method, system and storage medium based on secret sharing
CN111428884A (en) * 2020-03-30 2020-07-17 深圳前海微众银行股份有限公司 Federal modeling method, device and readable storage medium based on forward law
CN112070240A (en) * 2020-09-07 2020-12-11 清华大学 Layered federal learning framework for efficient communication and optimization method and system thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度学习模型的中毒攻击与防御综述;陈晋音;邹健飞;苏蒙蒙;张龙源;;信息安全学报(04);全文 *

Also Published As

Publication number Publication date
CN113723621A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
US9576246B2 (en) Predictive modeling and data analysis in a secure shared system
CN106844372B (en) Logistics information query method and device
CN109472524B (en) Information processing method and device
EP1789893A2 (en) Methods, systems, and apparatuses for extended enterprise commerce
US10540703B2 (en) Visualizations for aiding in co-locating products based upon associations
CN112016796B (en) Comprehensive risk score request processing method and device and electronic equipment
CN107451785B (en) Method and apparatus for outputting information
Fazel-Zarandi et al. Solving a stochastic facility location/fleet management problem with logic-based Benders' decomposition
Laalaoui et al. A binary multiple knapsack model for single machine scheduling with machine unavailability
CN110516984B (en) Method and apparatus for generating delivery path information
CN108985784A (en) Method and apparatus for storing information
WO2023216494A1 (en) Federated learning-based user service strategy determination method and apparatus
CN109934427B (en) Method and device for generating item distribution scheme
Laramee et al. Challenges and unsolved problems
CN109447674A (en) Electronic device, insurance agent target service area determine method and storage medium
CN112749323A (en) Method and device for constructing user portrait
CN114398553A (en) Object recommendation method and device, electronic equipment and storage medium
CN112085378B (en) Resource allocation method, device, computer equipment and storage medium
CN113723621B (en) Longitudinal federal learning modeling method, device, equipment and computer medium
CN110619400A (en) Method and device for generating order information
CN112017062A (en) Resource limit distribution method and device based on guest group subdivision and electronic equipment
CN113379177A (en) Task scheduling system and method
CN110119784A (en) A kind of order recommended method and device
Ogiela et al. Classification of cognitive systems dedicated to data sharing
Kulkarni et al. Optimal allocation of effort to software maintenance: A queuing theory approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: 100000 room 221, floor 2, block C, No. 18, Kechuang 11th Street, economic and Technological Development Zone, Daxing District, Beijing

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant