WO2023005133A1

WO2023005133A1 - Federated learning modeling optimization method and device, and readable storage medium and program product

Info

Publication number: WO2023005133A1
Application number: PCT/CN2021/141481
Authority: WO
Inventors: 何元钦
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2021-07-28
Filing date: 2021-12-27
Publication date: 2023-02-02
Also published as: CN113516255A

Abstract

Disclosed in the present application are a federated learning modeling optimization method and device, and a readable storage medium and a program product, which are applied to a federated server. The federated learning modeling optimization method comprises: distributing initial global feature extraction models corresponding to data modals to participant devices corresponding to the data modals, so that the participant devices obtain globally optimized local feature extraction models on the basis of local private training samples and by means of contrastive learning training, and generate target modal public sample representations according to the globally optimized local feature extraction models; receiving the target modal public sample representations sent by the participant devices, and aggregating the target modal public sample representations into target modal aggregated sample representations; and according to training samples selected from a public data set, optimizing each initial model global feature extraction model into a corresponding target global feature extraction model by means of knowledge distillation and contrastive learning.

Description

Federated learning modeling optimization method, device, readable storage medium and program product

priority information

This application claims priority to a Chinese patent application with application number 202110860096.9 filed on July 28, 2021, the entire contents of which are incorporated herein by reference.

technical field

The present application relates to the technical field of artificial intelligence in financial technology (Fintech), and in particular to a federated learning modeling optimization method, equipment, readable storage medium and program product.

Background technique

With the continuous development of financial technology, especially Internet technology finance, more and more technologies (such as distributed, artificial intelligence, etc.) The industry also has higher requirements for the distribution of to-do items.

With the continuous development of computer software, artificial intelligence, and big data cloud service applications, at present, in the existing horizontal federated learning framework, one of the prerequisites for horizontal federated learning is that the samples of each participant need to be aligned on the features themselves. The premise of alignment on the feature itself usually requires the samples of each participant to be in the same data mode, for example, the samples of each participant are all images or text, etc. If it is different, horizontal federated learning cannot be performed among the participants. Therefore, the existing horizontal federated learning can only be performed by combining samples of the same data mode in different participants, and the existing horizontal federated learning has strong limitations.

Contents of the invention

The main purpose of this application is to provide a federated learning modeling optimization method, device, readable storage medium and program product, aiming to solve the technical problem of strong limitations of horizontal federated learning in the prior art.

To achieve the above purpose, the present application provides a federated learning modeling optimization method, which is applied to a federated server, and the federated learning modeling optimization method includes:

Distributing the initial global feature extraction model corresponding to each data modality to the participant device corresponding to each data modality, so that the participant device can use the initial global feature extraction model and local Perform comparative learning and training between feature extraction models, optimize the local feature extraction model, obtain a globally optimized local feature extraction model, and based on the globally optimized local feature extraction model, perform a corresponding target model in the federal public data set Feature extraction is performed on the public samples of the target modal to obtain the representation of the public sample of the target modal;

receiving the target modality public sample representation sent by each of the participant devices, and performing selective aggregation based on the data modality for each of the target modality public sample representations to obtain the target modality corresponding to each of the data modes Aggregate sample characterization;

Acquiring public training samples corresponding to each of the data modalities in the federal public data set, and based on each of the public training samples, performing aggregated sample characterization based on each of the target modalities for each of the initial global feature extraction models The knowledge distillation learning training of each of the initial global feature extraction models is performed, and the comparative learning training is performed between each of the initial global feature extraction models to obtain a target global feature extraction model corresponding to each of the initial model global feature extraction models.

The present application also provides a federated learning modeling optimization method, the federated learning modeling optimization method is applied to participant devices, and the federated learning modeling optimization method includes:

Receive the initial global feature extraction model issued by the federation server, and extract local private training samples;

Based on the local private training samples, by performing comparative learning and training between the initial global feature extraction model and the local feature extraction model, optimizing the local feature extraction model, and obtaining a globally optimized local feature extraction model;

Extract the target modality public samples belonging to the data modality corresponding to the globally optimized local feature extraction model from the federated public data set, and based on the globally optimized local feature extraction model, extract the target modality public samples Perform feature extraction to obtain the public sample representation of the target mode;

Sending the public sample representation of the target modality to the federated server, so that the federated server can selectively aggregate the public sample representations of each target modality based on the data modality to obtain the corresponding Aggregating sample representations of target modalities, and obtaining public training samples corresponding to each of the data modalities in the federal public data set, and based on each of the public training samples, each of the initial global feature extraction models is based on each Knowledge distillation learning training of the target modality aggregation sample representation, and comparative learning training between each of the initial global feature extraction models, to obtain a target global feature extraction model corresponding to each of the initial model global feature extraction models.

The present application also provides a federated learning modeling optimization device, the federated learning modeling optimization device is a virtual device, and the federated learning modeling optimization device is applied to a federated server, and the federated learning modeling optimization device includes:

The model distribution module is used to distribute the initial global feature extraction model corresponding to each data modality to the participant device corresponding to each said data modality, so that the participant device can use local private training samples based on the initial Carry out comparative learning and training between the global feature extraction model and the local feature extraction model, optimize the local feature extraction model, obtain a globally optimized local feature extraction model, and based on the globally optimized local feature extraction model, perform the federated public The corresponding target modal public samples in the data set are subjected to feature extraction to obtain the representation of the target modal public samples;

A selective aggregation module, configured to receive the target modal public sample representations sent by each of the participant devices, and perform selective aggregation based on data modalities for each of the target modal public sample representations to obtain each of the data Aggregated sample representation of the target mode corresponding to the mode;

The training module is used to obtain public training samples corresponding to each of the data modalities in the federal public data set, and based on each of the public training samples, perform each of the initial global feature extraction models based on each of the targets Knowledge distillation learning training of modal aggregation sample representation, and comparative learning training between each of the initial global feature extraction models, to obtain a target global feature extraction model corresponding to each of the initial model global feature extraction models.

The present application also provides a federated learning modeling optimization device, the federated learning modeling optimization device is a virtual device, and the federated learning modeling optimization device is applied to participant equipment, and the federated learning modeling optimization device includes:

The receiving module is used to receive the initial global feature extraction model issued by the federation server, and extract local private training samples;

A contrastive learning and training module, configured to optimize the local feature extraction model by performing comparative learning and training between the initial global feature extraction model and the local feature extraction model based on the local private training samples, and obtain a globally optimized local Feature extraction model;

The feature extraction module is used to extract the target mode public samples belonging to the data mode corresponding to the globally optimized local feature extraction model in the federal public data set, and based on the globally optimized local feature extraction model, for all Feature extraction is performed on the public samples of the target modal to obtain the representation of the public samples of the target modal;

A sending module, configured to send the public sample representation of the target modality to a federated server, so that the federated server can selectively aggregate the public sample representations of each target modality based on data modality to obtain the Aggregating sample representations of target modalities corresponding to the data modalities, and obtaining public training samples corresponding to each of the data modalities in the federal public data set, based on each of the public training samples, each of the initial global features The extraction model performs knowledge distillation learning training based on each of the target modal aggregation sample representations, and performs comparative learning training between each of the initial global feature extraction models to obtain the target global feature corresponding to each of the initial model global feature extraction models. Feature extraction model.

The present application also provides a federated learning modeling optimization device, the federated learning modeling optimization device is a physical device, and the federated learning modeling optimization device includes: a memory, a processor, and stored in the memory and can be used in the The program of the federated learning modeling and optimization method run on the processor, and when the program of the federated learning modeling and optimization method is executed by the processor, the steps of the above federated learning modeling and optimization method can be realized.

The present application also provides a readable storage medium, on which a program for realizing the federated learning modeling optimization method is stored. When the program of the federated learning modeling and optimization method is executed by a processor, the above-mentioned federation Learn the steps of modeling optimization methods.

The present application also provides a computer program product, including a computer program. When the computer program is executed by a processor, the steps of the above-mentioned federated learning modeling optimization method are implemented.

This application provides a federated learning modeling optimization method, equipment, readable storage medium, and program product, compared with the technical means of aligning the characteristics of each participant in the prior art to perform horizontal federated learning , the present application first distributes the initial global feature extraction model corresponding to each data modality to the participant device corresponding to each data modality, so that the participant device can use the initial global feature extraction model based on the local private training sample Contrast learning and training between the extraction model and the local feature extraction model, optimize the local feature extraction model, so that the local feature extraction model can learn the model knowledge of the global model, and then obtain the globally optimized local feature extraction model, and then based on the The local feature extraction model after the global optimization is described, and the feature extraction is performed on the corresponding target modal public samples in the federation public data set, and the target modal public sample representation is obtained, and the target modal public sample representation sent by each participant device is received, And carry out selective aggregation based on the data modality for the public sample representations of each of the target modalities, and obtain the aggregated sample representations of the target modalities corresponding to each of the data modalities, so that the target modes belonging to the same data modality can be realized. The purpose of aggregating the modal public sample representations respectively, and then obtain the public training samples corresponding to each of the data modalities in the federated public data set, and based on each of the public training samples, respectively extract each of the initial global feature models Carry out knowledge distillation learning training based on each of the target modal aggregate sample representations, and perform comparative learning training between each of the initial global feature extraction models to obtain the target global feature extraction corresponding to each of the initial model global feature extraction models model, and since the public sample representation of each target modality is based on the local private training samples of the participant equipment to optimize the output of the globally optimized local feature extraction model, so that each initial global feature extraction model can pass the knowledge Distillation indirectly combines the samples of multiple participants corresponding to the data modality for horizontal federated learning, and at the same time, the initial global feature extraction model corresponding to each data modality can use comparative learning to align samples of different data modality in the feature space, Furthermore, the purpose of indirectly combining samples of different data modalities for horizontal federated learning is achieved, so that horizontal federated learning is no longer limited to samples of the same data modality of different participants, and overcomes the existing horizontal federated learning in the prior art. Federated learning can only be carried out by combining samples of the same data mode in different parties, which leads to the technical defect of the existing horizontal federated learning with strong limitations. Therefore, the limitation of horizontal federated learning is reduced.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, for those of ordinary skill in the art, In other words, other drawings can also be obtained from these drawings without paying creative labor.

Fig. 1 is a schematic flow chart of the first embodiment of the federated learning modeling optimization method of the present application;

FIG. 2 is a schematic flow diagram of the second embodiment of the federated learning modeling optimization method of the present application;

Fig. 3 is a schematic diagram of the interaction process when performing horizontal federated learning modeling in the federated learning modeling optimization method of the present application;

FIG. 4 is a schematic diagram of the device structure of the hardware operating environment involved in the federated learning modeling optimization method in the embodiment of the present application.

The realization of the purpose, functions and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

The embodiment of the present application provides a federated learning modeling optimization method, which is applied to a federated server. In the first embodiment of the federated learning modeling optimization method of the present application, referring to FIG. 1, the federated learning modeling optimization method includes:

Step S10, distributing the initial global feature extraction model corresponding to each data modality to the participant device corresponding to each said data modality, so that the participant device can use the initial global feature extraction model based on the local private training sample Carry out comparative learning training between the model and the local feature extraction model, optimize the local feature extraction model, obtain a globally optimized local feature extraction model, and based on the globally optimized local feature extraction model, corresponding to the federal public data set Feature extraction is performed on the target modal public samples to obtain the target modal public sample representation;

In this embodiment, it should be noted that the federated learning modeling and optimization method is applied to horizontal federated learning, wherein the framework of horizontal federated learning includes a federated server and multiple participant devices, wherein the federated server maintains a global The participant devices maintain their own local models, and each participant device corresponds to multiple data modalities, where a data model corresponds to at least one participant device. For example, assuming that there are data modalities A and B, the data Mode A corresponds to participant device A and participant device b, that is, participant a and participant device b have samples belonging to data mode A, and data mode B corresponds to participant device c, that is, participant device c has Samples belonging to data modality B.

In addition, it should be noted that the participant device has its own local private training sample, wherein the local private training sample may have a corresponding local sample label, and the data mode of the local private training sample is the participant device For the corresponding data mode, each participant device and the federation server have the same federated public data set, and the federated public data set has public training samples belonging to the data mode corresponding to each of the participant devices, and the federated server is aimed at Each data mode corresponding to each participant device maintains a corresponding initial global feature extraction model, and the participant device maintains a local feature extraction model for its corresponding data mode.

Distributing the initial global feature extraction model corresponding to each data modality to the participant device corresponding to each data modality, so that the participant device can use the initial global feature extraction model and local Perform comparative learning and training between feature extraction models, optimize the local feature extraction model, obtain a globally optimized local feature extraction model, and based on the globally optimized local feature extraction model, perform a corresponding target model in the federal public data set Feature extraction is performed on the public samples of the target modal to obtain the representation of the public sample of the target modal. Specifically, the initial global feature extraction model corresponding to each data modal is distributed to the participant device corresponding to the corresponding data modal, and then the participant device acquires locally held local private training samples and local feature extraction models, by performing comparative learning and training between the initial global feature extraction model and the local feature extraction model, prompting the local feature extraction model to learn the model of the initial global feature extraction model knowledge, optimize the local feature extraction model, and obtain a globally optimized local feature extraction model, and then the participant device extracts the target modal public samples belonging to the data modality corresponding to the participant device from the federal public data set, and uses the The local feature extraction model after the global optimization is used to perform feature extraction on the target modal public samples, and map the target modal public samples to target modal public sample representations, wherein the participant equipment is globally optimized The specific implementation process of the local feature extraction model and obtaining the public sample representation of the target modality can refer to the specific content in step A10 to step A30 and its refinement steps, and will not be repeated here.

Step S20, receiving the target modality public sample representation sent by each of the participant devices, and performing selective aggregation based on the data modality for each of the target modality public sample representations to obtain the corresponding Target modal aggregation sample characterization;

In this embodiment, the public sample representation of the target modality sent by each participant device is received, and the selective aggregation based on the data modality is performed on each of the public sample representations of the target modality to obtain each of the data modality The aggregated sample representations of the target modalities corresponding to the corresponding modalities, specifically, receiving the public sample representations of the target modalities sent by each of the participant devices, and then combining the public sample representations of the target modalities corresponding to the same data modal in the public sample representations of each target modal The representations are aggregated separately to obtain the aggregated sample representations of target modalities corresponding to each of the data modalities. For example, assuming that there are 3 target modal public sample representations corresponding to data modalities A of X1, X2 and X3, there are 3 target modalities The public sample representations X4, X5 and X6 correspond to the data mode B, and then X1, X2 and X3 are selected for aggregation to obtain the target mode aggregation sample representation Z1 corresponding to the data mode A, and X4, X5 and X6 are selected for aggregation. Obtain the target modality aggregation sample representation Z2 corresponding to the data modality B.

Wherein, the step of performing selective aggregation based on the data modality for the public sample representation of each of the target modalities, and obtaining the aggregated sample representation of the target modality corresponding to each of the data modalities includes:

Step S21, based on the corresponding relationship between each of the participant devices and each of the data modalities, determine the respective sample representations to be aggregated corresponding to each of the data modalities in the public sample representations of each of the target modalities;

In this embodiment, it should be noted that one data modality corresponds to at least one sample representation to be aggregated, and since the public sample representation of the target modality is sent to the federation server by the participant device, the target The modality public sample representation has a one-to-one correspondence with the participant devices, and each participant device has a data modality, and the data modality of each participant device may be the same or different.

Based on the corresponding relationship between each of the participant devices and each of the data modalities, determine the respective sample representations to be aggregated corresponding to each of the data modalities in the public sample representations of each of the target modalities, specifically, Based on the corresponding relationship between each of the participant devices and each of the data modalities, determine the sample representations to be aggregated corresponding to each data modality in the target modal public sample representations of each of the participant devices, for example , assuming that there are participant devices a1 and a2, corresponding to data modality A, and participant devices b1 and b2, corresponding to data modality B, then the target modality public sample representations sent by participant devices a1 and a2 to the federated server are both The sample representations to be aggregated corresponding to data modality A, and the public sample representations of the target modality sent by participant devices b1 and b2 to the federation server are all sample representations to be aggregated corresponding to data modality B.

In step S22, aggregate the sample representations to be aggregated corresponding to the respective data modalities to obtain aggregated sample representations of target modalities corresponding to the respective data modalities.

In this embodiment, the respective representations of samples to be aggregated corresponding to each of the data modalities are aggregated separately to obtain the target modal aggregation sample representations corresponding to each of the data modalities. Specifically, based on preset aggregation rules, Aggregate the representations of samples to be aggregated corresponding to each data modality to obtain the target modality aggregation sample representations corresponding to each data modality, wherein the preset aggregation rules include summation and averaging, etc., for example, assuming Data modality A corresponds to 2 sample representations to be aggregated as a1 and a2, and data modality B corresponds to 2 sample representations to be aggregated as b1 and b2, then a1 and a2 are aggregated into the target mode corresponding to data modality A state aggregation sample representation, and aggregate b1 and b2 into the target mode aggregation sample representation corresponding to data mode B.

Step S30, obtaining public training samples corresponding to each of the data modalities in the federated public data set, and based on each of the public training samples, performing an initial global feature extraction model based on each of the target modalities Aggregating sample representation knowledge distillation learning training, and performing comparative learning training between each of the initial global feature extraction models, to obtain a target global feature extraction model corresponding to each of the initial model global feature extraction models.

In this embodiment, it should be noted that the target modal aggregation sample characterization is used to guide the optimization of the corresponding initial global feature extraction model, so that the output of the initial global feature extraction model is as close as possible to the corresponding target modal aggregation sample characterization similar.

Acquiring public training samples corresponding to each of the data modalities in the federal public data set, and based on each of the public training samples, performing aggregated sample characterization based on each of the target modalities for each of the initial global feature extraction models knowledge distillation learning training, and comparative learning training between each of the initial global feature extraction models to obtain the target global feature extraction model corresponding to each of the initial model global feature extraction models, specifically, in the federal public data Collecting public training samples corresponding to each of the data modalities, and based on each of the public training samples, performing knowledge distillation learning training based on the aggregation sample representation of each of the target modalities for each of the initial global feature extraction models, In order to transfer the model knowledge of the globally optimized local feature extraction model of each data modality in each participant's device to the initial global feature extraction model corresponding to the corresponding data modality, and between each of the initial global feature extraction models Contrastive learning and training are performed between them, so as to promote the alignment of samples of different data modalities in the feature space, and then obtain the target global feature extraction models corresponding to the global feature extraction models of each initial model.

In addition, it should be noted that the federated server can directly use the target modality public samples selected by each participant device in the federated public data set as public training samples. Mode public sample, denoted as X1, participant device B selects multiple target modal public samples with data mode a, denoted as X2, participant device C selects multiple target modal public samples with data mode b , denoted as X3, then X1 and X2 can be directly used as the public training samples of the initial global feature extraction model corresponding to the data mode a selected by the federated server, and X3 can be directly used as the initial global feature extraction model corresponding to the data mode b selected by the federated server Public training samples for feature extraction models.

In addition, it should be noted that although the target modal aggregation sample characterization is used to guide the optimization of the corresponding initial global feature extraction model, so that the output of the initial global feature extraction model is as close as possible to the corresponding target modal aggregation sample characterization, However, since the target modal aggregated sample representation is the aggregation result of target modal public sample representations corresponding to several target modal public samples, the target modal aggregated sample representation represents the corresponding data modality to a certain extent, while Instead of a single sample, samples belonging to the data modality corresponding to the initial global feature extraction model can be reselected from the federal public data set as the public training samples of the initial global feature extraction model, instead of having to place each participant's device in the federation The public samples of the target modality selected in the public data set are used as public training samples.

Wherein, based on each of the public training samples, each of the initial global feature extraction models is respectively subjected to knowledge distillation learning training based on each of the target modality aggregation sample representations, and each of the initial global feature extraction models Carry out contrastive learning training between, the step of obtaining the target global feature extraction model corresponding to each described initial model global feature extraction model comprises:

Step S31, mapping the public training samples corresponding to each of the data modalities into predicted sample representations through the corresponding initial global feature extraction model;

In this embodiment, the public training samples corresponding to each of the data modalities are respectively mapped to prediction sample representations through the corresponding initial global feature extraction models, specifically, each of the public training samples is input into each of the public training samples. The initial global feature extraction model corresponding to the data mode corresponding to the sample performs feature extraction on each of the public training samples, so as to map each of the public training samples to a preset sample representation space, and obtain each of the public training samples. Corresponding prediction sample representations, that is, mapping each of the public training samples to corresponding prediction sample representations.

Step S32, calculating the knowledge distillation loss between each of the predicted sample representations and the corresponding target modal aggregation sample representation, and calculating the comparative learning loss between each of the predicted sample representations;

In this embodiment, the knowledge distillation loss between each of the predicted sample representations and the corresponding target modal aggregation sample representation is calculated, and the comparative learning loss between each of the predicted sample representations is calculated. Specifically, based on each Calculate the knowledge distillation loss based on the similarity between the predicted sample representations and the corresponding target modal aggregation sample representations, and calculate the comparative learning loss based on the similarity between the predicted sample representations.

Wherein, the steps of calculating the knowledge distillation loss between each of the predicted sample representations and the corresponding target modal aggregation sample representations, and calculating the contrastive learning loss between each of the predicted sample representations include:

Step S321, based on the sample labels corresponding to each of the public training samples, respectively select a positive sample representation and a corresponding negative sample representation corresponding to each of the predicted sample representations in each of the predicted sample representations;

In this embodiment, it needs to be explained that when the public training samples of different modal data represent the same thing, each public training sample of different modal data representing the same thing has the same sample label, and represents different parts of the same thing. The public training samples of the modal data constitute a public training sample group, that is, samples belonging to the same public training sample group have the same sample label, and samples not belonging to the same public training sample group have different sample labels.

Based on the sample labels corresponding to each of the public training samples, the positive sample representation and the corresponding negative sample representation corresponding to each of the prediction sample representations are respectively selected in each of the prediction sample representations, specifically, based on each of the public training samples corresponding to , determine the public training sample group corresponding to each of the public training samples, and then for each public training sample:

Using other prediction sample representations corresponding to other public training samples in the public training sample group corresponding to the public training samples as positive sample representations corresponding to the prediction sample representations corresponding to the public training samples, and using the public training samples corresponding to Other prediction sample representations corresponding to other samples outside the public training sample group are used as the negative sample representations corresponding to the prediction sample representations corresponding to the public training samples, and then the positive sample representations corresponding to each of the prediction sample representations and the corresponding A negative sample representation, wherein the number of the positive sample representation and the negative sample representation is at least one.

In another embodiment, if the number of each participant device is 2, and the data modes corresponding to the two participant devices are different, each of the public training samples includes each first public training sample belonging to the first data mode. training samples and respective second public training samples belonging to the second data modality,

The step of selecting a positive sample representation and a corresponding negative sample representation corresponding to each of the prediction sample representations in each of the prediction sample representations based on the sample labels corresponding to each of the public training samples includes:

In each of the second public training samples, determine a sample that has the same sample label as the first public training sample as the first positive sample corresponding to the first public training sample, and determine the same as the first public training sample. Samples that do not have the same sample label as the corresponding first negative sample corresponding to the first public training sample; characterizing the predicted sample corresponding to the first positive sample as the predicted sample corresponding to the first public training sample A positive sample representation of the representation, and a negative sample representation using the prediction sample representation corresponding to the first negative sample as the prediction sample representation corresponding to the first public training sample.

Similarly, in each of the first public training samples, determine a sample having the same sample label as the second public training sample as the second positive sample corresponding to the second public training sample, and determine the same as the second public training sample. The second public training sample does not have the same sample label as the corresponding second negative sample corresponding to the second public training sample; the predicted sample representation corresponding to the second positive sample is used as the second public training sample corresponding to The positive sample representation of the predicted sample representation of the predicted sample representation, and the negative sample representation of the predicted sample representation corresponding to the second negative sample as the predicted sample representation corresponding to the second public training sample.

Step S322, calculating the contrastive learning loss corresponding to each of the initial global feature extraction models based on each of the predicted sample representations, the positive sample representation corresponding to each of the predicted sample representations, and the corresponding negative sample representation;

In this embodiment, based on each of the predicted sample representations, the positive sample representations corresponding to each of the predicted sample representations, and the corresponding negative sample representations, the comparative learning loss corresponding to each of the initial global feature extraction models is calculated, specifically, For each initial global feature extraction model the following steps are performed:

Based on the similarity between each predicted sample representation output by the initial global feature extraction model and the corresponding positive sample representation, and the similarity between each predicted sample representation output by the initial global feature extraction model and the corresponding negative sample representation Degree, calculate the contrastive learning loss corresponding to the initial global feature extraction model, and then obtain the contrastive learning loss corresponding to each of the initial global feature extraction models, wherein, in an implementable manner, the calculation of the contrastive learning loss It is calculated as follows:

Among them, L _N is the contrastive learning loss, N-1 is the number of negative sample representations, f(x) ^T is the predicted sample representation, f(x ⁺ ) is the positive sample corresponding to the predicted sample representation characterization,

The j-th negative sample representation corresponding to the predicted sample representation.

Step S323, based on the similarity between each of the predicted sample representations and the corresponding target modal aggregation sample representations, respectively calculate the knowledge distillation loss corresponding to each of the initial global feature extraction models.

In this embodiment, based on the similarity between each of the predicted sample representations and the corresponding target modal aggregation sample representations, the knowledge distillation losses corresponding to each of the initial global feature extraction models are calculated, specifically, based on each The similarity between the predicted sample representation and the corresponding target modal aggregation sample representation is calculated respectively. The cross entropy between each of the predicted sample representations and the corresponding target modal aggregation sample representation is obtained to obtain each of the The knowledge distillation loss corresponding to the initial global feature extraction model.

In another embodiment, based on the similarity between each of the predicted sample representations and the corresponding target modal aggregation sample representations, the knowledge distillation loss corresponding to each of the initial global feature extraction models is calculated through the L2 loss function .

Step S33, based on the knowledge distillation loss corresponding to each of the initial global feature extraction models and the corresponding comparative learning loss, optimize each of the initial global feature extraction models to obtain each target global feature extraction model.

In this embodiment, it should be noted that if the initial global feature extraction model is optimized based on the contrastive learning loss, each initial global feature extraction model will The feature extraction model will output sample representations whose similarity is greater than the first preset similarity threshold, and each initial global feature extraction model will output sample representations as similar as possible, and will prompt each initial global feature extraction model to For the public training samples of the data modality, each initial global feature extraction model will output sample representations whose similarity is less than the second preset similarity threshold, and each initial global feature extraction model will output sample representations that are as dissimilar as possible. Therefore, based on the comparison Learning loss to optimize the initial global feature extraction model can shorten the distance between the predicted sample representation output by the initial global feature extraction model and its corresponding positive sample representation, and distance the predicted sample representation output by the initial global feature extraction model from its corresponding negative sample The distance between the tokens, wherein the first preset similarity threshold is greater than the second preset similarity threshold.

Furthermore, since the knowledge distillation loss is cross-entropy loss or L2 loss, it will prompt the predicted sample representation output by the initial global feature extraction model to be as close as possible to the target modal aggregation sample representation corresponding to the initial global feature extraction model, so that the initial The similarity between the predicted sample representation output by the global feature extraction model and the target modal aggregation sample representation corresponding to the initial global feature extraction model is greater than the preset third similarity threshold, which can then promote the initial global feature extraction model to learn the corresponding data The purpose of the model knowledge of the globally optimized local feature extraction model in each participant's device corresponding to the modality.

Based on the knowledge distillation loss corresponding to each of the initial global feature extraction models and the corresponding comparative learning loss, each of the initial global feature extraction models is optimized to obtain each target global feature extraction model, specifically, each of the initial global feature The knowledge distillation loss corresponding to the extraction model and the corresponding comparative learning loss are aggregated to obtain the total loss of the global model corresponding to each of the initial global feature extraction models, and then judge whether each of the initial global feature extraction models meets the preset end of training If the condition is satisfied, each of the initial global feature extraction models is used as the corresponding target global feature extraction model, if not satisfied, the corresponding initial model global feature extraction models are updated respectively based on the model gradient calculated by the total loss of each model, And return to the execution step: distribute the initial global feature extraction model corresponding to each data modality to the participant equipment corresponding to each said data modality, wherein the preset training end conditions include the convergence of the total loss of each model and the convergence of each model The number of iterations of the initial global feature extraction model has reached the preset threshold of iteration times, etc., and then realized the global model corresponding to each data mode by combining data belonging to the same data mode through knowledge distillation based on the federal public data set , and at the same time align the sample representations generated by the global model for samples of different data modalities in the feature space by means of comparative learning, and then realize the purpose of horizontal federated learning on samples of different data modalities, and then achieve The leap from horizontal federated learning based on samples of the same data mode to horizontal federated learning based on samples of multiple data modes solves the problem of data islands with participant devices between different data modes, and further improves user-friendliness. The richness of samples for horizontal federated learning improves the effect of horizontal federated learning and makes the prediction accuracy of the model built by horizontal federated learning higher.

Further, after step S30, it also includes:

According to the data modality corresponding to each target global feature extraction model, each of the target global feature extraction models is sent to the participant device having the corresponding data modality of each of the target global feature extraction models, and then the participant device Based on the local private training samples, by performing comparative learning and training between the target global feature extraction model and the local feature extraction model, and optimizing the local feature extraction model, the target local feature extraction model can be obtained, wherein the participant Based on the local private training samples, the device performs comparative learning and training between the target global feature extraction model and the local feature extraction model, and optimizes the local feature extraction model to obtain the target local feature extraction model. For the specific implementation process, please refer to The specific content of step A10 to step A20 and the detailed steps thereof will not be repeated here.

The embodiment of the present application provides a federated learning modeling optimization method. Compared with the technical means of aligning the features of each participant in the prior art to perform horizontal federated learning, the embodiment of the present application firstly integrates each The initial global feature extraction model corresponding to the data modality is distributed to each participant device corresponding to the data modality, so that the participant device can use the initial global feature extraction model and the local feature extraction model based on the local private training sample. Perform comparative learning and training between models, optimize the local feature extraction model, so that the local feature extraction model can learn the model knowledge of the global model, and then obtain a globally optimized local feature extraction model, and then based on the globally optimized local feature extraction model The feature extraction model extracts features from the corresponding target modal public samples in the federal public data set to obtain target modal public sample representations, receives the target modal public sample representations sent by each participant device, and The target modal public sample representation is selectively aggregated based on the data modality, and the target modality aggregation sample representation corresponding to each data modality is obtained, so that the target modal public sample representations belonging to the same data modality can be separately aggregated. The purpose of aggregation, and then obtain the public training samples corresponding to each of the data modalities in the federal public data set, and based on each of the public training samples, each of the initial global feature extraction models is based on each of the objectives Knowledge distillation learning training of modal aggregation sample representation, and comparative learning training between each of the initial global feature extraction models, to obtain the target global feature extraction model corresponding to each of the initial model global feature extraction models, and because each The target modal public sample representations are all output from the globally optimized local feature extraction model based on the optimization of the local private training samples of the participant’s equipment, so that each initial global feature extraction model can be indirectly combined with the corresponding data model through knowledge distillation. The samples of multiple participants in different modalities can be federated horizontally, and at the same time, the initial global feature extraction model corresponding to each data modal can use contrastive learning to align samples of different data modalities in the feature space, thus realizing the indirect joint different The purpose of horizontal federated learning on samples of data modalities is to make horizontal federated learning no longer limited to samples of the same data modality of different parties, and to overcome the problem in the prior art that the existing horizontal federated learning can only combine different The samples of the same data mode in the participants are performed, which leads to the technical defect of the existing horizontal federated learning with strong limitations, so the limitations of the horizontal federated learning are reduced.

Further, referring to FIG. 2 , in another embodiment of the present application, the federated learning modeling optimization method is applied to participant devices, and the federated learning modeling optimization method includes:

Step A10, receiving the initial global feature extraction model issued by the federation server, and extracting local private training samples;

In this embodiment, it should be noted that the initial global feature extraction model is an unoptimized target global feature extraction model in the federation server, and the local private training sample is the training sample data privately owned by the participant device, which is The private data of the participant's device.

Step A20, based on the local private training samples, by performing comparative learning and training between the initial global feature extraction model and the local feature extraction model, optimizing the local feature extraction model, and obtaining a globally optimized local feature extraction model;

In this embodiment, it should be noted that the local feature extraction model is a feature extraction model locally maintained by the participant device, and the number of local private training samples is at least 1.

Based on the local private training samples, by performing comparative learning and training between the initial global feature extraction model and the local feature extraction model, optimizing the local feature extraction model to obtain a globally optimized local feature extraction model, specifically, Use the initial global feature extraction model to perform feature extraction on all local private training samples to obtain the representation of each first sample, and use the local feature extraction model to perform feature extraction on all local private training samples to obtain the representation of each second sample, and then based on each A comparison learning loss is calculated based on the similarity between the first sample representation and each second sample representation, and then based on the comparison learning loss, the local feature extraction model is optimized to obtain a globally optimized local feature extraction model.

Wherein, based on the local private training samples, by performing comparative learning and training between the initial global feature extraction model and the local feature extraction model, optimizing the local feature extraction model, and obtaining a globally optimized local feature extraction model The steps include:

Step A21, mapping all the local private training samples to a first sample representation through the initial global feature extraction model, and mapping all the local private training samples to a second sample representation through the local feature extraction model;

In this embodiment, feature extraction is performed on all local private training samples through the initial global feature extraction model, and all local private training samples are respectively mapped to corresponding first sample representations, and the local feature extraction model is used to All local private training samples are subjected to feature extraction, and all local private training samples are respectively mapped to corresponding second sample representations.

Step A22, calculating a contrastive learning loss based on the similarity between each of the first sample representations and each of the second sample representations;

In this embodiment, based on the similarity between each of the first sample representations and each of the second sample representations, the contrastive learning loss is calculated. Specifically, the following steps are performed for each second sample representation:

Determining a target sample representation corresponding to the same local private training sample as the second sample representation in each of the first sample representations, and then using the target sample representation as the local positive sample representation corresponding to the second sample representation, Furthermore, other first sample representations except the target sample representation are used as local negative sample representations corresponding to the second sample representation, and then based on the local positive sample representations corresponding to each second sample representation and the corresponding local negative sample representations. Sample representation, computing contrastive learning loss.

Wherein, the step of calculating the contrastive learning loss based on the similarity between each of the first sample representations and each of the second sample representations includes:

Step A221, taking the sample representation corresponding to the same local private training sample as the second sample representation in each of the first sample representations as the local positive sample representation corresponding to the second sample representation;

Step A222, taking the sample representations of the second sample representations that do not correspond to the same local private training sample as the first sample representations as the local negative sample representations corresponding to the first sample representations;

Step A223, based on the similarity between each of the second sample representations and the local positive sample representations corresponding to each of the second sample representations, and the local positive sample representations corresponding to each of the second sample representations and each of the second sample representations The similarity between negative sample representations is used to calculate the contrastive learning loss.

In this embodiment, based on the similarity between each second sample representation and the corresponding local positive sample representation, and the similarity between each of the second sample representations and the corresponding local negative sample representation, Calculating a single-sample comparative learning loss corresponding to each of the second sample representations, and then accumulating the individual sample comparative learning losses to obtain the comparative learning loss, wherein the specific formula for calculating the comparative learning loss is as follows:

Among them, L _N is the contrastive learning loss, N-1 is the number of local negative sample representations, f(x) ^T is the second sample representation, f(x ⁺ ) is the corresponding The local positive sample representation of ,

A j-th local negative sample representation corresponding to the second sample representation.

Step A23: Optimizing the local feature extraction model based on the comparative learning loss to obtain a globally optimized local feature extraction model.

In this embodiment, based on the contrastive learning loss, the local feature extraction model is optimized to obtain a globally optimized local feature extraction model. Specifically, it is judged whether the contrastive learning loss converges, and if it converges, the The local feature extraction model is used as a globally optimized local feature extraction model. If it does not converge, update the local feature extraction model based on the model gradient calculated by the comparative learning loss, and return to the execution step: extract local private training samples, realize In order to promote the local feature extraction model to learn the model knowledge of the initial global feature extraction model, the output of the local feature extraction model and the initial global feature extraction model for the same sample is as similar as possible, and the global optimization of the corresponding local feature extraction is realized.

Wherein, the step of optimizing the local feature extraction model based on the comparative learning loss to obtain a globally optimized local feature extraction model includes:

Step A231, converting each of the second sample representations into output classification labels corresponding to each of the local private training samples through a preset classification model;

In this embodiment, it should be noted that the local private training sample has a corresponding preset real label, wherein the preset real label is the identifier of the local private training sample, which can be used to represent the local private convergent sample information such as categories, attributes, and identities.

Through the preset classification model, each of the second sample representations is converted into an output classification label corresponding to each of the local private training samples, specifically, each of the second sample representations is input into the preset classification model, and each The second sample representation is fully connected to obtain a fully connected vector corresponding to each of the second sample representations, and then based on a preset activation function, each of the fully connected vectors is respectively converted into each of the local private training samples. The output classification label for .

Step A232, calculating a classification loss based on each of the output classification labels and the preset real labels corresponding to each of the local private training samples;

In this embodiment, based on each of the output classification labels and the preset real labels corresponding to each of the local private training samples, the classification loss is calculated, specifically, the calculation of each of the output classification labels and the corresponding local private training samples The cross-entropy loss between the corresponding preset real labels, and then accumulate the cross-entropy losses to obtain the classification loss.

In another implementation, step A232 includes: calculating the L2 loss between each of the output classification labels and the preset real label corresponding to the corresponding local private training sample, and then accumulating the L2 losses to obtain the classification loss .

Step A233, calculating the total model loss based on the contrastive learning loss and the classification loss;

In this embodiment, based on a preset aggregation rule, the comparison learning loss and the classification loss are aggregated to obtain a total model loss, wherein the preset aggregation rule includes summing and averaging.

Step A234: Optimizing the local feature extraction model based on the total model loss to obtain the globally optimized local feature extraction model.

In this embodiment, specifically, it is judged whether the total loss of the model is converged, and if it is converged, the local feature extraction model is used as a globally optimized local feature extraction model, and if it is not converged, the model is based on the total loss Calculate the model gradient, update the local feature extraction model, and return to the execution step: extract local private training samples, realize the purpose of promoting the local feature extraction model to learn the model knowledge of the initial global feature extraction model, and make the local feature extraction model and the initial The output of the global feature extraction model for the same sample is as close as possible, which realizes the global optimization of the corresponding local feature extraction, and also makes the output of the local feature extraction model as close as possible to the preset real label, which improves the performance of the local feature extraction model. Accuracy.

Step A30, extracting target modal public samples belonging to the data modal corresponding to the globally optimized local feature extraction model from the federal public data set, and based on the globally optimized local feature extraction model, extracting the target modal Feature extraction is performed on the public samples of the target modal to obtain the representation of the public sample of the target modal;

In this embodiment, specifically, the target modal public samples belonging to the data modality corresponding to the globally optimized local feature extraction model are extracted from the federal public data set, that is, the target modal public samples belonging to the participating model are extracted from the federal public data set. The target modal public samples of the data modal corresponding to the square equipment, and then use the globally optimized local feature extraction model to perform feature extraction on all target modal public samples, and obtain the target modal public samples of all target modal public samples characterization.

Step A40, sending the public sample representation of the target modality to the federated server, so that the federated server can selectively aggregate the public sample representations of each target modality based on a data modality to obtain each of the data Aggregating sample representations of the target modalities corresponding to the modalities, and obtaining public training samples corresponding to each of the data modalities in the federal public data set, and based on each of the public training samples, respectively extracting the initial global features of each of the models Carry out knowledge distillation learning training based on each of the target modal aggregate sample representations, and perform comparative learning training between each of the initial global feature extraction models to obtain the target global feature extraction corresponding to each of the initial model global feature extraction models Model.

In this embodiment, it should be noted that the federated server base will perform data modality-based selective aggregation on the public sample representations of each of the target modalities to obtain the target modality aggregation corresponding to each of the data modalities Sample characterization, and obtain public training samples corresponding to each of the data modalities in the federal public data set, and based on each of the public training samples, perform each of the initial global feature extraction models based on each of the target modalities For the knowledge distillation learning training of aggregation sample representation, and the comparative learning training between each of the initial global feature extraction models, the specific implementation process of obtaining the target global feature extraction model corresponding to each of the initial model global feature extraction models can refer to the steps The specific content in steps S10 to S30 will not be repeated here.

Further, receiving the target global feature extraction model issued by the federated server, and extracting local private training samples, and then based on the local private training samples, by comparing and learning between the target global feature extraction model and the local feature extraction model Training, optimizing the local feature extraction model, so that the local feature extraction model can learn the model knowledge of the target local feature extraction model, and then obtain the target local feature extraction model, wherein, the specific implementation process and obtaining of the target local feature extraction model are obtained The specific implementation process of the globally optimized local feature extraction model is the same, for details, please refer to the specific content in step A10 to step A20, which will not be repeated here.

Further, as shown in Figure 3, it is a schematic diagram of the interaction process when performing horizontal federated learning modeling in the federated learning optimization method of the present application, where server is the federated server, Client is the device of the participant, N is the number of devices of the participant, and base ₁ and head ₁ are the local feature extraction models that make up Client ₁ , base _N and head _N are the local feature extraction models that make up Client _N , classfier is the preset classification model, base _ga and Head _ga are the components that make up Client ₁ The initial global feature extraction model of is Model _ga , base _gb and Head _gb are the initial global feature extraction models in Client _N , that is, Model _gb , Y ₁ and Y _N are preset real labels, and X ₁ is The local private training sample in Client ₁ , X _N is the local private training sample in Client _N , X _pub.a is the target modal public sample in Client ₁ , and X _pub.b is the target modal public sample in Client _N , Z _agg.a is the prediction sample representation corresponding to the first public training sample, Z _agg.b is the prediction sample representation corresponding to the second public training sample, cross-entropy loss is the cross entropy loss, Contrastive loss In order to compare the learning loss, the federated server directly uses the target mode public samples selected by each participant device in the federated public data set as public training samples.

The embodiment of the present application provides a federated learning modeling optimization method, that is, firstly receive the initial global feature extraction model issued by the federated server, and extract local private training samples, and then based on the local private training samples, through the Perform comparative learning and training between the initial global feature extraction model and the local feature extraction model, optimize the local feature extraction model, obtain a globally optimized local feature extraction model, and realize the global The purpose of the model knowledge of the model is to realize the global optimization of the local feature extraction model, and then extract the target mode public samples belonging to the data mode corresponding to the global optimized local feature extraction model in the federal public data set, and based on The local feature extraction model after the global optimization performs feature extraction on the target modality public samples to obtain target modality public sample representations, and then sends the target modality public sample representations to the federated server for the The federated server base will perform selective aggregation based on the data modality for each of the target modality public sample representations, obtain the target modality aggregated sample representations corresponding to each of the data modes, and obtain each The public training samples corresponding to the data modality, based on each of the public training samples, perform knowledge distillation learning training based on the representation of each of the target modality aggregation samples for each of the initial global feature extraction models, and in each of the The initial global feature extraction models are compared and trained to obtain the target global feature extraction models corresponding to each of the initial model global feature extraction models. However, since the public sample representation of each target modality is based on the local private The training samples are optimized to obtain the output of the globally optimized local feature extraction model, so that each initial global feature extraction model can indirectly combine the samples of multiple participants corresponding to the data mode for horizontal federated learning through knowledge distillation, and at the same time The initial global feature extraction model corresponding to each data modality can use contrastive learning to align samples of different data modality in the feature space, thereby achieving the purpose of indirectly combining samples of different data modality for horizontal federated learning, making horizontal federated Learning is no longer limited to samples of the same data modality of different participants, and overcomes the existing problems in the prior art because the existing horizontal federated learning can only be carried out jointly with samples of the same data modality in different participants. The limitations of horizontal federated learning have strong technical defects, so the limitations of horizontal federated learning are reduced.

Referring to FIG. 4 , FIG. 4 is a schematic diagram of a device structure of a hardware operating environment involved in the solution of the embodiment of the present application.

As shown in FIG. 4, the federated learning modeling optimization device may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. Wherein, the communication bus 1002 is used to realize connection and communication between the processor 1001 and the memory 1005 . The memory 1005 can be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

Optionally, the federated learning modeling optimization device may also include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may include a display screen (Display), an input sub-module such as a keyboard (Keyboard), and the optional rectangular user interface may also include a standard wired interface and a wireless interface. Optionally, the network interface may include a standard wired interface and a wireless interface (such as a WI-FI interface).

Those skilled in the art can understand that the federated learning modeling and optimization device structure shown in Figure 4 does not constitute a limitation on the federated learning modeling and optimization device, and may include more or less components than those shown in the illustration, or combine some components, or different component arrangements.

As shown in FIG. 4 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, and a federated learning modeling optimization program. The operating system is a program that manages and controls the hardware and software resources of the federated learning modeling optimization device, and supports the operation of the federated learning modeling optimization program and other software and/or programs. The network communication module is used to realize the communication between various components inside the memory 1005, and communicate with other hardware and software in the federated learning modeling optimization system.

In the federated learning modeling optimization device shown in FIG. 4 , the processor 1001 is configured to execute the federated learning modeling optimization program stored in the memory 1005 to implement the steps of the federated learning modeling optimization method described in any one of the above.

The specific implementation manners of the federated learning modeling optimization device of the present application are basically the same as the embodiments of the above federated learning modeling optimization method, and will not be repeated here.

The embodiment of the present application also provides a federated learning modeling optimization device, the federated learning modeling optimization device is applied to a federated server, and the federated learning modeling optimization device includes:

Optionally, the training module is also used for:

Mapping the public training samples corresponding to each of the data modalities into predicted sample representations through corresponding initial global feature extraction models;

calculating a knowledge distillation loss between each of the predicted sample representations and the corresponding target modal aggregation sample representation, and calculating a comparative learning loss between each of the predicted sample representations;

Based on the knowledge distillation loss corresponding to each of the initial global feature extraction models and the corresponding comparative learning loss, each of the initial global feature extraction models is optimized to obtain each target global feature extraction model.

Optionally, the training module is also used for:

Based on the sample label corresponding to each of the public training samples, respectively select a positive sample representation and a corresponding negative sample representation corresponding to each of the prediction sample representations in each of the prediction sample representations;

Based on each of the predicted sample representations and each of the predicted sample representations corresponding to the positive sample representation and the corresponding negative sample representation, calculate the comparative learning loss corresponding to each of the initial global feature extraction models;

Based on the similarity between each of the prediction sample representations and the corresponding target modal aggregation sample representations, the knowledge distillation losses corresponding to each of the initial global feature extraction models are calculated respectively.

Optionally, the selective aggregation module is also used for:

Based on the corresponding relationship between each of the participant devices and each of the data modalities, determine the respective sample representations to be aggregated corresponding to each of the data modalities in the public sample representations of each of the target modalities;

Aggregating the respective sample representations to be aggregated corresponding to the respective data modalities to obtain the aggregated sample representations of target modalities corresponding to the respective data modalities.

The specific implementation of the federated learning modeling optimization device of the present application is basically the same as the above embodiments of the federated learning modeling optimization method, and will not be repeated here.

The embodiment of the present application also provides a federated learning modeling optimization device, the federated learning modeling optimization device is applied to participant equipment, and the federated learning modeling optimization device includes:

Optionally, the contrastive learning training module is also used for:

mapping all of the local private training samples to a first sample representation by the initial global feature extraction model, and mapping all of the local private training samples to a second sample representation by the local feature extraction model;

calculating a contrastive learning loss based on the similarity between each of the first sample representations and each of the second sample representations;

Based on the comparative learning loss, the local feature extraction model is optimized to obtain a globally optimized local feature extraction model.

Optionally, the contrastive learning training module is also used for:

Using the sample representation corresponding to the same local private training sample as the second sample representation in each of the first sample representations as the local positive sample representation corresponding to the second sample representation;

Using the sample representations of the second sample representations that do not correspond to the same local private training sample as the first sample representations as the local negative sample representations corresponding to the first sample representations;

Based on the similarity between each of the second sample representations and the local positive sample representations corresponding to each of the second sample representations, and each of the second sample representations and the local negative sample representations corresponding to each of the second sample representations The similarity between, calculate the contrastive learning loss.

Optionally, the contrastive learning training module is also used for:

Converting each of the second sample representations into output classification labels corresponding to each of the local private training samples through a preset classification model;

calculating a classification loss based on each of the output classification labels and each of the preset real labels corresponding to the local private training samples;

calculating a model total loss based on the contrastive learning loss and the classification loss;

Based on the total loss of the model, the local feature extraction model is optimized to obtain the globally optimized local feature extraction model.

The embodiment of the present application provides a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs can also be executed by one or more processors to implement The steps of the federated learning modeling optimization method described in any one of the above.

The specific implementation manner of the readable storage medium of the present application is basically the same as the above embodiments of the federated learning modeling optimization method, and will not be repeated here.

The embodiment of the present application provides a computer program product, and the computer program product includes one or more computer programs, and the one or more computer programs can also be executed by one or more processors to implement The steps of the federated learning modeling optimization method described in any one of the above.

The specific implementation manners of the computer program product of the present application are basically the same as the embodiments of the above-mentioned federated learning modeling optimization method, and will not be repeated here.

The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent processing scope of the present application in the same way.

Claims

A federated learning modeling optimization method, wherein, applied to a federated server, the federated learning modeling optimization method includes:

Distributing the initial global feature extraction model corresponding to each data modality to the participant device corresponding to each data modality, so that the participant device can use the initial global feature extraction model and local Perform comparative learning and training between feature extraction models, optimize the local feature extraction model, obtain a globally optimized local feature extraction model, and based on the globally optimized local feature extraction model, perform a corresponding target model in the federal public data set Feature extraction is performed on the public samples of the target modal to obtain the representation of the public sample of the target modal;

receiving the target modality public sample representation sent by each of the participant devices, and performing selective aggregation based on the data modality for each of the target modality public sample representations to obtain the target modality corresponding to each of the data modes Aggregate sample characterization;

Acquiring public training samples corresponding to each of the data modalities in the federal public data set, and based on each of the public training samples, performing aggregated sample characterization based on each of the target modalities for each of the initial global feature extraction models The knowledge distillation learning training of each of the initial global feature extraction models is performed, and the comparative learning training is performed between each of the initial global feature extraction models to obtain a target global feature extraction model corresponding to each of the initial model global feature extraction models.
The federated learning modeling optimization method according to claim 1, wherein, based on each of the public training samples, each of the initial global feature extraction models is subjected to knowledge distillation learning based on each of the target modality aggregation sample representations training, and performing comparative learning training between each of the initial global feature extraction models, and obtaining a target global feature extraction model corresponding to each of the initial model global feature extraction models comprising:

Mapping the public training samples corresponding to each of the data modalities into predicted sample representations through corresponding initial global feature extraction models;

calculating a knowledge distillation loss between each of the predicted sample representations and the corresponding target modal aggregation sample representation, and calculating a comparative learning loss between each of the predicted sample representations;

Based on the knowledge distillation loss corresponding to each of the initial global feature extraction models and the corresponding comparative learning loss, each of the initial global feature extraction models is optimized to obtain each target global feature extraction model.
The federated learning modeling optimization method according to claim 2, wherein the calculation of the knowledge distillation loss between each of the predicted sample representations and the corresponding target modal aggregation sample representation, and the calculation of the relationship between each of the predicted sample representations The steps of contrastive learning loss include:

Based on the sample labels corresponding to each of the public training samples, respectively select a positive sample representation and a corresponding negative sample representation corresponding to each of the prediction sample representations in each of the prediction sample representations;

Based on each of the predicted sample representations and each of the predicted sample representations corresponding to the positive sample representation and the corresponding negative sample representation, calculate the comparative learning loss corresponding to each of the initial global feature extraction models;

Based on the similarity between each of the prediction sample representations and the corresponding target modal aggregation sample representations, the knowledge distillation losses corresponding to each of the initial global feature extraction models are calculated respectively.
The federated learning modeling and optimization method according to claim 1, wherein the selective aggregation based on the data modality is performed on the public sample representation of each of the target modalities to obtain the target modality corresponding to each of the data modalities The steps to aggregate sample characterization include:

Based on the corresponding relationship between each of the participant devices and each of the data modalities, determine the respective sample representations to be aggregated corresponding to each of the data modalities in the public sample representations of each of the target modalities;

Aggregating the respective sample representations to be aggregated corresponding to the respective data modalities to obtain the aggregated sample representations of target modalities corresponding to the respective data modalities.
A federated learning modeling optimization method, wherein, applied to participant equipment, the federated learning modeling optimization method includes:

Receive the initial global feature extraction model issued by the federation server, and extract local private training samples;

Based on the local private training samples, by performing contrastive learning and training between the initial global feature extraction model and the local feature extraction model, optimizing the local feature extraction model to obtain a globally optimized local feature extraction model;

Extract the target modality public samples belonging to the data modality corresponding to the globally optimized local feature extraction model from the federated public data set, and based on the globally optimized local feature extraction model, extract the target modality public samples Perform feature extraction to obtain the public sample representation of the target mode;

Sending the public sample representation of the target modality to the federated server, so that the federated server can selectively aggregate the public sample representations of each target modality based on the data modality to obtain the corresponding Aggregating sample representations of target modalities, and obtaining public training samples corresponding to each of the data modalities in the federal public data set, and based on each of the public training samples, each of the initial global feature extraction models is based on each Knowledge distillation learning training of the target modality aggregation sample representation, and comparative learning training between each of the initial global feature extraction models, to obtain a target global feature extraction model corresponding to each of the initial model global feature extraction models.
The federated learning modeling optimization method according to claim 5, wherein, based on the local private training samples, by performing comparative learning training between the initial global feature extraction model and the local feature extraction model, the local Feature extraction model, the steps of obtaining a globally optimized local feature extraction model include:

mapping all of the local private training samples to a first sample representation by the initial global feature extraction model, and mapping all of the local private training samples to a second sample representation by the local feature extraction model;

calculating a contrastive learning loss based on the similarity between each of the first sample representations and each of the second sample representations;

Based on the comparative learning loss, the local feature extraction model is optimized to obtain a globally optimized local feature extraction model.
The federated learning modeling optimization method according to claim 6, wherein the step of calculating the contrastive learning loss based on the similarity between each of the first sample representations and each of the second sample representations comprises:

Using the sample representation corresponding to the same local private training sample as the second sample representation in each of the first sample representations as the local positive sample representation corresponding to the second sample representation;

Using the sample representations of the second sample representations that do not correspond to the same local private training sample as the first sample representations as the local negative sample representations corresponding to the first sample representations;

Based on the similarity between each of the second sample representations and the local positive sample representations corresponding to each of the second sample representations, and each of the second sample representations and the local negative sample representations corresponding to each of the second sample representations The similarity between, calculate the contrastive learning loss.
The federated learning modeling optimization method according to claim 6, wherein said step of optimizing said local feature extraction model based on said contrastive learning loss to obtain a globally optimized local feature extraction model comprises:

Converting each of the second sample representations into output classification labels corresponding to each of the local private training samples through a preset classification model;

calculating a classification loss based on each of the output classification labels and each of the preset real labels corresponding to the local private training samples;

calculating a model total loss based on the contrastive learning loss and the classification loss;

Based on the total loss of the model, the local feature extraction model is optimized to obtain the globally optimized local feature extraction model.
A federated learning modeling optimization device, wherein the federated learning modeling optimization device includes: a memory, a processor, and a program stored on the memory for implementing the federated learning modeling optimization method,

The memory is used to store a program implementing a federated learning modeling optimization method;

The processor is configured to execute a program for implementing the federated learning modeling optimization method, so as to realize the steps of the federated learning modeling optimization method according to any one of claims 1 to 4 or 5 to 8.
A readable storage medium, wherein a program for implementing a federated learning modeling optimization method is stored on the readable storage medium, and the program for implementing a federated learning modeling and optimizing method is executed by a processor to implement claims 1 to 4 or the steps of the federated learning modeling optimization method described in any one of 5 to 8.
A program product, the program product is a computer program product, including a computer program, wherein, when the computer program is executed by a processor, it implements the federated learning modeling described in any one of claims 1 to 4 or 5 to 8 The steps of the optimization method.