CN116319714B

CN116319714B - Federal learning method based on modal conversion and related equipment

Info

Publication number: CN116319714B
Application number: CN202310589332.7A
Authority: CN
Inventors: 王光宇; 石剑宇; 刘晓鸿; 张平
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-07-21
Anticipated expiration: 2043-05-24
Also published as: CN116319714A

Abstract

The application provides a federal learning method based on modal conversion and related equipment; the method comprises the following steps: the client sends the characteristic data of the local mode to the server; the method comprises the steps that a server side decodes all characteristic data to obtain decoded data of a target mode, coding parameters of each client side are aggregated to obtain a global encoder, the decoded data are rebuilt into first rebuilding data of a local mode by the global encoder and are sent to the client side, the global encoder is trained by interaction of the first rebuilding data and the characteristic data, and the trained global encoder is sent to the client side; the client takes the output of the trained global encoder as the input of a downstream task model to perform supervised training, and sends the trained first model parameters to other clients; and the other clients perform semi-supervised secondary training on the downstream task model, and send the trained second model parameters to the server for aggregation to obtain the downstream task model for completing secondary training.

Description

Federal learning method based on modal conversion and related equipment

Technical Field

The embodiment of the application relates to the technical field of federal learning, in particular to a federal learning method based on modal conversion and related equipment.

Background

The main problems of the related federal learning method are: in the related federal learning process, data involving a plurality of different data modalities of different clients is required, but in the actual training process of the model, data of different modalities has difficulty in sharing data contents among the use, i.e., heterogeneous data.

Disclosure of Invention

In view of the foregoing, an object of the present application is to provide a federal learning method and related apparatus based on modality conversion.

Based on the above object, the present application provides a federal learning method based on modality conversion, which is applied to a federal network, wherein the federal network includes a server side and at least two clients;

the method comprises the following steps:

each client transmits the local characteristic data in the local mode to the server;

the server decodes the received characteristic data of each client to obtain the decoded data of the target mode;

the server side is enabled to aggregate coding parameters of each client side to obtain a global encoder, the decoding data are rebuilt into first rebuilding data of a local mode by utilizing the global encoder, the first rebuilding data are sent to each client side, the server side is enabled to train the global encoder by interacting the first rebuilding data and corresponding characteristic data with each client side, and the trained global encoder is sent to each client side;

Each client takes the output of the trained global encoder as the input of a preset downstream task model in the client, carries out first training on the downstream task model of the client, and sends first model parameters trained for the first time to other clients;

and enabling each other client to perform secondary training on the local downstream task model, sending secondary trained second model parameters to the server, and enabling the server to aggregate the second model parameters of each client to obtain the downstream task model for completing secondary training.

Further, before each client sends the local feature data in the local mode to the server, the method further includes:

enabling each client to desensitize local original data to obtain virtual data;

and carrying out characteristic representation on the virtual data by utilizing the respective encoder of each client to obtain characteristic data corresponding to the local original data of the client.

Further, reconstructing the decoded data into first reconstructed data of a local modality using the global encoder, comprising:

enabling the server side to determine respective decoders corresponding to each target mode;

Combining the global encoder with each decoder separately;

and reconstructing the corresponding decoded data in a data mode by using the corresponding decoder and the coding parameters in the global encoder to obtain first reconstructed data corresponding to the local mode.

Further, the server side trains the global encoder by interacting the first reconstruction data and the corresponding feature data with each client side, comprising:

each client calculates the distance loss between the respective first reconstruction data and the local characteristic data, and sends the distance loss to the server;

the server side aggregates the distance loss, and corrects the global encoder by utilizing the aggregated distance loss;

the server updates the first reconstruction data by using the corrected global encoder, and calculates the distance loss between the corrected first reconstruction parameter and the local characteristic data again by each client;

when the distance loss is corrected to a preset loss threshold value, the client determines that the global encoder training is completed, and issues the global encoder to each client.

Further, making each client take the output of the trained global encoder as the input of a preset downstream task model in the client, including:

The server side transmits decoding parameters of a corresponding decoder to each client side according to the local mode of the client side;

enabling each client to decode the characteristic data of the local mode of the client into decoded data of a target mode again by utilizing the received decoding parameters;

the client uses the trained global encoder to reconstruct the decoding data of the target mode to obtain second reconstruction data of the local mode;

the client is caused to take the second reconstructed data and the decoded data as inputs to the downstream task model.

Further, the first training of the downstream task model of the client comprises:

enabling a client to utilize the second reconstruction data and decode again to obtain decoding data of a target mode, and performing supervised training on a task model of the client;

and after the supervised training is completed, acquiring the current first model parameters of the downstream task model.

Further, each other client is enabled to perform secondary training on the local downstream task model, including:

enabling each other client to set a downstream task model in the client by using the received first model parameters;

The second reconstruction data in the client and the decoded data of the target mode obtained by re-decoding the client are input into a downstream task model of the client;

enabling the client to perform semi-supervised training on the downstream task model;

and after the semi-supervised training is finished, acquiring the current second model parameters of the downstream task model of the client.

Based on the same inventive concept, the application also provides a federal learning device based on modal conversion, comprising: the system comprises a first mode conversion module, a second mode conversion module, a third mode conversion module, a first task model training module and a second task model training module;

the first mode conversion module is configured to enable each client to send the local characteristic data in the local mode to the server;

the second mode conversion module is configured to enable the server side to decode the received characteristic data of each client side to obtain decoded data of a target mode;

the third mode conversion module is configured to enable the server side to aggregate coding parameters of each client side to obtain a global encoder, reconstruct the decoding data into first reconstruction data of a local mode by utilizing the global encoder, send the first reconstruction data to each client side, enable the server side to train the global encoder by interacting the first reconstruction data and corresponding characteristic data with each client side, and send the trained global encoder to each client side;

The first task model training module is configured to enable each client to take the output of the trained global encoder as the input of a preset downstream task model in the client, train the downstream task model of the client for the first time, and send first model parameters trained for the first time to other clients;

the second task model training module is configured to enable each other client to perform secondary training on the local downstream task model, send second model parameters obtained through secondary training to the server, and enable the server to aggregate the second model parameters of each client to obtain the downstream task model with the secondary training completed.

Based on the same inventive concept, the application also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the federal learning method based on the modal transformation when executing the program.

Based on the same inventive concept, the present application also provides a non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer instructions for causing the computer to perform the federal learning method based on modality conversion as described above.

As can be seen from the foregoing, the federal learning method and related device based on modality conversion provided in the present application construct a global encoder based on the local feature data of each client and the modality conversion of each feature data by the server, and further train the global encoder based on the first reconstruction data and the distance loss interacted between each client and the server, based on this, by taking the output of the global encoder as the input of the downstream task model, it can be seen that by performing supervised training on the downstream task model at each client and performing semi-supervised training on other clients and guiding the downstream task model in combination with the global encoder, so that the trained downstream task model has better generalization capability when coping with data of different modalities.

Drawings

In order to more clearly illustrate the technical solutions of the present application or related art, the drawings that are required to be used in the description of the embodiments or related art will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.

FIG. 1 is a flow chart of a federal learning method based on modality conversion according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a federal learning device based on modal transformation according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings.

It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present application should be given a general meaning as understood by one of ordinary skill in the art to which the present application belongs. The terms "first," "second," and the like, as used in the embodiments of the present application, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

As described in the background section, it is also difficult for the related federal learning method based on modality conversion to meet the requirement of data interaction in actual production.

The applicant has found in the course of implementing the present application that the main problems associated with the federal learning approach are: in the related federal learning process, data involving a plurality of different data modalities of different clients is required, but in the actual training process of the model, data of different modalities has difficulty in sharing data contents among the use, i.e., heterogeneous data.

Based on this, one or more embodiments in the present application provide a federal learning method based on modality conversion, based on which a task model of a downstream task is trained based on modality conversion of different modality data held by different clients.

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a federal learning method based on modality conversion according to an embodiment of the present application is specifically applied to a federal network, where the federal network includes a server side and at least two clients.

In this embodiment, each client communicates with the server and transmits data, that is, the server serves as a center of the federal network, and forms a centralized federal network with a plurality of clients.

Wherein each client holds its own original data locally, and the data modalities of the original data of each client may also be different.

In this embodiment, the data modality of the original data local to each client is referred to as a local modality, that is, the local modalities may be different from one client to another.

Further, each client is provided with an encoder and an authenticator.

Wherein the encoder is used to extract a characteristic representation of the original data, the discriminator may be used to discriminate the source of the data entered therein, e.g. whether the data belongs to data local to the client or data issued by the server.

Further, the server side is provided with a plurality of decoders, and the decoders can be used for carrying out modal conversion on the characteristic data of different local modes.

Wherein, each decoder corresponds to one type of data mode, a plurality of decoders can be set according to specific requirements, that is, the number of the decoders is the same as the number of the types of the data modes, for example, when m different data modes are involved in actual production, m different decoders can be set, and each type of data mode corresponds to one decoder.

The federal learning method based on modal conversion specifically comprises the following steps:

step S101, each client transmits the local characteristic data in the local mode to the server.

In an embodiment of the present application, each client, after preprocessing its local raw data, performs a feature representation on it and sends the feature representation to the center of the federal network, i.e., the server side.

Specifically, since the original data of each client terminal currently local contains data related to sensitive information such as personal information, when training for federal learning is performed, the original data needs to be desensitized, and data similar to the original data but not containing sensitive information is used.

Further, after desensitizing the original data, the obtained data is taken as virtual data.

Based on this, the virtual data has the same data modality as the original data, that is, a local modality.

Further, the obtained virtual data may be input to an encoder of the client, and the virtual data may be characterized by the encoder, that is, the virtual data may be encoded by the encoder.

Further, after characterizing the virtual data, the encoder may obtain the characterizing data corresponding to the virtual data local to the client.

Based on this, the feature data has the same data modality as the original data, i.e., the local modality of the client.

Further, each client may send its characteristic data to the server after obtaining the respective characteristic data.

In a specific example, a federal network having 2 clients and 1 server is taken as a specific example.

The 2 clients are a client A and a client B respectively, wherein the client A contains an encoder A, local data held by the client A is in a mode a, the client B contains an encoder B, and local data held by the client B is in a mode B.

Further, the client a may be enabled to be in the original data X of the a-modality _A Desensitizing to obtain virtual data x in the same mode as a _A And virtual data x _A The characteristic representation is input to the encoder a.

Further, after the feature representation, the encoder A will output the feature data x _a 。

Further, the client a may be caused to compare the feature data x _a And sending the data to a server side.

Similarly, client B may be caused to be in the B-modality raw data X _B Desensitizing to obtain virtual data x in b mode _B And virtual data x _B The characteristic representation is input to the encoder B.

Further, after the feature representation, the encoder B will output the feature data x _b 。

Further, the client B may be caused to compare the feature data x _b And sending the data to a server side.

Further, the server may receive the feature data sent by each client.

In a specific example, the server side can currently hold the feature data x in the a-mode _a And feature data x in b-mode _b 。

Step S102, the server decodes the received characteristic data of each client to obtain the decoded data of the target mode.

In the embodiment of the application, the server terminal can obtain the decoded data after decoding based on the received characteristic data of each client terminal.

Specifically, based on the feature data of the local mode of each client received by the server in the foregoing step, each feature data may be decoded into a preset target mode by using a decoder corresponding to the target mode, that is, the feature data is subjected to mode conversion.

Further, the data subjected to the modality conversion may be regarded as decoded data.

In a specific example, taking the federal learning network in the previous step as an example, the server holds the feature data x of the a-mode _a And b-modality characteristic data x _b 。

Further, the server side contains a decoder a corresponding to the a mode and a decoder B corresponding to the B mode.

Further, the server side can be caused to utilize the decoder B to process the characteristic data x _a Decoding, after decoding, namely, taking the B mode corresponding to the decoder B as a target mode, and taking the characteristic data x _a The a mode of (2) is converted into a target mode b mode, so as to obtain the decoded data。

Similarly, the server side can be caused to use the decoder A for the characteristic data x _b Decoding is performed, after decoding, that is, the a mode corresponding to the decoder A is taken as the target mode, and the characteristic data x is taken as the characteristic data x _b The b mode of (2) is converted into the a mode of the target mode, thereby obtaining the decoded data。

Based on the above, the server holds the feature data x in the a-mode after decoding _a And feature data x in b-mode _b And, decoded data in b-modeAnd decoded data in the a-mode +.>。

Step S103, the server side is enabled to aggregate coding parameters of each client side to obtain a global encoder, the decoding data are rebuilt into first rebuilding data of a local mode by utilizing the global encoder, the first rebuilding data are sent to each client side, the server side is enabled to train the global encoder by interacting the first rebuilding data and corresponding characteristic data with each client side, and the trained global encoder is sent to each client side.

In the embodiment of the present application, the server may aggregate the encoding parameters of each client during encoding to obtain a global encoder covering all client encoders, and reconstruct the data of each decoded data by using the global encoder to obtain the first reconstructed data for converting the target mode back to the local mode.

Further, the server side is enabled to send the global encoder and the first reconstruction data to each corresponding client side so as to train the global encoder.

Specifically, first, each client may be caused to transmit the encoding parameters of its encoder at the time of encoding to the server side.

Further, after the server receives the encoding parameters of each encoder, the encoding parameters of each encoder may be aggregated by an aggregation operation, for example, weighting, etc., and the global encoder may be set using the global encoding parameters obtained after the aggregation.

Based on the above, the server side can reconstruct each decoded data by the obtained global encoder respectively, so as to reconstruct each decoded data in the target mode into the original local mode, and obtain the first reconstructed data corresponding to the local mode of each client.

Further, the server may be enabled to send the obtained first reconstruction data to the client corresponding to the data mode of the first reconstruction data, and send the global encoder to each client, so that each client trains the obtained global encoder respectively.

In a specific example, the client a may be caused to send the encoding parameter a of the encoder a to the server side, and the client B may be caused to send the encoding parameter B of the encoder B to the server side.

Further, the server side performs aggregation after receiving the coding parameters A and B, and sets the global encoder S by using the coding parameters AB obtained after aggregation.

Further, the server side can be enabled to decode the data by using the global encoder SConversion of the b-mode of (a) into the corresponding original local mode a-mode, resulting in first reconstruction data +.>And decode data->Converting into corresponding original local mode b mode to obtain first reconstruction data +.>。

Further, let the server end to be in the first reconstruction data of the a modeSend to client A and will be in the first reconstruction data of b modality +.>And sending to the client B.

In this embodiment, each client may calculate, using the received first reconstruction data, a distance loss between the first reconstruction data and the characteristic data local to its client.

Further, each client may be caused to report the respective calculated distance loss to the server.

Further, after the server receives the distance loss of each client, all the distance losses can be aggregated, and the coding parameters of the global encoder can be corrected by utilizing the aggregated distance loss.

Further, the server side is enabled to reconstruct the plurality of decoding data of the server side by utilizing the corrected global encoder again, and corrected first reconstruction parameters are obtained.

Further, the corrected first reconstruction parameters are issued to the corresponding clients, the clients are enabled to calculate the respective distance loss again, and the calculated distance loss again is reported to the server.

Further, after the server side aggregates the recalculated distance loss, when the distance loss before the aggregation reaches the preset loss threshold value, the global encoder training can be considered to be completed.

Further, the server side may issue the trained global encoder to each client side.

It can be seen that each client can receive the same global encoder, that is, the global encoder has generalization performance for each client and can be suitable for each client because the global encoder uses the respective first reconstruction data and distance loss corresponding to each client during training.

Step S104, enabling each client to take the output of the trained global encoder as the input of a preset downstream task model in the client, training the downstream task model of the client for the first time, and sending the first model parameters trained for the first time to other clients.

In the embodiment of the application, each client side can train the downstream task model for the first time by using the global encoder based on the received trained global encoder; wherein the downstream task may be a task such as classification, and the first training may be a supervised training.

Specifically, the server may send the decoder corresponding to the local mode of each client to each corresponding client, and may specifically issue decoding parameters of the corresponding decoder to each client.

Further, each client is caused to construct a respective local decoder using the obtained decoding parameters.

Further, each client is caused to decode its local feature data, i.e. modality conversion, with a respective local decoder.

The local characteristic data of each client can be determined to be a local mode, and after the mode conversion is performed, the decoded data in the target mode can be obtained locally at the client as described in the foregoing steps.

Further, each client uses its local global encoder to reconstruct the decoded data of the target mode locally to the client, that is, to convert the decoded data of the target mode into the first reconstructed data of the local mode.

Further, for each client, the first reconstructed data and the decoded data obtained by converting the first reconstructed data locally are used as input of a downstream task model of the client.

It should be noted that, the original data and the virtual data local to the client have the same label, that is, the first reconstructed data and the decoded data obtained by the conversion have the same label.

Based on this, the client can be made to supervise, i.e. train, its downstream task model for the first time, using the first reconstruction data and the decoding data.

Further, after the supervision training of the local downstream task model by each client is completed, each client can determine the current first model parameters of the downstream task model, and send the local first model parameters of the client to the server.

Further, the server side may send the first model parameters of the client side to other client sides after receiving the first model parameters of each client side.

In some other embodiments, the server may also aggregate all the first model parameters after receiving the first model parameters of each client, and send the aggregated first model parameters to other clients.

Step 105, enabling each other client to perform secondary training on the local downstream task model, sending second model parameters obtained through secondary training to the server, and enabling the server to aggregate the second model parameters of each client to obtain the downstream task model with the secondary training completed.

In an embodiment of the present application, for each client in the federal network, the client may be enabled to set a local downstream task model of the client by using the first model parameters of the other clients received from the server, so as to perform semi-supervised training, that is, secondary training, on the downstream task model, and after the semi-supervised training is completed, set the second model parameters of the local downstream task model of the client.

In some other embodiments, the aggregated first model parameters in the foregoing steps may also be issued to each client, and each client may be enabled to train its local downstream task model by using the aggregated first model parameters.

Specifically, since the first model parameters obtained in the foregoing steps are obtained by training with a global decoder having generalization performance, each client has generalization capability for the client in a local downstream task model constructed by using the first model parameters of other clients.

Based on this, each client can set its local downstream task model with the first model parameters received from the server side and semi-supervise train its local downstream task model with the virtual data or the feature data local to the client.

In some embodiments, when the downstream task is a classification task, then the local downstream task model of the client may be utilized to generate a noisy pseudo tag for its local virtual data or feature data, and semi-supervised training of the local downstream task model using the noisy pseudo tag.

Further, after the semi-supervised training is completed, the client can determine the current second model parameters of the local downstream task model, and report the second model parameters to the server.

Based on this, the server side may receive the second model parameters of each client side and aggregate all the second model parameters, where the aggregation operation may be, for example, a weighting operation.

Further, a domain divider is further arranged in the server, and the domain divider can identify output data, intermediate characteristic data and the like of the downstream task model so as to determine the mode of the data.

Specifically, when the domain separator cannot successfully identify the data modes of the output data and the intermediate characteristic data, the downstream task model is considered to learn the characteristics unchanged among various modes in training, namely, the inherent characteristics of various modes are abandoned, and further, the downstream task model is considered to have better generalization performance.

Therefore, according to the federation learning method based on the modal conversion, the global encoder is built based on the local characteristic data of each client and the modal conversion of each characteristic data based on the server, the global encoder can be further trained based on the first reconstruction data interacted between each client and the server and the distance loss, based on the fact that the output of the global encoder is used as the input of the downstream task model, the downstream task model is supervised and trained on each client, the other clients are semi-supervised and trained, the downstream task model is guided by combining the global encoder, and the trained downstream task model has better generalization capability when coping with data of different modalities.

It should be noted that, the method of the embodiments of the present application may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present application, which interact with each other to complete the methods.

It should be noted that some embodiments of the present application are described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Based on the same inventive concept, the embodiment of the application also provides a federal learning device based on modal conversion, which corresponds to the method of any embodiment.

Referring to fig. 2, the federal learning device based on modality conversion includes: a first modality conversion module 201, a second modality conversion module 202, a third modality conversion module 203, a first task model training module 204, and a second task model training module 205;

the first modality conversion module 201 is configured to enable each client to send the local feature data in the local modality to the server;

the second mode conversion module 202 is configured to enable the server to decode the received characteristic data of each client to obtain decoded data of the target mode;

the third modality conversion module 203 is configured to enable the server to aggregate the encoding parameters of each client to obtain a global encoder, reconstruct the decoding data into first reconstruction data of a local modality by using the global encoder, send the first reconstruction data to each client, enable the server to train the global encoder by interacting the first reconstruction data and the corresponding characteristic data with each client, and send the trained global encoder to each client;

the first task model training module 204 is configured to enable each client to take the output of the trained global encoder as the input of a preset downstream task model in the client, train the downstream task model of the client for the first time, and send first model parameters trained for the first time to other clients;

The second task model training module 205 is configured to enable each other client to perform secondary training on the local downstream task model, send the second model parameters obtained by secondary training to the server, and enable the server to aggregate the second model parameters of each client to obtain the downstream task model with the secondary training completed.

For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present application.

The device of the foregoing embodiment is configured to implement the federal learning method based on modality conversion corresponding to any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Based on the same inventive concept, corresponding to the method of any embodiment, the embodiment of the application further provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the federal learning method based on the modal transformation according to any embodiment.

Fig. 3 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present application.

The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the solutions provided by the embodiments of the present application are implemented in software or firmware, the relevant program code is stored in memory 1020 and invoked for execution by processor 1010.

The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown in the figure) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present application, and not all the components shown in the drawings.

Based on the same inventive concept, corresponding to any of the above embodiments of the method, the present application further provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the federal learning method based on modality conversion as described in any of the above embodiments.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The storage medium of the foregoing embodiments stores computer instructions for causing the computer to perform the federal learning method based on modality conversion according to any one of the foregoing embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the application (including the claims) is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present application, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in details for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present application. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present application are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The embodiments of the present application are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the embodiments of the present application, are therefore intended to be included within the scope of the present application.

Claims

1. The federal learning method based on modal conversion is characterized by being applied to a federal network, wherein the federal network comprises a server side and at least two clients;

the method comprises the following steps:

Enabling each client to take the output of the trained global encoder as the input of a preset downstream task model in the client, training the downstream task model of the client for the first time, and sending first model parameters trained for the first time to other clients;

2. The method of claim 1, wherein before the causing each client to send the local feature data in the local modality to the server, further comprises:

enabling each client to desensitize local original data to obtain virtual data;

and carrying out characteristic representation on the virtual data by utilizing the respective encoder of each client to obtain characteristic data of local original data of the corresponding client.

3. The method of claim 1, wherein reconstructing the decoded data with the global encoder into first reconstructed data of a local modality comprises:

combining the global encoder with each decoder separately;

4. The method of claim 1, wherein the causing the server to train the global encoder by interacting the first reconstruction data and the corresponding feature data with each client comprises:

5. A method according to claim 3, wherein the causing each client to take the output of the trained global encoder as input to a pre-set downstream task model within the client comprises:

the server side transmits decoding parameters of a corresponding decoder to the client side according to the local mode of each client side;

enabling each client to decode the characteristic data of the local mode of the client into decoded data of the target mode again by utilizing the received decoding parameters;

the client side rebuilds the decoded data of the target mode by using the trained global encoder to obtain second rebuilding data of the local mode;

and enabling the client to take the second reconstruction data and the decoding data as input of the downstream task model.

6. The method of claim 5, wherein the first training of the downstream task model of the client comprises:

enabling the client to utilize the second reconstruction data and decode again to obtain decoding data of the target mode, and performing supervised training on a task model of the client;

7. The method of claim 1, wherein the causing each other client to secondarily train the local downstream task model comprises:

the second reconstruction data in the client and the decoded data of the target mode obtained by the client are decoded again and input into a downstream task model of the client;

8. A federal learning device based on modality conversion, comprising: the system comprises a first mode conversion module, a second mode conversion module, a third mode conversion module, a first task model training module and a second task model training module;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, wherein the processor implements the method of any one of claims 1 to 7 when executing the computer program.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.