CN116306905A

CN116306905A - Semi-supervised non-independent co-distributed federal learning distillation method and device

Info

Publication number: CN116306905A
Application number: CN202310142023.5A
Authority: CN
Inventors: 丁阳光; 沈超锋; 吴贻军; 梁前能; 熊永星; 解光林
Original assignee: Anhui Kexun Jinfu Technology Co ltd
Current assignee: Anhui Kexun Jinfu Technology Co ltd
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-06-23

Abstract

The invention relates to the technical field of artificial intelligence, and provides a semi-supervised non-independent co-distributed federal learning distillation method and device. In addition, the method combines knowledge distillation and federal learning, so that the student model can learn knowledge of other terminals which do not exist at all, namely, the student model has no relevant tag in own data, but can learn relevant knowledge through federal learning, and the method is an extreme non-independent co-distribution scene for the data tag. Meanwhile, the method can enable the fitting capability of the student model to be better through federal learning.

Description

Semi-supervised non-independent co-distributed federal learning distillation method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a semi-supervised non-independent co-distributed federal learning distillation method and device.

Background

Federal learning (Federated Learning, FL) is a novel model training method, and global models issued by a server side can be initially trained through local data by using each scattered terminal device, then each terminal device uploads the initially trained local model to the server side, the server side uniformly aggregates each uploaded local model, and the aggregated model is issued to each terminal device. The federal learning realizes not only that local data is not leaked, and effectively protects the privacy safety of the local data, but also that mass and scattered local data are fully utilized for model training, and a local model with better fitting performance is obtained. Since federal learning allows participants to cooperatively train a model without sharing data, the privacy of local data is well protected and the data island is broken, federal learning is widely focused, and is particularly widely applied to distributed training scenes.

In the distributed training scenario, many conventional distributed machine learning algorithms all need to assume that the data distribution is uniform, i.e. the data distribution between the terminal devices needs to follow Independent co-distribution (IID). However, in real life, the generation of local data cannot be controlled, local data is generated independently on different terminal devices, when a plurality of scattered terminal devices are used as participants of federal learning, the local data on each terminal device may be Non-Independent and Distributed (Non-IID), and even the labels of the local data are also Non-Independent and Distributed, which will cause a significant decrease in model training efficiency in federal learning and a weak model generalization capability. Moreover, after the federal learning is performed, the accuracy of the obtained aggregation model is not greatly improved or even reduced.

Therefore, how to improve model training efficiency of federal learning in Non-IID scene, improve model generalization ability, and improve accuracy of an aggregation model is important.

Disclosure of Invention

The invention provides a semi-supervised non-independent co-distributed federal learning distillation method and device, which are used for solving the defects in the prior art.

The invention provides a semi-supervised non-independent co-distributed federal learning distillation method which is applied to a target terminal, wherein data and/or labels of all terminals under a target service end to which the target terminal belongs meet the non-independent co-distribution; the method comprises the following steps:

determining tag data and non-tag data of the target terminal, aligning the tags of the target terminal and other terminals under the target server, and training an initial teacher model based on the tag data and tag alignment results to obtain a first off-set teacher model;

based on the first partial teacher model, respectively carrying out label prediction on the label data and the non-label data to obtain a soft label of the label data and a first prediction result of the non-label data;

receiving a second partial teacher model of the other terminal, performing label prediction on the non-label data based on the second partial teacher model to obtain a second prediction result of the non-label data, and generating a soft label and a hard label of the non-label data based on the first prediction result and the second prediction result;

And carrying out local distillation on the basic model based on the label data and the soft label thereof and the soft label and the hard label of the non-label data to obtain a student model, and carrying out federal learning based on the student model.

According to the semi-supervised non-independent co-distributed federal learning distillation method provided by the invention, the generating of the soft tag and the hard tag of the non-tag data based on the first prediction result and the second prediction result comprises the following steps:

calculating variances of the first prediction result and the second prediction result;

generating a hard tag of the non-tag data based on a difference between the prediction result with the large variance and the prediction result with the small variance;

and calculating the average value of the first prediction result and the second prediction result, and taking the average value as a soft label of the non-label data.

The invention provides a semi-supervised non-independent co-distributed federal learning distillation method, which further comprises the following steps:

extracting a part of structures in the first partial teacher model;

performing differential privacy protection on the partial structure to obtain a target structure, and sending the target structure to the other terminals; or alternatively, the process may be performed,

and carrying out differential privacy protection on part of structures in the first partial teacher model to obtain a target partial teacher model, and sending the target partial teacher model to the other terminals.

According to the semi-supervised non-independent co-distributed federal learning distillation method provided by the invention, the second partial teacher model is a structure obtained by performing differential privacy protection on part of the structure of the initial partial teacher model by the other terminals;

correspondingly, the performing label prediction on the non-label data based on the second paranoid teacher model to obtain a second prediction result of the non-label data includes:

splicing the difference structures in the second partial teacher model and the first partial teacher model to obtain a spliced model;

and carrying out label prediction on the non-label data based on the splicing model to obtain the second prediction result.

According to the semi-supervised non-independent co-distributed federal learning distillation method provided by the invention, federal learning is performed based on the student model, and the method comprises the following steps:

uploading the student model to the target server;

and receiving an aggregation model obtained by the target server based on federal average aggregation of the student models uploaded by the terminals, and circularly performing local distillation by taking the aggregation model as the basic model until federal learning is finished.

The invention also provides a semi-supervised non-independent co-distributed federal learning distillation device which is applied to a target terminal, wherein data and/or labels of all terminals under a target service end to which the target terminal belongs meet the non-independent co-distribution; the device comprises:

the determining module is used for determining tag data and non-tag data of the target terminal, aligning the tag of the target terminal with other terminals under the target service end, and training an initial teacher model based on the tag data and a tag alignment result to obtain a first off-the-road teacher model;

the first prediction module is used for respectively carrying out label prediction on the label data and the non-label data based on the first off-the-shelf teacher model to obtain a soft label of the label data and a first prediction result of the non-label data;

the second prediction module is used for receiving a second partial teacher model of the other terminal, performing label prediction on the non-label data based on the second partial teacher model to obtain a second prediction result of the non-label data, and generating a soft label and a hard label of the non-label data based on the first prediction result and the second prediction result;

And the federal distillation module is used for carrying out local distillation on the basic model based on the label data and the soft label thereof and the soft label and the hard label of the non-label data to obtain a student model, and carrying out federal learning based on the student model.

The invention provides a semi-supervised non-independent co-distributed federal learning distillation device, which also comprises a sending module, a first processing module and a second processing module, wherein the sending module is used for:

extracting a part of structures in the first partial teacher model;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the semi-supervised non-independent and co-distributed federal learning distillation method when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a semi-supervised non-independent co-distributed federal learning distillation method as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements a semi-supervised non-independent co-distributed federal learning distillation method as defined in any one of the above.

According to the semi-supervised non-independent co-distributed federal learning distillation method and device, the prediction result of the non-tag data of the target terminal by means of the off-the-shelf teacher model of the other terminal can enable the soft tag and the hard tag of the non-tag data to be more accurate and reliable, training efficiency of a basic model can be greatly improved, generalization capability of an obtained student model is enabled to be stronger, and accuracy of an aggregation model obtained through federal learning can be further improved. In addition, the method combines knowledge distillation and federal learning, so that the student model can learn knowledge of other terminals which do not exist at all, namely, the student model has no relevant tag in own data, but can learn relevant knowledge through federal learning, and the method is an extreme non-independent co-distribution scene for the data tag. Meanwhile, the method can enable the fitting capability of the student model to be better through federal learning.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.

FIG. 1 is a schematic flow diagram of a semi-supervised non-independent co-distributed federal learning distillation process provided by the present invention;

FIG. 2 is a schematic diagram of a semi-supervised non-independent co-distributed federal learning distillation apparatus according to the present invention;

fig. 3 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the prior art, when a plurality of scattered terminal devices are used as participants of federal learning, local data on each terminal device is possibly in non-independent and same distribution, and even labels carried by the local data are also in non-independent and same distribution, so that the model training efficiency in federal learning is greatly reduced, and the problem of weak model generalization capability occurs. Moreover, after the federal learning is performed, the accuracy of the obtained aggregation model is not greatly improved or even reduced. Therefore, the embodiment of the invention provides a semi-supervised federal learning distillation method with non-independent and same distribution, which is used for improving the model training efficiency of federal learning in a non-independent and same distribution scene, improving the generalization capability of the model and improving the accuracy of an aggregation model.

Fig. 1 is a schematic flow chart of a semi-supervised non-independent co-distributed federal learning distillation method provided in an embodiment of the present invention, where the method is applied to a target terminal, and data and/or labels of each terminal under a target server to which the target terminal belongs satisfy the non-independent co-distribution. As shown in fig. 1, the method includes:

s1, determining tag data and non-tag data of the target terminal, aligning the tags of the target terminal and other terminals under the target server, and training an initial teacher model based on the tag data and tag alignment results to obtain a first off-the-road teacher model;

S2, based on the first partial teacher model, respectively carrying out label prediction on the label data and the non-label data to obtain a soft label of the label data and a first prediction result of the non-label data;

s3, receiving a second partial teacher model of the other terminal, carrying out label prediction on the non-label data based on the second partial teacher model to obtain a second prediction result of the non-label data, and generating a soft label and a hard label of the non-label data based on the first prediction result and the second prediction result;

and S4, carrying out local distillation on the basic model based on the label data and the soft label thereof and the soft label and the hard label of the non-label data to obtain a student model, and carrying out federal learning based on the student model.

Specifically, the implementation main body of the semi-supervised non-independent co-distributed federal learning distillation method provided by the embodiment of the invention is a semi-supervised non-independent co-distributed federal learning distillation device, the device can be configured in a target terminal, and the target terminal can be each terminal in a target server, namely, each terminal in the target server is used for implementing the method. The target terminal may be a computer, which may be a local computer or a cloud computer, and the local computer may be a computer, a tablet, etc., which is not particularly limited herein.

The scene applied by the method is a non-independent identical distribution scene, namely, the data and/or labels of all terminals including the target terminal under the target service end to which the target terminal belongs meet the non-independent identical distribution, wherein the data meeting the non-independent identical distribution means that the types of the data labels of all the terminals are the same, but the data distribution is inconsistent. The fact that the labels meet the independent identical distribution means that the types of the data labels of all the terminals are not identical, and all the terminals have non-label data. The non-identical types of data labels of the terminals may include that the types of data labels of the terminals are not overlapped or partially overlapped, the label data refers to data with labels, and the non-label data refers to data without labels. Semi-supervision means that non-tag data exists in the data of each terminal.

First, step S1 is performed to determine tag data and non-tag data of the target terminal. Here, the target terminal may cluster the local data, that is, cluster the data with the tag and the data without the tag in the local data, respectively, to obtain the tag data and the non-tag data. The local data may be a picture, private data, etc., and the tag carried by the local data may be an object type, a private data type, etc. in the picture, which is not limited herein.

For example, the target server side includes a terminal a and a terminal B, the local data of the terminal a includes tag data with tags cat and dog and non-tag data without at least one of the tags cat, dog and fish, and three kinds of data, namely cat, dog, fish, are necessarily present in the non-tag data, only for specific data, no tag is present. The local data of the terminal B includes tag data with tag fish and non-tag data without at least one of tags cat, dog and fish, and three kinds of data cat, dog, fish must exist in the non-tag data, only for specific data, no tag exists.

The target terminal can align the labels of the target terminal and other terminals under the target server, and the label alignment result is that all labels related to local data of all terminals under the target server are related, so that the label types related to all terminals are consistent, and therefore, the terminals can be ensured to use a unified loss function in the federal learning process, and the predicted label types are kept uniform. For example, after the alignment of the tags, the tag related to the terminal a includes cat, dog, fish, but no corresponding data for the tag fish, and the tag related to the terminal B also includes cat, dog, fish, but no corresponding data for the tags cat, dog.

Furthermore, the target terminal can train the initial teacher model by using the label data and the label alignment result to obtain a first off-the-department teacher model. It can be understood that, due to the introduction of the label alignment result, the output item of the first paranoid teacher model includes the prediction results corresponding to all the labels, where the prediction results corresponding to the labels not related in the initial label data of the target terminal are included. For example, for terminal a, the resulting first paranoid teacher model Ta is better able to predict cat and dog class data, but has no predictive capability for fish class data, but still has its predictive terms. For the terminal B, the obtained first meta-teacher model Tb can better predict the fish class data, but has no prediction capability on the cat and dog class data, and only the prediction items still exist.

The initial teacher model can be built based on a neural network, tag data can be input into the initial teacher model when the initial teacher model is trained, an output result of the initial teacher model is obtained, a loss function is calculated by using the output result and the tag carried by the tag data, and structural parameters of the initial teacher model are iteratively updated based on the loss function until the loss function converges, so that the first partial teacher model is obtained.

And then executing step S2, wherein the target terminal respectively carries out label prediction on the label data and the non-label data by using the first partial teacher model, namely respectively inputting the label data and the non-label data into the first partial teacher model to obtain a prediction result of the label data and a first prediction result of the non-label data output by the first partial teacher model. Here, the prediction result of the tag data, i.e. soft tag soft-label of the tag data, and the original tag of the tag data can be used as hard tag hard-label.

For example, for the terminal A, performing label prediction on label data by using a first paranoid teacher model Ta to obtain a soft label of the label data; and carrying out label prediction on the non-label data by using the first off-the-shelf teacher model Ta to obtain a first prediction result A-Ta-soft of the non-label data. For the terminal B, carrying out label prediction on the label data by using a first meta-teacher model Tb to obtain a soft label of the label data; and carrying out label prediction on the non-label data by using the first off-the-shelf teacher model Tb to obtain a first prediction result B-Tb-soft of the non-label data.

And then executing step S3, the target terminal receives a second partial teacher model of other terminals, and performs label prediction on the non-label data of the target terminal by using the second partial teacher model to obtain a second prediction result of the non-label data of the target terminal.

For example, for the terminal a, the second meta-teacher model Tb 'of the terminal B may be received, and the second meta-teacher model Tb' may be the same as the first meta-teacher model Tb of the terminal B, may be only a part of the first meta-teacher model Tb, or may be obtained by encrypting the first meta-teacher model Tb or a part thereof, which is not particularly limited herein. And carrying out label prediction on the non-label data of the terminal A by using a second partial teacher model Tb ', so as to obtain a second prediction result A-Tb' -soft of the non-label data of the terminal A.

For the terminal B, a second paranoid teacher model Ta' of the terminal a may be received, which may be the same as the first paranoid teacher model Ta of the terminal a, may be only a part of the first paranoid teacher model Ta, may be obtained by encrypting the first paranoid teacher model Ta or a part thereof, and is not particularly limited herein. And carrying out label prediction on the non-label data of the terminal B by using a second partial teacher model Ta ', so as to obtain a second prediction result B-Ta' -soft of the non-label data of the terminal B.

Thereafter, the target terminal may generate soft and hard tags of non-tag data using the first and second predictions, to which the target terminal has been ready for the data and tags required for the subsequent local distillation step. The hard tag of the non-tag data may be determined according to the difference between the first prediction result and the second prediction result, for example, a difference threshold may be introduced, if the difference is smaller than the difference threshold, the hard tag of the corresponding non-tag data is determined to be 0, and if the difference is greater than or equal to the difference threshold, the hard tag of the corresponding non-tag data is determined to be 1. The soft labels of the non-label data may be determined from the average of the first prediction result and the second prediction result.

Here, for the terminal a, soft tag soft and hard tag soft-soft of non-tag data of the terminal a are generated using the first prediction result a-Ta-soft and the second prediction result a-Tb' -soft, and thus data and tags required for the subsequent local distillation step of the terminal a are prepared. For the terminal B, the soft label and the hard label of the non-label data of the terminal B are generated by utilizing the first prediction result B-Tb-soft and the second prediction result B-Ta' -soft, and the data and the labels required by the subsequent local distillation step of the terminal B are prepared.

And finally, executing step S4, and carrying out local distillation on the basic model by utilizing the label data and the soft labels and the hard labels of the soft labels and the non-label data to obtain the student model. Here, multiple rounds of local distillation may be performed, and the end flag may be that the distillation loss converges.

For example, for terminal A, the student model S-a can be obtained by locally distilling the basic model by using the tag data and soft tags soft-label and hard tags hard-label of the soft tag soft-label and the non-tag data; for the terminal B, the student model S-B can be obtained by locally distilling the basic model by using the label data, soft labels soft-label and hard labels hard-label of the label data.

And then, performing federal learning by using the obtained student model, namely, interacting with a target server by using the student model to realize federal learning.

The method comprises the steps of firstly determining label data and non-label data of a target terminal, aligning the label of the target terminal with other terminals under the target service end, and training an initial teacher model based on label data and label alignment results to obtain a first partial teacher model; then, respectively carrying out label prediction on the label data and the non-label data by using a first off-the-shelf teacher model to obtain first prediction results of the soft label and the non-label data of the label data; then receiving a second partial teacher model of the other terminal, carrying out label prediction on the non-label data based on the second partial teacher model to obtain a second prediction result of the non-label data, and generating a soft label and a hard label of the non-label data based on the first prediction result and the second prediction result; and finally, locally distilling the basic model by using the label data, the soft labels and the hard labels of the label data and the soft labels and the non-label data to obtain a student model, and performing federal learning based on the student model. According to the method, by means of the prediction results of the non-tag data of the target terminal by the teacher model of the partial family of other terminals, the soft tag and the hard tag of the non-tag data can be more accurate and reliable, the training efficiency of the basic model can be greatly improved, the generalization capability of the obtained student model is stronger, and the accuracy of the aggregation model obtained by federal learning can be further improved. In addition, the method combines knowledge distillation and federal learning, so that the student model can learn knowledge of other terminals which do not exist at all, namely, the student model has no relevant tag in own data, but can learn relevant knowledge through federal learning, and the method is an extreme non-independent co-distribution scene for the data tag. Meanwhile, the method can enable the fitting capability of the student model to be better through federal learning.

Based on the foregoing embodiment, the semi-supervised non-independent co-distributed federal learning distillation method provided in the embodiment of the present invention generates a soft tag and a hard tag of the non-tag data based on the first prediction result and the second prediction result, including:

Specifically, when generating the soft tag and the hard tag of the non-tag data of the target terminal, the variance of the first prediction result and the second prediction result may be calculated first, then the difference between the prediction result with large variance and the prediction result with small variance is calculated, and further the hard tag of the non-tag data of the target terminal is generated according to the difference. The introduction of variance factors can enable the determined hard tag to be more in line with the actual situation.

And calculating the average value of the first prediction result and the second prediction result, and taking the average value as a soft label of non-label data of the target terminal.

Based on the above embodiment, the semi-supervised non-independent co-distributed federal learning distillation method provided in the embodiment of the present invention further includes:

extracting a part of structures in the first partial teacher model;

Specifically, the target terminal may further extract a partial structure in the first paranoid teacher model, and the partial structure may be half of the first paranoid teacher model.

And encrypting the partial structure in a differential privacy protection mode to obtain a target structure, and sending the target structure to other terminals. Due to the introduction of differential privacy protection, other terminals can not recover the data of the target terminal. Moreover, only the first partial teacher model is encrypted and sent to other terminals, so that the difficulty of the other terminals in recovering the data of the target terminal can be further increased.

In the embodiment of the invention, the differential privacy protection can be directly carried out on part of the structure in the first partial teacher model to obtain the target partial teacher model, and the target partial teacher model is sent to other terminals, so that the other terminals can directly utilize the target partial teacher model, and the workload of the other terminals in application of the target partial teacher model is reduced.

Based on the embodiment, the semi-supervised non-independent co-distributed federal learning distillation method provided by the embodiment of the invention, wherein the second partial teacher model is a structure obtained by performing differential privacy protection on part of the structure of the initial partial teacher model by the other terminals;

Specifically, the second paranoid teacher model may perform differential privacy protection on a part of the structure of the initial paranoid teacher model obtained by training the second paranoid teacher model through the tag data and the tag alignment result through other terminals, and at this time, the second paranoid teacher model is incomplete and does not have a prediction function. Therefore, when the target terminal utilizes the second partial teacher model to conduct label prediction on the non-label data, the difference structure in the second partial teacher model and the difference structure in the first partial teacher model can be spliced, and a spliced model is obtained. It can be understood that the difference structure in the first paranoid teacher model is a different structure from the second paranoid teacher model, and a spliced model obtained by splicing the difference structure and the second paranoid teacher model is a model which can be used for prediction.

And then, directly carrying out label prediction on the non-label data by utilizing the splicing model, and obtaining a second prediction result.

Based on the above embodiment, the semi-supervised non-independent co-distributed federal learning distillation method provided in the embodiment of the present invention, wherein the federal learning based on the student model includes:

uploading the student model to the target server;

Specifically, when the student model is utilized to perform federal learning, the student model may be uploaded to a target server, and the target server receives the student models uploaded by all terminals belonging to the target server to perform federal average aggregation, so as to obtain an aggregation model S. The federal average aggregation may be performed by weighted averaging of the received structural parameters of all student models.

Thereafter, the target terminal may receive the aggregate model, and the receiving process may be to directly download the aggregate model from the target server. Furthermore, the aggregation model can be used as a basic model again, and the local distillation is circularly performed, namely, the student model is obtained by repeatedly performing the local distillation for a plurality of times and is uploaded to the target server, the aggregation model of the target server is received, and the repeated process is performed for a plurality of times until the federal learning is finished. At this time, a student model which can be used for label class prediction at the target terminal and an aggregation model of the target server, namely a federation model, are obtained.

As shown in fig. 2, on the basis of the above embodiment, in the embodiment of the present invention, there is provided a semi-supervised non-independent co-distributed federal learning distillation apparatus, including:

the determining module 21 is configured to determine tag data and non-tag data of the target terminal, perform tag alignment on the target terminal and other terminals under the target server, and train an initial teacher model based on the tag data and a tag alignment result to obtain a first paranoid teacher model;

a first prediction module 22, configured to perform label prediction on the label data and the non-label data based on the first paranoid teacher model, to obtain a soft label of the label data and a first prediction result of the non-label data;

a second prediction module 23, configured to receive a second paranoid teacher model of the other terminal, perform label prediction on the non-label data based on the second paranoid teacher model, obtain a second prediction result of the non-label data, and generate a soft label and a hard label of the non-label data based on the first prediction result and the second prediction result;

the federal distillation module 24 is configured to locally distill the base model based on the tag data and the soft tag thereof, and the soft tag and the hard tag of the non-tag data to obtain a student model, and perform federal learning based on the student model.

Based on the above embodiment, the semi-supervised non-independent co-distributed federal learning distillation apparatus provided in the embodiment of the present invention further includes a sending module, configured to:

extracting a part of structures in the first partial teacher model;

Based on the above embodiment, the semi-supervised non-independent co-distributed federal learning distillation apparatus provided in the embodiment of the present invention, the second prediction module is specifically configured to:

Based on the above embodiment, the semi-supervised non-independent co-distributed federal learning distillation apparatus provided in the embodiment of the present invention, where the second paranoid teacher model is a structure obtained by performing differential privacy protection on a part of the structure of the initial paranoid teacher model by the other terminal;

Correspondingly, the second prediction module is specifically configured to:

Based on the above embodiments, the semi-supervised non-independent co-distributed federal learning distillation apparatus provided in the embodiments of the present invention, where the federal distillation module is specifically configured to:

uploading the student model to the target server;

Specifically, the functions of each module in the semi-supervised non-independent co-distributed federal learning distillation device provided in the embodiment of the present invention are in one-to-one correspondence with the operation flows of each step in the above method embodiment, and the achieved effects are consistent.

Fig. 3 illustrates a physical schematic diagram of an electronic device, as shown in fig. 3, where the electronic device may include: processor (Processor) 310, communication interface (Communications Interface) 320, memory (Memory) 330 and communication bus 340, wherein Processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. The processor 310 may invoke the logic instructions in the memory 330 to execute the semi-supervised non-independent co-distributed federal learning distillation method provided in the above embodiments, which is applied to the target terminal, where the data and/or the tag of each terminal under the target server to which the target terminal belongs satisfy the non-independent co-distribution; the method comprises the following steps: determining tag data and non-tag data of the target terminal, aligning the tags of the target terminal and other terminals under the target server, and training an initial teacher model based on the tag data and tag alignment results to obtain a first off-set teacher model; based on the first partial teacher model, respectively carrying out label prediction on the label data and the non-label data to obtain a soft label of the label data and a first prediction result of the non-label data; receiving a second partial teacher model of the other terminal, performing label prediction on the non-label data based on the second partial teacher model to obtain a second prediction result of the non-label data, and generating a soft label and a hard label of the non-label data based on the first prediction result and the second prediction result; and carrying out local distillation on the basic model based on the label data and the soft label thereof and the soft label and the hard label of the non-label data to obtain a student model, and carrying out federal learning based on the student model.

Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention further provides a computer program product, where the computer program product includes a computer program, where the computer program may be stored on a non-transitory computer readable storage medium, where the computer program, when executed by a processor, is capable of executing the semi-supervised non-independent co-distributed federal learning distillation method provided in the foregoing embodiments, and is applied to a target terminal, where data and/or a tag of each terminal under a target server to which the target terminal belongs satisfies non-independent co-distribution; the method comprises the following steps: determining tag data and non-tag data of the target terminal, aligning the tags of the target terminal and other terminals under the target server, and training an initial teacher model based on the tag data and tag alignment results to obtain a first off-set teacher model; based on the first partial teacher model, respectively carrying out label prediction on the label data and the non-label data to obtain a soft label of the label data and a first prediction result of the non-label data; receiving a second partial teacher model of the other terminal, performing label prediction on the non-label data based on the second partial teacher model to obtain a second prediction result of the non-label data, and generating a soft label and a hard label of the non-label data based on the first prediction result and the second prediction result; and carrying out local distillation on the basic model based on the label data and the soft label thereof and the soft label and the hard label of the non-label data to obtain a student model, and carrying out federal learning based on the student model.

In still another aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program is implemented when executed by a processor to perform the semi-supervised non-independent co-distributed federal learning distillation method provided in the foregoing embodiments, where the method is applied to a target terminal, where data and/or a tag of each terminal under a target server to which the target terminal belongs satisfies non-independent co-distribution; the method comprises the following steps: determining tag data and non-tag data of the target terminal, aligning the tags of the target terminal and other terminals under the target server, and training an initial teacher model based on the tag data and tag alignment results to obtain a first off-set teacher model; based on the first partial teacher model, respectively carrying out label prediction on the label data and the non-label data to obtain a soft label of the label data and a first prediction result of the non-label data; receiving a second partial teacher model of the other terminal, performing label prediction on the non-label data based on the second partial teacher model to obtain a second prediction result of the non-label data, and generating a soft label and a hard label of the non-label data based on the first prediction result and the second prediction result; and carrying out local distillation on the basic model based on the label data and the soft label thereof and the soft label and the hard label of the non-label data to obtain a student model, and carrying out federal learning based on the student model.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The federal learning distillation method of semi-supervised non-independent co-distribution is characterized by being applied to target terminals, wherein data and/or labels of all terminals under a target service end to which the target terminals belong meet the non-independent co-distribution; the method comprises the following steps:

2. The semi-supervised non-independent co-distributed federal learning distillation method according to claim 1, wherein the generating soft and hard labels of the non-label data based on the first and second predictions comprises:

3. The semi-supervised non-self-contained co-distributed federal learn distillation method according to claim 1, further comprising:

extracting a part of structures in the first partial teacher model;

4. The semi-supervised non-independent co-distributed federal learning distillation method according to claim 1, wherein the second partial teacher model is a structure obtained by differential privacy protection of a part of the structures of the initial partial teacher model by the other terminals;

5. The semi-supervised non-independent co-distributed federal learning distillation method according to any of claims 1-4, wherein said federal learning based on said student model comprises:

uploading the student model to the target server;

6. The semi-supervised non-independent co-distributed federal learning distillation device is characterized by being applied to target terminals, wherein data and/or labels of all terminals under a target service end to which the target terminals belong meet the non-independent co-distribution; the device comprises:

7. The semi-supervised non-self-contained co-distributed federal learning distillation apparatus according to claim 6, further comprising a transmission module for:

extracting a part of structures in the first partial teacher model;

8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the semi-supervised non-self-contained co-distributed federal learning distillation method of any of claims 1-5 when the program is executed by the processor.

9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the semi-supervised non-independent co-distributed federal learning distillation method of any of claims 1-5.

10. A computer program product comprising a computer program which when executed by a processor implements the semi-supervised non-self-contained co-distributed federal learning distillation method of any of claims 1-5.