CN117292221A

CN117292221A - Image recognition method and system based on federal element learning

Info

Publication number: CN117292221A
Application number: CN202311252854.4A
Authority: CN
Inventors: 史慧玲; 张先恒; 张玮; 丁伟; 谭立状; 郝昊; 王小龙; 刘国正
Original assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Current assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Priority date: 2023-09-26
Filing date: 2023-09-26
Publication date: 2023-12-26

Abstract

The invention discloses an image recognition method and system based on federal element learning, and belongs to the technical field of machine learning. The method comprises the steps of obtaining an image to be identified; inputting the image to be identified into a trained federal element learning model for processing, and obtaining an image identification result; the process of training the federal meta-learning model includes: the server initializes global model parameters and sends the parameters to a plurality of clients respectively; the client receives the global model parameters, trains the local model of the client through a training set according to the global model parameters, updates the local model parameters and uploads the local model parameters to the server; the server aggregates all local model parameters and updates the global model parameters until the global model converges; and (5) carrying out circulating knowledge distillation among the clients to form a personalized model. The method can improve the performance of the federal learning model, improve the accuracy of image recognition on the premise of ensuring privacy and data safety, and solve the problems that the performance of a client model is poor and the accuracy of image recognition is affected.

Description

Image recognition method and system based on federal element learning

Technical Field

The invention relates to the technical field of machine learning, in particular to an image recognition method and system based on federal element learning.

Background

The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.

Machine Learning (ML) is increasingly common in people's daily lives, and is applied to aspects of people's lives, bringing great convenience to people's lives. However, to meet user privacy, data security, and government regulations, data cannot be shared directly between institutions. In order to perform machine learning modeling without invading privacy and ensuring data security, a federal learning framework has been developed.

Federal learning is an efficient alternative method of communication and privacy protection, allowing a group of organizations or groups within the same organization to train and refine a shared global machine learning model in a collaborative and iterative manner without exchanging unique data between the organizations participating in federal learning. Therefore, the data privacy safety can be ensured, and the machine learning modeling task can be completed. The design goal is to develop high-efficiency machine learning among multiple participants or multiple computing nodes on the premise of guaranteeing information security during large data exchange, protecting terminal data and personal data privacy and guaranteeing legal compliance.

However, one of the challenges faced by federal learning is data heterogeneity, where data in federal clients is typically non-independently co-distributed. There may be a potential statistical structure of data between devices to represent the relationships and distribution of different devices, which may severely impact the performance of the federally learned global model. Meanwhile, the data distribution is different, so that the global model may have better local performance in some federal clients and poor performance in other clients.

Some research methods aim at solving the problem that the global model has poor effect under the condition of non-independent and same-distribution data, but lack of personalized consideration on the local model, and the existing schemes usually have different performances at local clients.

When federal learning is applied to image recognition, on the one hand, the data distribution of different devices in federal learning may be different, resulting in more representative data for some devices, while data for other devices may not be sufficient to support efficient training for some categories, resulting in low accuracy in image recognition. On the other hand, in federal learning, although data is always kept on the local device, privacy protection can be improved, privacy and security risks, such as model back-pushing attacks and parameter leakage, still exist, which can pose a threat to user data.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides an image recognition method, an image recognition system, electronic equipment and a computer readable storage medium based on federal element learning.

In a first aspect, the present invention provides an image recognition method based on federal element learning;

an image recognition method based on federal element learning, comprising:

acquiring an image to be identified;

inputting the image to be identified into a trained federal element learning model for processing, and obtaining an image identification result; the federal element learning model comprises a global model and a plurality of personalized models;

wherein the process of training the federal meta-learning model comprises:

the server initializes global model parameters and sends the parameters to a plurality of clients respectively;

the method comprises the steps that a client receives global model parameters, a training set is divided by an image data set to be identified, the local model of the client is trained through the training set according to the global model parameters, and the local model parameters are updated and uploaded to a server;

the server aggregates all local model parameters to update the global model parameters until the global model converges;

and training the local model by circulating knowledge distillation among the clients to form a personalized model.

Further, when the local model of the client is trained through the training set, the second-order gradient in meta learning is calculated, and gradient descent is carried out on the local model.

Further, the training of the federal element learning model is participated by using a CNN network architecture, wherein the CNN network architecture comprises a classifier and a feature extractor which are composed of full connection layers. Further, the server aggregates all local model parameters to update global model parameters specifically as follows: and carrying out weighted average on the local model parameters of each client to obtain global model parameters.

Further, multiple rounds of circular knowledge distillation are performed between multiple clients, expressed as:

wherein g _tea Is the feature extractor of the previous client, g _stu Is the feature extractor of the current client, D ^k Is a local data sample of the current client.

Further, the loss function of the cyclic knowledge distillation stage is expressed as:

where λ is the weight of the previous client to deliver knowledge to the current client, l _cls Is a cross entropy loss function.

Further, the dividing the training set by the image data set to be identified specifically includes: and dividing the image data set to be identified by using Dirichlet distribution so as to realize non-independent same-distribution data partition among different clients.

In a second aspect, the present invention provides an image recognition system based on federal element learning;

an image recognition system based on federal element learning, comprising:

an acquisition module configured to: acquiring an image to be identified;

an image recognition module configured to: inputting the image to be identified into a trained federal element learning model for processing, and obtaining an image identification result; the federal element learning model comprises a global model and a plurality of personalized models;

wherein the process of training the federal meta-learning model comprises:

and training the local model by circulating knowledge distillation among the clients, and updating the personalized model.

In a third aspect, the present invention provides an electronic device;

an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the federal element learning based image recognition method described above.

In a fourth aspect, the present invention provides a computer-readable storage medium;

a computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the federal element learning based image recognition method described above.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the technical scheme provided by the invention, on the premise of ensuring the privacy safety of the federation, the local data of each federation client is utilized for training, and the personalized model of each federation client is formed through circulating knowledge distillation; through global model fusion, not only can a high-performance global model be obtained, but also a personalized model conforming to data distribution of each client can be generated for each client, so that the problem of influence of data heterogeneity on model performance in federal learning is solved, and the accuracy of image recognition is improved.

2. According to the technical scheme provided by the invention, the performance of the local client can be improved, the performance improvement of the global model is obtained through the global model fusion, and meanwhile, the performance of the personalized model can be further improved on a single client, and the method is superior to the federal element learning method facing data heterogeneity.

3. According to the technical scheme provided by the invention, the local model is trained only by utilizing the data of the local client, privacy-sensitive image data can be effectively protected, and the method has the advantages of privacy protection, distributed learning, personalized model, migration learning, adaptation to continuously-changing data distribution and the like, so that federal element learning becomes an effective tool for processing the distributed privacy-sensitive image data.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a schematic flow chart provided in an embodiment of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Example 1

In the prior art, the method for coping with the data heterogeneity slows down the influence of the data heterogeneity on the bang learning to a certain extent, but cannot consider the performance of the global model and the client model; therefore, the invention provides an image recognition method based on federal element learning, which adopts the characteristics learned in element learning and cyclic knowledge distillation to obtain a personalized model to adapt to the data heterogeneity problem of different clients, thereby obtaining better models in the global and local areas.

Next, a detailed description will be given of an image recognition method based on federal element learning disclosed in this embodiment with reference to fig. 1. The image recognition method based on federal element learning comprises the following steps:

s1, acquiring an image to be identified.

S2, inputting the image to be identified into a trained federal element learning model to process, and obtaining an image identification result, wherein the federal element learning model comprises a global model and a plurality of personalized models.

Specifically, the image to be identified is input into a global model or a personalized model on a specific client for processing, and the model generates an image identification result, wherein the image identification result is usually a classification label or probability distribution of the image.

Further, the process of training the federal element learning model comprises the following steps:

step 1, initializing global model parameters by a server and respectively sending the global model parameters to a plurality of clients. The specific flow is as follows:

step 101, adapting the reference data sets CIFAR-10 and CIFAR-100 to federal element learning setting under the independent same distribution data.

The CIFAR-10 data set comprises 60000 color images of 32x32, 10 categories are covered, each category contains 6000 images, wherein the number of training images is 50000, and the number of test images is 10000. The dataset was divided into five training batches, each batch containing 10000 images, and one test batch including 1000 images randomly selected for each category, the images in the training batches being arranged in a random order, but some batches may contain a greater number of images for a certain category than for other categories. In general, each training batch contained 5000 images from each category.

The CIFAR-100 dataset covers 100 categories, each category containing 600 images, each category containing 500 training images and 100 test images, respectively. The 100 categories in CIFAR-100 are divided into 20 supercategories. Each image is accompanied by a "fine" label to indicate the specific category to which it belongs, and a "coarse" label to indicate the supercategory to which it belongs

In this embodiment, a Dirichlet distribution is used to partition non-independent co-distributed data partitions between different clients, and the value of the degree of isomerism alpha on CIFAR-10 and CIFAR-100 is set to 0.5 as a local data set for each federal client.

And 102, building a federal learning environment and initializing a model.

In order to ensure faster convergence on the premise of ensuring accuracy in the next step, in this embodiment, a CNN network architecture including a feature extractor and a classifier composed of fully connected layers is adopted to participate in training. The network architecture is a neural network phi with the parameter w= { u, v } _w It is extracted by a feature extractor f _u And classifier two part h _v The feature extractor is used for extracting high-level features of the input image, and the classifier is used for mapping the high-level features obtained from the feature extractor onto a class probability distribution to classify. The local model parameter for client k is denoted as w _k . Under the default environment, 100 rounds of global communication are totally operated, a total of 20 clients are provided for circulating knowledge distillation among 100 rounds of clients, the local training batch size is set to be 64, the internal learning rate is 0.001, the external learning rate is 0.1, and the optimizer is SGD.

Step 103, the server randomly initializes global model parameters and sends the parameters to the selected client.

Illustratively, when the t-th communication is performed, the server selects N clients from the K clients to form a set S ^t When the first communication is carried out, the server randomly initializes the global model w ^t And send the global model w into the selected client ^t 。

And 2, receiving global model parameters by the client, dividing a training set by the image data set to be identified, training the local model of the client through the training set according to the global model parameters, updating the local model parameters and uploading the local model parameters to the server. The specific flow is as follows:

step 201, the client divides the local data set into a training set, a verification set and a test set.

The data sets of the clients are in a special partitioning mode to ensure independence between training, verification and testing, so the data set D on each client _i = { (x, y) } is divided into three parts, training setVerification set->Test set->

Step 202, after each selected client receives the global model sent by the server, the local data thereof is usedk=1, …, N were trained and models updated. Wherein, the outer learning rate beta, the inner learning rate alpha and the optimizer of the meta learning strategy adopt random gradient descent (SGD):

further, at the t-th communication, a small number of (e.g. one) gradient drops are performed in each client using the local data, so as to obtain a model suitable for the local data set, and the optimization objective is defined as:

wherein F is _k (w) is a loss function on client k representing the loss or error calculated on client k's local data set; the loss function is typically used to measure the performance of the model on the training data, where w is a parameter of the model.A small number of gradient drops per client are defined, which are made locally, in particular,representing the loss function F on client k _k (w) a gradient with respect to the parameter w, which tells us the rate and direction of change of the loss function at the current parameter w. />Representing the current position at the parameter w, updating the parameter w by multiplying the step size alpha towards the direction of the gradient; this is similar to the standard gradient descent procedure, which moves the parameters in the direction of decreasing loss function. />Representing the calculation of the loss function at the updated parameter locations. This value represents the performance of the model on client k's local data updated by gradient descent.

The meta-learning model can not only inherit the advantages of federal learning (summarize all client data), but also capture the differences between different users: the client can adjust the initial model according to the data of the client, so that a unique model is obtained. During the local training process, for F _k (w) performing an update. The gradient needs to be calculated:

due to F _k The expression of (w) relates to f _k (w) gradientTherefore, calculate->Hessian matrix of parameters to be calculated in time>Consider the calculation +.>In the present embodiment, a calculation method is adopted: selecting a batch of data D at a client ^k And using this data to obtain +.>An unbiased estimate of (1), namely:

namely D ^k All gradients were averaged. Similarly, for the case ofAlso with this calculation, an unbiased estimate of a batch of data is obtained.

And 3, the server aggregates all local model parameters to update the global model parameters until the global model converges. The specific flow is as follows:

step 301, aggregating the models uploaded by the clients at the server so as to perform the next round of communication until the global model converges.

After local update, the client updates the updated local modelUploading to a server. The server performs model aggregation by performing weighted average on local model parameters of each client so as to obtain a global model in the t+1st round of communication:

and 4, training the local model by circulating knowledge distillation among the clients to form a personalized model.

Specifically, after a better model is obtained globally, circulating knowledge distillation is performed between clients, the former federation is used as a teacher model, and fine tuning is performed for the latter federation model, so that a client personalized model is formed. The server is not needed to participate in the stage, and each server keeps the parameters which are most suitable for the local.

Illustratively, when the clients complete the first stage of model training, the server fuses all models, and then performs a circular knowledge distillation only between the clients. In the training of the normal model, the round of model convergence approaching the limit does not greatly raise the overall model, and therefore, the present embodiment stops communication with the server when convergence approaches the limit, thereby reducing consumption between communications. The stage of client training locally is called public knowledge accumulation.

The former trained federation serves as a teacher for the latter federation and the annular knowledge distillation phase continues for several rounds to ensure knowledge is transferred and retained between the federal ends. In the annular knowledge distillation phase, knowledge is passed only between federations, without involvement of a central server, expressed as:

Knowledge between all federations can be fully utilized through knowledge distillation, and the performance of the local federation can be improved through knowledge in other federations. Thus, the local total loss of training:

Accumulating enough local knowledge in the previous stage and after server aggregation, obtaining a federal model F containing enough public knowledge _k (w) passing to all federal clients after the federation model aggregation phase is completed to prevent loss of public knowledge.

When the teacher model performs poorly on the verification dataset of the current federal client, we want fewer references to set λ=0. During the common knowledge accumulation phase, the current F _k (w) already contains other federal knowledge, lambda ₀ The value of (2) is set to 1 by default. When the performance of the teacher model is acceptable on the current federal validation set, we make a personalized adjustment to λ:

example two

The embodiment discloses an image recognition system based on federal element learning, comprising:

an acquisition module configured to: acquiring an image to be identified;

wherein the process of training the federal meta-learning model comprises:

It should be noted that, the acquiring module and the image identifying module correspond to the steps in the first embodiment, and the modules are the same as the examples and the application scenarios implemented by the corresponding steps, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.

Example III

The third embodiment of the invention provides an electronic device, which comprises a memory, a processor and computer instructions stored on the memory and running on the processor, wherein the steps of the image recognition method based on federal element learning are completed when the computer instructions are run by the processor.

Example IV

The fourth embodiment of the present invention provides a computer readable storage medium storing computer instructions, where the computer instructions, when executed by a processor, complete the steps of the image recognition method based on federal element learning.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The image recognition method based on federal element learning is characterized by comprising the following steps:

acquiring an image to be identified;

wherein the process of training the federal meta-learning model comprises:

2. The federal element learning-based image recognition method and system according to claim 1, wherein when the local model of the client is trained through the training set, a second-order gradient in element learning is calculated, and gradient descent is performed on the personalized model.

3. The federal learning-based image recognition method according to claim 1, wherein the training of the federal learning model is participated using a CNN network architecture including a classifier and a feature extractor composed of fully connected layers.

4. The federal element learning-based image recognition method according to claim 1, wherein the server aggregates all local model parameters to update global model parameters specifically: and carrying out weighted average on the local model parameters of each client to obtain global model parameters.

5. The federal element learning-based image recognition method according to claim 1, wherein a plurality of rounds of circular knowledge distillation is performed between a plurality of clients, expressed as:

6. The federal element learning-based image recognition method according to claim 1, wherein the loss function of the cyclic knowledge distillation stage is expressed as:

where lambda is the weight of the previous client to deliver knowledge to the current client,is a cross entropy loss function.

7. The federal element learning-based image recognition method according to claim 1, wherein the dividing the training set by the image data set to be recognized is specifically: and dividing the image data set to be identified by using Dirichlet distribution so as to realize non-independent same-distribution data partition among different clients.

8. An image recognition system based on federal element learning, comprising:

an acquisition module configured to: acquiring an image to be identified;

wherein the process of training the federal meta-learning model comprises:

9. An electronic device comprising a memory and a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the federal element learning-based image recognition method of any one of claims 1-7.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the federal element learning based image recognition method of any one of claims 1-7.