CN115618241A

CN115618241A - Task self-adaption and federal learning method and system for edge side vision analysis

Info

Publication number: CN115618241A
Application number: CN202211218192.4A
Authority: CN
Inventors: 罗潘亚欣; 韩锐; 刘驰
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-01-17

Abstract

Aiming at the task self-adaptation, in a federal learning scene, the high precision of the model in the visual analysis is ensured by preventing negative migration, so that the model can be trained through a local sample, can interact with other edge equipment at the same time, and learns the related task information of other edge equipment, thereby improving the precision of the model of the edge equipment in the visual analysis; meanwhile, the expense in federal learning is reduced, the communication expense is high when the edge measurement is interacted, the resources of edge equipment are limited, the communication expense and the calculation expense for responding to the task self-adaption in the whole training process are reduced by using local task knowledge accumulation under the condition of ensuring the high learning precision, and the communication size of the edge measurement cannot be increased along with the increase of tasks.

Description

Task self-adaption and federal learning method and system for edge side vision analysis

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a task self-adaption and federal learning method and system for edge side vision analysis.

Background

Today, with the generation of terabytes of data at the network edge by hundreds of millions of mobile and internet of things devices (IoT), opportunities are provided for deploying artificial intelligence on edge devices, where artificial intelligence applications (e.g., deep learning networks) can avoid transmitting raw data to protect the privacy of the device, and at the same time, this presents new challenges: the continuous change of scenes encountered by edge devices requires that a deployed deep neural network model can capture and adapt to the changes in real time, in real life, the devices need to process a series of tasks over time, the tasks comprise different classifications, and the challenge is further aggravated when the network model needs to be trained on a large-scale client according to the continuous change tasks because different clients not only have different tasks but also carry heterogeneous training samples, so that how to process the task sequence on each edge by using limited resources and accurately and efficiently transmit task information among the devices becomes a key for solving the difficulty.

As discussed above, the sequential tasks on each client require that the model can continuously learn and not forget the tasks that occurred in the past, and each task needs to communicate with other clients during the learning process to effectively obtain information related to the task, thereby avoiding situations that a single client cannot process, and such communication is required to be secure and stable, because the data of the client is mostly private data, which requires a strong privacy protection method to avoid transmitting original data samples, and at the same time, because of the limitation of the network, the content transmitted by each client cannot be too large, so that in order to efficiently process new tasks, the main method includes:

setting a regularization parameter: the regularization parameters are a group of vectors for limiting model deviation, for each task, calculation is carried out according to the trained task parameters, the contribution capacity of the parameters to the task is judged, the judgment result is the regularization parameters, and the regularization parameters are set, so that the model can be prevented from excessively deviating to a new task to cause forgetting of the learned task when the new task is learned; however, the memory capacity of old tasks is continuously reduced by adopting the method for setting the regularization parameters, under the federal learning, a series of tasks carried by edge equipment are endless, and the method for setting the regularization parameters utilizes a model to process an infinite task sequence under the condition of not additionally adding storage, so that the earlier tasks are more likely to be forgotten along with the continuous progress of the tasks, the model can only accept the tasks within a certain range, and the earlier tasks are abandoned, so that the model cannot work; in addition, the learning ability of a new task is interfered, many parameters of the model are limited in order to memorize an old task, when the new task is learned, the convergence speed of the model is reduced, the training time is greatly increased, and meanwhile, the limited parameters reduce the training precision of the model;

and (3) sample storage: the sample storage method is simple in terms of samples of each occurring task access part, and the samples of the old tasks can be used for learning together when a new task is learned; however, the applicability is limited in a federal scene, and under the federal scene, the calculation and storage capacities of each edge device are limited, however, the method needs to store task samples continuously along with tasks, so that a plurality of samples cannot be stored for each task, and after the stored samples are limited, the training capacity of the model for the past tasks is reduced, the overfitting problem is generated, and finally catastrophic forgetting is caused;

dynamic model architecture: the dynamic model architecture method adds new network parameters for each task, so that each task has independent parameters, and the interference of other tasks is reduced; however, the edge end limits the size of the model, the dynamic model architecture needs to continuously expand the size of the model, and on various edge devices with limited resources, as the task arrives continuously, the model cannot be expanded unlimitedly; in addition, the augmented model brings communication problems, and each edge end needs to transmit the model parameters to the server and aggregate with other edge devices while continuously increasing the model with tasks, and there are two problems: firstly, because the tasks experienced by each edge terminal are different, the sizes of the extended models may be different, so that the number of the model parameters transmitted to the server is different, and aggregation cannot be performed; second, traffic increases with increasing model size in transit, which can cause the federal framework to stall for traffic when the number of model parameters is particularly large.

Based on the technical problems in the prior art, the invention provides a task self-adaption and federal learning method and system for edge side vision analysis.

Disclosure of Invention

The invention provides a task self-adaption and federal learning method and system for edge side vision analysis.

The invention adopts the following technical scheme:

in one aspect, the invention provides a task adaptive and federal learning method facing edge side visual analysis, which comprises the following steps:

step 1, a server transmits global model parameters to a selected client to obtain task knowledge;

step 2, the gradient generation module calculates all past task gradients according to local data samples of the client and stored task knowledge, judges whether the current task gradient number exceeds a set upper limit, if so, sequentially calculates the similarity between the current task gradient and the past task gradient, selects the gradient with the largest difference and transmits the gradient to the gradient aggregation module, otherwise, the step 2 is finished;

step 3, calculating an included angle between the current task gradient and the past task gradient by a gradient aggregation module; if the included angle is not more than 90 degrees, the current task gradient meets the requirements of the gradient aggregation module, and the current task gradient is used for updating the model; otherwise, selecting the gradient, rotating the current task gradient to the minimum extent to enable the current task gradient and each past task gradient to meet the included angle of less than or equal to 90 degrees, and updating the model by utilizing the aggregate gradient;

step 4, the client uploads the model parameters and the training sample numbers to the server, and the server performs weighted average through the training sample numbers of the clients to obtain aggregated task parameters;

and step 5, the knowledge extraction module reserves the model parameters with the maximum weight of 5-15% according to the current model, sets the rest parameters to 0, performs model adjustment, and stores the adjusted model parameters to the knowledge storage module to be used as the task knowledge of the current task.

Further, step 1 comprises:

step 1.1, selecting clients to participate in training according to the number of the clients participating in training and a set selection proportion;

and 1.2, the server sends the parameters to each selected client.

Further, in step 2, obtaining the task gradient experienced by all task knowledge from the sample includes:

step 2.1, selecting a part of samples from the current task, and carrying out one-time reasoning by the model by using the selected samples;

step 2.2, acquiring all past task knowledge from the knowledge storage module, and performing reasoning once for each task knowledge according to the sample;

and 2.3, calculating loss according to the sample data label and the inference result of the model in the step 2.1 so as to obtain the gradient of the current task, and meanwhile, calculating the gradient of each past task according to the inference result in the step 2.2.

Further, in step 5, the model adjustment includes:

step 5.1: randomly selecting partial data samples from the current task data set, and carrying out reasoning once by the model according to the data samples;

step 5.2, calculating loss by using the data labels and the reasoning results;

step 5.3, calculating gradient according to loss, and updating parameters which are not set to be 0 by using the gradient;

and 5.4, judging whether the loss reaches a loss threshold or the adjustment times reaches an upper limit, returning to the step 5.2 if the loss threshold is not reached or the adjustment times does not reach the upper limit, and otherwise, entering the step 5.3.

Further, in step 1.1, after the client training is completed, the accuracy, loss and memory occupation of each client model are tested.

Further, in step 2.2, a cross entropy loss function is used as a loss function, and a loss value is calculated according to the probability of inference and the true label (i.e. the loss is smaller as the probability of correctly predicting the category is larger), and the past task gradient calculation formula is as follows:

in equation (1): g _i Expressed as the gradient corresponding to the ith task, loss represents the cross entropy loss function, f (W, X) _m+1 ) Representing data samples X according to model parameters W _m+1 The inference result of (1), and f (W) _i ，X _m+1 ) Indicating utilization of task knowledge for the ith task at sample X _m+1 The above reasoning results.

Further, in step 3, g is used _m+1 Representing the current task gradient, G = { G = ₁ ，...，g _m Denotes the set of past task gradients, by g _i *g _m+1 If the gradient is less than 0, the included angle between the two gradients is judged to be greater than 90 degrees.

Further, in step 3, the gradient rotation problem is converted into a quadratic programming optimization problem, and the optimization objective is as follows:

s.t.Gg'≥0......(3)，

in equations (2) to (3), g' is the gradient of the final polymerization, which can be found as follows:

the constant part is omitted:

to reduce the amount of computation, the amount of rotation v of the gradient is directly computed, and equation (5) is converted to a dual equation as follows:

s.t.v≥0......(7)，

the final gradient is:

g'＝G ^T v+g _m+1 *g _m+1 ......(8)，

the model parameters are updated using g'.

In another aspect, the present invention further provides a task adaptive and federal learning system for edge-side vision analysis, which includes:

the parameter aggregation module receives a plurality of groups of model parameters of the edge side, and transmits the combined parameters to each edge device participating in training through an internal aggregation algorithm;

the knowledge extraction module is used for identifying and extracting the most relevant parameter information of the task from the trained model to serve as task knowledge of the task; after each task is switched, the knowledge extraction module extracts and stores task knowledge;

the knowledge storage module is used for receiving the task knowledge extracted by the knowledge extraction module and storing the task knowledge;

the gradient generation module is used for generating gradients of all past tasks by using the current training sample for training by using all task knowledge from the knowledge storage module so as to adapt to the tasks; the gradient generation module distinguishes a part of past task gradient which is the largest difference with the current task gradient and transmits the selected gradient to the gradient aggregation module;

and the gradient aggregation module is used for finding a proper aggregation gradient so as to reduce the loss of a new task and not increase the loss of a previous task.

Further, the gradient aggregation module makes an included angle between the aggregation gradient and the past task gradient be an acute angle.

Compared with the prior art, the invention has the following advantages:

1. according to the task self-adaption and federal learning method facing to the edge side visual analysis, aiming at the task self-adaption, in the federal learning scene, the negative migration is prevented to ensure the high precision of the model in the visual analysis, so that the model can be trained through a local sample, can interact with other edge equipment to learn the related task information of other edge equipment, and the precision of the model of the edge equipment in the visual analysis is improved;

2. the task self-adaption and federal learning method facing to the visual analysis of the edge side reduces the expense in federal learning, has large communication expense and limited resources of edge equipment when the edge side carries out interaction, reduces the communication expense and the calculation expense for responding to the task self-adaption in the whole training process by using local (local) task knowledge accumulation under the condition of ensuring the high learning precision, and the communication size of the edge side cannot be increased along with the increase of tasks.

Drawings

FIG. 1 is an overall flow chart of a method according to an embodiment of the invention;

FIG. 2 is a flow chart of step 1 of a method according to an embodiment of the invention;

FIG. 3 is a flow chart of step 2 of a method according to an embodiment of the invention;

FIG. 4 is a flow chart of step 3 of a method according to an embodiment of the invention;

FIG. 5 is a flow chart of step 4 of a method according to an embodiment of the invention;

FIG. 6 is a flow chart of step 5 of a method according to an embodiment of the invention;

FIG. 7 is a time versus accuracy graph on an edge device for methods and other methods described in embodiments of the invention;

FIG. 8 is a graph of time versus bandwidth size for a method and other methods according to embodiments of the invention;

FIG. 9 is a graph of accuracy versus a large scale scenario for the method and other methods described in embodiments of the invention;

FIG. 10 is a graph of accuracy versus a multitasking scenario for a method and other methods described in embodiments of the invention;

FIG. 11 is a graph of accuracy of a method and other methods described in embodiments of the invention over a plurality of exemplary classification networks;

FIG. 12 is a graph of accuracy versus memory scale for the method and other methods described in embodiments of the invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention is described in further detail below with reference to the accompanying drawings and detailed description, and it is to be noted that the embodiments and features of the embodiments of the present application can be combined with each other without conflict.

Examples

As shown in fig. 1 to 6, the task adaptive and federal learning method for edge-oriented visual analysis includes:

step 1, the server transmits the parameters to the client:

step 1.1, selecting a part of clients from all clients to participate in training according to the number of edge devices participating in training and a set selection ratio;

step 1.2, the server sends the parameters to each selected client, and the step 1 is finished;

step 2, the gradient generation module trains the gradient supply models of all the past tasks to deal with the self-adaptation of the tasks:

step 2.1, obtaining the task gradient experienced by all task knowledge according to the sample:

step 2.1.1, selecting a part of samples from the current task, and carrying out one-time reasoning by the model by using the selected samples;

step 2.1.2, acquiring all past task knowledge from the knowledge storage module, wherein each task knowledge carries out reasoning once according to the sample;

step 2.1.3, calculating loss according to the sample data label and the inference result of the model in the step 2.1.1, thereby obtaining the gradient of the current task, and meanwhile, calculating the gradient of each task in the past according to the inference result in the step 2.1.2;

step 2.2, judging whether the current task gradient number exceeds a set upper limit, if so, entering step 2.3, otherwise, indicating that the system can sufficiently cope with all gradients, ending the step 2 and entering step 3;

step 2.3, calculating the similarity between the current task gradient and the past task gradient in sequence, and selecting a part of gradient with the largest difference to be transmitted into a gradient aggregation module;

step 3, finding a proper polymerization gradient by a gradient polymerization module:

step 3.1, calculating an included angle between the current task gradient and the past task gradient;

step 3.2, judging whether violation conditions occur, sequentially calculating an included angle between each past task gradient and the current task gradient, judging whether the included angle is larger than 90 degrees, if the included angle is not larger than 90 degrees, indicating that the current task gradient meets the requirements of a gradient aggregation module, directly updating the model by using the current task gradient, ending the step 3, and entering the step 4; otherwise, entering step 3.3;

step 3.3, the gradients are selected, if the included angle is larger than 90 degrees, the current task gradient is indicated to violate the requirement of the gradient aggregation module, and at the moment, the current task gradient is rotated to the minimum extent, so that the current task gradient and each past task gradient can meet the condition that the included angle is smaller than or equal to 90 degrees;

step 3.4, updating the model by utilizing the aggregation gradient;

step 4, server aggregation model parameters:

step 4.1, the client participating in training uploads the model parameters and the training sample number of the client to a server;

step 4.2, the server aggregates parameters, and after receiving the parameters from the trained clients, the server performs weighted average according to the training sample number of each client to obtain aggregated task parameters;

step 5, the knowledge extraction module extracts and stores the task knowledge:

step 5.1, selecting a part with the maximum weight value according to the current model parameters, and setting the rest parameters to be 0;

step 5.2, fine adjustment of the model:

step 5.2.1: randomly selecting a part of data samples from the current task data set, and carrying out reasoning once by the model according to the data samples;

step 5.2.2, calculating loss by using the data labels and the reasoning results;

step 5.2.3, calculating a gradient according to the loss, and updating the parameter which is not set to 0 by using the gradient;

step 5.2.4, judging whether the loss reaches a loss threshold or whether the fine tuning frequency reaches an upper limit, if the loss does not meet the condition, returning to the step 5.2.2, otherwise, entering into step 5.3;

and 5.3, storing the fine-tuned model parameters into a knowledge storage module.

In the above embodiments, the concept of task knowledge is defined: in practical situations, models on edge devices need to continuously deal with different tasks, in order to ensure that the models on the edge devices do not forget learned task information, task knowledge is defined and used for storing key information of each task, and the task knowledge of each task is composed of model parameters after the task is trained and stored in each edge device.

The task self-adaptation and federal learning system facing the edge side visual analysis comprises:

Specifically, in the above embodiment, 4 platform architectures are selected in total: 8 JetsonTX2 has 256-core nvidilapacaldgpu and 8GB of memory; 8 Jetsonnano is NVIDIA-Maxwell framework, and has 128 NVIDIACUDA kernels and 4GB memory; 4 Jetson XavierNX has NVIDIA-Volta GPU with 384 cores, and the total number of the Jetson XavierNX is 48 Tensor cores and 16GB memory; the 4 JetsonAGX has 512-core VoltaGPU and 32GB memory, all Jetson platforms are ubutu 18.04.5lts, support pytorch1.9.0 (python 3.6.9) as target edge devices, and deploy multiple typical models in each edge device, where the data set considers multiple classical image classification data sets, including: cifar100, FC100, CORe50, miniImageNet, and TinyImageNet, for Cifar100, FC100, miniImageNet distributes their split into 200 private task sequences into 20 clients, for CORe50 split into 220 private task sequences into 20 clients, and for tinyiimagenet splits into 400 task sequences into 20 clients.

In step 1 of the above embodiment, when the edge device trains the network, the server selects 40% of clients to participate in training each round, that is, 20 × 0.4=8 edge devices for training, 10 rounds are performed for a task, and each edge device model is tested after 10 rounds, for example, including: accuracy, loss, memory footprint.

In step 2 of the above embodiment, the gradient generation module selects 20% of samples from the existing samples to participate in gradient calculation each time, in step 2.1.1, the model inference result is the probability of predicting each class, the loss function used in step 2.1.2 is a cross entropy loss function, which calculates a loss value according to the inference probability and the real label, that is, the higher the probability of correctly predicting a class is, the smaller the loss is, and for the gradient of the past task, the calculation formula is as follows:

in the above formula, g _i Expressed as the gradient corresponding to the ith task, loss expressed as the cross-entropy loss function described above, f (W, X) _m+1 ) Representing data samples X according to model parameters W _m+1 Inference result of (c), f (W) _i ，X _m+1 ) It means that the task knowledge of the ith task is utilized in sample X _m+1 The above reasoning results; in step 2.1.3, if the number of tasks is particularly large, k =5 is set, which means that only 5 past gradients with the largest difference are selected to be transmitted to the gradient aggregation module, and it is ensured that the training time per training time does not increase with the increase of the number of tasks, and the calculation amount is reduced.

In step 3 of the above example, the generated gradient needs to be polymerized, and for convenience of describing a specific process, g is used _m+1 Representing the current task gradient, G = { G = { G = } ₁ ，...，g _m Denotes the set of past task gradients, step 3.1 by calculating g _i *g _m+1 To determine whether there is an obtuse angle condition, i.e., g _i *g _m+1 < 0 means that the angle between the two gradients is greater than 90 degrees, and if this is not the case g is used directly _m+1 Updating model parameters, otherwise rotating the gradient according to the step 3.2, and converting the rotation of the gradient into a quadratic programming optimization problem according to the description of the step 3.3, wherein the optimization objective is as follows:

s.t.Gg'≥0，

g' is the gradient of the final polymerization, whereby:

the constant part is omitted:

to reduce the amount of calculation, the amount of rotation v of the gradient is directly calculated, and therefore, the above formula is converted into a dual formula as follows:

s.t.v≥0，

the final gradient is used at this point:

g'＝G ^T v+g _m+1 *g _m+1 ，

the model parameters are updated using g'.

In step 4 of the above embodiment, the trained parameters are transmitted to the server by 8 clients, the number of samples trained by each client in Cifar100, FC100, miniImageNet and MiniImageNet is 250, and the number of samples trained by each client in tinyiimagenet is 825, and the server performs weighted averaging to aggregate the parameters, and then transmits the parameters to the client in the next round.

In step 5 of the above embodiment, after a task is trained, the client extracts relevant task parameters as task knowledge, selects 10% of the maximum weight parameters of the model to store in step 5.1, sets the remaining parameters to 0, selects 20% of the samples to perform fine tuning on the selected parameters in step 5.2, sets the maximum number of fine tuning to 10, sets the threshold value of the error value to be one percent of the previous error value, and then stores the trained parameters in the knowledge memory.

In order to verify the superior technical effects of the method of the above embodiment, the method performed well in the context of federal learning and task adaptation by performing tests on multiple data sets. In the scenario of multiple edge devices with limited resources, shown in fig. 7, the method improves the accuracy by 33.26% compared to general federal learning, improves the accuracy by 77.35% compared to general task adaptive methods, and in practice, limits the communication network bandwidth from 50KB to 10MB, and the method of the present invention performs well, and in the case of a bandwidth of 1MB, the communication time of the method only occupies 10% of the training time, which is reduced by 34.28% compared to the latest algorithm (fig. 8).

By testing under extreme conditions, the method mainly comprises the following steps: on large-scale clients and a large number of tasks, the method still maintains the highest precision and less training time, and firstly 50 and 100 clients are selected for the large-scale clients to be tested. In the case of a large-scale client, as shown in fig. 9, the method of the present invention still maintains the highest accuracy, which is improved by 43.1% compared with some of the latest methods, while in the case of multitasking, the accuracy is improved by 39.8% compared with the latest methods, because the method does not increase the training time with the increase of tasks, the method of the present invention maintains a lower training time for each task training, as the method of the present invention includes a total of 1600 heterogeneous task sequences on 20 clients as shown in fig. 10.

The results reflect the applicability of the method of the invention by testing on different typical image classification networks, in the test, 8 latest neural networks are selected, and the test is performed on MiniImageNet, and in FIG. 11, the method maintains the highest precision on each network structure, the precision is close to 90%, and the applicability of the method to various networks is shown.

Finally, different parameter proportions are set and stored for testing, in the testing, a task self-adaptive correlation algorithm for storing samples is selected for comparison, for example, in fig. 12, with the increase of the storage proportion parameters, the training time of the method for storing samples is greatly increased, while the increase time of the method is less, and similarly, with the decrease of the storage proportion parameters, although the precision is reduced, the reduced proportion of the method is less, and the method still maintains the highest precision, which indicates that the method has stronger applicability when changes such as model replacement are carried out, and meanwhile, the super-parameter setting is simple and needs not to carefully carry out multiple experimental adjustments on a data set.

The present invention is not limited to the above-described embodiments, which are described in the specification and illustrated only for illustrating the principle of the present invention, but various changes and modifications may be made within the scope of the present invention as claimed without departing from the spirit and scope of the present invention. The scope of the invention is defined by the appended claims.

Claims

1. A task adaptive and federal learning method facing edge side visual analysis is characterized by comprising the following steps:

step 2, the gradient generation module calculates all past task gradients according to local data samples of the client and stored task knowledge, judges whether the current task gradient number exceeds a set upper limit, and if so, sequentially calculates the similarity between the current task gradient and the past task gradient, selects the gradient with the largest difference and transmits the gradient to the gradient aggregation module, otherwise, the step 2 is finished;

step 3, calculating an included angle between the current task gradient and the past task gradient by a gradient aggregation module; if the included angle is not more than 90 degrees, the current task gradient meets the requirements of the gradient aggregation module, and the current task gradient is used for updating the model; otherwise, selecting the gradient, rotating the current task gradient to the minimum extent to ensure that the included angle between the current task gradient and each past task gradient is less than or equal to 90 degrees, and updating the model by utilizing the aggregation gradient;

2. An edge-side-vision-analysis-oriented task adaptation and federal learning method as claimed in claim 1, wherein step 1 comprises:

and step 1.2, the server sends the parameters to each selected client.

3. An edge-side-vision-analysis-oriented task adaptation and federal learning method as claimed in claim 1, wherein the step 2 of obtaining a task gradient experienced by all task knowledge from the samples comprises:

step 2.2, acquiring all past task knowledge from the knowledge storage module, wherein each task knowledge carries out reasoning once according to the sample;

4. An edge-side-vision-analysis-oriented task adaptation and federal learning method as claimed in claim 1, wherein in step 5, the model adjustment comprises:

step 5.1: randomly selecting partial data samples from the current task data set, and carrying out reasoning by the model according to the data samples;

step 5.2, calculating loss by using the data labels and the reasoning results;

step 5.3, calculating a gradient according to the loss, and updating the parameter which is not set to 0 by using the gradient;

and 5.4, judging whether the loss reaches a loss threshold or whether the adjustment times reaches an upper limit, returning to the step 5.2 if the loss does not meet the conditions, and otherwise, entering the step 5.3.

5. An edge-side-vision-analysis-oriented task adaptive and federal learning method as claimed in claim 2, wherein in step 1.1, after the client training is completed, each client model is tested for accuracy, loss and memory usage.

6. An edge-side visual analysis-oriented task adaptive and federal learning method as claimed in claim 3, wherein in step 2.2, a cross entropy loss function is used as a loss function, and a loss value is calculated according to the probability of inference and a real label, and the past task gradient calculation formula is as follows:

in equation (1): g _i Expressed as the gradient corresponding to the ith task, loss represents the cross entropy loss function, f (W, X) _m+1 ) Representing data samples X according to model parameters W _m+1 The inference result of (1), and f (W) _i ，X _m+1 ) Indicating the utilization of task knowledge of the ith task at sample X _m+1 The reasoning result of (3).

7. An edge-side-vision-analysis-oriented task adaptation and federal learning method as claimed in claim 6, wherein in step 3, g is used _m+1 Representing the current task gradient, G = { G = ₁ ，...，g _m Denotes the set of past task gradients, by g _i *g _m+1 If the gradient is less than 0, the included angle between the two gradients is larger than 90 degrees.

8. An edge-side-vision-analysis-oriented task adaptive and federal learning method as claimed in claim 7, wherein in step 3, the gradient rotation problem is converted into a quadratic programming optimization problem, and the optimization objectives are as follows:

s.t.Gg'≥0......(3)，

the constant part is omitted:

to reduce the amount of computation, the amount of rotation v of the gradient is directly computed, and equation (5) is converted into a dual equation as follows:

s.t.v≥0......(7)，

the final gradient is:

the model parameters are updated using g'.

9. A task adaptive and federal learning system for edge-side vision analysis, comprising:

the gradient generation module is used for generating gradients of all past tasks by using all task knowledge from the knowledge storage module through a current training sample for training of the model so as to adapt to the tasks; the gradient generation module distinguishes a part of past task gradient which is the largest difference with the current task gradient and transmits the selected gradient to the gradient aggregation module;

10. An edge-facing visual analysis task adaptive and federal learning system as claimed in claim 9, wherein the gradient aggregation module makes the included angle between the aggregate gradient and the past task gradient an acute angle.