CN114584581A

CN114584581A - Federal learning system and federal learning training method for smart city Internet of things and letter fusion

Info

Publication number: CN114584581A
Application number: CN202210111366.0A
Authority: CN
Inventors: 陈铭松; 张帆; 李一鸣; 裴秋旭
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2022-01-29
Filing date: 2022-01-29
Publication date: 2022-06-03
Anticipated expiration: 2042-01-29
Also published as: CN114584581B

Abstract

The invention discloses a federal learning system for smart city Internet of things letter fusion, which comprises an equipment end and a server end, wherein the equipment end is connected with the server end through a network; the equipment end comprises an MQTT, a training thread and a heartbeat sending thread; the server side comprises an MQTT, a model aggregation thread, a task scheduling thread and a heartbeat monitoring thread. The invention also provides a federal learning training method based on the federal learning system, which comprises the following steps: the server initializes a global model, selects a certain proportion of equipment in a communication round and sends the global model to the selected equipment; after receiving the message, the equipment trains and updates the model; uploading the updated model to the server again and aggregating to obtain a new global model, and entering the next communication round; and after the preset number of communication rounds is completed, the equipment receives the latest global model issued by the server and stores the latest global model for subsequent use. The method has certain equipment expandability, robustness and high efficiency.

Description

Federal learning system and federal learning training method for smart city Internet of things and letter fusion

Technical Field

The invention belongs to the technical field of communication technology of the Internet of things, deep learning algorithm, federal learning algorithm and letter fusion under the scene of the Internet of things, and relates to a federal learning system and a federal learning training method which utilize the communication technology of the Internet of things to realize interconnection of a plurality of equipment clients and a server, build a communication interface to carry out federal learning model transmission, develop federal learning of multi-party equipment participation under the scene of the Internet of things of smart cities and realize letter fusion.

Background

With the progress of mobile technology, edge devices such as internet of things and smart mobile phones have been developed at a high speed, and become an indispensable part of modern life. These devices have some computing and communication capabilities and are producing vast amounts of valuable data. With the advent of Deep Learning (DL), these data can be effectively used to train a high-performance model for performing specific tasks, such as road congestion prediction and environmental monitoring. However, in the conventional method, making full use of the data requires centralized deep learning training for collecting data from a data center, which is difficult to achieve in practice. Because in practice data is often scattered across different parties (e.g., mobile devices and companies), this is not practical with traditional centralized deep learning approaches because due to the increasing privacy concerns and data protection regulations, parties cannot send their private data to a centralized server to train the model.

In order to solve the above problem of data privacy protection, a Federal Learning (FL) method is proposed. Federated learning is a distributed deep learning framework that allows devices to cooperatively learn a global model applicable to participating devices without sharing data, and protects the data privacy of the devices. The FedAvg algorithm is the most widely adopted FedAvg learning framework at present, and selects a part of clients to participate in training in each round, and the selected clients perform a plurality of updating rounds locally and upload the updating rounds to a server, and aggregate updating is performed after server synchronization to obtain a global model, and then the global model is sent back to the clients by the server. Through the training mode, different equipment terminals can train own data sets locally through the Federal average algorithm, model parameters of the equipment terminals are uploaded to the cloud service terminal to be aggregated, and therefore a global model containing more knowledge is obtained, privacy of personal data is protected, heavy communication cost is avoided, and meanwhile convergence of the model is guaranteed.

The internet of things system under the smart city scene, such as smart traffic, smart driving and the like, is very suitable for developing federal learning application, because data of the internet of things system are collected by massive distributed devices, such as vehicle-mounted sensors, intersection monitoring cameras and the like, and the data are usually difficult to be concentrated together due to the consideration of data privacy protection, the federal learning becomes a very promising solution for developing deep learning under the smart city internet of things scene. And carrying out federal learning by using local data of different devices to obtain a global model applicable to data of each device. After the trained federated learning global models are obtained, the federated learning global models are deployed at the equipment end or the cloud end to carry out specific task prediction, and different control instructions are executed or issued according to the model prediction result, so that information physical fusion is realized.

For a distributed machine learning framework of federal learning, if actual deployment is needed in a scene of internet of things of a smart city, a network communication technology is an indispensable ring for realizing the framework, and the network communication technology is needed to be used for connecting a server and a client participating in federal learning, so that an information interaction function between the server and the client is realized. The method requires to build a federal learning platform with a communication function, so that the internet of things equipment and a server which are expected to participate in federal learning can be more conveniently accessed into the federal learning system, and the system needs to have certain expandability.

At present, federal learning faces the problems of equipment isomerism, data isomerism, equipment unreliability and the like. The unreliable devices mean that the devices participating in the federal learning may suffer from network fluctuation, low-power shutdown and other problems, and further cause the devices to lose connection with the server temporarily or for a long time, which requires that the federal learning system has certain fault tolerance, can timely process the devices losing connection without influencing the training of other devices, and can recover the participation of the devices after the devices are reconnected. The problem of device heterogeneity is mainly caused by the difference in computing and storage capabilities between devices, which results in a large difference in the time for devices with different capabilities to perform one training locally. Because the thing networking device under the wisdom city scene is diversified, the heterogeneous problem of equipment can be especially obvious under the wisdom city thing networking scene. In this case, if a classical synchronous aggregation algorithm is adopted, it may lead to a situation where a high-performance device that has completed this training waits for a low-performance device that has not completed training, greatly reducing the efficiency of federal learning. Therefore, a federated learning algorithm is needed to be capable of processing the equipment heterogeneous situation, and learning efficiency under the equipment heterogeneous scene is improved. In addition, in the actual scene of the internet of things in the smart city, local data of each participating device is often Non-independent and uniformly distributed (Non-IID) and unbalanced, because different devices have different performances and are located in different scenes, the data collected by the devices have large difference in quantity and distribution, and the heterogeneity of the data can cause the client to introduce offset in the process of performing local update, so that a local model deviates from a global model, a client offset problem is generated, and a slow and unstable convergence process is further caused, and the performance of the model is damaged.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to provide a federal learning system and a federal learning training method for intelligent city internet of things and letter fusion, and provides a feasible solution for developing federal learning application in the scene of internet of things.

The invention provides a federal learning system for smart city internet of things and letter fusion, which comprises an equipment end and a service end, wherein the equipment end is connected with the service end through a network; wherein the content of the first and second substances,

the equipment end comprises a message queue telemetry transmission protocol MQTT, a training thread and a heartbeat sending thread;

the server side comprises a message queue telemetry transmission protocol MQTT, a model aggregation thread, a task scheduling thread and a heartbeat monitoring thread.

The training thread is used for receiving the model and the task issued by the server, performing local training by using local data, and uploading the trained model to the server;

the heartbeat sending thread is used for periodically sending heartbeat signals to the server;

the model aggregation thread is used for receiving the local models sent by the equipment end and aggregating according to a model aggregation strategy to obtain a global model;

the task scheduling thread is used for selecting equipment which is trained in the current turn from the accessed equipment and issuing a global model to the corresponding selected equipment;

the heartbeat monitoring thread is used for receiving heartbeat signals sent by each device, monitoring whether the device is on line currently or not, and processing the off-line or reconnected device in time.

MQTT (message queue telemetry transport protocol), a lightweight communication protocol based on publish/subscribe mode, is built on TCP/IP protocol, and has the advantage of providing real-time and reliable message service for remote devices with few codes and limited bandwidth, thereby realizing many-to-many communication.

The invention also provides a federal learning training method realized by using the federal learning system, which comprises the following steps:

step one, a server initializes a global model;

step two, in each communication round, the server selects equipment with a certain proportion to participate in training according to the number of participating nodes and the communication bandwidth limit of the server, wherein the value interval of the proportion value is (0, 1.0), namely at least one equipment needs to be selected to participate in training, at most all the equipment can be selected to participate in training, and a global model is sent to the selected equipment;

after the device in the step two receives the message, the global model is used as an initial model, and training is executed locally and the model is updated based on a distance constraint method of model hierarchical weighting;

step four, uploading the updated model of each device to the server again in the step three;

step five, the server aggregates the updated model with the equipment grouping strategy according to a sliding aggregation updating strategy based on model layering to obtain a new global model, and enters the next communication round;

step six, after the server finishes the preset number of communication rounds, the server regards that the Federal learning training process is finished, and sends the latest global model to all the participating devices and sends a training stopping instruction;

and step seven, after receiving the latest global model and the training stopping instruction, the equipment stops the related threads, and stores the global model in the local for subsequent further model deployment and task prediction so as to realize the service of the fuse function of the letters.

The device side is accessed into the server side, and the communication interaction function between the device side and the server side is realized through an MQTT internet of things communication technology and a web service framework flash, so that a solution is provided for developing a federal learning application in a smart city internet of things scene, deploying a federal learning model and realizing the integration of letters.

In the third step, the equipment is grouped according to the time for training and updating the model locally by the equipment, and the equipment with similar time consumption is divided into the same group; in the same group, the time difference between the device with the least time consumption and the device with the most time consumption does not exceed a preset threshold value, and the smaller the threshold value, the more the number of the packets is, the higher the asynchronous degree of the model aggregation is, the higher the communication efficiency is, otherwise, the smaller the number of the packets is, the lower the asynchronous degree of the model aggregation is, and the better the convergence performance is.

In step three, in the local training process of the equipment, applying a L2 distance constraint of model hierarchical weighting to the training process, and limiting the distance between the locally updated model and the global model, wherein the distance is measured by using the two-norm of the difference between the local model and the global model. In the local training process of the equipment, different degrees of constraint are applied to model parameter layers with different depths, a smaller constraint weight is adopted for a model parameter of a shallower layer, a larger constraint weight is adopted for a parameter of a deeper layer, and distance constraint items of the different model parameter layers are added into a training loss function after being weighted.

Step five, when the updated models are aggregated by the server, an equipment grouping strategy is adopted, if the models uploaded by the equipment in the same group are synchronized, the models uploaded by the equipment in the group are aggregated in a synchronous aggregation updating mode to obtain a secondary global model, and then different secondary global models obtained by different groups are subjected to asynchronous aggregation updating to obtain a global model; the asynchronous aggregation updating is realized in a weighted average aggregation mode, and the weight of the model with less updating times in aggregation is larger.

And step five, in the process of model aggregation by the server, a sliding aggregation updating strategy passing through layering is also applied, and based on the idea of model layering, different momentum values are adopted for different model layers to perform sliding updating, so that the model aggregation performance is improved.

Specifically, the federal learning system and the federal learning training method of the present invention include the following aspects:

a) device access and communication realized by using Internet of things communication technology and web service framework

In order to realize the communication function of the equipment side and the server, the invention adopts the commonly used MQTT communication technology in the communication of the Internet of things and a Web service framework flash. The MQTT communication needs a message transfer service (browser), and the message transfer service can depend on an OneNet Internet of things cloud platform, an Aricloud platform or self-opened emqx service and the like. The invention jointly accesses the server and the device participating in the federal learning to the browser, performs message communication through a message subscription and release mechanism, and the server and the client execute specific processing operations such as task scheduling training, model aggregation, local model updating and the like through defined message types. And the flash is mainly used for requesting the client to access the server to acquire the training parameter configuration. Aiming at the problem of unreliable equipment, the federal learning system also designs an equipment on-line real-time detection mechanism. After the equipment side is accessed to the MQTT, a heartbeat signal is periodically sent to the server side, the server side judges whether the equipment is on-line or reconnected according to the received equipment heartbeat signal, and if the equipment is off-line, the server side temporarily removes the equipment from the task scheduling pool; and if the equipment is reconnected, the server rejoins the equipment to the task scheduling pool, so that the equipment participates in the model aggregation again.

b) Method for solving equipment heterogeneous problem by combining synchronous and asynchronous updating technologies

For the problem of device heterogeneity, if a classical synchronous aggregation mode is adopted, the efficiency of federal learning is lowered by devices with lower performance in the devices, so that the time cost is increased, and meanwhile, the computing resources of high-performance devices are wasted. And the other model asynchronous aggregation updating mode can relieve the efficiency problem to a certain extent, namely when the server receives the completed update, the equipment synchronization is not carried out, and the global model is immediately updated. However, the asynchronous aggregation method has higher efficiency and robustness in the heterogeneous device scenario, but brings larger communication overhead and reduces the aggregation performance of the model. In contrast, the invention deploys a model grouping update method combining synchronous and asynchronous update modes in the federal learning system, namely, the accessed devices are grouped according to the performance, and the devices with similar performance are grouped into one group; for the same group of equipment, a secondary global model is maintained for the group by adopting a classical synchronous aggregation method, and for different groups, an asynchronous updating mode is adopted, when one updating is completed for a certain group, the models maintained by all the groups are aggregated once, so that the global model is obtained. The method can give consideration to the advantages of the asynchronous polymerization method and the synchronous polymerization method, and the efficiency of federal learning in a heterogeneous scene of equipment is improved.

c) Method for solving data heterogeneous problem by using model hierarchical constraint and hierarchical sliding aggregation update mode

In a data heterogeneous scenario, when devices participating in federal learning perform local training, local model performances of different devices are more biased to local data distribution due to great difference of local data distribution of each device, so that a client offset problem is caused, and the offset degree is different for different model parameter layers. Therefore, in order to improve the convergence performance of the federated learning model in the data heterogeneous scene, the invention provides a model hierarchical weighted L2 distance constraint method: when the equipment executes local training, distance loss constraints of different degrees are applied to network parameter layers of different depths, the distance between the local model and the global model is controlled in a fine-grained manner, under the condition that inconsistency differences of all equipment model parameter layers are considered, the local deviation of the equipment is better corrected, and the performance of the global model is improved. In addition, in order to reduce the communication overhead in a large-scale device scene, the federal average algorithm only selects part of nodes to participate in training in each communication round, the convergence rate and the convergence performance of the model can be reduced to a certain extent by the sampling training method, and in order to alleviate the problem, the invention also applies the layered idea to the model aggregation step, and further provides a layered sliding aggregation updating strategy by combining with a sliding aggregation updating mode, so that the performance of global model aggregation in a large-scale scene is improved.

The invention realizes the communication and interaction functions between the Federal learning participation equipment and the server by using the Internet of things communication technology MQTT and the web service framework flash, and comprises equipment access, model uploading and issuing, an equipment online detection mechanism and the like.

The method applies a model aggregation strategy combining synchronous and asynchronous modes, is used for improving the federal learning efficiency under the scene of equipment isomerism, and gives consideration to the communication efficiency and the training convergence performance of the federal learning.

The invention provides a model hierarchical weighted L2 distance constraint method and a hierarchical sliding aggregation updating strategy, which are respectively used for improving the processes of local training of Federal learning equipment and service end model aggregation and further improving the model convergence performance of Federal learning in a data heterogeneous scene.

The invention supports the deployment of various federal learning algorithms, all federal learning algorithm programs realized based on Python and a popular deep learning framework can extract an algorithm main body from the codes of the invention and store the algorithm main body as a template, and specific algorithms and parameters are filled into the template during the deployment to generate the federal learning codes supporting the participation of multi-party equipment.

The invention also provides a hardware system for realizing the federal learning training method, which comprises the following steps: a memory and a processor; the memory has stored thereon a computer program which, when executed by the processor, carries out the above-mentioned method.

The invention also proposes a computer-readable storage medium on which a computer program is stored which, when being executed by a processor, carries out the above-mentioned method.

The beneficial effects of the invention include: 1. the federated learning system and the federated learning training process constructed by the invention have certain system expandability, robustness and high efficiency, can be used for developing federated learning in different fields under the scene of multiple smart city Internet of things, such as traffic flow prediction, traffic light control, computer vision recognition, automatic driving perception and control and the like, fully utilize data of different devices on the premise of protecting device data privacy, train to obtain global models suitable for the different devices, execute specific task prediction and provide a basis for realizing letter fusion.

2. The equipment grouping and aggregation algorithm contained in the invention can improve the federal learning communication efficiency and the convergence speed in the heterogeneous scene of the equipment and shorten the time consumed by the federal learning training. The method has positive correlation between the improvement effect of the communication efficiency and the heterogeneous degree of the equipment, and has more obvious improvement effect on the communication efficiency when the performance difference between the equipment participating in the federal learning is larger.

3. The distance constraint method of model hierarchical weighting and the sliding aggregation updating strategy based on the hierarchy can obviously improve the model convergence speed and convergence accuracy of federal learning in a data heterogeneous scene. The proposed method was tested based on the commonly used datasets CIFAR-10 and CIFAR-100 for visual classification and compared to the currently most popular Federal learning algorithm FedAvg. The result shows that in a high-data non-independent and same-distribution scene, compared with FedAvg, the method provided by the invention can achieve 2-3 times of improvement of convergence rate on the average in the convergence rate of the model, and can achieve about 7% of improvement on the average in the convergence accuracy.

Drawings

Fig. 1 is a federal learning system architecture diagram for smart city internet of things (lot) letter fusion.

Fig. 2 is a schematic representation of the federal learning procedure of the present invention.

FIG. 3 is a graph of the weight distribution associated with the model hierarchy of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.

The OneNet platform is an Internet of things open platform created by a Middling Internet of things Limited company based on Internet of things communication technology and industrial characteristics, supports quick access of various devices and intelligent hardware, and can effectively reduce the cost of application development and deployment of the Internet of things. Therefore, the OneNet Internet of things cloud platform is used as the MQTT transfer service, the access of the Federal learning device and the server is realized by using the MQTT protocol, and a message transmission channel between the device and the server is established.

The invention relates to a federal learning system for smart city Internet of things and letter fusion and an implementation method thereof, wherein the federal learning system comprises the following contents:

1. the method for realizing the integral framework of the federal learning system comprises the following steps:

the system program is mainly divided into a server side and an equipment side. The server-side program mainly comprises three threads, namely a task scheduling thread, a model aggregation thread and a heartbeat monitoring thread. The task scheduling thread is mainly responsible for selecting equipment for training in the current turn from the accessed equipment and issuing a global model to corresponding equipment; the model aggregation thread is responsible for receiving the local models sent by the equipment terminal, and aggregation is carried out according to a model aggregation strategy to obtain a global model; and the heartbeat monitoring thread is responsible for receiving heartbeat signals sent by each device, so as to monitor whether the device is online currently and process offline or reconnected devices in time. The device end mainly comprises a training thread and a heartbeat sending thread, wherein the training thread mainly receives a model and a task issued by the server end, performs local training by using local data, and uploads the trained model to the server end; the heartbeat sending thread is responsible for periodically sending heartbeat signals to the server.

The overall design of the system is shown in fig. 1, where: firstly, a server side sends a training task and a global model to an equipment side; secondly, the equipment end receives the global model sent by the server end, updates the local model and then carries out local training; thirdly, the representation equipment sends the locally trained model to a server; representing the model sent by the server receiving equipment end, and aggregating according to a certain strategy; fifthly, after the current round aggregation is finished, the global model is delivered to a task scheduler for the next round of scheduling; sixthly, the equipment end sends heartbeat signals to the server end periodically; seventhly, the server receives the heartbeat signal of the equipment and refreshes the timing; and (b) when the server detects that the equipment is disconnected or reconnected, removing or adding the equipment again in the task scheduler.

For the federal learning training process, the invention adopts a training framework as shown in fig. 2, namely, a server initializes a global model, and in each communication round, a server selects equipment with a certain proportion and sends the global model to the selected equipment; after receiving the message, the equipment takes the global model as an initial model, performs training and updates the model by using local data based on a distance constraint method of model hierarchical weighting, and then uploads the updated model gradient; the server aggregates a model gradient uploaded by the equipment based on a sliding aggregation updating strategy of model layering and an equipment grouping strategy to obtain a new global model and enters a next communication round; after the server completes the preset number of communication rounds, the server regards that the federal learning training process is completed, and sends the latest global model to all the participating devices and sends a training stopping instruction; and after receiving the latest global model and the training stopping instruction, the equipment stops the related threads, and stores the global model locally for subsequent further model deployment and task prediction so as to realize the service of the letter fusion function.

After the federal learning process is completed, a high-performance global model suitable for different data can be obtained and deployed at an equipment end or a cloud end to execute specific task prediction, and the equipment or the cloud service executes or issues specific control instructions according to a model prediction structure to realize control of the equipment end and interaction with the environment, so that an information physical fusion function is realized.

2. The communication and interaction functions between the server and the equipment are realized by adopting an Internet of things cloud platform:

when the server program is started, the OneNet Internet of things cloud platform is accessed through an MQTT protocol, the id number of the server program is set to be 0, and meanwhile, a global model is initialized. And then starting the device registration request service through a flash framework, and waiting for the device to join. After the registration request service is started, the server side establishes a mapping between the device name and the device number, and when a new device name requests registration, the server side allocates a new id number (which is gradually increased from 1) to the device and subscribes an MQTT message topic of the device according to the id number, wherein each id number corresponds to one message topic and corresponds to a transmission channel between the server and the device. Meanwhile, in order to group devices with different performances, when the server side responds to the device registration, the server side also sends a performance evaluation instruction to the device, so that the device performs pre-training locally for one time, and the time used by the training is evaluated. The device end node and the server end node are at startup. When the equipment end node is started, firstly, an equipment registration request is sent to the server through the flash frame and the ip address of the server, and after the request is successful, the client receives model training parameters returned by the server and distributes id numbers to the equipment. And then the equipment is accessed to the cloud platform of the Internet of things through an MQTT protocol, subscribes a server subject message according to the id number, completes initialization, starts to wait for the message sent by the server, simultaneously starts to work by a heartbeat sending thread, and periodically sends heartbeat signals to the server. When the equipment end receives the performance evaluation message, the equipment end executes pre-training once locally, records the time of local training completion, and sends the time back to the server end to complete the access process.

For MQTT message communication, the invention enables the server and the equipment to enter different threads according to the message type and the message load when receiving the subscribed messages by defining different types of messages, and executes specific operations, such as model aggregation, heartbeat signal processing, local equipment training and the like.

3. The method for realizing the grouping aggregation updating strategy by combining synchronous updating and asynchronous updating comprises the following steps:

when the accessed devices reach a certain number and performance evaluation is completed, the server side groups the devices according to a certain algorithm, and the grouping principle is that the time difference between the devices with the least time consumption and the devices with the most time consumption in the same group does not exceed a threshold value. After the grouping of the devices is completed, task scheduling of the devices is started. And after receiving the model sent back by the equipment side, the server side obtains the secondary global model to which each group belongs by adopting a synchronous aggregation mode in the FedAvg for the models of the same group of equipment. And for the secondary global models of different groups, obtaining the global model by adopting an asynchronous aggregation mode, namely, when one group finishes the intra-group updating, performing weighted average aggregation on the secondary global models of all the groups. In order to prevent the performance of the models from being more biased towards high-performance devices that are updated more frequently, a weighting manner related to the number of group updates is adopted, i.e., the models with fewer updates have more weight. The aggregation strategy combining asynchronous and synchronous modes has high communication efficiency and good model convergence performance.

4. The distance constraint training implementation method of model hierarchical weighting comprises the following steps:

in the local training process of the equipment, a model layered weighted L2 distance constraint is added to a trained loss function, the distance between a global model and a local model is controlled, and the problem of client offset is relieved. This process can be described by the following equation:

g,G＝group_policy(w)

wherein, the variable i represents the index of the parameter layer and corresponds to the depth of the parameter layer; w is a^tRepresenting the global model parameter issued by the t-th communication round server; w represents a local parameter of the device training update; h is_k(w；w^t) Representing an optimization objective function in the model training process; f_k(w) is a loss function; | | · | | denotes the L2 norm; mu.s_iRepresenting a constraint weight of an i-th layer parameter layer; mu.s_baseRepresenting a basic constraint weight; w [ i ]]An i-th layer parameter representing a local model; w is a^t[i]An i-th layer parameter representing a global model; g represents a group sequence number set corresponding to the model parameters; g is a radical of formula_iRepresenting the group serial number corresponding to the ith layer parameter layer; g is the total number of the groups; group _ policy (w) represents a grouping policy function for performing grouping mapping on the model parameter layer; func (x) a mapping function representing the weight distribution, in the form shown in FIG. 3, with depth output [0,1 ] according to the parameter layer]Relative weight values therebetween; cof and inh are parameters of the weight distribution function for adjusting the weight distribution curve.

5. The sliding aggregation updating strategy implementation method based on the layering comprises the following steps:

and when the server-side carries out synchronous aggregation on the local model gradient uploaded by the equipment, adopting a sliding aggregation updating strategy based on layering. The hierarchical-based sliding aggregation update strategy can be described by the following formula:

g,G＝group_policy(w)

w^t＝w^t-1-Δw^t，

w'^t[i]＝γ_i*w^t-1[i]+(1-γ_i)*w^t[i]，

wherein t is the number of current communication rounds; w is a^t-1，w^tRespectively obtaining global model parameters obtained by updating the previous round and the current tth round; Δ w^tObtaining a model parameter gradient for the t-th polymerization; w [ i ]]Is the model ith layer parameter;

local gradients uploaded for device k on the t-th round; s_kThe number of training samples for device k; s is the total number of samples participating in the training equipment; gamma ray_baseA basic momentum parameter; gamma ray_iThe weighted network layer momentum parameters are used for adjusting the mixing proportion of the new model and the old model; w is a^t[i]Representing the ith layer parameter of the global model of the current round t; w is a^t-1[i]Representing the ith layer parameters of the (t-1) round global model; g represents a group sequence number set corresponding to the model parameters; g_iRepresenting the group serial number corresponding to the ith layer parameter layer; g is the total number of the groups; func represents a non-linear weight distribution function, which is in the form of a graphOutput [0,1 ] according to the depth of the parameter layer, shown in FIG. 3]The relative weight value between the two parameters is used to indicate that a smaller momentum parameter (corresponding to a larger model update rate) is adopted for the network parameter of the shallow layer, and a larger momentum parameter is adopted for the network parameter of the deep layer. Through the layered sliding aggregation updating strategy, the global model aggregation effect under the data heterogeneous scene can be effectively improved, the problems of model offset, knowledge forgetting and the like caused by random sampling are solved, and the training convergence stability is improved.

Examples

The invention provides a realization method of a federal learning system for smart city Internet of things letter fusion, which comprises the following code realization parts (important interception):

this section includes code implementations of the device communicating with the server, as shown by code 1:

code 1 describes the procedure of communication between the server and the device. And the server side starts the equipment registration service by using flash when starting, and waits for the equipment to send a request. And when the equipment end is started, a request is sent to the server end, after the server end responds to the request, equipment information is registered, a number is distributed to the equipment, a corresponding MQTT theme is subscribed according to the equipment number, a connection channel with the equipment is established, training parameters are sent back to the equipment end, and sending time is recorded. The equipment end sends the ready signal again after finishing one training locally, and the server end takes the difference between the receiving time and the sending time as the basis of equipment performance evaluation after receiving the ready signal and finishes the registration process. And when the number of the access devices exceeds a certain value, the server side groups the devices according to the device performance evaluation, and then schedules the devices by taking the group as a unit. And the equipment terminal receives the global model issued by the server terminal, then carries out local training, and updates the trained model to be uploaded to the server terminal for aggregation.

As shown in code 2, code 2 shows an implementation method of the device to perform local training (intercepting important parts):

the code 2 shows the process of local training of the device, and the model parameter layers of the device are grouped before the training is started to obtain the group number of each parameter layer. In the training process, the weight is determined through the group number of the parameter layer and the distribution curve function on the basis of the original cross entropy loss, and L2 distance loss with different weights is added to different parameter layers, so that the performance of local training is further improved.

As shown in code 3, code 3 shows an implementation method of the server aggregation process:

the code 3 mainly gives the process of aggregating the models by the service end. The server side synchronizes the model gradients uploaded by the devices in the same group and then performs aggregation. In the process of synchronous aggregation, a sliding aggregation updating strategy based on layering is realized by grouping model parameter layers and adopting different momentum values for different groups. And when one group completes synchronous aggregation, the server side performs an asynchronous aggregation process, namely immediately performs weighted summation on the models among different groups to obtain a global model. In order to prevent the model performance from being biased towards the group with high update frequency, the invention adopts the mode of weighting update frequency for relieving.

The invention provides a federal learning system for smart city Internet of things and letter fusion, which adopts Internet of things communication technology and a web service framework to realize communication and interaction functions between equipment and a server, and comprises equipment access, model uploading and issuing, training task scheduling, real-time online monitoring and the like. The system has certain equipment expandability and operation robustness, and provides a solution for developing federal learning, deploying a federal learning model and realizing the integration of the letters in the scene of the Internet of things. In addition, the invention designs an improved scheme aiming at equipment heterogeneous challenge and data heterogeneous challenge in federal learning. For the equipment heterogeneous challenge, the invention adopts a model grouping updating method combining a synchronous mode and an asynchronous mode at the server side, and gives consideration to the communication efficiency and the model convergence performance of the federal learning. For the challenge of data isomerism, the invention researches and designs an L2 distance constraint training method of model hierarchical weighting and a sliding aggregation updating strategy based on hierarchy, which are respectively used for improving the local training process of equipment and the aggregation process of a service-side model, thereby relieving the client offset problem and the inconsistency difference problem of model parameter layers of data heterogeneous structure, and improving the model convergence performance of federal learning in a data isomerism scene.

TABLE 1 model accuracy comparison

TABLE 2 comparison of model convergence Rate results

The federate learning training method provided by the invention is similar to the overall flow framework of the FedAvg method, and is different from the fedAvg method in that the federate learning training method is optimized for the equipment local training process and the server model aggregation process. Compared with FedAvg, the training method provided by the invention has higher convergence speed and convergence accuracy. The method is based on common data sets CIFAR-10 and CIFAR-100 of vision classification scenes, tests are carried out on the method, and the method is compared with the currently most popular Federal learning algorithm FedAvg. The test results are shown in tables 1 and 2, wherein β represents the degree of data isomerization, and a smaller value indicates a higher degree of data isomerization. The result shows that compared with FedAvg, the method provided by the invention can achieve 2-3 times of convergence rate improvement on the average in the model convergence rate and about 7% of improvement on the average in the convergence accuracy in the highly independent and identically distributed data scene (beta is 0.2).

The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, which is set forth in the following claims.

Claims

1. The federal learning system for smart city internet of things and letter fusion is characterized by comprising an equipment end and a server end; wherein the content of the first and second substances,

2. The federal learning system of claim 1, wherein the training thread is configured to receive a model and a task issued by a server, perform local training using local data, and upload the trained model to the server;

the task scheduling thread is used for selecting equipment for training in the current turn from the accessed equipment and issuing a global model to the corresponding selected equipment;

the heartbeat monitoring thread is used for receiving heartbeat signals sent by all devices, monitoring whether the devices are on line currently or not, and timely processing the off-line or reconnected devices.

3. The federal learning system of claim 1, wherein the device side access service side and the communication interaction function between the device side and the service side are implemented by MQTT internet of things communication technology and web service framework flash.

4. The federal learning system of claim 1, wherein an online real-time device detection mechanism is further applied in the system, after the device accesses the MQTT, a heartbeat signal is periodically sent to the server, and the server determines whether the device is online or reconnected according to the received heartbeat signal of the device; if the equipment is offline, the server side temporarily removes the equipment from the task scheduling pool; and if the equipment is reconnected, the server rejoins the equipment to the task scheduling pool, so that the equipment participates in the model aggregation again.

5. A federal learning training method implemented using a federal learning system as claimed in any one of claims 1 to 4, wherein the federal learning training method comprises the steps of:

step one, a server initializes a global model;

in each communication round, the server selects at least one device to participate in training according to the number of participating nodes and the communication bandwidth limit of the server, and then sends a global model to the selected device;

step four, uploading the updated model of each device to the server again in step three;

and step seven, after receiving the latest global model and the training stopping instruction, the equipment stops the related threads and stores the global model in the local for subsequent model deployment and task prediction.

6. The federal learning training method as claimed in claim 5, wherein the devices are grouped according to the time for the devices to train and update the model locally in step three, and the devices with similar time consumption are divided into the same group; the time difference between the least used device and the most used device in the same group does not exceed a predetermined threshold.

7. The federal learning training method as claimed in claim 5, wherein in step three, during the local training process of the device, a model-hierarchically weighted L2 distance constraint is applied to the training process to limit the distance between the locally updated model and the global model, and the distance measure is measured by a two-norm of the difference between the local model and the global model; applying different degrees of constraint on model parameter layers with different depths, adopting a smaller constraint weight for a model parameter of a shallower layer and a larger constraint weight for a parameter of a deeper layer, weighting distance constraint items of different model parameter layers and adding the weighted distance constraint items into a training loss function; this process is described by the following equation:

g，G＝group_policy(w)，

wherein, the variable i represents the index of the parameter layer and corresponds to the depth of the parameter layer; w is a^tRepresenting the global model parameter issued by the t-th communication round server; w represents a local parameter of the device training update; h is_k(w；w^t) Representing an optimization objective function in the model training process; f_k(w) is a loss function; i | · | | represents the L2 norm; mu.s_iRepresenting a constraint weight of an i-th layer parameter layer; mu.s_baseRepresenting a basic constraint weight; w [ i ]]An i-th layer parameter representing a local model; w is a^t[i]An i-th layer parameter representing a global model; g represents a group sequence number set corresponding to the model parameters; g_iRepresenting the group serial number corresponding to the ith layer parameter layer; g is the total number of groups; group _ policy (w) represents a grouping policy function for performing grouping mapping on the model parameter layer; func (x) a mapping function representing the weight distribution, outputting [0,1 ] according to the depth of the parameter layer]Relative weight values therebetween; cof and inh are parameters of the weight distribution function for adjusting the weight distribution curve.

8. The federal learning training method of claim 5, wherein in step five, when the server aggregates the updated models, an equipment grouping strategy is adopted, if the models uploaded by the equipment in the same group are synchronized, the models uploaded by the equipment in the group are aggregated in a synchronous aggregation update manner to obtain a secondary global model, and then different secondary global models obtained by different groups are aggregated in an asynchronous aggregation update manner to obtain a global model; the asynchronous aggregation updating is realized in a weighted average aggregation mode, and the weight of the model with less updating times in aggregation is larger.

9. The federal learning training method as in claim 5, wherein in step five, a hierarchical-based sliding aggregation update strategy is also applied during model aggregation by the server; the strategy is based on the idea of model layering, different momentum values are adopted for different model layers to perform sliding updating, and the model aggregation performance is improved; the hierarchical-based sliding aggregation update strategy is described by the following formula:

g，G＝group_policy(w)，

w^t＝w^t-1-Δw^t，

w′^t[i]＝γ_i*w^t-1[i]+(1-γ_i)*w^t[i]，

wherein t is the current number of communication rounds; w is a^t-1Updating the obtained global model parameters for the (t-1) th round; w is a^tAggregating global model parameters obtained for the t-th round; Δ w^tObtaining a model parameter gradient for the t-th polymerization; w [ i ]]Is the model ith layer parameter;

local gradients uploaded for device k on the t-th round; s_kNumber of training samples for device kAn amount; s is the total number of samples participating in the training equipment; gamma ray_baseA basic momentum parameter; gamma ray_iThe weighted network layer momentum parameters are used for adjusting the mixing proportion of the new model and the old model; w is a^t[i]Representing the ith layer parameter of the global model of the current round t; w is a^t-1[i]Representing the ith layer parameter of the (t-1) round global model, and g representing the group sequence number set corresponding to the model parameter; g_iRepresenting the group serial number corresponding to the ith layer parameter layer; g is the total number of the groups; func represents a non-linear weight distribution function, outputting [0,1 ] according to the depth of the parameter layer]Relative weight values in between.

10. A hardware system for implementing the method according to any of claims 5-9, wherein the hardware system comprises: a memory and a processor; the memory has stored thereon a computer program which, when executed by the processor, implements the method of any of claims 5-9.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 5-9.