CN117130762A

CN117130762A - Data processing method, apparatus, device, storage medium and computer program product

Info

Publication number: CN117130762A
Application number: CN202210552726.0A
Authority: CN
Inventors: 刘吉; �田�浩; 贾俊铖; 周吉文; 周瑞璞; 窦德景
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2023-11-28

Abstract

The disclosure provides a data processing method, a data processing device, electronic equipment, a storage medium and a computer program product, relates to the technical field of artificial intelligence such as model updating, model fusion, federal learning and the like, and can be applied to image recognition scenes. The method comprises the following steps: sequentially carrying out sparse compression and quantitative compression on the initial model of the federal learning of the round to obtain a first model; issuing the first model to each target edge computing node, and receiving at least two second models returned by each target edge computing node; sequentially decompressing at least two received second models according to inverse operations of quantization compression and sparse compression to obtain a corresponding number of third models; and fusing all the third models to obtain a global model subjected to federal learning of the round. By the method, the data volume transmitted between the server side and the edge computing node can be reduced, bandwidth occupation is reduced, time consumption of federation learning in each round is shortened, and federation learning efficiency is improved.

Description

Data processing method, apparatus, device, storage medium and computer program product

Technical Field

The disclosure relates to the technical field of data processing, in particular to the technical field of artificial intelligence such as model updating, model fusion, federal learning and the like, which can be applied to image recognition scenes, and particularly relates to a data processing method, a data processing device, electronic equipment, a computer readable storage medium and a computer program product.

Background

With the increase of various edge devices, such as smart phones, internet of things devices, mobile sensor devices, etc., more and more data is available for deep learning model training in different artificial intelligence applications. The traditional model training method transmits all data to a server for centralized training, so that the problems of huge communication overhead, limited computing resources, privacy security risk and the like are brought. In contrast, federal learning (Federated Learning, FL) can effectively solve these problems. In FL, model training is performed at edge devices or edge nodes, while global models are aggregated at the server side.

The FL can well solve the problems of security and privacy in edge computing, limited computing resources of edge nodes, communication overhead and the like. Researchers in the field of edge computing have adopted different federal optimization schemes, namely synchronous and asynchronous communication schemes. In an asynchronous scheme, the server does not need to wait for all devices to complete local training, it can update the global model immediately after receiving updates from any selected device.

Taking asynchronous federal learning for processing image recognition as an example, taking into account that frequent data transmission with large data volume is required between the server side and the edge computing node, the occupation of bandwidth is large. How to overcome this technical defect is a technical problem to be solved by the person skilled in the art.

Disclosure of Invention

The embodiment of the disclosure provides a data processing method, a data processing device, electronic equipment, a computer readable storage medium and a computer program product.

In a first aspect, an embodiment of the present disclosure provides a data processing method, including: sequentially carrying out sparse compression and quantization compression on the initial model of the federal learning of the round to obtain a first model, wherein the sparse compression controls the compression ratio through a first parameter, the quantization compression controls the compression ratio through a second parameter, and the values of the first parameter and the second parameter are optimal solutions for minimizing the model scale under the condition of setting the model accuracy loss threshold; the first model is issued to each target edge computing node, at least two second models returned by each target edge computing node are received, and the second models are obtained by updating and recompressing initial models obtained after decompression by the target edge computing nodes by using node local data; sequentially decompressing at least two received second models according to inverse operations of quantization compression and sparse compression to obtain a corresponding number of third models; and fusing all the third models to obtain a global model subjected to federal learning of the round.

In a second aspect, an embodiment of the present disclosure proposes a data processing apparatus, including: the model compression unit is configured to sequentially perform sparse compression and quantization compression on the initial model of the federal learning of the round to obtain a first model, wherein the sparse compression controls the compression ratio through a first parameter, the quantization compression controls the compression ratio through a second parameter, and the values of the first parameter and the second parameter are optimal solutions for minimizing the model scale under the condition of setting a model accuracy loss threshold; the model issuing and receiving unit is configured to issue a first model to each target edge computing node and receive at least two second models returned by each target edge computing node, wherein the second models are obtained by updating and recompressing initial models obtained by decompressing the target edge computing nodes by using node local data; the model decompression unit is configured to sequentially decompress the received at least two second models according to inverse operations of quantization compression and sparse compression to obtain a corresponding number of third models; and the model fusion unit is configured to fuse the third models to obtain a global model subjected to the federal learning of the round.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to implement a data processing method as described in any one of the implementations of the first aspect when executed.

In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement a data processing method as described in any one of the implementations of the first aspect when executed.

In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, is capable of implementing the steps of a data processing method as described in any of the implementations of the first aspect.

According to the data processing method provided by the disclosure, in the process of federal learning in an asynchronous mode, the initial model to be issued by the server side is compressed by sequentially using the sparse compression algorithm and the quantitative compression algorithm, and the first parameter and the second parameter which are obtained by enabling the model scale to be minimum under the set model accuracy loss threshold value in advance and serve as the optimal solution are combined, so that when the two compression algorithms are sequentially used, the model scale can be minimum while the model accuracy is ensured, and the model scales are not mutually conflicted and even have a negative effect. The compression algorithm can reduce the data quantity transmitted between the server side and the edge computing node, reduce the occupation of bandwidth, shorten the time consumption of federation learning for each round and improve federation learning efficiency.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture in which the present disclosure may be applied;

FIG. 2 is a flow chart of a data processing method according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of a method for compressing an initial model to obtain a first model in a data processing method according to an embodiment of the present disclosure;

FIG. 4 is a flowchart of a model fusion method in a data processing method according to an embodiment of the present disclosure;

FIG. 5 is a flowchart of another model fusion method in the data processing method according to the embodiment of the present disclosure;

FIG. 6 is a flowchart of a further model fusion method in the data processing method according to the embodiment of the present disclosure;

fig. 7 is a flowchart of an image recognition method according to an embodiment of the present disclosure;

FIG. 8 is a block diagram of a data processing apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device adapted to perform a data processing method according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of the data processing methods, apparatus, electronic devices, and computer-readable storage media of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include edge computing nodes, which are served by heterogeneous terminal devices 101, 102, 103, a network 104, and a server 105, which is a federal learning control center. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The terminal devices 101, 102, 103 may record or collect various data authorized by the user (such as user image data, webpage text data accessed by the user in history, and voice data queried or listened by the user), train the initial model issued by the server 105 received through the network 104 as node local data, and finally return the local model updated by the local data to the server 105, so that the server 105 fuses according to the received multiple local models to obtain a global model. To achieve the above object, various applications for enabling information communication between the terminal devices 101, 102, 103 and the server 105, such as a data collection type application, a model update type application, a compression/decompression type application, a model fusion type application, a model data transmission type application, and the like, may be installed on the terminal devices.

The terminal devices 101, 102, 103 and the server 105 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablet computers, laptop and desktop computers, and the like; when the terminal devices 101, 102, 103 are software, they may be installed in the above-listed electronic devices, which may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not particularly limited herein. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server; when the server is software, the server may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not particularly limited herein.

The server 105 may provide various services through various built-in applications, for example, a federal learning class application that may provide federal learning services, and the server 105 may achieve the following effects when running the federal learning class application: firstly, sequentially carrying out sparse compression and quantitative compression on an initial model of the federal learning of the round to obtain a first model, wherein the sparse compression controls the compression ratio through a first parameter, the quantitative compression controls the compression ratio through a second parameter, and the values of the first parameter and the second parameter are optimal solutions for minimizing the model scale under the condition of setting a model accuracy loss threshold; then, the first model is issued to the terminal devices 101, 102 and 103 serving as target edge computing nodes through the network 104, and at least two second models returned by the terminal devices 101, 102 and 103 are received successively, wherein the second models are obtained by updating and recompressing initial models obtained by decompression by using node local data by the terminal devices; then, sequentially performing inverse operation of quantization compression and sparse compression, and decompressing the received at least two second models to obtain a corresponding number of third models; and finally, fusing all third models to obtain a global model subjected to federal learning of the round.

The data processing method provided by the subsequent embodiments of the present disclosure needs to be simultaneously completed by the server side and the edge computing node. However, in order to clearly understand the point of improvement of the solution provided by the present disclosure, the execution body of the subsequent embodiment will mainly expand the operations or steps performed by the server side (e.g., the server 105 shown in fig. 1).

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring to fig. 2, fig. 2 is a flowchart of a data processing method according to an embodiment of the disclosure, wherein a flowchart 200 includes the following steps:

step 201: sequentially carrying out sparse compression and quantitative compression on the initial model of the federal learning of the round to obtain a first model;

this step aims at sequentially performing thinning compression and quantization compression on the initial model of the federal learning of this round by an execution subject (for example, the server 105 shown in fig. 1) of the data processing method, to obtain a first model. Under the federal learning mode, each pair of initial models is locally updated by combining local data of each edge computing node, and then each local updated model is fused to obtain a global model, which is called a round.

The actual compression by sparsification refers to compression processing by a sparsification compression algorithm, and specifically includes: random-k, top-k, etc.; the quantization compression actually means compression processing by a quantization compression algorithm, such as 8-bit quantization (8-bit quantization), 12-bit quantization, 16-bit quantization, and the like.

In this step, the initial model is first thinned and compressed, then the thinned and compressed model is compressed again by using quantization compression, and the final compression result is the first model. In the present disclosure, the sparse compression controls the compression ratio through a first parameter, and the quantized compression controls the compression ratio through a second parameter, so that two compression algorithms can perform benign matching with each other, and the values of the first parameter and the second parameter are specifically an optimal solution for minimizing the model scale under the set model accuracy loss threshold, so that the model accuracy is ensured and the model scale is minimized.

Step 202: issuing the first model to each target edge computing node, and receiving at least two second models returned by each target edge computing node;

On the basis of step 201, this step aims at issuing, by the above-mentioned execution body, the first model to each target edge computing node, and receiving at least two second models returned by each target edge computing node. The second model is obtained by updating and recompressing the initial model obtained after decompression by using node local data by the target edge computing node.

The target edge computing node may be a predetermined edge computing node participating in federal learning of the present round, for example, the target edge computing node may be already in an idle state before the start of the present round or may be expected to be in an idle state when the present round is actually started, or may be some edge computing nodes selected in advance.

To further clarify the process of generating the second model, the process of generating the second model is specifically developed while standing on the perspective of the target edge computing node:

the target edge computing node receives a first model in a compressed state, and then decompresses the first model according to inverse operations of quantization compression and sparse compression in sequence to obtain the initial model;

the target edge computing node uses self-stored node local data as a training sample, and uses the training sample to update parameters of the initial model to obtain a third model after parameter updating;

And the target edge computing node compresses the third model according to the same compression operation to obtain the second model.

Step 203: sequentially decompressing at least two received second models according to inverse operations of quantization compression and sparse compression to obtain a corresponding number of third models;

based on step 202, this step aims at decompressing the received at least two second models by the execution body according to the inverse operation of quantization compression and sparsification compression in turn, so as to obtain a corresponding number of third models.

Step 204: and fusing all the third models to obtain a global model subjected to federal learning of the round.

Based on step 203, this step aims at fusing at least two third models by the execution body to obtain a global model after the federal learning of this round. Namely, the above-mentioned "at least two" number limitation made to the third model is also because at least two are needed for model fusion, namely, the essence of model fusion is to fuse model parameters of at least two different third models so as to consider the model parameters learned by the two original third models.

Furthermore, before fusing the third models, the execution body may determine the actual number of the third models stored in the preset buffer space, and then only when the actual number exceeds the preset number threshold, the step of fusing the third models may be executed, so that the invalid model fusing operation is avoided while the computing resources are fully utilized.

According to the data processing method provided by the embodiment of the disclosure, in the process of federal learning in an asynchronous mode, the initial model to be issued by the server side is compressed by sequentially using a sparse compression algorithm and a quantitative compression algorithm, and the first parameter and the second parameter which are obtained by enabling the model scale to be minimum under the set model accuracy loss threshold value in advance and serve as optimal solutions are combined, so that when the two compression algorithms are sequentially used, the model scale can be minimum while the model accuracy is ensured, and the model scales are not mutually collided and even have a negative effect. The compression algorithm can reduce the data quantity transmitted between the server side and the edge computing node, reduce the occupation of bandwidth, shorten the time consumption of federation learning for each round and improve federation learning efficiency.

Referring to fig. 3, fig. 3 is a flowchart of a method for compressing an initial model to obtain a first model in the data processing method according to the embodiment of the present disclosure, that is, a specific implementation is provided for step 201 in the flowchart 200 shown in fig. 2, and other steps in the flowchart 200 are not adjusted, so that a new complete embodiment can be obtained by replacing step 201 with the specific implementation provided in the embodiment. Wherein the process 300 comprises the steps of:

Step 301: determining a model weight matrix of an initial model of the federal learning of the round;

the model weight matrix is a matrix form in which model weights corresponding to respective model structures constituting an initial model are expressed, and each model weight in the model weight matrix is generally referred to as an element of the model weight matrix.

In particular, each model structure constituting a model is actually the core part of the corresponding model, so although the compressed object is said to be a "model", the compressed object can be actually understood as: model weights or gradients corresponding to different model structures. This is because the model weights or gradients will change as the sample is trained or data updated, i.e., the model weights or gradients will exist as a result of the training or updating, i.e., the model described in this disclosure does not contain file data in a conventional sense. From this point of view, compression of the model can therefore be equivalent to compression of model weights or gradients.

Step 302: the method comprises the steps of reserving model weights with absolute value of K% in a model weight matrix by using a sparsification compression algorithm controlled by a parameter K, and resetting the rest model weights to 0 to obtain a first weight matrix;

Based on step 301, this step aims at reserving model weights with the absolute value of the model weight matrix being ranked in the previous K% by the execution body through a sparsification compression algorithm controlled by the parameter K, and resetting the rest of model weights to 0, so as to obtain a first weight matrix. Wherein K is a positive integer greater than 0 and less than 100, such as a positive integer of 5, 25, 35, etc.

Namely, in the embodiment, a Top-K sparse compression algorithm is specifically selected, and K is used as a first parameter affecting the compression ratio of the sparse compression algorithm, namely, the sparse compression operation performed in the step essentially reserves a part of model structures with larger model weights and removes a part of model structures with smaller model weights, so that model scale is reduced while model accuracy is ensured.

Step 303: reducing the original bit number of each matrix element in the first weight matrix to a new bit number corresponding to the parameter Q by utilizing a quantization compression algorithm controlled by the parameter Q to obtain a second weight matrix;

based on step 302, this step aims at reducing the original number of bits of each bit matrix element in the first weight matrix to a new number of bits corresponding to the parameter Q by the execution body using the quantization compression algorithm controlled by the parameter Q, so as to obtain a second weight matrix.

That is, in the present embodiment, the parameter Q is a second parameter for controlling the compression ratio of the quantization compression algorithm, and Q is a positive integer greater than 100 and less than 200. Specifically, when the quantization compression algorithm is 8-bit quantization, the new bit number corresponding to the parameter Q is 8 bits, i.e., the matrix element of 32 bits in the conventional case can be reduced to 8 bits. That is, the original 32-bit floating point number is compressed into an 8-bit floating point number, which is equivalent to mapping a large set into a small set, and the degree of compression is determined by the degree of sparseness of the mapping.

Step 304: a model corresponding to the second weight matrix is determined as the first model.

In this embodiment, only an exemplary specific implementation scheme for compressing the initial model is provided, and a person skilled in the art may refer to the compression guiding thought provided in this embodiment, and combine various situations and requirements in a specific application scenario, replace other sparse compression algorithms or quantization compression algorithms, and then obtain various different technical scheme combinations, which are not listed here one by one.

Considering that since devices typically generate and collect data on an edge computing network in a non-independent, co-distributed manner, the data collected or collected by each edge computing node individually is statistically significantly heterogeneous. Thus, if global fusion is performed on a perspective of all different edge computing nodes based on their local data update (i.e., the third model), some local models may reduce the convergence speed of the global model and even diverge. Therefore, the present embodiment also considers that a penalty term is added to the local optimization objective, which can effectively limit the negative effects of data heterogeneity. An implementation including, and not limited to, is:

Guiding and fusing all third models by using a preset model fusion loss function to obtain a global model which is learned by a local federation; the model fusion loss function comprises a basic part and a fusion difference control part, wherein the fusion difference control part comprises: and presetting the model difference between the third model and the global model of the same round through the preset regular term control, wherein the preset regular term is used as a control coefficient of the model difference between the third model and the global model of the same round. The fusion difference control part is added, so that the global model is not excessively biased to a certain local model finally through the guidance of the model fusion loss function, and the deviation of the global model and the certain local model is not excessively large.

The above embodiment gives a solution to the model fusion process from the perspective of non-independent co-distribution, and the following three embodiments give three different specific solutions in terms of how scientifically and reasonably specifying the contribution degree of multiple third models to the finally generated global model when performing model fusion, please refer to three different embodiments shown in fig. 4-6, respectively:

fig. 4 is a flowchart of a model fusion method in a data processing method according to an embodiment of the disclosure, where a flowchart 400 includes the following steps:

Step 401: determining the receiving time of each second model;

step 402: adding model staleness parameters with corresponding sizes to the corresponding third models according to the sequence of the receiving time;

step 403: and carrying out model fusion based on each third model added with model staleness parameters to obtain a global model subjected to federal learning of the round.

The earlier the order of the receiving time is, the smaller the model staleness parameter is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is.

That is, the model staleness described in this embodiment refers to the timeliness of the third model generated by each edge computing node, and it should be understood that if a round of federal learning has 5 edge computing nodes involved, where the time for generating the third model by 4 edge computing nodes is relatively fast, the time for generating the third model by the last edge computing node is relatively long (mainly for the case of insufficient computing capability of the computing node, but not for the case of relatively large data volume of the local data used by the computing node), then the contribution degree of the third model generated by the 5 th step to the finally generated global model in the model fusion process needs to be weakened as much as possible, because of the poor timeliness.

In other words, the embodiment ensures the accuracy of the finally generated global model from the aspect of ensuring the timeliness of the fused third model by combining the control parameter of model aging.

Unlike the model fusion scheme presented in the embodiment of fig. 4, fig. 5 shows a flowchart of another model fusion method, whose flowchart 500 includes the following steps:

step 501: determining the data quantity used in the generation process of each corresponding second model according to each third model;

step 502: according to the size of the data volume, adding a data volume weight with a corresponding size to a corresponding third model;

step 503: and carrying out model fusion based on each third model added with the data quantity weight to obtain a global model subjected to the federal learning of the round.

The larger the data volume is, the larger the data volume weight is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is.

That is, the data amount described in this embodiment refers to how much node local data is stored by itself, which is used when each edge computing node updates the initial model, and it should be understood that, in the case where the computing capabilities of each edge computing node are substantially identical, the greater the data amount of the node local data, the more knowledge the third model learns and the greater the probability that it is useful. Then the degree of contribution of these third models to the final generated global model in the model fusion process needs to be determined according to the learned knowledge.

In other words, the embodiment combines the control parameter of the data volume weight, so that the contribution degree of the third model with more knowledge to the finally generated global model is ensured to be larger, and the accuracy of the global model is further improved.

The embodiments shown in fig. 4 and fig. 5 consider the model fusion process from a single perspective of model staleness and data volume weight, respectively, and the model fusion method shown in fig. 6 considers both angles in an effort to further improve the accuracy of the finally generated global model by more comprehensive consideration. The process 600 includes the steps of:

step 601: determining the receiving time of each second model;

step 602: adding model staleness parameters with corresponding sizes to the corresponding third models according to the sequence of the receiving time;

step 603: determining the data quantity used in the generation process of each corresponding second model according to each third model;

step 604: according to the size of the data volume, adding a data volume weight with a corresponding size to a corresponding third model;

step 605: and carrying out model fusion based on each third model added with model staleness parameters and data volume weights to obtain a global model subjected to federal learning of the round.

The earlier the sequence of the receiving time is, the smaller the model staleness parameter is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is; the larger the data volume is, the larger the data volume weight is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is.

The technical effect of this embodiment, which is essentially the superposition of the embodiments shown in fig. 4 and 5, is also the accumulation of both, and the description will not be repeated here.

To deepen understanding, the present disclosure further provides a new asynchronous federal learning protocol in combination with a specific application scenario to reduce bandwidth occupation of FL, and simultaneously reduce storage space occupied by the model. The key idea of the proposed protocol is, among other things, to compress the data that needs to be transmitted at the time of communication.

A) Problem definition

Consideration toolAn edge computing system comprising a Base Station (BS) and M edge devices for jointly performing federal learning tasks in a wireless network environment. We assume that the geographic locations of the edge devices and base stations are unchanged throughout the federal learning process. We define the idle time of a device as the period of time computing task for which the device has sufficient power without other local tasks. Each device uses its local data to train the global model. The set of M edge devices is denoted asThe local data set owned by each edge device i is noted as:

wherein,is the number of samples of the local dataset, x _i,d Is the d-th sample on device i, the sample is a vector of dimension s, y _i,d Is x _i,d Is marked as +.> The total data is recorded as

The overall optimization goal is to use the local data of all edge devicesTraining weights w of the federally learned global model to minimize their value for a certain loss function, the optimization objective being defined as:

wherein,is the local loss function of the i-th device, satisfying:

wherein F (w, x) _i,d ,y _i,d ) Is the loss of the kth round on device i, defined as:

wherein f (w; x) _i,d ,y _i,d ) Is a measure of w data samples (x _i,d ,y _i,d ) Common loss functions include Cross entropy loss function (Cross-entropy loss function), 0-1 loss function (zero-one loss), and the like. Where μ is a canonical parameter, the canonical term is used to control the local model w _i,k And a global model w _k The difference is not too large.

B) Federal learning framework

Federal learning solves the above problem using an iterative approach, the kth round comprising the following steps:

1) The Base Station (BS) randomly selects part of the devices and issues the current global model w _k-1

2) Each device i receives w _k-1 After that, let w _i,k (0)＝w _k-1 Then in its local data setThe local model w is calculated by applying a random gradient descent algorithm (SGD) on the model _i,k ：

Wherein, eta is the learning rate,is from->The magnitude of the gradient calculated on the data set of one mini_batch randomly selected, the batch is marked as b, τ is the local iteration number, and τ and b are both fixed system parameters. After τ iterations, w _i,k (tau) is uploaded to the Base Station (BS).

3) The base station receives the global model w uploaded by all devices of the round _i,k They are then aggregated, typically by weighted averaging the data set sizes across the devices:

then w is _k Uploading to the BS.

4) After the BS obtains a new global model, the process is continuously repeated, and the training of the (k+1) th round is started until a satisfactory global model w is obtained _K 。

C) Asynchronous federal learning system framework for communication compression

There are M devices in the Protocol federal learning environment, device number i e {1,2, …, M }. k.epsilon.1, K represents the current number of rounds. C epsilon (0, 1) represents the scores of all devices that can participate in the training of the latest model at the same time; gamma e (0, 1) is a parameter that limits the number of local updates of the server at each round of cached devices. Q is the shared variable between processes at the BS for caching the local model sent by the edge device.

1. Initializing a base station: the BS initializes a random global model or trains a pre-trained model as a global model using the common data set. The initial global model is denoted as w ₀ 。

2. Initializing equipment: each edge device reports their information to the BS including location information, channel conditions, device computing power, whether it is idle, etc.

3. The base station operates:

3.1 A distribution process: p (P) ^k Representing the number of edge devices currently being trained, P ^k Initializing to 0;

(1) if P ^k <[M*C]Calculating a compressed global model:

the latest global model and what turn is currently,issuing the request to the device;

(2) let P ^k ＝P ^k +1；

(3) Cycles (1) and (2).

3.2 A receiving process:

(1) reception ofCalculating a decompressed local model:

then (w) _i,k ,t _i ) Put into the buffer queue Q;

(2) let P ^k ＝P ^k -1；

(3) Cycles (1) and (2).

3.3 Update process):

definition S (k, t) _i )＝(k-t _i +1) ^-a ,a>0 for local model staleness, a is a non-zero constant.

(1) If q.length > = [ M x y ], the result of calculating all local weights in Q as a weighted average of staleness and local sample number is:

then, the staleness of all local model weights in Q is calculated as:

next, the global model is updated: w (w) ^k+1 ＝α ^k u+(1-α ^k )w ^k ，k＝k+1。

(2) Repeating (1).

4. Edge device operation:

4.1 State checking process): the edge device continually checks its available computing resources and once in an idle state, immediately sends a request to the server to acquire a training task.

4.2 Local training process): after the request is responded, the BS receives

Decompression and reduction global model:

at its local data set The local model w is calculated by applying a random gradient descent algorithm (SGD) on the model _i，k ：

Then let the time stamp t _i Let w is =k _i,k ＝w _i,k (τ), compressing the local model:

then will->And transmitted to the BS through the wireless network.

5. And (3) terminating: the BS and edge device cycle through the respective processes until the global model reaches the desired performance, and each process terminates.

Wherein C is ₁ Is a sparse compression algorithm, common sparse compression mechanisms comprise Random-k, top-k and the like,is C ₁ Is the reverse of the above. C (C) ₂ Is a quantization compression algorithm, and two existing quantization compression mechanisms comprise limited-bit and code-based, and specific algorithms comprise 8-bit quantization (8-bit quantization) and the like>Is C ₂ Is the reverse of the above.

D) System model

1. Communication delay

1.1 Model weight issuing time delay

The BS and the edge equipment communicate with each other through the base station, and the maximum reachable rate of the equipment i is recorded as follows when the base station issues a model:wherein P is ₀ Representing the maximum transmission power of the BS, assuming P ₀ Always unchanged, h _0,i,k Representing channel conditions between BS and device i at the time of model release, N ₀ Representing the power density of the noise.

Will be global model w _k The size of (2) is expressed asThe time it takes for the BS to issue the uncompressed model weights to device i is:compressed global model- >The size of (2) is marked +.>The time consumed by the BS to issue the compressed global model weights to device i is:

1.2 Model weight upload delay

The maximum achievable rate of the device i when the base station issues the model is noted as:

wherein P is _i Representing the maximum transmission power of device i, assume P _i Always unchanged, h _i,0,k Representing channel conditions between BS and device i at the time of uploading model weights, N ₀ Representing the power density of the noise.

Will be local model w _i，k The size of (2) is expressed asThe time it takes for device i to upload the uncompressed model weights to the BS is: />Compressed global model->The size of (2) is marked +.>The time consumed by the device i to upload the compressed global model weights to the BS is: />

2. Calculating time delay

2.1 Model local training time delay

To capture randomness and performance fluctuations in local model training, a shift index distribution is employed to simulate the computation time of a device:

wherein,representing the local training time of device i at the kth round, a _i Is the maximum computing power, mu, of the device _i Representing fluctuations in computing power of the device, assume a _i Sum mu _i Remain unchanged throughout the training process.

3. Other neglected delays

3.1 Time of BS and device compression and decompression model weights

3.2 Time spent by BS performing other operations such as judgment and aggregation

3.3 Time when edge device sends task request

Based on the foregoing, the following will be summarized and discussed in several sections, respectively:

1. task allocation

Considering that there may be a large number of edge devices in an edge computing environment, there are too many devices involved in training the same model, resulting in slow model convergence. Thus, a hyper-parameter C is introduced to control the number of devices that are simultaneously involved in training the same model. The introduced C parameter can effectively ensure the convergence rate of the model, and avoids the potential risk of overload of the server caused by excessive participants. Once the server receives the task request from the idle device, it will determine from the C score whether the latest model is immediately sent to the client. If the number of devices participating in the training of the latest model is less than [ N.C ], the latest model is sent to the client, otherwise, not sent. Then, if the idle device receives a model from the server, it will update the model with the local data. Here, there is a trade-off for the C value: if C is set to a large value, the potential risk of upstream and downstream congestion will increase.

2. Local update and upload

Because devices typically generate and collect data on edge computing networks in a non-independent, co-distributed manner, the underlying data is statistically heterogeneous. Thus, local updates of some clients may reduce the convergence speed of the global model and even diverge. Thus, a penalty term is added to the local optimization objectives, which can effectively limit the negative impact of data heterogeneity. After receiving the global model sent by the server, the idle device solves the following regularization problems:

wherein μ is a regularization weight parameter. The penalty term can limit local updating to be closer to an initial model received by equipment, so that the influence of data heterogeneity is reduced, and model convergence is more stable.

3. Communication compression

Since asynchronous federal learning is in the communication process, communication pressure is mainly in the stage of local update on the device. At this stage, the transmission between the server and the device is mainly the weight parameters of the model. The more devices participate in the training asynchronously, the greater the communication pressure, so in order to reduce the communication pressure, further improving the communication efficiency, the communication compression mechanism is used to perform lossless compression as much as possible. The compression mechanism comprises two aspects of quantization and sparsification of model parameters.

The model parameters are first thinned, for example, only a subset of the original parameters of the model is selected, and a sparse vector is obtained. Two existing quantization compression mechanisms include limited-bit and codebook-based, and the quantization is 8-bit quantization; and the thinning compression mechanism comprises Random-k, top-k and the like. Then, quantizing the local model parameters, for example, reducing the bit number of each element in the model parameters, and compressing the original 32-bit floating point number into an 8-bit floating point number; the embodiment specifically combines the existing quantization and sparse compression mechanisms to perform communication compression on data transmitted in the wireless network.

a) Static selection of compression parameters

Sparse algorithm C used in this embodiment ₁ And (3) finding out K% of elements with the largest absolute value from the weight matrix to reserve for the Top-K sparse algorithm, and setting the rest to 0. Thus sparse algorithm C ₁ A parameter K is required to control the compression ratio. Quantization compression algorithm C used in this embodiment ₂ Specifically, 8-bit quantization, i.e. reducing the number of bits of each element in the model parameters, compressing from the original 32-bit floating point number to 8-bit floating point number, which is equivalent to mapping a large set into a small set, the degree of compression is determined by the degree of sparseness of the mapping, therefore C ₂ The algorithm requires a parameter Q to control the compression ratio.

How the parameters K and Q are chosen below becomes critical to control the compression ratio of the compression algorithm. The method for statically searching K and Q is provided, and can effectively achieve the trade-off between higher compression ratio and smaller loss of accuracy:

(1) algorithm finds the optimal K and Q within the threshold

(2) Inputting model weight w pre-trained by using common data set, compressing algorithm C ₁ Compression algorithm C ₂ The accuracy loss threshold.

(4) Output C ₁ Compression parameters K, C of (C) ₂ Is set, the compression parameter Q of (a).

The specific unfolding process is represented as follows:

(1) upper limit of test accuracy: upper_acc=test (w);

②for each K∈(0,100),Q∈(100,200)：

if upper_acc-acc<threshold：

w_size＝size(w ^compressed )；

(3) k and Q are returned that minimize w_size.

Wherein, the step of K is 5, the step of Q is 1, and the search space is 4000 test operations.

In order to facilitate a more thorough understanding of the technical solutions provided by the above embodiments of the present disclosure, the present embodiment further provides a specific usage manner of a combination scene by specifically combining with an image recognition scene:

in this image recognition scenario, the initial model described in the above embodiments is an initial image recognition model for image recognition, the node local data is a sample image set for training the initial image recognition model, the sample image set includes a sample image and an object recognition label attached to the sample image, the third model generated by each target edge computing node is a locally updated image recognition model obtained by training the initial model based on the sample image set stored locally by the node, and the global model is a global image recognition model.

On the basis of the above, fig. 7 shows an image recognition method, and a flow 700 thereof includes the following steps:

step 701: receiving an incoming image to be identified;

step 702: inputting the image to be identified into a global image identification model to obtain an image identification result output by the global image identification model;

step 703: and returning the image recognition result to the input end of the image to be recognized.

On the basis that the federal learning is specifically used for learning a global image recognition model for image recognition, when an incoming image to be recognized is received, a subsequent use scene can call the pre-generated global image recognition model, namely, an image recognition result output by the global image recognition model is finally obtained by inputting the image to be recognized into the global image recognition model, so that an image recognition request initiated by a user is completed.

With further reference to fig. 8, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of a data processing apparatus, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 8, the data processing apparatus 800 of the present embodiment may include: model compression unit 801, model transmitting and receiving unit 802, model decompression unit 803, model fusion unit 804. The model compression unit 801 is configured to sequentially perform sparse compression and quantization compression on an initial model of the federal learning of the present round to obtain a first model, wherein the sparse compression controls a compression ratio through a first parameter, the quantization compression controls a compression ratio through a second parameter, and the values of the first parameter and the second parameter are optimal solutions for minimizing the model scale under a set model accuracy loss threshold; the model issuing and receiving unit 802 is configured to issue a first model to each target edge computing node, and receive at least two second models returned by each target edge computing node, where the second models are obtained by updating and recompressing an initial model obtained by decompression by using node local data by the target edge computing node; the model decompression unit 803 is configured to sequentially decompress the received at least two second models according to inverse operations of quantization compression and sparse compression, so as to obtain a corresponding number of third models; the model fusion unit 804 is configured to fuse the third models to obtain a global model subjected to the federal learning of the present round.

In the present embodiment, in the data processing apparatus 800: the specific processes and technical effects of the model compressing unit 801, the model transmitting and receiving unit 802, the model decompressing unit 803, and the model fusing unit 804 may refer to the relevant descriptions of steps 201-204 in the corresponding embodiment of fig. 2, and are not repeated here.

In some optional implementations of the present embodiment, the model compression unit 801 may be further configured to:

determining a model weight matrix of an initial model of the federal learning of the round;

the method comprises the steps of reserving model weights with the absolute value of the model weight matrix being in front K% by using a sparsification compression algorithm controlled by a parameter K, resetting the rest model weights to 0, and obtaining a first weight matrix, wherein K is a positive integer greater than 0 and less than 100;

reducing the original bit number of each matrix element in the first weight matrix to a new bit number corresponding to the parameter Q by utilizing a quantization compression algorithm controlled by the parameter Q to obtain a second weight matrix, wherein Q is a positive integer greater than 100 and less than 200;

a model corresponding to the second weight matrix is determined as the first model.

In some optional implementations of the present embodiment, the model fusion unit 804 may be further configured to:

Guiding and fusing all third models by using a preset model fusion loss function to obtain a global model which is learned by a local federation; the model fusion loss function comprises a basic part and a fusion difference control part, wherein the fusion difference control part comprises: and presetting the model difference between the third model and the global model of the same round through the preset regular term control, wherein the preset regular term is used as a control coefficient of the model difference between the third model and the global model of the same round.

determining the receiving time of each second model:

adding model staleness parameters for the corresponding third model according to the sequence of the receiving time;

based on each third model added with model staleness parameters, carrying out model fusion to obtain a global model subjected to the federal learning of the round; the earlier the order of the receiving time is, the smaller the model staleness parameter is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is.

Determining the data quantity used in the generation process of each corresponding second model according to each third model;

adding data volume weight for the corresponding third model according to the data volume;

based on each third model added with the data quantity weight, carrying out model fusion to obtain a global model subjected to the federal learning of the round; the larger the data volume is, the larger the data volume weight is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is.

determining the receiving time of each second model:

based on each third model added with model staleness parameters and data amount weights, carrying out model fusion to obtain a global model subjected to federal learning of the round; the higher the order of the receiving time is, the smaller the model staleness parameter is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is; the larger the data volume is, the larger the data volume weight is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is.

In some optional implementations of the present embodiment, the data processing apparatus 800 may further include:

an actual number determining unit configured to determine an actual number of third models stored in a preset buffer space before fusing the respective third models;

the triggering condition satisfies an execution unit configured to execute the step of fusing the third models in response to the actual number exceeding a preset number threshold.

In some optional implementations of this embodiment, the initial model is an initial image recognition model for image recognition, the node local data is a sample image set for training the initial image recognition model, the sample image set includes a sample image and an object recognition label attached to the sample image, the third model generated by each target edge computing node is a locally updated image recognition model obtained by training the initial model based on the sample image set stored locally by the node, and the global model is a global image recognition model.

an image to be recognized receiving unit configured to receive an incoming image to be recognized;

the image recognition unit is configured to input the image to be recognized into the global image recognition model to obtain an image recognition result output by the global image recognition model;

And the image recognition result returning unit is configured to return the image recognition result to the incoming end of the image to be recognized.

The data processing device provided in this embodiment compresses an initial model to be issued by a server side by sequentially using a sparse compression algorithm and a quantization compression algorithm in the process of performing federal learning in an asynchronous manner, and combines a first parameter and a second parameter which are obtained in advance by minimizing the model scale under a set model accuracy loss threshold and serve as an optimal solution, so that when the two compression algorithms are sequentially used, model scales can be minimized while model accuracy is ensured, and the model scales are not mutually conflicting and even have a negative effect. The compression algorithm can reduce the data quantity transmitted between the server side and the edge computing node, reduce the occupation of bandwidth, shorten the time consumption of federation learning for each round and improve federation learning efficiency.

According to an embodiment of the present disclosure, the present disclosure further provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to implement the data processing method described in any of the embodiments above when executed.

According to an embodiment of the present disclosure, there is also provided a readable storage medium storing computer instructions for enabling a computer to implement the data processing method described in any of the above embodiments when executed.

According to an embodiment of the present disclosure, the present disclosure further provides a computer program product, which, when being executed by a processor, is capable of implementing the data processing method described in any of the above embodiments.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the data processing method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual private server (VPS, virtual Private Server) service.

According to the technical scheme of the embodiment of the disclosure, in the process of federal learning in an asynchronous mode, the initial model to be issued by the server side is compressed by sequentially using a sparse compression algorithm and a quantitative compression algorithm, and the first parameter and the second parameter which are obtained by enabling the model scale to be minimum under the set model accuracy loss threshold value in advance are combined to ensure that the model scale is minimum, rather than mutually conflicting and even the negative effect when the two compression algorithms are sequentially used, the model accuracy is ensured. The compression algorithm can reduce the data quantity transmitted between the server side and the edge computing node, reduce the occupation of bandwidth, shorten the time consumption of federation learning for each round and improve federation learning efficiency.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A data processing method, comprising:

sequentially carrying out sparse compression and quantization compression on the initial model of the federal learning of the round to obtain a first model, wherein the sparse compression controls the compression ratio through a first parameter, the quantization compression controls the compression ratio through a second parameter, and the values of the first parameter and the second parameter are optimal solutions for minimizing the model scale under the condition of setting a model accuracy loss threshold;

issuing the first model to each target edge computing node, and receiving at least two second models returned by each target edge computing node, wherein the second models are obtained by updating and recompressing initial models obtained by the target edge computing nodes after decompression by using node local data;

Sequentially decompressing the received at least two second models according to the inverse operation of the quantization compression and the sparse compression to obtain a corresponding number of third models;

and fusing the third models to obtain a global model subjected to the federal learning of the round.

2. The method of claim 1, wherein the sequentially performing sparsification compression and quantization compression on the initial model of the present round of federal learning to obtain a first model includes:

reserving model weights with the absolute value of K percent in the model weight matrix, resetting the rest model weights to 0 to obtain a first weight matrix, wherein K is a positive integer greater than 0 and less than 100;

and determining a model corresponding to the second weight matrix as the first model.

3. The method of claim 1, wherein said fusing each of the third models results in a global model that is subject to a present round of federal learning, comprising:

Guiding and fusing all the third models by using a preset model fusion loss function to obtain a global model which is learned by a local federation; the model fusion loss function comprises a basic part and a fusion difference control part, wherein the fusion difference control part comprises the following components: and the model difference between the third model and the global model of the same round is controlled by a preset regular term, wherein the preset regular term is used as a control coefficient of the model difference between the third model and the global model of the same round.

4. The method of claim 1, wherein said fusing each of the third models results in a global model that is subject to a present round of federal learning, comprising:

determining the receiving time of each second model;

adding model staleness parameters with corresponding sizes to corresponding third models according to the sequence of the receiving time;

based on each third model added with the model staleness parameters, carrying out model fusion to obtain a global model subjected to the federal learning of the round; the earlier the order of the receiving time is, the smaller the model staleness parameter is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is.

5. The method of claim 1, wherein said fusing each of the third models results in a global model that is subject to a present round of federal learning, comprising:

determining the data amount used in the generation process of each corresponding second model according to each third model;

adding a data quantity weight with a corresponding size to a corresponding third model according to the data quantity;

6. The method of claim 1, wherein said fusing each of the third models results in a global model that is subject to a present round of federal learning, comprising:

determining the receiving time of each second model;

based on each third model added with the model staleness parameters and the data amount weights, carrying out model fusion to obtain a global model subjected to the federal learning of the round; the higher the order of the receiving time is, the smaller the model staleness parameter is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is; the larger the data volume is, the larger the data volume weight is, and the greater the contribution degree of the corresponding third model to the finally generated global model through model fusion is.

7. The method of claim 1, further comprising:

determining the actual number of third models stored in a preset cache space before fusing each third model;

and in response to the actual number exceeding a preset number threshold, executing the step of fusing each third model.

8. The method of any of claims 1-7, wherein the initial model is an initial image recognition model for image recognition, the node local data is a sample image set for training the initial image recognition model, the sample image set includes a sample image and an object recognition label attached to the sample image, a third model generated by each of the target edge computing nodes is a local updated image recognition model obtained by training the initial model based on a sample image set stored locally by the node, and the global model is a global image recognition model.

9. The method of claim 8, further comprising:

receiving an incoming image to be identified;

inputting the image to be identified into the global image identification model to obtain an image identification result output by the global image identification model;

and returning the image recognition result to the input end of the image to be recognized.

10. A data processing apparatus comprising:

the model compression unit is configured to sequentially perform sparse compression and quantization compression on an initial model of the federal learning of the round to obtain a first model, wherein the sparse compression controls the compression ratio through a first parameter, the quantization compression controls the compression ratio through a second parameter, and the values of the first parameter and the second parameter are optimal solutions for minimizing the model scale under the condition of setting a model accuracy loss threshold;

the model issuing and receiving unit is configured to issue the first model to each target edge computing node and receive at least two second models returned by each target edge computing node, wherein the second models are obtained by updating and recompressing initial models obtained by decompression by using node local data by the target edge computing nodes;

The model decompression unit is configured to sequentially decompress the received at least two second models according to the inverse operation of the quantization compression and the sparse compression to obtain a corresponding number of third models;

and the model fusion unit is configured to fuse the third models to obtain global models subjected to the federal learning of the round.

11. The apparatus of claim 10, wherein the model compression unit is further configured to:

reducing the original bit number of each matrix element in the first weight matrix to a new bit number corresponding to the parameter Q by utilizing a quantization compression algorithm controlled by the parameter Q to obtain a second weight matrix Q which is a positive integer greater than 100 and less than 200;

12. The apparatus of claim 10, wherein the model fusion unit is further configured to:

13. The apparatus of claim 10, wherein the model fusion unit is further configured to:

determining the receiving time of each second model;

14. The apparatus of claim 10, wherein the model fusion unit is further configured to:

15. The apparatus of claim 10, wherein the model fusion unit is further configured to:

determining the receiving time of each second model;

16. The method of claim 10, further comprising:

an actual number determining unit configured to determine an actual number of third models stored in a preset buffer space before fusing each of the third models;

the triggering condition satisfies an execution unit configured to execute the step of fusing each of the third models in response to the actual number exceeding a preset number threshold.

17. The apparatus of any of claims 10-16, wherein the initial model is an initial image recognition model for image recognition, the node local data is a sample image set for training the initial image recognition model, the sample image set includes a sample image and an object recognition label attached to the sample image, the third model generated by each of the target edge computing nodes is a local updated image recognition model obtained by training the initial model based on a sample image set stored locally by the node, and the global model is a global image recognition model.

18. The apparatus of claim 17, further comprising:

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-9.

20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the data processing method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the data processing method according to any of claims 1-9.