CN113297175A

CN113297175A - Data processing method, device, system and readable storage medium

Info

Publication number: CN113297175A
Application number: CN202110578585.5A
Authority: CN
Inventors: 侯宪龙
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2021-08-24

Abstract

The application discloses a data processing method, a device, a system and a readable storage medium, and belongs to the technical field of artificial intelligence. The method is applied to a server of a data processing system, the server is connected with a plurality of client devices, and the method comprises the following steps: receiving model training information sent by a plurality of client devices; determining a second client device associated with the first client device; training a server model corresponding to the first client device by using model training information of the second client device to obtain a target network model; and sending the model parameters of the target network model to the first client equipment, and instructing the first client equipment to update the client model corresponding to the first client equipment by using the model parameters of the target network model. According to the method and the device, the server model corresponding to the first client device is updated by utilizing the model training information of the second client device, so that the accuracy of model training can be improved while the privacy of a user is protected.

Description

Data processing method, device, system and readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a data processing method, apparatus, system, and readable storage medium.

Background

With the continuous development of artificial intelligence and the continuous improvement of privacy requirements of users, the data island problem gradually enters the visual field of people. Data occupies an extremely important position in modeling, more multidimensional and richer data are beneficial to establishing a more accurate model with better effect, but the data are generally distributed in different users, and certain privacy and safety problems exist if data sharing is carried out. Therefore, due to the problems of privacy protection, data barriers and the like, the joint training model of the privacy data of multiple parties is difficult to safely and comprehensively utilize on the premise of protecting the privacy data of each user.

Disclosure of Invention

The application provides a data processing method, a device, a system and a readable storage medium, so as to improve the defects.

In a first aspect, an embodiment of the present application provides a data processing method, where the method is applied to a server of a data processing system, where the server is connected to a plurality of client devices, and the plurality of client devices include a first client device. The method comprises the following steps: receiving model training information sent by the plurality of client devices, wherein the model training information is a network model parameter obtained by each client device respectively training a server model sent by the server by using own data; determining a second client device associated with the first client device; training a server model corresponding to the first client device by using the model training information of the second client device to obtain a target network model; and sending the model parameters of the target network model to the first client device, and instructing the first client device to update the model corresponding to the first client device by using the model parameters of the target network model.

In a second aspect, an embodiment of the present application provides a data processing method, where the method is applied to a first client device of a data processing system, and the data processing system further includes a server, where the server is connected to a plurality of client devices. The method comprises the following steps: receiving model parameters of a target network model sent by the server, wherein the target network model is network model parameters obtained by the server by training a server model sent by a plurality of client devices by using own data of each client device, determining a second client device associated with the first client device, and obtaining a server model corresponding to the first client device by using model training information of the second client device; and updating the client model corresponding to the first client device by using the model parameters of the target network model.

In a third aspect, an embodiment of the present application further provides a data processing apparatus, where the apparatus is applied to a server of a data processing system, the server is connected to a plurality of client devices, where the plurality of client devices includes a first client device, and the apparatus includes: the device comprises a receiving module, a determining module, a training module and an updating module. The receiving module is configured to receive model training information sent by the multiple client devices, where the model training information is a network model parameter obtained by each client device training a server model sent by the server by using its own data. A determination module to determine a second client device associated with the first client device. And the training module is used for training the server model corresponding to the first client device by using the model training information of the second client device to obtain a target network model. And the updating module is used for sending the model parameters of the target network model to the first client equipment and instructing the first client equipment to update the client model corresponding to the first client equipment by using the model parameters of the target network model.

In a fourth aspect, an embodiment of the present application further provides a data processing system, including a server, a first client device, and a second client device; the server comprises one or more processors, memory, one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the above-described method; the first client device includes one or more processors, memory, one or more applications, wherein the one or more applications in the first client device are stored in the memory of the first client device and configured to be executed by the one or more processors in the first client device, the one or more applications in the first client device configured for execution.

In a fifth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code can be called by a processor to execute the above method.

According to the data processing method, the data processing device, the data processing system and the readable storage medium, when model training information sent by client equipment is received, a second client equipment associated with a first client equipment is determined, a server model corresponding to the first client equipment is trained by using the model training information of the second client equipment to obtain a target network model, model parameters of the target network model are sent to the first client equipment, and the first client equipment is instructed to update the client model corresponding to the first client equipment by using the model parameters of the target network model. Therefore, the client model is trained by utilizing the model training information of the client device, the privacy of a user can be protected because the client device does not need to upload training data, and the model training information is the model training information of the second client device associated with the first client device, so that the client model corresponding to the first client device can be safely and comprehensively trained by utilizing the privacy data of multiple parties in a combined manner, meanwhile, the model training information is the model training information of the second client device associated with the first client device, so that the client model corresponding to the first client device is updated by utilizing the model training information, the convergence speed of the client model corresponding to the first client device can be accelerated, and the learning problem of non-independent and uniformly distributed data is solved.

Additional features and advantages of embodiments of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of embodiments of the present application. The objectives and other advantages of the embodiments of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 shows a schematic block diagram of a data processing system;

FIG. 2 is a block diagram of a data processing system configured in accordance with a horizontal federated learning framework;

FIG. 3 is a schematic diagram illustrating a distribution of horizontal federated learning data to which embodiments of the present application relate;

FIG. 4 illustrates a method flow diagram of a data processing method provided by one embodiment of the present application;

fig. 5 is a schematic diagram illustrating data stored in a first device in a data processing method according to an embodiment of the present application;

fig. 6 is a schematic diagram illustrating data stored in a second device in the data processing method according to an embodiment of the present application;

fig. 7 is a schematic diagram illustrating data stored in a third device in the data processing method according to an embodiment of the present application;

FIG. 8 is a schematic diagram illustrating data interaction between a server and a plurality of client devices in a data processing method according to an embodiment of the present application;

FIG. 9 is a method flow diagram of a data processing method provided by another embodiment of the present application;

fig. 10 is a flowchart illustrating a step of step S220 in a data processing method according to another embodiment of the present application;

fig. 11 is a flowchart illustrating a step of step S240 in the data processing method according to another embodiment of the present application;

FIG. 12 is a method flow diagram of a data processing method provided by yet another embodiment of the present application;

FIG. 13 is a method flow diagram of a data processing method according to yet another embodiment of the present application;

fig. 14 is a block diagram illustrating a structure of a data processing apparatus according to an embodiment of the present application;

FIG. 15 is a block diagram illustrating a data processing system provided by an embodiment of the present application;

fig. 16 illustrates a storage unit provided in an embodiment of the present application and used for storing or carrying program codes for implementing a data processing method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

For convenience of understanding, terms referred to in the embodiments of the present application will be described below.

1) Artificial Intelligence (AI)

Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

2) Machine Learning (Machine Learning, ML)

Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

3) Attention mechanism (Attention)

The attention mechanism simulates the internal process of biological observation behavior, namely a mechanism which aligns internal experience and external feeling so as to increase the observation fineness of partial areas, and can rapidly screen out high-value information from a large amount of information by using limited attention resources. The main idea of the attention mechanism is to ignore irrelevant information and focus on important information. If the attention mechanism is classified, the attention mechanism can be further classified into a hard attention mechanism and a soft attention mechanism, wherein the hard attention mechanism refers to distinguishing an attention area and a non-attention area, is respectively represented by 1 and 0 and is mainly used for reinforcement learning; the soft attention mechanism refers to the level of attention of each region, and is represented by a continuous value between 0 and 1. By adopting a soft attention mechanism, important information can be screened by integrating channel attention (channel attention) and spatial attention (spatial attention) in a three-dimensional convolutional neural network.

4) Federal Learning (Federated Learning)

Federal learning is also known as Federal machine learning, Joint learning, and Union learning. The federated learning is a machine learning framework of a distributed system based on cloud technology, and in the federated learning framework, the distributed learning framework comprises a server and a plurality of client devices, each client device locally stores respective training data, and models with the same model framework are arranged in the server and each client device, and the federated learning framework is used for training machine learning models, so that the problem of data islanding can be effectively solved, participators can jointly model on the basis of not sharing data, the data islanding can be technically broken, and AI cooperation is realized.

The federal Learning can include Horizontal Federal Learning (HFL), Vertical Federal Learning (VFL), and Federal Transfer Learning (FTL), three elements constituting the federal Learning are a data source, a federal Learning system, and a user, respectively, wherein the states of the Horizontal federal Learning are the same or similar, the features overlap more, and the users overlap less; longitudinal federal learning is to reach the same or similar users, with little overlap of features and much overlap of users; federal transfer learning is an approach where the average of the state and the user intersects less, the features overlap less, and the user overlaps less.

5) General Data Protection regulations (General Data Protection Regulation)

The universal data protection regulation is a new eu privacy and data protection regulation. The method requires more detailed privacy protection measures in a company system, more detailed data protection protocols and more user-friendly and detailed related disclosure of company privacy and data protection practices.

6) Cloud Technology (Cloud Technology)

The cloud technology is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data. The cloud technology is a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

7) Independent and distributed (independent identity Distribution)

The independent homography means that the probability distribution of each variable in a set of random variables is the same, and the random variables are independent of each other, and the independent homography of a set of random variables does not mean that the occurrence probability of each event in the sample space is the same. For example, the sequence of results from rolling non-uniform dice is independently equally distributed, but the probability of rolling each face up is not the same.

With the popularization of 5G technology (5th generation mobile communication technology) and the continuous upgrade of computing capability of mobile internet devices, mobile terminals start to carry more and more complex model computations, so that the requirements of different users on different scenes can be customized. When training a machine learning model, if the model is trained by using data generated by only one user, a good-quality machine learning model cannot be obtained, so that the machine learning model needs to be trained by combining data of different users uniformly. However, with the promulgation of General Data Protection Regulations (GDPR) and the Data security law of the people's republic of china, the strength of Data Protection is strengthened, and the Data islanding problem is troubling people. Although the data of the client device can be stored locally through the horizontal federal learning and not sent out, the effect of model training can also be achieved, the existing horizontal federal learning can only be applied to the Independent and Identically Distributed (IID) type scene, if the situation of Non-independent and identically distributed (Non-IID) occurs, the problem cannot be solved well by using the prior art, and under the situation of Non-independent and identically distributed, the convergence speed of model training is slow due to uniform data distribution.

In view of the above problems, the inventor proposes a data processing method, apparatus, system and storage medium according to embodiments of the present application, in which a client model is trained by using model training information of a client device, so that privacy of a user can be protected since the client device does not need to upload training data, and the model training information is model training information of a second client device associated with a first client device, so that the application can safely and comprehensively use multiple pieces of privacy data to jointly train a client model corresponding to the first client device, and at the same time, since the model training information is model training information of the second client device associated with the first client device, the model training information is used to update the client model corresponding to the first client device, so that a convergence rate of the client model corresponding to the first client device can be increased, namely solving the learning problem of non-independent data with the same distribution. Specific data processing methods are described in detail in the following embodiments.

FIG. 1 is a block diagram illustrating a data processing system in accordance with an exemplary embodiment. The system comprises a server 101 and a plurality of client devices 102. In some scenarios, the server 101 may be referred to as a central server, and the server may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform.

The client device 102 may be an electronic device, which may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The server 101 and the client device 102 may be directly or indirectly connected through a communication manner of a wired network or a wireless network, and the present application is not limited thereto.

Optionally, the wireless network or wired network described above uses standard communication techniques and/or protocols. The Network is typically the Internet, but may be any Network including, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, wireline or wireless Network, a private Network, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including Hypertext Mark-up Language (HTML), Extensible Markup Language (XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec). In other embodiments, custom and/or dedicated data communication techniques may also be used in place of, or in addition to, the data communication techniques described above.

The data processing scheme provided by the application can be particularly applied to the scene of horizontal federal Learning, which can also be called Feature-Aligned federal Learning (Feature-Aligned fed Learning), that is, the data features of participants of the horizontal federal Learning are Aligned, and the horizontal federal can increase the total amount of training samples. It can be seen that, when samples with the same characteristics are distributed in different participants, the horizontal federal learning is an algorithm which can ensure the data privacy of each party while realizing the comprehensive application of each data.

FIG. 2 is a block diagram of a data processing system configured in accordance with an exemplary embodiment based on a horizontal federated learning framework. Referring to fig. 2, the data processing system is comprised of a server 101 and a plurality of client devices 102, wherein the plurality of client devices 102 may be a client device 1, a client device 2, a client device 3. The client device 102 comprises at least a data storage 1021, the data storage 1021 is configured to store data generated by the client device 102, and to construct a training data set based on the data, which may be referred to as data of the client device itself, by which at least one client model 1022 may be trained. The at least one client model 1022 may be a preset learning model. The client models 1022 may be trained according to data stored in the data storage 1021, and then model training information corresponding to each client model 1022 obtained through training may be uploaded to the server 101. The server 101 fuses the model training information uploaded by the client model to obtain the latest network model, and re-issues the latest network model to the client device 1022, until the client model in the client device meets the convergence condition, the model training is stopped.

As an example, fig. 3 is a schematic diagram of a distribution of horizontal federal learning data related to the present application. The data processing system may include a plurality of client devices, which may be client device 1, client device 2, and client device 3, respectively, as shown in FIG. 3. Wherein the data stored in the client device 1 is a first data set 11, the first data set 11 comprising samples u1 to u3, each sample having characteristic data comprising X1 to X4; the data stored in the client device 2 is a second data set 12 comprising samples u4 through u6, each sample having characteristic data comprising X1 through X4; the data stored in the client device 3 is a third data set 13 comprising samples u7 to u9, each sample having characteristic data comprising X1 to X4, in the third data set 13. The data of the whole federal learning can be expanded to include samples u1 to u9 through horizontal federal learning, and each sample has characteristic data including X1 to X4.

Referring to fig. 4, fig. 4 is a schematic flow chart illustrating a data processing method according to an embodiment of the present application. In a specific embodiment, the data processing method is applied to the data processing apparatus 500 shown in fig. 14 and to the server 610 shown in fig. 15. As will be described in detail with respect to the flow shown in fig. 4, the data processing method may specifically include step S110 to step S140.

Step S110: and receiving model training information sent by the plurality of client devices.

In this embodiment of the application, an application scenario of the data processing method may be horizontal federal learning, and the method may be applied to a server of a data processing system as shown in fig. 1 and fig. 2, where the server may be connected to a plurality of client devices, where the plurality of client devices may include a first client device, that is, the server may be connected to the plurality of client devices including the first client device through a wired or wireless network, and the server may send data to the plurality of client devices or receive data transmitted by the plurality of client devices.

In some embodiments, the server may receive model training information sent by a plurality of client devices, where the model training information may be that each client device trains a server model sent by the server by using its own data, and different client devices with different network model parameters may store different data, and the characteristics included in the data may be the same or different. In addition, the format corresponding to the data of the client device itself may be an image, a voice, or a text, and the like, and the format of the data included in each client device may be the same, and the specific format of the data is not specifically limited here and may be selected according to the actual situation.

However, in most cases, the data contained in different client devices is not the same, and the characteristics contained in the data may not be the same, and this type of data distribution may be referred to as a non-independent and uniform distribution. The non-independent same distribution is that the data owned by each client device often has different feature distributions or label distributions during the joint training because the users corresponding to the samples of the data owner are different, the users are located in different regions, and the time windows for data acquisition are different, and the features are not mutually independent.

To more clearly understand the difference between the non-independent co-distribution and the independent co-distribution, diagrams as shown in fig. 5 to 7 are given, fig. 5 being data stored in the first device, which may include a first object 111 and a second object 112, wherein the first object 111 is a dog and the second object 112 is a cat. Fig. 6 is data stored in the second device, where the data includes a third object 113, where the third object 113 is a dog, and it can be known by comparing fig. 5 and fig. 6 that the data included in the first device and the data included in the second device both include a dog, that is, the data included in the first client device and the data included in the second client device belong to independent and same distribution. In addition, fig. 7 is data stored in the third device, and the data includes a fourth object 114, a fifth object 115, a sixth object 116, and the like, wherein the fourth object 114, the fifth object 115, and the sixth object 116 together form a landscape. For example, the fourth object 114 may be a river, the fifth object 115 may be a grass, and the sixth object 116 may be a sky, the river, the grass, the sky, etc. may collectively form a landscape view as shown in fig. 7, fig. 7 is two types of images of different kinds compared to fig. 5 or fig. 7 and 6, fig. 7 is a landscape view, and fig. 5 or fig. 6 is an animal image, so fig. 5 and 7 or fig. 6 and 7 are non-independent and co-distributed data.

As a way, when the server obtains network model parameters uploaded by different client devices, if the network model parameters uploaded by the client devices are directly fused by using a global model, the client models cannot be converged, or the network model parameters can be converged but the convergence time of the model is long, mainly because the network model parameters are obtained by training of non-independent and uniformly distributed data. In order to solve the problem, in the embodiment of the application, the second client device associated with the first client device is obtained, the server model corresponding to the first client device is trained by using the model training information of the second client device, and the model training information of each client device can be quickly and effectively obtained by continuously updating and iterating.

As an example, if a first client device is associated with a second client device, a server model corresponding to the first client device may be trained by using model training information of the second client device, and if the first client device is not associated with the second client device, the server model corresponding to the first client device is not trained by using model training information of the second client device. Because the information for training the server model corresponding to the first client device is the model training information of the device associated with the first client device, the non-convergence of the model can be avoided to a certain extent, and the convergence speed of the model is increased.

In some implementations, the model training information sent by the client device may be at least one of model gradient data, model parameters, and a complete client model. If the model training information is a model parameter, each client device may train a server model sent by a server through its own data to obtain a client model, where the parameter corresponding to each client model is the model parameter, and the model parameter may include a weight, an offset value, and the like of the client model. In addition, after the client device acquires the client model, random noise may be added to the generated model parameters of each client model through a differential privacy mechanism, and the model parameters to which the random noise is added are sent to the server as model training information.

In other embodiments, each client device may train a server model sent by the server through its own data, generate an intermediate model gradient, and then send the model gradient to the server as model training information. In addition, when the client device acquires the intermediate model gradient, random noise can be added to each generated model gradient through a differential privacy mechanism, and the model gradient with the noise added is sent to the server as model training information.

In other embodiments, each client device may train a server model sent by the server through its own data, generate a client model, and then send each client model to the server as model training information. In addition, when the client device acquires the client model, random noise can be added to the generated client model through a differential privacy mechanism, and the client model with the noise added is used as model training information and sent to the server.

In addition, in order to ensure the security of the model training information, when the client device acquires the model gradient data, the model parameters, or the complete client model, it may encrypt the model gradient data, the model parameters, or the complete client model, etc. by using a homomorphic encryption method, and send the encrypted data as the model training information to the server. Homomorphic encryption is a cryptography technology obtained based on mathematical computation encryption, and an exchange rule is satisfied between four operations and encryption operation, namely, aiming at data A, encrypted data B is generated by encrypting through the homomorphic encryption technology, and operations such as addition and subtraction are performed on the encrypted data B. The result obtained is the same as that obtained by performing the same addition and subtraction operation on the data a and encrypting it. Homomorphic encryption is generally classified into fully homomorphic encryption and semi-homomorphic encryption. The fully homomorphic encryption simultaneously satisfies the addition homomorphic encryption and the multiplication homomorphic encryption, and the semi-homomorphic encryption intelligently satisfies one of the conditions.

In other embodiments, in order to ensure the security of the model training information, when the client device acquires the model gradient data, the model parameters, or the complete client model, it may also encrypt the model gradient data, the model parameters, or the complete client model, etc. by using a Secure Multi-party computing (MPC) method, and send the encrypted data to the server as the model training information. The safe multi-party calculation is a distributed calculation technology for multi-party cooperation, and under the condition that a plurality of data participants perform common calculation, the participants who are not trusted mutually can be ensured to obtain required calculation results, and meanwhile, original data information cannot be leaked. Secure multiparty computing requires the use of different computing protocols for different applications, including oblivious transfer protocols, secret sharing protocols, etc. The data participants may be a plurality of client devices in this embodiment.

In some embodiments, the server may be deployed with a plurality of server models, and the number of the plurality of server models may be determined according to the number of the client devices, that is, how many client devices may be configured with how many server models. For example, if there are three client devices, the server may correspondingly deploy three server models, and each server model corresponds to one client device, specifically, the client device 1 corresponds to the server model 1, the client device 2 corresponds to the server model 2, and the client device 3 corresponds to the server model 3.

In other embodiments, the multiple server-side models deployed by the server may be Neural Network models of the same structure or mathematical models, and the Neural Network models may include Deep Neural Network (DNN) models, Recurrent Neural Network (RNN) models, embedding (embedding) models, Gradient Boosting Decision Tree (GBDT) models, and the like, and the mathematical models include linear models, Tree models, and the like, which are not listed herein any more. It should be noted that, the roles of the multiple server-side models deployed by the server are also the same, for example, the server-side models are all used for image recognition, or all used for voice recognition, etc.

In other embodiments, when the server obtains the server model, it may first perform initialization operation on each model parameter in the server model randomly to obtain an initialized server model, and when issuing a model training parameter for each client device currently performing horizontal federal learning, issue the initialized server model to the corresponding client device. For example, the initialized server model 1 is issued to the client device 1, the initialized server model 2 is issued to the client device 2, and the initialized server model 3 is issued to the client device 3.

As a mode, after the server issues a plurality of initialized server models to corresponding client devices, the server may instruct the client devices to train the server models by using their own data, so as to obtain model training information corresponding to different client devices. As can be appreciated from the above description, the data contained by different client devices is not the same. For example, data corresponding to the client device 1 is data 1, data corresponding to the client device 2 is data 2, and data corresponding to the client device 3 is data 3.

Specifically, after receiving a server model 1 sent by a server, a client device 1 may train the server model 1 by using data 1 to obtain a client model 1, and at this time, the client device 1 may upload model training information 1 corresponding to the client model 1 to the server; after receiving the server model 2 sent by the server, the client device 2 may train the server model 2 by using the data 2 to obtain the client model 2, and at this time, the client device 2 may upload the model training information 2 corresponding to the client model 2 to the server; after receiving the server model 3 sent by the server, the client device 3 may train the server model 3 by using the data 3 to obtain the client model 3, and at this time, the client device 3 may upload the model training information 3 corresponding to the client model 3 to the server. The above description is merely exemplary in nature and, thus, should be read in light of the above teachings.

Step S120: a second client device associated with the first client device is determined.

In this embodiment of the application, the first client device may be any one of a plurality of client devices, and there may be a plurality of first client devices at the same time, such as client C in the example diagram shown in fig. 8₁Can be used as a first client device, client C₂As well as the first client device, etc. In addition, before determining a second client device associated with the first client device, in the embodiment of the present application, the working state of each client device may be obtained first, and then the client device whose working state meets a preset working state is used as the first client device, where the working state may be the current power consumption of each client device. For example, the current client device may be considered the first client device upon determining that the power consumption of the client device is less than the preset power consumption.

In another implementation, the working state of the client device may also include an application program currently running by the client device, and specifically, in this embodiment of the present application, it may be determined whether the application program currently running by the client device is a specified application program, if the application program currently running by the client device is the specified application program, the client device is not taken as the first client device, and if the application program currently running is not the specified application program, the client device may be taken as the first client device. For example, the currently running program is a game program, and since the power consumption of the game program is generally large, the client device is not regarded as the first client device when it is determined that the client device runs the game program. In addition, the designated application program may be set by the user according to the requirement of the user, or may be preset by the client device before shipment.

In other embodiments, when a plurality of client devices whose operating states conform to the preset operating state are provided, in the embodiments of the present application, all of the plurality of client devices may be regarded as first client devices, and then a second client device associated with each first client device is obtained. It should be noted that, in the embodiment of the present application, the second client device associated with each first client device may be acquired in parallel, or the second client device associated with each first client device may be acquired in series. Specifically, the server may obtain second client devices associated with each first client device at the same time, and then train the server model corresponding to each first client device by using the model training information of each second client device, respectively, to obtain a plurality of target network models. Or, the server may also respectively traverse each first client device, and end the processing of the data until all the first client devices have been traversed.

In some implementations, the second client device may be a client device other than the first client device associated with the first client device, wherein the association may be that the data stored by the first client device is similar to the type of data stored by the second client device, which may be that the subject type of the data stored by the first client device is similar to the subject type of the data stored by the second client device. Specifically, the embodiment of the present application may determine whether the data tag stored by the first client device is similar to the data tag stored by the second client device, and if so, indicate that the first client device is associated with the second client device. For example, client C₁For a first client device whose subject type of stored data is an animal, client C₂Subject type of stored data is scene, client C₃The subject type of the stored data is animal, and the client C can be connected at the moment₃Is called as the client C₁An associated second client device.

In this embodiment of the application, the number of the second client devices associated with the first client device may be 1, may be multiple, or may be absent, where the absence refers to that the second client device associated with the first client device does not exist in the current time period. If it is determined that the second client device associated with the first client device does not exist in the current time period, the embodiment of the present application may continuously detect, that is, determine whether the second client device associated with the first client device exists at the next time.

Specifically, the embodiment of the application may detect an update condition of data corresponding to other client devices, and when it is determined that the data of the other client devices is updated, re-determine whether a second client device associated with the first client device exists. The main reason for this is that the data of different client devices may change continuously with the habits of the users using the client devices. For example, client C₁The user who likes the landscape map in the first time period but likes the animal map in the second time period, which also causes the data to change continuously according to the user's preference.

In some embodiments, when the server receives model training information sent by multiple client devices, it may determine a second client device associated with a first client device, and in particular, in this embodiment of the present application, the second client device associated with the first client device may be determined according to the model training information, the second client device associated with the first client device may be determined according to different categories of the client devices, or the second client device associated with the first client device may be determined according to a geographic location between the client devices, and the like.

In this embodiment of the application, the server may determine, in parallel, second client devices associated with different client devices, specifically, the server takes all the client devices as the first client devices at the same time, and then determines, respectively, the second client devices associated with each of the first client devices. For example, parallel determination with client C₁Associated second client device, parallel determination with client C₂Associated second client device, parallel determination with client C₃An associated second client device, and so on.

In addition, the servers can also be connected in seriesDetermining a second client device associated with a different client device, e.g. determining first with client C₁Associated second client device, and then determining with client C₂The associated second client device, and then determines the second client device associated with client C₃An associated second client device, and so on. The specific manner in which the second client device associated with the different client devices is determined is not described in detail here, and may be selected according to actual situations.

Step S130: and training the server model corresponding to the first client device by using the model training information of the second client device to obtain a target network model.

In some embodiments, after acquiring the second client device associated with the first client device, the server may train a server model corresponding to the first client device by using model training information of the second client device to obtain the target network model. Specifically, the server may determine whether the number of the second client devices is multiple, and if the number of the second client devices is multiple, train the server model corresponding to the first client device by using the model training information of the second client device to obtain weight values corresponding to different model training information, then obtain an average value of the model training information of the second client devices by using a weighted average method, and finally use the average value of the model training information as a network model parameter of the target network model.

As an embodiment, if the model training information of the second client device is gradient information, the server may calculate an average value of gradient values in each gradient information in a weighted average manner, and update a result as a parameter of the server model corresponding to the first client, that is, use the average value of gradient values in each gradient information as a gradient value of the target network model. If the model training information of the second client device is the model parameter information, the server may calculate the average value of the model parameters in each model parameter information in a weighted average manner, and update the result as the parameter of the server model corresponding to the first client, that is, use the average value of the model parameters in each model parameter information as the model parameter of the target network model.

Optionally, the weight of each second client device used in the weighted average algorithm may be set in advance according to specific needs, for example, the weight may be set according to a data volume ratio of local training data of each participating device, and a participating device with a larger data volume correspondingly sets a higher weight. In addition, the weight may also be determined according to the association degree between the second client device and the first client device, and the greater the association degree between the second client device and the first client device is, the greater the corresponding weight value may be.

As can be known from the above description, the first client device may be any one of a plurality of client devices, that is, in this embodiment, each client device may serve as the first client device, and may also serve as a second client device of other client devices. Therefore, the number of the finally obtained target network models may also be multiple, that is, each first client device may correspond to one target network model, and if the server obtains multiple target network models, the server may send the model parameters of the target network models to the corresponding first client devices, and instruct the first client devices to update the client models corresponding to the first client devices by using the model parameters of the target network models, that is, step S140 is performed.

Step S140: and sending the model parameters of the target network model to the first client equipment, and instructing the first client equipment to update the client model corresponding to the first client equipment by using the model parameters of the target network model.

In this embodiment of the application, after obtaining the target model, the server may send the model parameters of the target network model to the first client device, and instruct the first client device to update the client model corresponding to the first client device by using the model parameters of the target network model, where the client model corresponding to the first client device may be referred to as a first client model, and the first client model may be a model stored by the first client device. In other words, the client model corresponding to the first client device is mainly a model obtained by training the server model by using the data of the first client device after the first client device receives the server model sent by the server.

Specifically, the server may instruct the first client device to perform iterative training on the client model corresponding to the first client device by using its own data, and the first client device may optimize the client model corresponding to the first client device by using the model parameters of the target network model in the training process. The client model corresponding to the first client device is a model corresponding to model training information sent by the first client device. The model parameters of the target network model may be a weight, a deviation, a loss, and the like of the target network model, and the model parameters may be a gradient of the target network model. Optionally, the target network model may also be directly sent to the first client device in the embodiment of the present application.

In other embodiments, in order to ensure data security, before the server obtains the client model corresponding to the client device that sends the model parameters of the target network model to the first client device, the server may encrypt the model parameters of the target network model, and then send the encrypted model parameters to the first client device. Specifically, the embodiment of the application may encrypt the model parameter of the target network model by using a homomorphic encryption method, and send the encrypted model parameter to the first client device. In addition, in the embodiment of the present application, the model parameters of the target network model may also be encrypted in a differential privacy manner, and specifically, what kind of manner is used to encrypt the model parameters of the target network model may be selected according to the actual situation without explicit limitation.

According to the data processing method provided by the embodiment of the application, when model training information sent by client equipment is received, a second client equipment associated with a first client equipment is determined, a server model corresponding to the first client equipment is trained by using the model training information of the second client equipment to obtain a target network model, model parameters of the target network model are sent to the first client equipment, and the first client equipment is instructed to update the client model corresponding to the first client equipment by using the model parameters of the target network model. Therefore, the client model is trained by utilizing the model training information of the client device, the privacy of a user can be protected because the client device does not need to upload training data, and the model training information is the model training information of the second client device associated with the first client device, so that the client model corresponding to the first client device can be safely and comprehensively trained by utilizing the privacy data of multiple parties in a combined manner, meanwhile, the model training information is the model training information of the second client device associated with the first client device, so that the client model corresponding to the first client device is updated by utilizing the model training information, the convergence speed of the client model corresponding to the first client device can be accelerated, and the learning problem of non-independent and uniformly distributed data is solved.

Referring to fig. 9, the data processing method may include steps S210 to S260.

Step S210: and receiving model training information sent by the plurality of client devices.

Step S220: a second client device associated with the first client device is determined.

In some embodiments, the server may obtain the second client device associated with the first client device by using an attention mechanism, referring to fig. 10 in detail, it is known from fig. 10 that step S220 may include steps S221 to S223.

Step S221: determining an information difference between the model training information of the first client device and the model training information of each of the other client devices using an attention mechanism.

By one approach, embodiments of the present application may utilize an attention mechanism to determine an information difference between model training information of a first client device and model training information of each other client device, where the information difference may be a difference value between model information of the other client devices and model information of the first client device. If the model training information is a model parameter, the server may determine an information difference between the model parameter of the first client device and the model parameters of the other client devices. For example, a difference value between the weight value of the client model corresponding to the first client device and the weight values of the other client models is obtained, and the difference value is used as the information difference. Alternatively, if the model training information is a gradient value, the server may determine an information difference between the gradient value of the first client device and the gradient values of the other client devices.

In the embodiment of the present application, the function of the attention mechanism may be:

wherein A denotes an attention function, w_iRepresented is model training information, w, for the first client device_jModel training information for other client devices is represented, the σ being a constant, and' being equal, i.e. the attention function being equal to

In addition, σ may be a hyper-parameter, which may be set by a user according to the training condition of the model.

Step S222: and determining the association degree between the first client device and each other client device according to the information difference.

In this embodiment of the application, if the server obtains the information difference between the model training information of the first client device and the model training information of each of the other client devices, it may determine the association degree between the first client device and each of the other client devices according to the information difference, where a larger information difference indicates a larger difference between the first client device and the other client devices, and a smaller information difference indicates a smaller difference between the first client device and the other client devices.

In this embodiment of the application, the larger the information difference between the first client device and the other client devices is, the smaller the association degree between the first client device and the other client devices is, and the smaller the information difference between the first client device and the other client devices is, the larger the association degree between the first client device and the other client devices is.

In some embodiments, after determining the association degree between the first client device and each of the other client devices according to the information difference, the server may determine whether the association degree between the first client device and the other client devices meets a preset condition, and if the association degree meets the preset condition, the other client devices meeting the preset condition are regarded as the second client device, that is, step S223 is performed, where the preset condition may be that the association degree is greater than the preset association degree. In addition, if the association degree is smaller than the preset association degree, the client device corresponding to the association degree may not be regarded as the second client device.

Step S223: and if the association degree between the first client side equipment and other client side equipment accords with a preset condition, taking the other client side equipment which accords with the preset condition as second client side equipment.

Step S230: and if the number of the second client devices is multiple, acquiring the association degree between each second client device and the first client device.

Step S240: and acquiring target training information according to the association degree and the model training information of the second client device.

In other embodiments, referring to fig. 11, obtaining target training information according to the association degree and the model training information of the second client device may include steps S241 to S242.

Step S241: and distributing corresponding weight to each second client device according to the association degree between each second client device and the first client device, wherein the weight is positively correlated with the association degree.

In some embodiments, after determining the second client devices associated with the first client device, in this embodiment of the present application, a corresponding weight may be allocated to each second client device according to a degree of association between each second client device and the first client device, where the weight and the degree of association are in positive correlation, and the allocated weights of different second client devices may be different or the same, that is, when there are multiple second client devices with the same degree of association with the first client device, the weights of the multiple second client devices may be the same. In some embodiments, the weights may be assigned by the server using an attention mechanism.

Specifically, when model training information sent by a plurality of client devices is acquired, the server may input the model training information to the weight acquisition model, and the weight acquisition model may take the model training information of each client device as input, and then output a weighted value of other client devices compared with the first client device. For example, client C_iBeing a first client device, the client device associated therewith may have client C₁Client C₂And client C_mClient C₁Compare client C_iMay be beta_i,1Client C₂Compare client C_iMay be beta_i,2Client C₃Compare client C_iMay be beta_i,3Client C_mCompare client C_iMay be beta_i,m. It can be seen that if the first client device is different, other client devices associated with the first client device are also different, and the association degree of the other client devices with the first client device is also different.

In the embodiment of the present application, the formula corresponding to the weight of the other client devices associated with the first client device may be

Wherein alpha is_kIndicating a decline in gradientStep size, A denotes the attention function, w_iRepresenting model training information, w, for the first client device_jRepresented is model training information for other client devices associated with the first client device.

In other embodiments, if the association degree between the other client devices and the first client device is greater, the corresponding weight value β is greater, that is, if the model training information of the other client devices is more similar to the model training of the first client device, the corresponding β value is greater, and at this time, the contribution of the model training information of the client device to the target network model is greater, that is, the similar client devices participate in the update of the server model together. If the other client device is not associated with the first client device or the association degree is very small, the corresponding weight value β may be approximately 0.

In the embodiment of the present application, the main reason for the different association degree between the other client devices and the first client device is that the data of each client device is different, where the difference may be a data type contained in the data or a data type. For example, client C₁The data in (1) is mainly animals, client C₂The data in (1) is mainly characters, and the client C_iThe data in (1) is mainly animals if client C_iIs the first client, because client C₁Is similar to its own data, so client C₁May be referred to as a second client device associated with the first client device, and client C₂Since the stored data is not associated with the first client device, it cannot act as a second client device.

It should be noted that the second client device associated with the first client device may change with the change of the data of each client device. For example, the client device associated with the first client device for the first time period is client C₁The client device associated with the first client device during the second time period may become client C₂。

Step S242: and obtaining the target training information according to the weight of each second client device and the model training information.

In some embodiments, after the weight corresponding to each second client device is obtained, the target training information may be obtained according to the weight and the model training information of each second client device in the embodiments of the present application. Specifically, the server may obtain target training information in a weighted summation manner, where a model corresponding to the target training information is a target network model, where the target network model may be a network model corresponding to the first client device.

As one example, the target training information may be

The model training information of the second client device associated with the first client device may be respectively

And

etc., and the weight corresponding to the second client device is β, respectively_i，1、β_i，2And beta_i,m(excluding. beta.)_i,i) And finally the target training information

Can be equal to

Wherein, beta_i,1+β_i,2+…+β_i,m＝1。

Step S250: and training the server model corresponding to the first client device by using the target training information to obtain a target network model.

In the embodiment of the present application, the target network model carries a task of iteratively updating the client model by docking, and the target network model is also a current expression of the client models corresponding to all the second client devices associated with the first client device. Thus, the target network model can be viewed as an information sensor of the server and different clients.

Step S260: and sending the model parameters of the target network model to the first client equipment, and instructing the first client equipment to update the client model corresponding to the first client equipment by using the model parameters of the target network model.

To sum up, in the embodiment of the application, different server models are deployed in a server, a second client device similar to a first client device is selected based on an attention mechanism in a self-adaptive manner, a server model corresponding to the first client device is updated by using model training information of the second client device to obtain a target network model, and then the client model corresponding to the first client device is iterated by using the target network model and data of the first client device, so that the problem of non-independent and uniformly distributed data in horizontal federal learning can be effectively solved. Therefore, similar user groups are bound together through an attention mechanism to update the server model, and the client model updates iteration based on the server model, so that horizontal federal learning of the client model can be trained by similarly distributed data, and the convergence speed of the client model is increased.

Referring to fig. 12, the data processing method may include steps S310 to S360.

Step S310: and receiving model training information sent by the plurality of client devices.

Step S320: a second client device associated with the first client device is determined.

Step S330: and training the server model corresponding to the first client device by using the model training information of the second client device to obtain a target network model.

Step S340: and sending the model parameters of the target network model to the first client equipment, and instructing the first client equipment to update the client model corresponding to the first client equipment by using the model parameters of the target network model.

In some embodiments, the server, upon retrieving the target network model, may send model parameters of the target network model to the first client device, and instruct the first client device to update the client model corresponding to the first client device by using the model parameters of the target network model, or send the model parameters of the target network model to the first client device and a second client device associated with the first client device at the same time, and instruct the first client device and the second client device to update the respective stored client models with the model parameters of the target network model, specifically, the first client may be instructed to update a client model corresponding to the first client device with model parameters of the target network model, and instructing the second client device to update the second client model with the model parameters of the target network model.

In other embodiments, when the client device updates its respective client model with the model parameters of the target network model, the client model may be trained in combination with the model parameters of the target network model and the respective data. Therefore, updating the client model corresponding to the first client device with the model parameters of the target network model may satisfy the following formula.

Wherein the content of the first and second substances,

representing a client model corresponding to the first client device,

the loss of the client model corresponding to the first client device under the condition that the first client device considers the own data is represented,

representing a model of the target network

Such that a corresponding client model of the first client device can be made

And target network model

Increasingly similar, i.e. client models corresponding to the first client device

Similar user passing target network model obtained mainly based on attention mechanism

And (6) updating the obtained result.

In some embodiments, after sending the model parameters of the target network model to the first client device and instructing the first client device to update the client model corresponding to the first client device by using the model parameters of the target network model, the server may receive update information sent by at least one client device, that is, enter step S350.

Step S350: receiving update information sent by at least one client device, wherein the update information is a network parameter corresponding to a client model stored in the at least one client device.

In the embodiment of the application, the server sends the model parameters of the target network model to the first client device, and can instruct the first client device to update the client model corresponding to the first client device of the first client device by using the model parameters of the target network model, so as to obtain the update information. In other words, the server may receive update information sent by at least one client device, where the update information is a network parameter corresponding to a client model stored in the at least one client device. In addition, at least one client device may be the first client device, the first client device and the second client device, or all client devices connected to the server.

Step S360: and if the client model corresponding to the first client device is determined to meet the convergence condition according to the updating information, stopping the updating operation of the client model corresponding to the first client device.

In some embodiments, at least one client device may be a first client device, where the convergence condition may be that a loss value of a client model corresponding to the first client device is smaller than a first loss value, specifically, the loss value of the client model corresponding to the first client device is obtained according to update information sent by the first client device, and if the loss value of the client model corresponding to the first client device is smaller than the first loss value, it is determined that the client model corresponding to the first client device meets the convergence condition, the update operation on the client model corresponding to the first client device is stopped.

In addition, if at least one client device is a first client device, in the embodiment of the present application, the update times of the client model corresponding to the first client device may also be obtained, that is, the iteration times of the client model corresponding to the first client device are counted, and then it is determined whether the iteration times are smaller than the preset iteration times, if the iteration times of the client model corresponding to the first client device are smaller than the preset iteration times, it is determined that the client model corresponding to the first client device meets the convergence condition, and then the update operation on the client model corresponding to the first client device is stopped.

Optionally, in this embodiment of the present application, it may also be determined whether a network parameter of a client model corresponding to the first client device is unchanged, and if the network parameter is unchanged, it is determined that the client model corresponding to the first client device meets a convergence condition. It should be noted that, in the embodiment of the present application, whether the client model corresponding to the first client device converges may be determined by using any one of the above convergence conditions, and whether the client model corresponding to the first client device converges may also be determined by combining the above at least two convergence conditions, so that the finally obtained client model corresponding to the first client device is more accurate. For example, the server may first determine whether the training time of the client model corresponding to the first client device is greater than a preset training time, and if so, may continue to determine whether the loss of the client model corresponding to the first client device is less than a preset loss, and if so, represents that the client model corresponding to the first client device converges. The specific combination of which convergence conditions are not explicitly limited herein may be selected according to the actual situation.

In other embodiments, the at least one client device may include all client devices, and each client device may correspond to one client model, and the convergence condition at this time may be that the target loss value is smaller than the second loss value, where the target loss value may be obtained according to the loss values of all client models. Specifically, a loss value of each client model is obtained according to update information sent by all the client devices, a target loss value is obtained according to the loss values of all the client models, and if the target loss value is smaller than a second loss value, it is determined that the client model corresponding to the first client device meets a convergence condition, the update operation on the client model corresponding to the first client device is stopped. The target loss value may be the sum of the loss values of all the client models, or the target loss value may be the average of the loss values of all the client models.

In some embodiments, the at least one client device may include all client devices, wherein all client devices include the first client device and a plurality of other client devices, and each client device corresponds to one client model, and the convergence condition at this time may be that a value of a sum of a target loss value and a target degree of association is smaller than a third loss value, wherein the target loss value may be a sum of all client model loss values, and the target degree of association may be a sum of degrees of association between the first client device and each of the other client devices. Specifically, a loss value of each client model is obtained according to update information sent by all the client devices, a sum of the loss values of all the client models is used as a target loss value, a correlation degree between the first client device and each of the other client devices is obtained according to an attention mechanism, the correlation degrees are summed to obtain a target correlation degree, and if the sum of the target loss value and the target correlation degree is smaller than a third loss value, it is determined that the client model corresponding to the first client device meets a convergence condition, the update operation on the client model corresponding to the first client device is stopped.

Optionally, the formula corresponding to the convergence condition may be:

wherein G (W) is the total loss value of a plurality of client models,

for the target loss value, λ represents a regularization coefficient,

representing the degree of association of the object, A representing the attention function, w_iRepresented is model training information, w, for the first client device_jModel training information of other client devices is represented, m represents the number of client devices, and L is a third loss value. In addition, F represents a penalty function for machine learning, which may be a crossEntropy, cross entropy, is mainly used to measure the performance of the network model, and the output of the network model is a probability value between 0 and 1.

It can be known from the above description that the value of a (·) approaches 0 if the model training information of the other client devices is more similar to the model training information of the first client device, and conversely approaches 1 if the model training information of the other client devices is much different from the model training information of the first client device. In order to optimize g (w), in the embodiment of the present application, a (-) and F (-) may be alternately optimized until g (w) converges or the maximum iteration number K set at this time is reached, and the training is stopped. The formula for optimizing a (-) can be as follows.

Wherein alpha is_kThe step size of the gradient descent is indicated,

the partial derivative of A (-) is shown. By utilizing the formula, the server model can be continuously optimized, and the client model can be obtained more quickly and effectively.

In the embodiment of the present application,

the representation is the interaction between all client-side models and the server-side models, which can also be called as a regular term, the regular term is also called as a penalty term, and is used for penalizing the condition that the models are too complex due to too large weights, the larger the regular coefficient is, the stronger the penalty effect of the regular term is, and the regularization coefficient is used for balancing the fitting degree of training data and the complexity of the models. If the regularization coefficients are too large, the model may be simpler, but at risk of under-fitting. The model may not learn some characteristics in the training data and may not be accurate in prediction. If the regularization coefficients are too small, the model is complex, but risks overfitting. The embodiments of the present application are introduced

Not only can avoid the overfitting problem but also can improve the accuracy of model training.

In other embodiments, the at least one client device may be a first client device and a second client device associated with the first client device, the second client device corresponds to a second client model, and the second client device may be multiple client devices, where the convergence condition may be that the target loss value is smaller than a fourth loss value, where the target loss value may be obtained according to the loss values of the client model corresponding to the first client device and the client model corresponding to the second client device. Specifically, loss values of a client model corresponding to a first client device and a client model corresponding to a second client device are obtained according to update information sent by the first client device and the second client device, a target loss value is obtained according to the loss values of the client model corresponding to the first client device and the client model corresponding to the second client device, and if the target loss value is smaller than a fourth loss value, it is determined that the client model corresponding to the first client device meets a convergence condition, the update operation on the client model corresponding to the first client device is stopped. The target loss value may be a sum of loss values of a client model corresponding to the first client device and a client model corresponding to the second client device.

According to the data processing method provided by the embodiment of the application, when model training information sent by client equipment is received, a second client equipment associated with a first client equipment is determined, a server model corresponding to the first client equipment is trained by using the model training information of the second client equipment to obtain a target network model, model parameters of the target network model are sent to the first client equipment, and the first client equipment is instructed to update the client model corresponding to the first client equipment by using the model parameters of the target network model. Therefore, the client model is trained by utilizing the model training information of the client device, the privacy of a user can be protected because the client device does not need to upload training data, and the model training information is the model training information of the second client device associated with the first client device, so that the client model corresponding to the first client device can be safely and comprehensively trained by utilizing the privacy data of multiple parties in a combined manner, meanwhile, the model training information is the model training information of the second client device associated with the first client device, so that the client model corresponding to the first client device is updated by utilizing the model training information, the convergence speed of the client model corresponding to the first client device can be accelerated, and the learning problem of non-independent and uniformly distributed data is solved. In addition, according to the embodiment of the application, the server model and the client model are alternately optimized, the second client device associated with the first client device is selected based on the attention mechanism in each iteration to cooperatively update the server model corresponding to the first client device, and then the first client device can update the client model corresponding to the first client device based on the data of the first client device and the model parameters of the target network model sent by the server, so that the client model can be quickly updated adaptively and iteratively based on the non-independent and different-distribution data distribution characteristics in different periods. In addition, the convergence condition of the first client device can be determined in different modes, and the flexibility of model training is improved.

An embodiment of the present application provides a data processing method, which may be applied to a first client device of a data processing system, where the data processing system further includes a server, and the server is connected to a plurality of client devices, please refer to fig. 13, and the data processing method may include steps S410 to S420.

Step S410: and receiving the model parameters of the target network model sent by the server.

In the embodiment of the application, the target network model is obtained by training a server model corresponding to a first client device by using model training information of a second client device associated with the first client device when the server receives model training information sent by a plurality of client devices, wherein the model training information is a network model parameter obtained by training the server model sent by the server by each client device by using own data.

Step S420: and updating the client model corresponding to the first client device by using the model parameters of the target network model.

As one way, when the client device receives the model parameters of the target network model sent by the server, it may update the client model corresponding to the first client device by using the model parameters.

According to the data processing method provided by the embodiment of the application, the client side equipment updates the client side model corresponding to the first client side equipment by obtaining the model parameters of the target network model sent by the server, and because the target network model sent by the server is obtained by the server by utilizing the model training information of the second client side equipment associated with the first client side equipment, the finally obtained client side model corresponding to the first client side equipment is more accurate and better in effect.

Referring to fig. 14, an embodiment of the present application provides a data processing apparatus 500, where the data processing apparatus 500 may be applied to a server of a data processing system, and the server may be connected to a plurality of client devices, where the plurality of client devices includes a first client device. In a specific embodiment, the data processing apparatus 500 includes: a receiving module 510, a determining module 520, a training module 530, and an updating module 540.

A receiving module 510, configured to receive model training information sent by the multiple client devices, where the model training information is a network model parameter obtained by each client device respectively training a server model sent by the server by using its own data.

A determining module 520 for determining a second client device associated with the first client device.

Further, the plurality of client devices includes a plurality of other client devices, the other client devices are a plurality of client devices other than the first client device, the determining module 520 is further configured to determine an information difference between the model training information of the first client device and the model training information of each of the other client devices by using an attention mechanism, and the information difference is used for representing a difference between data in the plurality of client devices; determining the association degree between the first client device and each other client device according to the information difference; and if the association degree between the first client side equipment and other client side equipment accords with a preset condition, taking the other client side equipment which accords with the preset condition as second client side equipment.

A training module 530, configured to train a server model corresponding to the first client device by using the model training information of the second client device, to obtain a target network model.

Further, the training module 530 is further configured to, if there are multiple second client devices, obtain a degree of association between each second client device and the first client device; acquiring target training information according to the relevance and the model training information of the second client device; and training the server model corresponding to the first client device by using the target training information to obtain a target network model.

Further, the training module 530 is further configured to assign a corresponding weight to each second client device according to a degree of association between each second client device and the first client, where the weight is positively correlated to the degree of association; and obtaining the target training information according to the weight of each second client device and the model training information.

An updating module 540, configured to send the model parameter of the target network model to the first client device, and instruct the first client device to update the client model corresponding to the first client device by using the model parameter of the target network model.

Further, the updating module 540 is further configured to encrypt the model parameters of the target network model, and send the encrypted model parameters to the first client device. Encrypting the model parameters of the target network model and sending the encrypted model parameters to the first client device, including: and encrypting the model parameters of the target network model by using a homomorphic encryption method, and sending the encrypted model parameters to the first client equipment.

Further, the data processing apparatus 500 is further configured to, before determining a second client device associated with the first client device, obtain an operating state of each of the client devices, and use a client device whose operating state meets a preset operating state as the first client device.

Further, the data processing apparatus 500 is further configured to receive update information sent by at least one client device, where the update information is a network parameter corresponding to a client model stored in at least one client device; and if the client model corresponding to the first client device is determined to meet the convergence condition according to the updating information, stopping the updating operation of the client model corresponding to the first client device.

Further, at least one client device may be a first client device, and the data processing apparatus 500 is further configured to obtain a loss value of a client model corresponding to the first client device according to the update information sent by the first client device; and if the loss value of the client model corresponding to the first client device is smaller than the first loss value, determining that the client model corresponding to the first client device meets the convergence condition, and stopping the updating operation of the client model corresponding to the first client device.

Further, the at least one client device may include all client devices, each client device corresponds to one client model, and the data processing apparatus 500 is further configured to obtain a loss value of each client model according to the update information sent by all the client devices; obtaining a target loss value according to the loss values of all the client models; and if the target loss value is smaller than a second loss value, determining that the client model corresponding to the first client device meets a convergence condition, and stopping the updating operation of the client model corresponding to the first client device.

Further, the at least one client device includes all the client devices, where all the client devices include a first client device and a plurality of other client devices, each of the client devices corresponds to one client model, and the data processing apparatus 500 is further configured to obtain a loss value of each of the client models according to the update information sent by all the client devices, and use a sum of the loss values of all the client models as a target loss value; acquiring the association degree between the first client device and each of the other client devices according to an attention mechanism, and summing the association degrees to obtain a target association degree; and if the sum of the target loss value and the target relevance is smaller than a third loss value, determining that the client model corresponding to the first client device meets a convergence condition, and stopping the updating operation of the client model corresponding to the first client device.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

According to the data processing device provided by the embodiment of the application, when model training information sent by client equipment is received, a second client equipment associated with a first client equipment is determined, a server model corresponding to the first client equipment is trained by using the model training information of the second client equipment to obtain a target network model, model parameters of the target network model are sent to the first client equipment, and the first client equipment is instructed to update the client model corresponding to the first client equipment by using the model parameters of the target network model. Therefore, the client model is trained by utilizing the model training information of the client device, the privacy of a user can be protected because the client device does not need to upload training data, and the model training information is the model training information of the second client device associated with the first client device, so that the client model corresponding to the first client device can be safely and comprehensively trained by utilizing the privacy data of multiple parties in a combined manner, meanwhile, the model training information is the model training information of the second client device associated with the first client device, so that the client model corresponding to the first client device is updated by utilizing the model training information, the convergence speed of the client model corresponding to the first client device can be accelerated, and the learning problem of non-independent and uniformly distributed data is solved.

Referring to fig. 15, a block diagram of a data processing system 600 according to an embodiment of the present application is shown. The data processing system 600 in the present application may include a server 610, a first client device 620, and a second client device (no reference numeral given in the figure), and the first client device 620 and the second client device may be connected with the server 610 in a wired or wireless manner.

Wherein the server 610 includes one or more of the following components: a first processor 611, a first memory 612, and one or more applications, wherein the one or more applications may be stored in the first memory 612 and configured to be executed by the one or more first processors 611, the one or more applications configured to perform a method as described in the aforementioned method embodiments.

The first processor 611 may include one or more processing cores. The first processor 611 connects various parts within the entire server 610 using various interfaces and lines, performs various functions of the server 610 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the first memory 612, and calling data stored in the first memory 612. Alternatively, the first processor 611 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The first processor 611 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the first processor 611, but may be implemented by a communication chip.

The first Memory 612 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The first memory 612 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The first memory 612 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the server 610 during use (e.g., phone books, audio-video data, chat log data), etc.

The first client device 620 may be a smartphone, a tablet, an e-book, or other electronic device capable of running applications, which may include one or more second processors 621, a second memory 622, one or more applications, wherein the one or more applications in the first client device 620 are stored in the memory 622 of the first client device 620 and configured to be executed by the one or more processors 621 in the first client device 620, and the one or more applications in the first client device 620 are configured to perform the method as described in the aforementioned method embodiments.

Referring to fig. 16, a block diagram of a computer-readable storage medium 700 according to an embodiment of the present application is shown. The computer-readable storage medium 700 has stored therein program code that can be called by a processor to execute the methods described in the above-described method embodiments.

The computer-readable storage medium 700 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer-readable storage medium 700 includes a non-volatile computer-readable storage medium. The computer readable storage medium 700 has storage space for program code 710 for performing any of the method steps in the above-described method embodiments. The program code can be read from or written to one or more computer program products. The program code 710 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A data processing method applied to a server of a data processing system, the server being connected to a plurality of client devices including a first client device, the method comprising:

receiving model training information sent by the plurality of client devices, wherein the model training information is a network model parameter obtained by each client device respectively training a server model sent by the server by using own data;

determining a second client device associated with the first client device;

training a server model corresponding to the first client device by using the model training information of the second client device to obtain a target network model;

and sending the model parameters of the target network model to the first client equipment, and instructing the first client equipment to update the client model corresponding to the first client equipment by using the model parameters of the target network model.

2. The method of claim 1, wherein the training a server model corresponding to the first client device by using the model training information of the second client device to obtain a target network model comprises:

if the number of the second client devices is multiple, acquiring the association degree between each second client device and the first client device;

acquiring target training information according to the relevance and the model training information of the second client device;

and training the server model corresponding to the first client device by using the target training information to obtain a target network model.

3. The method of claim 2, wherein obtaining target training information according to the relevance and model training information of the second client device comprises:

according to the association degree between each second client device and the first client device, distributing a corresponding weight to each second client device, wherein the weight is positively correlated with the association degree;

and obtaining the target training information according to the weight of each second client device and the model training information.

4. The method of claim 1, wherein the plurality of client devices comprises a plurality of other client devices, the other client devices being a plurality of client devices other than the first client device;

the determining a second client device associated with the first client device comprises:

determining, using an attention mechanism, an information difference between the model training information of the first client device and the model training information of each of the other client devices, the information difference being used to characterize differences between data in the plurality of client devices;

determining the association degree between the first client device and each other client device according to the information difference;

and if the association degree between the first client side equipment and other client side equipment accords with a preset condition, taking the other client side equipment which accords with the preset condition as second client side equipment.

5. The method of claim 1, wherein after sending the model parameters of the target network model to the first client device and instructing the first client device to update the client model corresponding to the first client device with the model parameters of the target network model, the method comprises:

receiving update information sent by at least one client device, wherein the update information is a network parameter corresponding to a client model stored in the at least one client device;

and if the client model corresponding to the first client device is determined to meet the convergence condition according to the updating information, stopping the updating operation of the client model corresponding to the first client device.

6. The method of claim 5, wherein the at least one client device is a first client device, and the stopping the update operation of the client model corresponding to the first client device if it is determined that the client model corresponding to the first client device meets the convergence condition according to the update information comprises:

obtaining a loss value of a client model corresponding to the first client device according to the update information sent by the first client device;

and if the loss value of the client model corresponding to the first client device is smaller than the first loss value, determining that the client model corresponding to the first client device meets the convergence condition, and stopping the updating operation of the client model corresponding to the first client device.

7. The method according to claim 5, wherein the at least one client device includes all the client devices, each of the client devices corresponds to one client model, and the stopping the update operation of the client model corresponding to the first client device if it is determined that the client model corresponding to the first client device meets the convergence condition according to the update information includes:

obtaining the loss value of each client model according to the updating information sent by all the client devices;

obtaining a target loss value according to the loss values of all the client models;

and if the target loss value is smaller than a second loss value, determining that the client model corresponding to the first client device meets a convergence condition, and stopping the updating operation of the client model corresponding to the first client device.

8. The method of claim 5, wherein the at least one client device comprises all of the client devices, including a first client device and a plurality of other client devices, each of the client devices corresponding to a client model;

if it is determined according to the update information that the client model corresponding to the first client device meets the convergence condition, stopping the update operation of the client model corresponding to the first client device, including:

obtaining the loss value of each client model according to the updating information sent by all the client devices, and taking the sum of the loss values of all the client models as a target loss value;

acquiring the association degree between the first client device and each of the other client devices according to an attention mechanism, and summing the association degrees to obtain a target association degree;

and if the sum of the target loss value and the target relevance is smaller than a third loss value, determining that the client model corresponding to the first client device meets a convergence condition, and stopping the updating operation of the client model corresponding to the first client device.

9. The method of claim 1, wherein sending the model parameters of the target network model to the first client device comprises:

and encrypting the model parameters of the target network model and sending the encrypted model parameters to the first client device.

10. The method of claim 9, wherein encrypting the model parameters of the target network model and sending the encrypted model parameters to the first client device comprises:

and encrypting the model parameters of the target network model by using a homomorphic encryption method, and sending the encrypted model parameters to the first client equipment.

11. The method of any of claims 1 to 10, wherein determining the second client device associated with the first client device is preceded by:

and acquiring the working state of each client device, and taking the client device with the working state conforming to the preset working state as a first client device.

12. A data processing method applied to a first client device of a data processing system, the data processing system further comprising a server connected to a plurality of client devices, the method comprising:

receiving model parameters of a target network model sent by the server, wherein the target network model is network model parameters obtained by the server by training a server model sent by a plurality of client devices by using own data of each client device, determining a second client device associated with the first client device, and obtaining a server model corresponding to the first client device by using model training information of the second client device;

and updating the client model corresponding to the first client device by using the model parameters of the target network model.

13. A data processing apparatus, applied to a server of a data processing system, the server being connected to a plurality of client devices, the plurality of client devices including a first client device, the apparatus comprising:

the receiving module is used for receiving model training information sent by the plurality of client devices, wherein the model training information is a network model parameter obtained by each client device by utilizing own data to train a server model sent by the server;

a determination module to determine a second client device associated with the first client device;

the training module is used for training the server model corresponding to the first client device by using the model training information of the second client device to obtain a target network model;

and the updating module is used for sending the model parameters of the target network model to the first client equipment and instructing the first client equipment to update the client model corresponding to the first client equipment by using the model parameters of the target network model.

14. A data processing system comprising a server, a first client device, a second client device:

the server comprising one or more processors, memory, one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-11;

the first client device includes one or more processors, memory, one or more applications, wherein the one or more applications in the first client device are stored in the memory of the first client device and configured to be executed by the one or more processors in the first client device, the one or more applications in the first client device configured to perform the method of claim 12.

15. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 11.