WO2024008154A1 - 联邦学习方法、装置、通信设备及可读存储介质 - Google Patents

联邦学习方法、装置、通信设备及可读存储介质 Download PDF

Info

Publication number
WO2024008154A1
WO2024008154A1 PCT/CN2023/106114 CN2023106114W WO2024008154A1 WO 2024008154 A1 WO2024008154 A1 WO 2024008154A1 CN 2023106114 W CN2023106114 W CN 2023106114W WO 2024008154 A1 WO2024008154 A1 WO 2024008154A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
communication device
federated learning
model performance
model
Prior art date
Application number
PCT/CN2023/106114
Other languages
English (en)
French (fr)
Inventor
程思涵
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2024008154A1 publication Critical patent/WO2024008154A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management

Definitions

  • This application belongs to the field of communication technology, and specifically relates to a federated learning method, device, communication equipment and readable storage medium.
  • the model in order to improve the model effect, the model can be trained based on federated learning.
  • members participating in federated learning may no longer be willing to participate in federated learning due to various reasons, such as because other more important tasks have arrived, there are too many tasks to handle, and they need to exit federated learning, etc., and they may not want to participate in federated learning. or cease to be a suitable federated learning member.
  • how to reasonably select members to participate in federated learning is an urgent problem that needs to be solved.
  • the embodiments of this application provide a federated learning method, device, communication device, and readable storage medium, which can solve the problem of how to reasonably select members to participate in federated learning.
  • the first aspect provides a federated learning method, including:
  • the first communication device receives first information from the second communication device, wherein the first information includes at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning, the second The status information of communication equipment in this round of federated learning and the model performance information of this round of federated learning;
  • the first communication device determines whether the second communication device participates in the next round of federated learning based on the first information.
  • the second aspect provides a federated learning method, including:
  • the second communication device determines first information, and the first information includes at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning; status information and model performance information of this round of federated learning;
  • the second communication device sends the first information to the first communication device, and the first information is used by the first communication device to determine whether the second communication device participates in the next round of federated learning.
  • a federated learning device applied to the first communication device, including:
  • a first receiving module configured to receive first information from a second communication device, where the first information includes at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning, The second communication device Prepare the status information of this round of federated learning and the model performance information of this round of federated learning;
  • a first determination module configured to determine whether the second communication device participates in the next round of federated learning based on the first information.
  • a federated learning device applied to the second communication device, including:
  • a second determination module configured to determine first information, where the first information includes at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning, whether the second communication device agrees to participate in federated learning, The status information of this round of federated learning and the model performance information of this round of federated learning;
  • the second sending module is configured to send the first information to the first communication device, where the first information is used by the first communication device to determine whether the second communication device participates in the next round of federated learning.
  • a communication device including a processor and a memory, the memory stores a program or instructions that can be run on the processor, and when the program or instructions are executed by the processor, the first The steps of the method described in the second aspect, or the steps of implementing the method described in the second aspect.
  • a communication device including a processor and a communication interface.
  • the communication interface is used to receive first information from a second communication device, and the processor is used to According to the first information, determine whether the second communication device participates in the next round of federated learning; or, when the communication device is a second communication device, the processor is used to determine the first information, and the communication interface is used to Send the first information to a first communication device; wherein the first information includes at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning, the second communication device Status information in this round of federated learning and model performance information in this round of federated learning.
  • a seventh aspect provides a communication system, including the above first communication device and a second communication device.
  • the first communication device can be used to perform the steps of the federated learning method as described in the first aspect.
  • the second communication device The communication device may be used to perform the steps of the federated learning method as described in the second aspect.
  • a readable storage medium is provided. Programs or instructions are stored on the readable storage medium. When the programs or instructions are executed by a processor, the steps of the method described in the first aspect are implemented, or the steps of the method are implemented as described in the first aspect. The steps of the method described in the second aspect.
  • a chip in a ninth aspect, includes a processor and a communication interface.
  • the communication interface is coupled to the processor.
  • the processor is used to run programs or instructions to implement the method described in the first aspect. steps, or steps to implement the method described in the second aspect.
  • a computer program/program product is provided, the computer program/program product is stored in a storage medium, and the computer program/program product is executed by at least one processor to implement the method as described in the first aspect
  • the first information can be received from the second communication device, and based on the first information, it is determined whether the second communication device participates in the next round of federated learning; the first information includes at least one of the following: for Second information indicating whether the second communication device agrees to participate in federated learning, status information of the second communication device in this round of federated learning, and model performance information in this round of federated learning.
  • Second information indicating whether the second communication device agrees to participate in federated learning
  • status information of the second communication device in this round of federated learning and model performance information in this round of federated learning.
  • Figure 1 is a block diagram of a wireless communication system applicable to the embodiment of the present application.
  • Figure 2 is a schematic diagram of a neural network in an embodiment of the present application.
  • FIG. 3 is a schematic diagram of neurons in the embodiment of the present application.
  • Figure 4 is a flow chart of a federated learning method provided by an embodiment of the present application.
  • Figure 5 is a flow chart of another federated learning method provided by the embodiment of the present application.
  • Figure 6 is a schematic diagram of the federated learning process in the embodiment of this application.
  • Figure 7 is a schematic structural diagram of a federated learning device provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of another federated learning device provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a communication device provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a network side device provided by an embodiment of the present application.
  • first, second, etc. in the description and claims of this application are used to distinguish similar objects and are not used to describe a specific order or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and that "first" and “second” are distinguished objects It is usually one type, and the number of objects is not limited.
  • the first object can be one or multiple.
  • “and/or” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the related objects are in an "or” relationship.
  • LTE Long Term Evolution
  • LTE-Advanced, LTE-A Long Term Evolution
  • LTE-A Long Term Evolution
  • CDMA Code Division Multiple Access
  • TDMA Time Division Multiple Access
  • FDMA Frequency Division Multiple Access
  • OFDMA Orthogonal Frequency Division Multiple Access
  • SC-FDMA Single-carrier Frequency Division Multiple Access
  • NR New Radio
  • 6G 6th Generation
  • FIG. 1 shows a block diagram of a wireless communication system to which embodiments of the present application are applicable.
  • the wireless communication system includes a terminal 11 and a network side device 12.
  • the terminal 11 can be a mobile phone, a tablet computer (Tablet Personal Computer), a laptop computer (Laptop Computer), or a notebook computer, a personal digital assistant (Personal Digital Assistant, PDA), a handheld computer, a netbook, or a super mobile personal computer.
  • Tablet Personal Computer Tablet Personal Computer
  • laptop computer laptop computer
  • PDA Personal Digital Assistant
  • PDA Personal Digital Assistant
  • UMPC ultra-mobile personal computer
  • UMPC mobile Internet device
  • MID mobile Internet device
  • augmented reality augmented reality, AR
  • VR virtual reality
  • robots wearable devices
  • Vehicle user equipment VUE
  • pedestrian terminal pedestrian terminal
  • PUE pedestrian terminal
  • smart home home equipment with wireless communication functions, such as refrigerators, TVs, washing machines or furniture, etc.
  • game consoles personal computers (personal computer, PC), teller machine or self-service machine and other terminal-side devices.
  • Wearable devices include: smart watches, smart bracelets, smart headphones, smart glasses, smart jewelry (smart bracelets, smart bracelets, smart rings, smart necklaces, smart anklets) bracelets, smart anklets, etc.), smart wristbands, smart clothing, etc.
  • the network side device 12 may include an access network device or a core network device.
  • the access network device may also be called a wireless access network device, a radio access network (Radio Access Network, RAN), a wireless access network function or a wireless access network unit.
  • Access network equipment can include base stations, Wireless Local Area Networks (WLAN) access points or WiFi nodes, etc.
  • the base station can be called Node B, Evolved Node B (eNB), access point, base transceiver station ( Base Transceiver Station (BTS), radio base station, radio transceiver, Basic Service Set (BSS), Extended Service Set (ESS), home B-node, home evolved B-node, sending and receiving point ( Transmitting Receiving Point (TRP) or some other suitable term in the field, as long as the same technical effect is achieved, the base station is not limited to specific technical terms. It should be noted that in the embodiment of this application, only the NR system is used The base station is introduced as an example, and the specific type of base station is not limited.
  • Core network equipment may include but is not limited to at least one of the following: Network Data Analytic Function (NWDAF), core network node, core network function, mobility management entity (Mobility Management Entity, MME), access mobility management function (Access and Mobility Management Function, AMF), Session Management Function (Session Management Function, SMF), User Plane Function (User Plane Function, UPF), Policy Control Function (Policy Control Function, PCF), Policy and Charging Rules Function Unit (Policy and Charging Rules Function, PCRF), Edge Application Server Discovery Function (EASDF), Unified Data Management (UDM), Unified Data Repository (UDR), Attribution User Server (Home Subscriber Server, HSS), centralized network configuration (Centralized network configuration, CNC), network storage function (Network Repository Function, NRF), network exposure function (Network Exposure Function, NEF), local NEF (Local NEF, or L -NEF), binding support function (Binding Support Function, BSF), application function (Application Function, AF), etc.
  • the network data analysis function NWDAF can be split into two network elements, such as Model training logical network element (Model Training Logical Function, MTLF) and analysis logical network element (Analytics Logical Function, AnLF).
  • Model training logical network element MTLF Model Training Logical Function
  • AnLF Analytics Logical Function
  • the model training logical network element MTLF is mainly used to generate models and perform model training. It can be either the central server (server) in federated learning or the members (clients) in federated learning.
  • the analysis logic network element AnLF is mainly used for reasoning to generate prediction information or models, etc. It can request a model from MTLF, and the model can be generated through federated learning.
  • the model in the embodiment of this application may be an artificial intelligence (Artificial Intelligence, AI) model.
  • AI models have a variety of algorithm implementations, such as neural networks, decision trees, support vector machines, Bayesian classifiers, etc. This application takes neural network as an example for explanation, but does not limit the specific type of AI module.
  • the schematic diagram of a neural network can be shown in Figure 2, in which X 1 , The results will continue to be passed to the next layer.
  • the input layer, hidden layer and output layer composed of these many neurons is a neural network.
  • the number of hidden layers and the number of neurons in each layer is the "network structure" of the neural network.
  • a neural network is composed of neurons, and the schematic diagram of the neurons can be shown in Figure 3, where a 1 , a k ...a K (i.e., X1, X2... shown in Figure 2) are inputs, and w is the weight. (can also be called: multiplicative coefficient), b is the bias (can also be called: additive coefficient), ⁇ () is the activation function, z is the output value, and the corresponding operation process can be expressed as: Common activation functions include but are not limited to Sigmoid function, hyperbolic tanh function, rectified linear unit (Rectified Linear Unit, ReLU), etc.
  • the parameter information of each neuron and the algorithm used are combined to form the "parameter information" of the entire neural network, which is also an important part of the AI model file.
  • an AI model refers to a file containing elements such as network structure and parameter information.
  • the trained AI model can be directly reused by its framework platform without repeated construction or learning, and can be directly judged and/or Recognition and other intelligent functions.
  • Federated learning aims to build a federated learning model based on distributed data sets.
  • model-related information can be exchanged between parties (or in encrypted form), but the raw data cannot. This exchange does not expose any protected private parts of the data on each training node.
  • the federated learning involved in the embodiments of this application is horizontal federated learning.
  • the essence of horizontal federated learning is the union of samples, which is suitable for scenarios where the participants have the same business format but reach different customers, that is, when there is a lot of feature overlap and little user overlap.
  • the CN domain and RAN domain in the communication network serve the same service for different users (such as each UE, that is, different samples), such as Mobility Management (MM) business, Session Management (SM) business or certain One business.
  • MM Mobility Management
  • SM Session Management
  • the server (server, which may also be called a central server or organizer) in federated learning may be a network element device in the network, such as MTLF split by NWDAF.
  • Members (clients, also called participants) participating in federated learning can be network element devices in the network, such as MTLF split by NWDAF, or terminals.
  • the server in federated learning can first select the members participating in federated learning, such as sending a request to information storage network elements such as NRF to obtain the capability information of each intelligent network element device such as MTLF, and Match whether they can participate in federated learning through capability information; then, send information such as the initialization model of federated learning to each selected member.
  • Each member performs local model training and then feeds back intermediate results, such as gradients, etc., to the server. Afterwards, the server aggregates the received intermediate results and updates the global model. Repeat the steps of member selection - model delivery - local model training - intermediate result feedback - aggregation and update of the global model multiple times. Model training can be stopped after the model converges.
  • Figure 4 is a flow chart of a federated learning method provided by an embodiment of the present application. The method is applied to a first communication device.
  • the first communication device is specifically a server in federated learning, including but It is not limited to intelligent network element equipment such as MTLF.
  • the method includes the following steps:
  • Step 41 The first communication device receives the first information from the second communication device.
  • Step 42 The first communication device determines whether the second communication device participates in the next round of federated learning based on the first information.
  • the above-mentioned first information may include but is not limited to at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning, status information of the second communication device in this round of federated learning, Model performance information of this round of federated learning, etc.
  • the second information may also be optional information, used to indicate whether the second communication device is willing to participate in federated learning.
  • the above-mentioned first information may also include capability information of the second communication device.
  • this capability information is the capability information after the current round of model training, including but not limited to whether it can still be a participant (member) of federated learning, the accuracy information of the participating training model, etc.
  • a member's ability information is: he can be a participant in federated learning, he has the ability to conduct local training, and the accuracy information of the participating training model is X, etc.
  • the above-mentioned second communication device is specifically a client device in federated learning, which may include but is not limited to terminals and intelligent network element devices such as MTLF.
  • the above-mentioned first information may be actively reported by the second communication device (i.e., a member in federated learning), for example, fed back to the server in federated learning together with the results of local training, thereby reducing signaling consumption, and The number of interactions.
  • the second communication device i.e., a member in federated learning
  • members in federated learning when members in federated learning are no longer willing to participate in federated learning, for example, they are no longer willing to participate in federated learning because other more important tasks have arrived, there are too many tasks to handle, and they need to exit federated learning first, etc., you can The information used to indicate that they do not agree to participate in federated learning, that is, the information on the willingness to withdraw from federated learning, will be fed back to the server in federated learning to help members choose during the federated learning process and achieve a reasonable selection of members participating in federated learning. If the member is willing to continue to participate in federated learning, he or she does not need to send back information indicating that he or she agrees to participate in federated learning. At this time, the server will default to the member's willingness to continue participating in federated learning. In addition, members in federated learning can also directly indicate to the server that they are willing to participate in federated learning.
  • the computing power required for local model training may not be enough, and it may no longer be suitable to be selected to participate in the next round of federated learning. Therefore, members in federated learning can send their status information in this round of federated learning to the server in federated learning, and the server determines whether the member will participate in the next round of federated learning, thus helping the selection of members in the federated learning process.
  • the global model will be overfitted in the environment of the member, and the member may no longer be suitable.
  • the member is selected to participate in the next round of federated learning, so the training of this member can be suspended for several rounds at this time to achieve model convergence more quickly. Therefore, members in federated learning can send their model performance information in this round of federated learning to the server in federated learning, and the server determines whether the member will participate in the next round of federated learning, thereby helping the selection of members in the federated learning process.
  • the first communication device can also select a third communication device to participate in the next round of federated learning based on the first information.
  • the third communication device is different from the second communication device, and is specifically a new member (client) device participating in federated learning, which may include but is not limited to terminals and intelligent network element devices such as MTLF.
  • client new member
  • new members can be selected to participate in the next round of federated learning to ensure the smooth progress of federated learning.
  • the above status information can be used to describe the status information of the second communication device (i.e., the member in federated learning) after the local training of this round of federated learning is completed, and can include but is not limited to at least one of the following:
  • the load information can be understood as load status information, which can represent the load status of network elements such as network functions (Network Function, NF).
  • load status information can represent the load status of network elements such as network functions (Network Function, NF).
  • the load information may include at least one of the following: average load information, peak load information, etc.
  • the average load information can be understood as the average load within the range of this round of federated learning. For example, during a local training, the average load of a certain member is 70% and the peak load is 80%.
  • the resource usage information can be understood as resource usage information.
  • the resource usage information may include at least one of the following: average resource usage information, peak resource usage information.
  • the average resource usage information can be understood as the average resource usage within the scope of this round of federated learning.
  • the resource usage end (such as resource usage) corresponding to the resource usage information can include but is not limited to central processing unit (CPU), memory (memory), disk (disk), graphics processor (Graphics Processing Unit) GPU etc.
  • the resource usage information may include power information, etc.
  • the average resource usage of a member is: 60% CPU usage, 80% GPU usage, 70% memory usage (for example, 12GB is occupied, which is represented by numerical values), and 40% disk space usage; and
  • the peak resource usage of this member is: 80% CPU usage, 100% GPU usage, 80% memory usage (for example, 14GB is occupied, that is, expressed in numerical values), and 50% disk space usage.
  • model performance information may be model performance information before and/or after the local model training is started, and may include at least one item:
  • the first model performance information after local model training is completed is completed
  • the above model performance information may include at least one of the following: accuracy, mean absolute error (Mean Absolute Error, MAE). In addition, it may also include but is not limited to at least one of the following: Precision, Recall, F1 score, Area Under Curve (AUC), Sum of Squares due to Error, SSE), sum variance, mean squared error (MSE), variance, root mean square (Root Mean Squared Error, RMSE), standard deviation, coefficient of determination (R-Squared), etc.
  • accuracy mean absolute error
  • MAE mean Absolute Error
  • MAE mean Absolute Error
  • it may also include but is not limited to at least one of the following: Precision, Recall, F1 score, Area Under Curve (AUC), Sum of Squares due to Error, SSE), sum variance, mean squared error (MSE), variance, root mean square (Root Mean Squared Error, RMSE), standard deviation, coefficient of determination (R-Squared), etc.
  • the above-mentioned first model performance information may include accuracy, mean absolute error MAE, etc.
  • the above-mentioned second model performance information may include accuracy, mean absolute error MAE, etc.
  • the above-mentioned first model performance information is mainly used to explain the performance of the model based on its local data after the local model training of this round of federated learning is completed, and can include certain statistical parameters and the corresponding values of the parameters, such as model The accuracy and specific value (such as 80%), and the mean absolute error MAE and its value (such as 0.1).
  • the above-mentioned second model performance information is mainly used to illustrate the performance of the model based on its local data before the local model training of this round of federated learning begins.
  • a statistical calculation of the model performance needs to be performed first, which can include a certain A statistical calculation parameter and the corresponding value of the parameter, such as the accuracy of the model and its specific value (such as 70%), as well as the mean absolute error MAE and its value (such as 0.15).
  • the data set contains input data and labels (label data), which have a corresponding relationship.
  • a set of input data corresponds to one or a group of labels.
  • h( xi ) represents the predicted value of the model
  • yi represents the corresponding true value
  • m represents the number of training samples.
  • whether the second communication device feeds back the first information may be decided by the first communication device.
  • the first communication device may send third information to the second communication device, where the third information is used to identify that the second communication device needs to feed back the first information. If the first communication device does not send the third information, that is, the second communication device does not receive the third information, the second communication device does not need to feed back the first information.
  • the third information may include but is not limited to at least one of the following:
  • Information used to identify that the second communication device needs to feed back the second information for example, the information is an identification that needs to feed back the second information;
  • Information used to identify that the second communication device needs to feedback status information for example, the information is an identification that needs to feedback status information;
  • Information used to identify that the second communication device needs to feed back model performance information includes the need to feed back this Identification of model performance information after local model training is completed, and/or identification of model performance information that needs to be fed back before local model training starts.
  • the above information used to identify that the second communication device needs to feed back status information is mainly used to illustrate that after the local model training of this round of federated learning is completed, the second communication device needs to feed back its status information.
  • the specific status information can be at least one of the following: the member's load (such as NF load), the member's resource usage (such as resource usage including CPU, memory, disk and/or GPU, etc.), etc.
  • the above-mentioned information used to identify that the second communication device needs to feedback model performance information is mainly used to explain that after the local model training of this round of federated learning is completed, the second communication device needs to feedback the model performance before and/or after the local model training starts and/or is completed.
  • the model performance information includes the above-mentioned first model performance information and/or second model performance information.
  • the above-mentioned sending of the third information may include at least one of the following:
  • the first communication device sends the third information to the second communication device according to the preset strategy; wherein the preset strategy may refer to when or under what circumstances the first communication device sends the third information to the second communication device, such as After every five rounds of training, the third information is sent to the second communication device; or after a certain second communication device participates in five rounds of training, the third information is sent to the second communication device.
  • the preset policy not only indicates whether to require feedback from the second device, but also indicates when or under what circumstances to seek feedback.
  • the first communication device can send the third information to the second communication device; and if the preset policy indicates that the second communication device does not need to feed back the first information, then The first communication device does not send the third information to the second communication device.
  • the preset strategy can be predefined, agreed upon in an agreement, etc.
  • the first communication device sends the third information to the second communication device according to the needs in the federated learning-based model training process; for example, if the first communication device expects to use the second communication device according to the wishes, status and/or model performance of the second communication device, etc. , to decide whether the second communication device participates in the next round of federated learning, the third information can be sent to the second communication device; otherwise, the third information is not sent to the second communication device. That is to say, the first communication device can independently decide whether to send the third information to the second communication device.
  • the above-mentioned sending of the third information may include: sending a first request to the second communication device, the first request is used to request the second communication device to participate in federated learning, and the first request carries the third information. information.
  • the third information can be sent with the first request for requesting the second communication device to participate in federated learning, thereby reducing signaling consumption and the number of interactions.
  • the multiple model performance information can be first summarized to obtain the third model performance information; and then the third model performance information can be obtained according to the third communication device.
  • Model performance information to determine whether model training is completed, such as determining whether the model has converged. For example, if the third model performance information includes accuracy, and the accuracy is higher than the preset threshold, it can be determined that the model training is over, otherwise the model training will continue; or if the third model performance information includes the mean absolute value error MAE, If the MAE is lower than the preset threshold, it can be determined that the model training is over, otherwise the model training will continue.
  • the above-mentioned summary method includes but is not limited to: calculating an average value of multiple model performance information, calculating a weighted average value of multiple model performance information, etc.
  • the corresponding weight may be determined by the first communication device, for example, the weight may be preset or calculated by itself.
  • the first communication device summarizes multiple first model performance information (that is, model performance information after local model training is completed), and determines whether the model training is completed based on the summarized model performance information.
  • the first communication device may feed back the third model performance information to the model user to facilitate the model user to understand the model performance.
  • the above embodiment mainly describes the present application from the perspective of the first communication device (ie, the server in federated learning).
  • the following will describe the present application from the perspective of the second communication device (ie, the member in federated learning).
  • Figure 5 is a flow chart of a federated learning method provided by an embodiment of the present application. The method is applied to a second communication device.
  • the second communication device is specifically a member (client) in federated learning, including but It is not limited to terminals and intelligent network element equipment such as MTLF.
  • the method includes the following steps:
  • Step 51 The second communication device determines the first information.
  • Step 52 The second communication device sends first information to the first communication device, where the first information is used by the first communication device to determine whether the second communication device participates in the next round of federated learning.
  • the above-mentioned first information may include but is not limited to at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning, status information of the second communication device in this round of federated learning, Model performance information of this round of federated learning, etc.
  • the above-mentioned first communication device is specifically a server in federated learning, which may include but is not limited to intelligent network element devices such as MTLF.
  • the above-mentioned first information may be actively reported by the second communication device (i.e., a member in federated learning), for example, fed back to the server in federated learning together with the results of local training, thereby reducing signaling consumption, and The number of interactions.
  • the second communication device i.e., a member in federated learning
  • the federated learning method in the embodiment of the present application sends first information to the first communication device, where the first information includes at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning;
  • the status information of the two communication devices in this round of federated learning, the first model performance information after the local model training of this round of federated learning is completed, and the second model performance information before the local model training of this round of federated learning starts, can make The first communication device determines whether the second communication device will participate in the next round of federated learning based on the second communication device's wishes, status information, and/or model performance, thereby rationally selecting members to participate in federated learning and improving training efficiency. Such as preventing members from falling behind (that is, some members do not feedback results within the specified time, etc.), and selecting members who can bring higher efficiency.
  • the above status information can be used to describe the status information of the second communication device (i.e., the member in the federated learning) after the local training of this round of federated learning is completed, and can include but is not limited to at least one of the following:
  • the load information can be understood as load situation information and can represent the NF load situation.
  • the load information may include at least one of the following: average load information, peak load information, etc.
  • the average load information can be understood as the average load within the range of this round of federated learning. For example, during a local training, the average load of a certain member is 70% and the peak load is 80%.
  • the resource usage information may include at least one of the following: average resource usage information, peak resource usage information.
  • the average resource usage information can be understood as the average resource usage within the scope of this round of federated learning.
  • the resource users (such as resource usage) corresponding to the resource usage information can include but are not limited to central processing unit (CPU), memory (memory), disk (disk), graphics processor (Graphics Processing Unit) GPU etc.
  • the resource usage information may include power information, etc.
  • the average resource usage of a member is: 60% CPU usage, 80% GPU usage, 70% memory usage (for example, 12GB is occupied, which is represented by numerical values), and 40% disk space usage; and
  • the peak resource usage of this member is: 80% CPU usage, 100% GPU usage, 80% memory usage (for example, 14GB is occupied, that is, expressed in numerical values), and 50% disk space usage.
  • model performance information may be model performance information before and/or after the local model training is started, and may include at least one item:
  • the first model performance information after local model training is completed is completed
  • the above model performance information may include at least one of the following: accuracy, mean absolute error MAE, etc.
  • the above-mentioned first model performance information may include accuracy and mean absolute error MAE.
  • the above-mentioned second model performance information may include accuracy and mean absolute error MAE.
  • whether the second communication device feeds back the first information may be decided by the first communication device.
  • the above-mentioned determining of the first information may include: first receiving third information from the first communication device, where the third information is used to identify that the second communication device needs to feed back the first information; and then determining the first information based on the third information.
  • the third information may include but is not limited to at least one of the following:
  • Information used to identify that the second communication device needs to feed back the second information for example, the information is an identification that needs to feed back the second information;
  • Information used to identify that the second communication device needs to feedback status information for example, the information is an identification that needs to feedback status information;
  • Information used to identify that the second communication device needs to feed back model performance information includes an identification that needs to feedback model performance information after the local model training is completed, and/or an identification that needs to feedback model performance information before the local model training starts. .
  • the above-mentioned receiving the third information from the first communication device may include: receiving a first request from the first communication device, the first request is used to request the second communication device to participate in federated learning, and the first request carries There is third information.
  • the third information can be sent with the help of the first request for requesting the second communication device to participate in federated learning, thereby reducing signaling consumption and the number of interactions.
  • the federated learning server is NWDAF (such as MTLF), and the federated learning members (clients) are NWDAF (such as MTLF).
  • NWDAF such as MTLF
  • clients such as MTLF
  • the specific federated learning process includes:
  • Step 61 The federated learning consumer (such as NWDAF (AnLF)) requests the federated learning server (such as NWDAF (MTLF)) sends a model request (such as Nnwdaf_MLModelProvision_Subscribe), which is used to request a model to complete its own tasks.
  • NWDAF NWDAF
  • MTLF NWDAF
  • the server determines whether to trigger federated learning based on local configuration or requests from federated learning consumers, and initializes federated learning and member selection.
  • Step 62 If federated learning is triggered, the server can initialize and formulate a strategy for federated learning when selecting members, such as: specifying how many rounds of training to collect status information after, and/or how many rounds of training to collect model performance information, etc. .
  • Step 63 The server sends a federated learning task request (such as Nnwdaf_MLModelTraining_Subscribe) to each member (clients) to request to participate in federated learning, and perform local training of federated learning based on the global model and the local data of each member.
  • the task request can include task identification (such as analytic ID), model initialization information (such as training parameters), information used to identify the need for feedback status information/model performance information (ie feedback requirement feedback requirement), etc.
  • Model initialization information is used to describe the model and configuration information in this round of federated learning. Describing the model refers to describing the model itself, such as what algorithm, what architecture, what parameters and hyper-parameters the model is composed of, or the model itself, such as the model file, the address information of the model file, etc.
  • the configuration information in this round of federated learning refers to the number of rounds of local training to be performed, the data type that should be used, and other information during the local training process of this round of federated learning.
  • Step 64 Members send a request to obtain data (such as Ndccf_DataManagement_Subscribe/Nnf_EventExposure_Subscribe) to their region or data source to collect data for local model training.
  • data such as Ndccf_DataManagement_Subscribe/Nnf_EventExposure_Subscribe
  • the network elements that provide data are also different, such as UPF, OAM, UDM, etc.
  • Step 65 The data source returns a response to the corresponding member.
  • the response contains the requested data.
  • the response is, for example: Ndccf_DataManagement_Notify/Nnf_EventExposure_Notify.
  • Step 66 Each member uses the data obtained based on steps 64 and 65 for local model training, and generates intermediate results, which are fed back to the server in subsequent steps so that the server can aggregate and update the global model, and use local data to evaluate model performance. analyze.
  • analyzing model performance can be done by using a locally trained model and local data to calculate accuracy or MAE. If the task request in step 63 carries identification information that requires feedback of model performance information before local training, members need to perform statistical calculations on model performance before performing local training.
  • member NWDAF can set up a verification data set for evaluating local training accuracy.
  • the verification data set includes data for model input (input data) and real label data (label/ground truth).
  • Member NWDAF will input The data is input into the trained model to obtain the output data.
  • the member NWDAF compares whether the output data is consistent with the real label data, and then uses the above calculation formula to obtain the local training accuracy value.
  • the concept of correct prediction results does not necessarily mean that the results must be completely consistent with the labeled data. When there is a certain gap between the two, but this gap is within the allowable range , the prediction result can also be considered correct.
  • the member NWDAF calculates the mean of the square sum of errors corresponding to the predicted data and label data (label value, original data) to obtain the MAE, see the following calculation formula: local training Specifically, member NWDAF can set up a verification data set for evaluating local training accuracy.
  • the verification data set includes data used for model input (input data) and real label data (label/ground truth, that is, in the above formula of ), the member NWDAF inputs the input data into the trained model to obtain the output data (i.e., the predicted data y).
  • the member NWDAF compares the square sum mean between the output data and the real label data, that is, using the above calculation formula to obtain the local training MAE value.
  • Step 67 Each member actively or according to the requirements in step 63, feedback the intermediate results of local training as well as information such as willingness, status and/or model performance (the above-mentioned first information) to the server, for example, through the federated learning training process.
  • a feedback message corresponding to the request message is used to feed back information such as willingness, status, and/or model performance.
  • the feedback message may be a notification message.
  • the member client finds that its training situation is very good, such as the accuracy reaching a certain threshold (the threshold can be carried by the model initialization information in step 63, or carried in the model request, or it can be obtained in advance /configuration), you can proactively feedback model performance information. Or, member clients actively feedback their wishes, status and/or model performance information in each round.
  • the server can use the intermediate results to update the global model, and use information such as willingness, status, and/or model performance to assist in decision-making for member selection in the next round of federated learning.
  • Step 68 The server aggregates the intermediate results and updates the global model based on the feedback intermediate results, and determines whether the corresponding member still needs to participate in the next round of federated learning based on feedback information such as willingness, status, and/or model performance.
  • model performance information can be aggregated to obtain the overall/global training situation of the model.
  • the server can aggregate these intermediate results through the server's algorithm, such as average, weighted average, etc., and then use these intermediate results to update the global model.
  • the server can judge whether the client can still participate in the next round of federated learning based on the client's feedback information such as willingness, status, and/or model performance.
  • the server will not select the client for the next round of federated learning; for another example, if the status information fed back by the client is 90% CPU usage, 100% GPU usage, and 80% memory usage (such as 14GB, expressed in numerical values), If 50% of the disk space is used, the server considers that the client is no longer a good member to participate in federated learning, because the GPU is already full during local training, and the next training may take longer or be disconnected, etc., so The client will not be selected in the next round of member selection; for another example, if the model performance fed back by the client is 98% accurate, and at the same time the model performance fed back by other clients is basically 60%-80%, then the server may think that the model If the fitting has been performed in the environment of the client and the training of the client needs to be suspended, the client will not be selected in the next round of member selection.
  • the server may think that the model If the fitting has been performed in the environment of the client and the training of the client needs to be suspended, the client will not be selected in the next round of member selection
  • steps 63 to 68 can be repeated until the model converges.
  • Step 69 After completing the model training of federated learning, the server feeds back the trained model and overall/global model performance to the consumer (such as AnLF).
  • the consumer such as AnLF
  • the execution subject may be a federated learning device.
  • a federated learning device executing a federated learning method is used as an example to illustrate the federated learning device provided by the embodiment of this application.
  • Figure 7 is a schematic structural diagram of a federated learning device provided by an embodiment of the present application.
  • the device is applied to a first communication device.
  • the first communication device is specifically a server in federated learning, including but It is not limited to intelligent network element equipment such as MTLF.
  • the federated learning device 70 includes:
  • the first receiving module 71 is configured to receive first information from a second communication device, where the first information includes at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning, The status information of the second communication device in this round of federated learning and the model performance information of this round of federated learning;
  • the first determination module 72 is configured to determine whether the second communication device participates in the next round of federated learning based on the first information.
  • the status information includes at least one of the following:
  • the load information includes at least one of the following: average load information, peak load information;
  • the resource usage information includes at least one of the following: average resource usage information and peak resource usage information.
  • model performance information includes at least one of the following:
  • the first model performance information after local model training is completed is completed
  • the model performance information includes at least one of the following: accuracy, mean absolute error, precision, and mean square error.
  • the federated learning device 70 also includes:
  • the first sending module is configured to send third information to the second communication device, where the third information is used to identify that the second communication device needs to feed back the first information.
  • the third information includes at least one of the following:
  • Information used to identify that the second communication device needs to feed back the second information for example, the information is an identification that needs to feed back the second information;
  • the first sending module is specifically used for at least one of the following:
  • the first sending module is specifically configured to send a first request to the second communication device.
  • the first request is used to request the second communication device to participate in federated learning.
  • the first request carries There is said third information.
  • the federated learning device 70 also includes:
  • a processing module configured to, when the first communication device receives a plurality of model performance information from a plurality of second communication devices, aggregate the plurality of model performance information to obtain third model performance information, and perform the processing according to the third model performance information. Model performance information to determine whether model training has ended.
  • the federated learning device 70 also includes:
  • a feedback module is used to feed back the third model performance information to the model user.
  • the federated learning device 70 also includes:
  • a selection module configured to select a third communication device to participate in the next round of federated learning based on the first information.
  • the third communication device is different from the second communication device, and is specifically a new member (client) device participating in federated learning, which may include but is not limited to terminals and intelligent network element devices such as MTLF. For example, if it is determined based on the first information received that many members are no longer suitable to participate in the next round of federated learning, new members can be selected to participate in the next round of federated learning to ensure the smooth progress of federated learning.
  • the federated learning device 70 provided by the embodiment of this application can implement each process implemented by the method embodiment shown in Figure 4 and achieve the same technical effect. To avoid duplication, the details will not be described here.
  • Figure 8 is a schematic structural diagram of a federated learning device provided by an embodiment of the present application.
  • the device is applied to a second communication device.
  • the second communication device is specifically a member (client) in federated learning, including but It is not limited to terminals and intelligent network element equipment such as MTLF.
  • the federated learning device 80 includes:
  • the second determination module 81 is used to determine first information, where the first information includes at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning, whether the second communication device agrees to participate in federated learning, Status information of this round of federated learning, model performance information of this round of federated learning;
  • the second sending module 82 is configured to send the first information to the first communication device, where the first information is used by the first communication device to determine whether the second communication device participates in the next round of federated learning.
  • the status information includes at least one of the following:
  • the load information includes at least one of the following: average load information, peak load information;
  • the resource usage information includes at least one of the following: average resource usage information and peak resource usage information.
  • model performance information includes at least one of the following:
  • the first model performance information after local model training is completed is completed
  • model performance information includes at least one of the following: accuracy, mean absolute value error, precision, mean square error.
  • the federated learning device 80 includes:
  • a second receiving module configured to receive third information from the first communication device, where the third information is used to identify that the second communication device needs to feed back the first information
  • the second determination module 81 is specifically configured to determine the first information according to the third information.
  • the second receiving module is further configured to: receive a first request from the first communication device, where the first request is used to request the second communication device to participate in federated learning. Carrying the third information.
  • the federated learning device 80 provided by the embodiment of the present application can implement each process implemented by the method embodiment shown in Figure 5 and achieve the same technical effect. To avoid duplication, the details will not be described here.
  • this embodiment of the present application also provides a communication device 90, which includes a processor 91 and a memory 92.
  • the memory 92 stores programs or instructions that can be run on the processor 91, such as , when the communication device 90 is the first communication device, when the program or instruction is executed by the processor 91, each step of the federated learning method embodiment shown in FIG. 4 is implemented, and the same technical effect can be achieved.
  • the communication device 90 is a second communication device, when the program or instruction is executed by the processor 91, each step of the federated learning method embodiment shown in Figure 5 is implemented, and the same technical effect can be achieved. To avoid duplication, here No longer.
  • An embodiment of the present application also provides a communication device, including a processor and a communication interface.
  • the communication interface is used to receive the first information from the second communication device, and the processor is used to receive the first information according to the The first information is to determine whether the second communication device participates in the next round of federated learning; or when the communication device is the second communication device, the processor is used to determine the first information, and the communication interface is used to send the first communication device to the first communication device.
  • the first information includes at least one of the following: second information indicating whether the second communication device agrees to participate in federated learning, status information of the second communication device in this round of federated learning, status information in this round of federated learning, The first model performance information after the local model training is completed, and the second model performance information before the local model training starts in this round of federated learning.
  • This embodiment corresponds to the above-mentioned method embodiment.
  • Each implementation process and implementation manner of the above-mentioned method embodiment can be applied to this embodiment and can achieve the same technical effect.
  • FIG. 10 is a schematic diagram of the hardware structure of a terminal that implements an embodiment of the present application.
  • the terminal 1000 includes but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, a processor 1010, etc. At least some parts.
  • the terminal 1000 may also include a power supply (such as a battery) that supplies power to various components.
  • the power supply may be logically connected to the processor 1010 through a power management system, thereby managing charging, discharging, and power consumption through the power management system. Management and other functions.
  • the terminal structure shown in FIG. 10 does not constitute a limitation on the terminal.
  • the terminal may include more or fewer components than shown in the figure, or some components may be combined or arranged differently, which will not be described again here.
  • the input unit 1004 may include a graphics processing unit (GPU) 10041 and a microphone 10042.
  • the graphics processor 10041 is useful in video capture mode or image processing. In the image capture mode, image data of still pictures or videos obtained by an image capture device (such as a camera) is processed.
  • the display unit 1006 may include a display panel 10061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 1007 includes at least one of a touch panel 10071 and other input devices 10072 .
  • Touch panel 10071 also known as touch screen.
  • the touch panel 10071 may include two parts: a touch detection device and a touch controller.
  • Other input devices 10072 may include but are not limited to physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be described again here.
  • the radio frequency unit 1001 after receiving downlink data from the network side device, can transmit it to the processor 1010 for processing; in addition, the radio frequency unit 1001 can send uplink data to the network side device.
  • the radio frequency unit 1001 includes, but is not limited to, an antenna, an amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, etc.
  • Memory 1009 may be used to store software programs or instructions as well as various data.
  • the memory 1009 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required for at least one function (such as a sound playback function, Image playback function, etc.) etc.
  • memory 1009 may include volatile memory or nonvolatile memory, or memory 1009 may include both volatile and nonvolatile memory.
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
  • Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (Synch link DRAM) , SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DRRAM).
  • RAM Random Access Memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory Synchronous DRAM, SDRAM
  • Double data rate synchronous dynamic random access memory Double Data Rate SDRAM, DDRSDRAM
  • enhanced SDRAM synchronous dynamic random access memory
  • Synch link DRAM synchronous link dynamic random access memory
  • SLDRAM direct memory bus random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the processor 1010 may include one or more processing units; optionally, the processor 1010 integrates an application processor and a modem processor, where the application processor mainly handles operations related to the operating system, user interface, application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the above modem processor may not be integrated into the processor 1010.
  • the terminal 1000 can serve as a member of the federated learning, and the processor 1010 is used to determine the first information;
  • the radio frequency unit 1001 is used to send first information to the server in federated learning.
  • the first information is used by the server to determine whether the terminal 1000 participates in the next round of federated learning; the first information includes at least one of the following: used to instruct the terminal 1000 The second information of whether to agree to participate in federated learning, the status information of terminal 1000 in this round of federated learning, the model performance information of this round of federated learning, etc.
  • the terminal 1000 provided by the embodiment of the present application can implement each process implemented by the method embodiment shown in Figure 5 and achieve the same technical effect. To avoid duplication, the details will not be described here.
  • the embodiment of the present application also provides a network side device.
  • the network side device 110 It includes: processor 111, network interface 112 and memory 113.
  • the network interface 112 is, for example, a common public radio interface (CPRI).
  • CPRI common public radio interface
  • the network side device 110 in the embodiment of the present application also includes: instructions or programs stored in the memory 113 and executable on the processor 111.
  • the processor 111 calls the instructions or programs in the memory 113 to execute FIG. 7 and/or The execution method of each module shown in Figure 8 achieves the same technical effect. To avoid repetition, it will not be described in detail here.
  • Embodiments of the present application also provide a readable storage medium. Programs or instructions are stored on the readable storage medium. When the program or instructions are executed by a processor, each process of the above federated learning method embodiment is implemented, and the same can be achieved. The technical effects will not be repeated here to avoid repetition.
  • the processor is the processor in the terminal described in the above embodiment.
  • the readable storage medium includes computer readable storage media, such as computer read-only memory ROM, random access memory RAM, magnetic disk or optical disk, etc.
  • An embodiment of the present application further provides a chip.
  • the chip includes a processor and a communication interface.
  • the communication interface is coupled to the processor.
  • the processor is used to run programs or instructions to implement the above federated learning method embodiment. Each process can achieve the same technical effect. To avoid duplication, it will not be described again here.
  • chips mentioned in the embodiments of this application may also be called system-on-chip, system-on-a-chip, system-on-chip or system-on-chip, etc.
  • Embodiments of the present application further provide a computer program/program product.
  • the computer program/program product is stored in a storage medium.
  • the computer program/program product is executed by at least one processor to implement the above federated learning method embodiment.
  • Each process can achieve the same technical effect. To avoid repetition, we will not go into details here.
  • An embodiment of the present application also provides a communication system, including the above first communication device and a second communication device.
  • the first communication device can be used to perform the steps of the federated learning method as shown in Figure 4.
  • the second communication device The communication device may be used to perform the steps of the federated learning method as shown in Figure 5.
  • the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation.
  • the technical solution of the present application can be embodied in the form of a computer software product that is essentially or contributes to related technologies.
  • the computer software product is stored in a storage medium. (such as ROM/RAM, magnetic disk, optical disk), including a number of instructions to cause a terminal (which can be a mobile phone, computer, server, air conditioner, or network equipment, etc.) to execute the method described in various embodiments of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Medical Informatics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请公开了一种联邦学习方法、装置、通信设备及可读存储介质,属于通信技术领域,本申请实施例的联邦学习方法包括:第一通信设备从第二通信设备接收第一信息,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息;根据所述第一信息,确定所述第二通信设备是否参加下一轮联邦学习。

Description

联邦学习方法、装置、通信设备及可读存储介质
相关申请的交叉引用
本申请主张在2022年7月8日在中国提交的中国专利申请No.202210815546.7的优先权,其全部内容通过引用包含于此。
技术领域
本申请属于通信技术领域,具体涉及一种联邦学习方法、装置、通信设备及可读存储介质。
背景技术
相关通信网络中,为了提升模型效果,可以基于联邦学习来进行模型的训练。但参与联邦学习的成员在联邦学习的过程中,可能因为种种原因比如因为有其他更重要的任务到来而不再愿意参加联邦学习、有太多任务要处理先退出联邦学习等,不想参与联邦学习或者不再是合适的联邦学习成员。这种情况下,如何合理地选择参与联邦学习的成员是目前急需解决的问题。
发明内容
本申请实施例提供一种联邦学习方法、装置、通信设备及可读存储介质,能够解决如何合理地选择参与联邦学习的成员的问题。
第一方面,提供了一种联邦学习方法,包括:
第一通信设备从第二通信设备接收第一信息,其中,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息;
所述第一通信设备根据所述第一信息,确定所述第二通信设备是否参加下一轮联邦学习。
第二方面,提供了一种联邦学习方法,包括:
第二通信设备确定第一信息,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息;
所述第二通信设备向第一通信设备发送所述第一信息,所述第一信息用于所述第一通信设备确定所述第二通信设备是否参加下一轮联邦学习。
第三方面,提供了一种联邦学习装置,应用于第一通信设备,包括:
第一接收模块,用于从第二通信设备接收第一信息,其中,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设 备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息;
第一确定模块,用于根据所述第一信息,确定所述第二通信设备是否参加下一轮联邦学习。
第四方面,提供了一种联邦学习装置,应用于第二通信设备,包括:
第二确定模块,用于确定第一信息,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息;
第二发送模块,用于向第一通信设备发送所述第一信息,所述第一信息用于所述第一通信设备确定所述第二通信设备是否参加下一轮联邦学习。
第五方面,提供了一种通信设备,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤,或者实现如第二方面所述的方法的步骤。
第六方面,提供了一种通信设备,包括处理器及通信接口,例如该通信设备为第一通信设备时,所述通信接口用于从第二通信设备接收第一信息,所述处理器用于根据所述第一信息,确定所述第二通信设备是否参加下一轮联邦学习;或者,该通信设备为第二通信设备时,所述处理器用于确定第一信息,所述通信接口用于向第一通信设备发送所述第一信息;其中,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息。
第七方面,提供了一种通信系统,包括如上的第一通信设备和第二通信设备,所述第一通信设备可用于执行如第一方面所述的联邦学习方法的步骤,所述第二通信设备可用于执行如第二方面所述的联邦学习方法的步骤。
第八方面,提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤,或者实现如第二方面所述的方法的步骤。
第九方面,提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法的步骤,或者实现如第二方面所述的方法的步骤。
第十方面,提供了一种计算机程序/程序产品,所述计算机程序/程序产品被存储在存储介质中,所述计算机程序/程序产品被至少一个处理器执行以实现如第一方面所述的方法的步骤,或者实现如第二方面所述的方法的步骤。
在本申请实施例中,可以从第二通信设备接收第一信息,并根据第一信息,确定第二通信设备是否参加下一轮联邦学习;所述第一信息包括以下至少一项:用于指示第二通信设备是否同意参加联邦学习的第二信息、第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息。由此,可以结合第二通信设备的意愿、状态信息和/或模型性 能等,确定第二通信设备是否参加下一轮联邦学习,从而实现合理地选择参与联邦学习的成员。
附图说明
图1是本申请实施例可应用的一种无线通信系统的框图;
图2是本申请实施例中的神经网络的示意图;
图3是本申请实施例中的神经元的示意图;
图4是本申请实施例提供的一种联邦学习方法的流程图;
图5是本申请实施例提供的另一种联邦学习方法的流程图;
图6是本申请实施例中的联邦学习过程的示意图;
图7是本申请实施例提供的一种联邦学习装置的结构示意图;
图8是本申请实施例提供的另一种联邦学习装置的结构示意图;
图9是本申请实施例提供的一种通信设备的结构示意图;
图10是本申请实施例提供的一种终端的结构示意图;
图11是本申请实施例提供的一种网络侧设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”所区别的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”一般表示前后关联对象是一种“或”的关系。
值得指出的是,本申请实施例所描述的技术不限于长期演进型(Long Term Evolution,LTE)/LTE的演进(LTE-Advanced,LTE-A)系统,还可用于其他无线通信系统,诸如码分多址(Code Division Multiple Access,CDMA)、时分多址(Time Division Multiple Access,TDMA)、频分多址(Frequency Division Multiple Access,FDMA)、正交频分多址(Orthogonal Frequency Division Multiple Access,OFDMA)、单载波频分多址(Single-carrier Frequency Division Multiple Access,SC-FDMA)和其他系统。本申请实施例中的术语“系统”和“网络”常被可互换地使用,所描述的技术既可用于以上提及的系统和无线电技术,也可用于其他系统和无线电技术。以下描述出于示例目的描述了新空口(New Radio,NR)系统,并且在以下大部分描述中使用NR术语,但是这些技术也可应用于NR系统应用以外的应 用,如第6代(6th Generation,6G)通信系统。
图1示出本申请实施例可应用的一种无线通信系统的框图。无线通信系统包括终端11和网络侧设备12。其中,终端11可以是手机、平板电脑(Tablet Personal Computer)、膝上型电脑(Laptop Computer)或称为笔记本电脑、个人数字助理(Personal Digital Assistant,PDA)、掌上电脑、上网本、超级移动个人计算机(ultra-mobile personal computer,UMPC)、移动上网装置(Mobile Internet Device,MID)、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、机器人、可穿戴式设备(Wearable Device)、车载设备(Vehicle User Equipment,VUE)、行人终端(Pedestrian User Equipment,PUE)、智能家居(具有无线通信功能的家居设备,如冰箱、电视、洗衣机或者家具等)、游戏机、个人计算机(personal computer,PC)、柜员机或者自助机等终端侧设备,可穿戴式设备包括:智能手表、智能手环、智能耳机、智能眼镜、智能首饰(智能手镯、智能手链、智能戒指、智能项链、智能脚镯、智能脚链等)、智能腕带、智能服装等。需要说明的是,在本申请实施例并不限定终端11的具体类型。网络侧设备12可以包括接入网设备或核心网设备,接入网设备也可以称为无线接入网设备、无线接入网(Radio Access Network,RAN)、无线接入网功能或无线接入网单元。接入网设备可以包括基站、无线局域网(Wireless Local Area Networks,WLAN)接入点或WiFi节点等,基站可被称为节点B、演进节点B(eNB)、接入点、基收发机站(Base Transceiver Station,BTS)、无线电基站、无线电收发机、基本服务集(Basic Service Set,BSS)、扩展服务集(Extended Service Set,ESS)、家用B节点、家用演进型B节点、发送接收点(Transmitting Receiving Point,TRP)或所述领域中其他某个合适的术语,只要达到相同的技术效果,所述基站不限于特定技术词汇,需要说明的是,在本申请实施例中仅以NR系统中的基站为例进行介绍,并不限定基站的具体类型。核心网设备可以包含但不限于如下至少一项:网络数据分析功能(Network Data Analytic Function,NWDAF)、核心网节点、核心网功能、移动管理实体(Mobility Management Entity,MME)、接入移动管理功能(Access and Mobility Management Function,AMF)、会话管理功能(Session Management Function,SMF)、用户平面功能(User Plane Function,UPF)、策略控制功能(Policy Control Function,PCF)、策略与计费规则功能单元(Policy and Charging Rules Function,PCRF)、边缘应用服务发现功能(Edge Application Server Discovery Function,EASDF)、统一数据管理(Unified Data Management,UDM),统一数据仓储(Unified Data Repository,UDR)、归属用户服务器(Home Subscriber Server,HSS)、集中式网络配置(Centralized network configuration,CNC)、网络存储功能(Network Repository Function,NRF),网络开放功能(Network Exposure Function,NEF)、本地NEF(Local NEF,或L-NEF)、绑定支持功能(Binding Support Function,BSF)、应用功能(Application Function,AF)等。需要说明的是,在本申请实施例中仅以NR系统中的核心网设备为例进行介绍,并不限定核心网设备的具体类型。
可选的,在本申请实施例中,网络数据分析功能NWDAF可以拆分成两个网元,比如 模型训练逻辑网元(Model Training Logical Function,MTLF)和分析逻辑网元(Analytics Logical Function,AnLF)。其中,模型训练逻辑网元MTLF主要用于生成模型并进行模型训练,既可以是联邦学习中的中央服务器(server),也可以是联邦学习中的成员(clients)。分析逻辑网元AnLF主要用于进行推理来生成预测信息或者模型等,可以向MTLF请求模型,该模型可以是通过联邦学习生成。
可选的,本申请实施例中的模型可以为人工智能(Artificial Intelligence,AI)模型。AI模型有多种算法实现方式,例如神经网络、决策树、支持向量机、贝叶斯分类器等。本申请以神经网络为例进行说明,但是并不限定AI模块的具体类型。
例如,一个神经网络的示意图可以如图2所示,其中,X1、X2…Xn等为输入值,Y为输出结果,一个个“○”代表一个个神经元即是进行运算的地方,结果会继续传入到下一层。这些众多神经元组成的输入层、隐藏层和输出层就是一个神经网络。隐藏层的数量以及每一层神经元的数量即是神经网络的“网络结构”。
又例如,神经网络由神经元组成,神经元的示意图可以如图3所示,其中,a1、ak…aK(即图2所示的X1、X2…)为输入,w为权值(也可称为:乘性系数),b为偏置(也可称为:加性系数),σ()为激活函数,z为输出值,相应运算过程可表示为: 常见的激活函数包括但不限于Sigmoid函数、双曲正切tanh函数、修正线性单元(Rectified Linear Unit,ReLU)等等。每一个神经元的参数信息和所用算法组合在一起就是整个神经网络的“参数信息”,也是AI模型文件中很重要的一部分。
在实际使用过程中,一个AI模型指的是一个包含网络结构和参数信息等元素的文件,经过训练的AI模型可被其框架平台直接再次使用,无需重复构建或者学习,直接进行判断和/或识别等智能化功能。
联邦学习旨在建立一个基于分布数据集的联邦学习模型。在模型训练的过程中,模型相关的信息能够在各方之间交换(或者是以加密形式交换),但原始数据不能。这一交换不会暴露每个训练节点上数据的任何受保护的隐私部分。
可选的,本申请实施例中涉及的联邦学习为横向联邦学习。横向联邦学习的本质是样本的联合,适用于参与者间的业态相同,但触达客户不同,即特征重叠多、用户重叠少时的场景。比如,通信网络内的CN域和RAN域服务不同用户(如每一个UE,即样本不同)的同一服务,比如移动管理(Mobility Management,MM)业务、会话管理(Session Management,SM)业务或某一业务。通过联合参与方的不同样本的相同数据特征,横向联邦使训练样本的数量增多,从而得到一个更好的模型。
本申请实施例中,联邦学习中的服务器(server,也可称为中央服务器或组织者)可以是网络中网元设备,如由NWDAF拆分的MTLF。参与联邦学习的成员(client,也可称为参与方)可以是网络中网元设备,如由NWDAF拆分的MTLF,也可以是终端等。在进行联邦学习时,联邦学习中的服务器可以先进行对参与联邦学习的成员的选择,比如向NRF等储存信息网元发送请求,以请求获取各MTLF等智能化网元设备的能力信息,并 通过能力信息匹配其是否能参与联邦学习;然后,向选择的各成员发送联邦学习的初始化模型等信息。各成员进行本地模型训练后向服务器反馈中间结果,如梯度等。之后,服务器对收到的中间结果进行聚合并更新全局模型。多次重复成员选择-模型下发-本地模型训练-中间结果反馈-聚合并更新全局模型的步骤,待模型收敛等情况后即可停止模型训练。
下面结合附图,通过一些实施例及其应用场景对本申请实施例提供的联邦学习方法、装置、通信设备及可读存储介质进行详细地说明。
请参见图4,图4是本申请实施例提供的一种联邦学习方法的流程图,该方法应用于第一通信设备,该第一通信设备具体为联邦学习中的服务器(server),包括但不限于MTLF等智能化网元设备。如图4所示,该方法包括如下步骤:
步骤41:第一通信设备从第二通信设备接收第一信息。
步骤42:第一通信设备根据第一信息,确定第二通信设备是否参加下一轮联邦学习。
本实施例中,上述的第一信息可以包括但不限于以下至少一项:用于指示第二通信设备是否同意参加联邦学习的第二信息、第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息等。比如,该第二信息也可选为意愿信息,用于指示第二通信设备是否有意愿参加联邦学习。
此外,上述的第一信息还可包括第二通信设备的能力信息。比如,此能力信息为在本轮模型训练结束后的能力信息,包括但不限于是否还能当联邦学习的参与者(成员)、参与训练模型的精度信息等。比如,一次本地训练结束后,某成员的能力信息为:能当联邦学习的参与者,有进行本地训练的能力,参与训练模型的精度信息为X等。
上述的第二通信设备具体为联邦学习中的成员(client)设备,可以包括但不限于终端以及MTLF等智能化网元设备。
一些实施例中,上述的第一信息可以是第二通信设备(即联邦学习中的成员)主动汇报,比如和本地训练的结果一起反馈给联邦学习中的服务器,从而减少信令的消耗,和交互的次数。
一些实施例中,当联邦学习中的成员不再愿意参加联邦学习时,比如因为有其他更重要的任务到来而不再愿意参加联邦学习、有太多任务要处理先退出联邦学习等情况,可以将用于指示其不同意参加联邦学习的信息,即退出联邦学习的意愿信息,反馈给联邦学习中的服务器,以帮助联邦学习过程中成员的选择,实现合理地选择参与联邦学习的成员。而若该成员有意愿继续参加联邦学习,可以不反馈用于指示其同意参加联邦学习的信息,此时服务器默认其有意愿继续参加联邦学习。此外,联邦学习中的成员也可直接向服务器指示其愿意参加联邦学习。
另一些实施例中,由于联邦学习中成员的状态发生变化(比如负载变重等)时,可能会导致本地模型训练所需的算力不够,不再适合被选择为下一轮参加联邦学习的成员,因此,联邦学习中的成员可以将其在本轮联邦学习的状态信息发送给联邦学习中的服务器,由服务器确定该成员是否参加下一轮联邦学习,从而帮助联邦学习过程中成员的选择,实 现合理地选择参与联邦学习的成员,并提高训练效率,如避免状态变差的成员掉队(比如避免状态变差的成员没有在规定时间内反馈结果等情况),以及选择能带来更高效率的成员。
另一些实施例中,由于联邦学习中成员的数据已被多次学习或者已被融入到联邦学习的全局模型时,该全局模型在该成员的环境下会过拟合,该成员可能不再适合被选择为下一轮参加联邦学习的成员,故此时可以暂停几轮该成员的训练,从而更快速的实现模型收敛。因此,联邦学习中的成员可以将其在本轮联邦学习的模型性能信息发送给联邦学习中的服务器,由服务器确定该成员是否参加下一轮联邦学习,从而帮助联邦学习过程中成员的选择,实现合理地选择参与联邦学习的成员,并提高训练效率,如避免状态变差的成员掉队(比如避免状态变差的成员没有在规定时间内反馈结果等情况),以及选择能带来更高效率的成员。
可选的,上述接收第一信息之后,第一通信设备还可以根据该第一信息,选择第三通信设备参加下一轮联邦学习。该第三通信设备不同于第二通信设备,具体为参与到联邦学习新的成员(client)设备,可以包括但不限于终端以及MTLF等智能化网元设备。比如,若根据收到的第一信息确定较多的成员设备不再适合参加下一轮联邦学习,则可以选择新的成员参加下一轮联邦学习,以保证联邦学习的顺利进行。
本申请实施例中,上述状态信息可以用于说明第二通信设备(即联邦学习中成员)在本轮联邦学习的本地训练完成后的状态信息,可以包括但不限于以下至少一项:
1)第二通信设备在本轮联邦学习的负载信息。
本实施例中,该负载信息可理解为负载情况信息,可以表示网元比如网络功能(网Network Function,NF)的负载情况。
可选的,该负载信息可以包括以下至少一项:平均负载信息、峰值负载信息等。平均负载信息可理解为,在本轮联邦学习的范围内的负载平均值。比如,一次本地训练中,某成员的平均负载为70%,峰值负载为80%。
2)第二通信设备在本轮联邦学习的资源使用信息。
本实施例中,该资源使用信息可理解为资源使用情况信息。
可选的,该资源使用信息可以包括以下至少一项:平均资源使用信息、峰值资源使用信息。平均资源使用信息可理解为,在本轮联邦学习的范围内的平均资源使用情况。
比如,该资源使用信息对应的资源使用端(如resource usage)可以包括但不限于中央处理器(Central Processing Unit,CPU)、内存(memory)、磁盘(disk)、图形处理器(Graphics Processing Unit)GPU等。该资源使用信息可以包括电量信息等。
比如,一次本地训练中,某成员的平均资源使用情况为:CPU使用60%,GPU使用80%,内存使用70%(比如占用12GB,即用数值表示的情况),磁盘空间使用40%;且该成员的峰值资源使用情况为:CPU使用80%,GPU使用100%,内存使用80%(比如占用14GB,即用数值表示的情况),磁盘空间使用50%。
本申请实施例中,上述模型性能信息可选为本地模型训练开始前和/或完成后的模型性能信息,可以包括至少一项:
本地模型训练完成后的第一模型性能信息;
本地模型训练开始前的第二模型性能信息。
可选的,上述模型性能信息可以包括以下至少一项:准确度、平均绝对值误差(Mean Absolute Error,MAE)。此外,还可包括但不限于以下至少一项:精确度(Precision)、召回率(Recall)、F1分数(F1score)、面积曲线(Area Under Curve,AUC)、误差平方和(Sum of Squares due to Error,SSE)、和方差、均方差(Mean Squared Error,MSE)、方差、均方根(Root Mean Squared Error,RMSE)、标准差、确定系数(R-Squared)等等。
一些实施例中,上述的第一模型性能信息可包括准确度、平均绝对值误差MAE等。上述的第二模型性能信息可包括准确度、平均绝对值误差MAE等。
可理解的,上述的第一模型性能信息主要用于说明在本轮联邦学习的本地模型训练完成后的模型基于其本地数据的表现,可以包括某种统计参数和该参数对应的数值,如模型的准确度和具体值(如80%),以及平均绝对值误差MAE和其值(如0.1)。上述的第二模型性能信息主要用于说明在本轮联邦学习的本地模型训练开始前的模型基于其本地数据的表现,即在接收到模型后需要先进行一次模型性能的统计计算,可以包括某种统计计算参数和该参数对应的数值,如模型的准确度和具体值(如70%),以及平均绝对值误差MAE和其值(如0.15)。
需指出的,准确率是指预测正确的数量与预测总数的百分比。在模型训练阶段,数据集包含输入数据和标签(标签数据),这两者有对应关系,一组输入数据对应一个或一组标签,通过比较模型生成的预测值与该次训练对应的标签,判断此次训练是否正确。平均绝对值误差MAE表示预测值和真实值之间绝对误差的平均值,计算方式为:
其中,h(xi)表示模型的预测值,yi表示相应的真实值,m表示训练样本的个数。
本申请实施例中,对于第二通信设备是否反馈第一信息,可以由第一通信设备决定。可选的,第一通信设备可以向第二通信设备发送第三信息,所述第三信息用于标识第二通信设备需要反馈第一信息。而若第一通信设备没有发送第三信息,即第二通信设备没有接收到第三信息,则第二通信设备无需反馈第一信息。
可选的,所述第三信息可以包括但不限于以下至少一项:
用于标识所述第二通信设备需要反馈所述第二信息的信息;比如,该信息为需要反馈所述第二信息的标识;
用于标识第二通信设备需要反馈状态信息的信息;比如,该信息为需要反馈状态信息的标识;
用于标识第二通信设备需要反馈模型性能信息的信息;比如,该信息包括需要反馈本 地模型训练完成后的模型性能信息的标识,和/或需要反馈本地模型训练开始前的模型性能信息的标识。
需指出的,上述用于标识第二通信设备需要反馈状态信息的信息主要用于说明在本轮联邦学习的本地模型训练完成后,第二通信设备需要反馈其状态信息。此外,还可指定具体的状态信息可以是以下至少一项:成员的负载情况(如NF load)、成员的资源使用情况(如resource usage包括CPU、memory、disk和/或GPU等)等。
上述用于标识第二通信设备需要反馈模型性能信息的信息主要用于说明在本轮联邦学习的本地模型训练完成后,第二通信设备需要反馈本地模型训练开始前和/或完成后的模型性能信息,该模型性能信息包括上述的第一模型性能信息和/或第二模型性能信息。
可选的,上述发送第三信息可以包括以下至少一项:
第一通信设备根据预设策略,向第二通信设备发送第三信息;其中,该预设策略可以是指第一通信设备何时或者何种情况下向第二通信设备发送第三信息,比如每五轮训练后,向第二通信设备发送第三信息;或者某第二通信设备参与了5轮训练后,向该第二通信设备发送第三信息。该预设策略不仅可以指示要不要第二设备反馈,还可以指示何时或者何种情况寻求反馈。比如,若预设策略指示第二通信设备需反馈第一信息,则第一通信设备可以向第二通信设备发送第三信息;而若预设策略指示第二通信设备无需反馈第一信息,则第一通信设备不向第二通信设备发送第三信息。该预设策略可以预定义、协议约定等。
第一通信设备根据在基于联邦学习的模型训练过程中的需求,向第二通信设备发送第三信息;比如,若第一通信设备期望根据第二通信设备的意愿、状态和/或模型性能等,决定第二通信设备是否参加下一轮联邦学习,则可以向第二通信设备向第二通信设备发送第三信息;否则,不向第二通信设备发送第三信息。也就是说,第一通信设备可以自主决定是否向第二通信设备发送第三信息。
可选的,上述发送第三信息可以包括:向第二通信设备发送第一请求,所述第一请求用于请求第二通信设备参加联邦学习,所述第一请求中携带有所述第三信息。这样,可以借助用于请求第二通信设备参加联邦学习的第一请求来发送第三信息,从而减少信令的消耗,和交互的次数。
本申请实施例中,若第一通信设备从多个第二通信设备接收到多个模型性能信息,则可以首先对多个模型性能信息进行汇总,得到第三模型性能信息;然后根据该第三模型性能信息,判定模型训练是否结束,比如判定模型是否收敛。比如,若第三模型性能信息包括准确度,且该准确度高于预设阈值,则可以判定模型训练结束,否则继续进行模型训练;或者,若第三模型性能信息包括平均绝对值误差MAE,且该MAE低于预设阈值,则可以判定模型训练结束,否则继续进行模型训练。
可选的,上述汇总的方式包括但不限于:对多个模型性能信息求取平均值、对多个模型性能信息求取加权平均值等。在求取加权平均值时,相应权重可以是由第一通信设备决定的,比如可采用预设或自身计算权重的方式。
一些实施例中,第一通信设备对多个第一模型性能信息(即本地模型训练完成后的模型性能信息)进行汇总,并根据汇总后的模型性能信息判定模型训练是否结束。
进一步的,在得到第三模型性能信息之后,第一通信设备可以将该第三模型性能信息反馈给模型使用者,以方便模型使用者了解模型性能。
上述实施例主要从第一通信设备(即联邦学习中服务器)的角度对本申请进行说明,下面将从第二通信设备(即联邦学习中成员)的角度对本申请进行说明。
请参见图5,图5是本申请实施例提供的一种联邦学习方法的流程图,该方法应用于第二通信设备,该第二通信设备具体为联邦学习中的成员(client),包括但不限于终端以及MTLF等智能化网元设备。如图5所示,该方法包括如下步骤:
步骤51:第二通信设备确定第一信息。
步骤52:第二通信设备向第一通信设备发送第一信息,所述第一信息用于第一通信设备确定第二通信设备是否参加下一轮联邦学习。
本实施例中,上述的第一信息可以包括但不限于以下至少一项:用于指示第二通信设备是否同意参加联邦学习的第二信息、第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息等。
上述的第一通信设备具体为联邦学习中的服务器(server),可以包括但不限于MTLF等智能化网元设备。
一些实施例中,上述的第一信息可以是第二通信设备(即联邦学习中的成员)主动汇报,比如和本地训练的结果一起反馈给联邦学习中的服务器,从而减少信令的消耗,和交互的次数。
本申请实施例中的联邦学习方法,通过向第一通信设备发送第一信息,所述第一信息包括以下至少一项:用于指示第二通信设备是否同意参加联邦学习的第二信息、第二通信设备在本轮联邦学习的状态信息、在本轮联邦学习的本地模型训练完成后的第一模型性能信息、在本轮联邦学习的本地模型训练开始前的第二模型性能信息,可以使得第一通信设备结合第二通信设备的意愿、状态信息和/或模型性能等,确定第二通信设备是否参加下一轮联邦学习,从而实现合理地选择参与联邦学习的成员,并提高训练效率,如避免成员掉队(即有成员没有在规定时间内反馈结果等情况),以及选择能带来更高效率的成员。
本申请实施例中。上述状态信息可以用于说明第二通信设备(即联邦学习中成员)在本轮联邦学习的本地训练完成后的状态信息,可以包括但不限于以下至少一项:
1)第二通信设备在本轮联邦学习的负载信息。
本实施例中,该负载信息可理解为负载情况信息,可以表示NF负载情况。
可选的,该负载信息可以包括以下至少一项:平均负载信息、峰值负载信息等。平均负载信息可理解为,在本轮联邦学习的范围内的负载平均值。比如,一次本地训练中,某成员的平均负载为70%,峰值负载为80%。
2)第二通信设备在本轮联邦学习的资源使用信息。
可选的,该资源使用信息可以包括以下至少一项:平均资源使用信息、峰值资源使用信息。平均资源使用信息可理解为,在本轮联邦学习的范围内的平均资源使用情况。
比如,该资源使用信息对应的资源使用者(如resource usage)可以包括但不限于中央处理器(Central Processing Unit,CPU)、内存(memory)、磁盘(disk)、图形处理器(Graphics Processing Unit)GPU等。该资源使用信息可以包括电量信息等。
比如,一次本地训练中,某成员的平均资源使用情况为:CPU使用60%,GPU使用80%,内存使用70%(比如占用12GB,即用数值表示的情况),磁盘空间使用40%;且该成员的峰值资源使用情况为:CPU使用80%,GPU使用100%,内存使用80%(比如占用14GB,即用数值表示的情况),磁盘空间使用50%。
本申请实施例中,上述模型性能信息可选为本地模型训练开始前和/或完成后的模型性能信息,可以包括至少一项:
本地模型训练完成后的第一模型性能信息;
本地模型训练开始前的第二模型性能信息。
可选的,上述模型性能信息可以包括以下至少一项:准确度、平均绝对值误差MAE等。比如,上述的第一模型性能信息可以包括准确度、平均绝对值误差MAE。上述的第二模型性能信息可以包括准确度、平均绝对值误差MAE。
本申请实施例中,对于第二通信设备是否反馈第一信息,可以由第一通信设备决定。上述确定第一信息可以包括:首先从第一通信设备接收第三信息,所述第三信息用于标识第二通信设备需要反馈第一信息;然后根据所述第三信息,确定第一信息。
可选的,所述第三信息可以包括但不限于以下至少一项:
用于标识所述第二通信设备需要反馈所述第二信息的信息;比如,该信息为需要反馈所述第二信息的标识;
用于标识第二通信设备需要反馈状态信息的信息;比如,该信息为需要反馈状态信息的标识;
用于标识第二通信设备需要反馈模型性能信息的信息;比如,该信息包括需要反馈本地模型训练完成后的模型性能信息的标识,和/或需要反馈本地模型训练开始前的模型性能信息的标识。
可选的,上述从第一通信设备接收第三信息可以包括:从第一通信设备接收第一请求,所述第一请求用于请求第二通信设备参加联邦学习,所述第一请求中携带有第三信息。这样,可以借助用于请求第二通信设备参加联邦学习的第一请求来发送第三信息,从而减少信令的消耗,和交互的次数。
下面结合图6对本申请实施例中的联邦学习过程进行说明。
本申请实施例中,联邦学习服务器(server)为NWDAF(如MTLF),联邦学习成员(clients)为NWDAF(如MTLF),如图6所示,具体联邦学习过程包括:
步骤61:联邦学习消费者(如:NWDAF(AnLF))向联邦学习服务器(如NWDAF (MTLF))发送模型请求(如Nnwdaf_MLModelProvision_Subscribe),该模型请求用于请求获得一个模型用于完成自己的任务。此时,服务器基于本地配置或者联邦学习消费者的请求等情况判断是否触发联邦学习,并进行初始化联邦学习和成员选择。
步骤62:若触发联邦学习,服务器在进行成员选择时,可以初始化制定对于联邦学习的策略,如:规定进行多少轮训练后收集一次状态信息,和/或进行多少轮训练后收集模型性能信息等。
步骤63:服务器向各个成员(clients)发送联邦学习的任务请求(如Nnwdaf_MLModelTraining_Subscribe),以请求参加联邦学习,并且根据全局模型和各成员的本地数据进行联邦学习的本地训练。该任务请求可以包括任务标识(如analytic ID)、模型初始化信息(比如包含训练参数training parameters)、用于标识需要反馈状态信息/模型性能信息的信息(即反馈需求feedback requirement)等。
其中,analytic ID主要用于指示相应模型是用于进行哪个任务。模型初始化信息用于描述模型和在此轮联邦学习中的配置信息等。描述模型是指描述模型本身,如模型是以何种算法,何种架构,何种参数及超参数等结构组成,又或者模型本身,如模型文件,模型文件的地址信息等。此轮联邦学习中的配置信息是指,在此轮联邦学习的本地训练过程中,要进行本地训练的轮数、应使用的数据类型等信息。对于用于标识需要反馈状态信息/模型性能信息的信息可以参见上述实施例所述,在此不再赘述。
步骤64:成员向自己所在区域或所属的数据源(data source)发送获取数据的请求(如Ndccf_DataManagement_Subscribe/Nnf_EventExposure_Subscribe),以收集数据进行本地的模型训练。根据任务的不同,提供数据的网元也不同,比如为UPF,OAM,UDM等。
步骤65:数据源返回响应给相应成员,该响应中包含请求的数据,该响应比如为:Ndccf_DataManagement_Notify/Nnf_EventExposure_Notify。
步骤66:各成员使用基于步骤64和65获取的数据进行本地模型训练,并生成中间结果,并在后续步骤中反馈给服务器,以供服务器聚合并更新全局模型,并使用本地数据进行模型性能的分析。
比如,分析模型性能可以是通过使用本地训练后的模型和本地的数据进行准确度或者MAE的计算等。如果步骤63中任务请求携带有需要反馈本地训练前的模型性能信息的标识信息,则成员需在进行本地训练前进行模型性能的统计计算。
一种实现中,成员NWDAF将模型预测结果正确的次数除以总预测次数作为模型的本地训练准确度,即计算式为:本地训练准确度=正确结果次数÷总次数。具体地,成员NWDAF可设置一个验证数据集用于评估本地训练准确度,该验证数据集中包括用于模型输入的数据(input data)和真实的标签数据(label/ground truth),成员NWDAF将输入数据输入训练后的模型得到输出数据,成员NWDAF再比较输出数据与真实标签数据是否一致,进而利用上述计算式来获得本地训练准确度的值。说明:预测结果正确这个概念不一定是觉得结果要与标签数据完全一致。当两者之前存在一定的差距,但此差距在允许范围 内时,也可认为预测结果正确。
一种实现中,成员NWDAF通过计算预测数据和标签数据(标签值,原始数据)对应点误差的平方和的均值,以得出MAE,见以下计算式:本地训练具体地,成员NWDAF可设置一个验证数据集用于评估本地训练准确度,该验证数据集中包括用于模型输入的数据(input data)和真实的标签数据(label/ground truth,即上述式子中的),成员NWDAF将输入数据输入训练后的模型得到输出数据(即预测数据y),成员NWDAF再比较输出数据与真实标签数据之间的平方和均值,即利用上述计算式来获得本地训练MAE的值。
步骤67:各成员主动或者根据步骤63中要求,反馈本地训练完的中间结果以及意愿、状态和/或模型性能等信息(上述的第一信息)给服务器,比如可以借由联邦学习训练过程的请求消息对应的反馈消息来反馈意愿、状态和/或模型性能等信息,该反馈消息可选为通知(notify)消息。
一种实现中,成员client发现自己的训练情况很良好,如准确度到达某一阈值(该阈值可以是步骤63中模型初始化信息携带的,也可以是模型请求中携带的,也可以是预先获得/配置的),则可主动反馈模型性能信息。又或者,成员client每轮都主动反馈其意愿、状态和/或模型性能信息等。服务器可以使用中间结果更新全局模型,并使用意愿、状态和/或模型性能等信息辅助决策下一轮联邦学习时对成员的选择。
步骤68:服务器根据反馈的中间结果进行中间结果的聚合和全局模型的更新,并根据反馈的意愿、状态和/或模型性能等信息,判断相应成员是否还需要参与下一轮联邦学习。此外,可以通过聚合模型性能信息以获得模型的整体/全局训练情况。
比如,server获得各个client反馈的中间结果后,可以将这些中间结果通过server的算法进行聚合,如平均,加权平均等,再使用这些中间结果更新全局模型。又比如,server可以根据client反馈的意愿、状态和/或模型性能等信息,进行该client是否还能参加下一轮联邦学习的判断,如该client在其意愿信息中说明,其退出联邦学习,则server不会选择该client进行下一轮联邦学习;又如,若该client反馈的状态信息是CPU使用90%,GPU使用100%,内存使用80%(如14GB,用数值表示的情况),磁盘空间使用50%,则server认为该client不再是好的参与联邦学习的成员,因为在进行本地训练时,GPU已经跑满,下一次训练可能会使用更长时间或者断开连接等,因此也不在下一轮成员选择中选择该client;再如,若该client反馈的模型性能为准确度98%,而同时其他client反馈的模型性能基本在60%-80%,那么server可能会认为模型在该client的环境下已经过拟合,需要该client的训练暂停,则也不在下一轮成员选择中选择该client。
需指出的,聚合模型性能信息以获得模型的整体/全局训练情况,是指server收集各个clients的模型性能信息,通过平均或者加权平均等方法,生成一个全局的训练情况。比如,假设有5个client参加了联邦学习,并且他们反馈了模型性能,如准确度分别为70%、72%、 75%、68%和65%,则server可以通过计算这些准确度的平均值获得全局训练情况,即全局训练情况的准确度为:(70%+72%+75%+68%+65%)/5=70%。
在进行完成员的重新选择之后,可以重复执行步骤63至步骤68,直到模型收敛。
步骤69:在完成联邦学习的模型训练后,服务器向消费者(如AnLF)反馈训练后的模型以及整体/全局模型性能。
本申请实施例提供的联邦学习方法,执行主体可以为联邦学习装置。本申请实施例中以联邦学习装置执行联邦学习方法为例,说明本申请实施例提供的联邦学习装置。
请参见图7,图7是本申请实施例提供的一种联邦学习装置的结构示意图,该装置应用于第一通信设备,该第一通信设备具体为联邦学习中的服务器(server),包括但不限于MTLF等智能化网元设备。如图7所示,联邦学习装置70包括:
第一接收模块71,用于从第二通信设备接收第一信息,其中,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息;
第一确定模块72,用于根据所述第一信息,确定所述第二通信设备是否参加下一轮联邦学习。
可选的,所述状态信息包括以下至少一项:
负载信息;
资源使用信息。
可选的,所述负载信息包括以下至少一项:平均负载信息、峰值负载信息;
所述资源使用信息包括以下至少一项:平均资源使用信息、峰值资源使用信息。
可选的,所述模型性能信息包括以下至少一项:
本地模型训练完成后的第一模型性能信息;
本地模型训练开始前的第二模型性能信息。
可选的,所述模型性能信息包括以下至少一项:准确度、平均绝对值误差、精确度、均方差。
可选的,联邦学习装置70还包括:
第一发送模块,用于向所述第二通信设备发送第三信息,所述第三信息用于标识所述第二通信设备需要反馈所述第一信息。
可选的,所述第三信息包括以下至少一项:
用于标识所述第二通信设备需要反馈所述第二信息的信息;比如,该信息为需要反馈所述第二信息的标识;
用于标识所述第二通信设备需要反馈状态信息的信息;
用于标识所述第二通信设备需要反馈模型性能信息的信息。
可选的,所述第一发送模块具体用于以下至少一项:
根据预设策略,向所述第二通信设备发送第三信息;
根据在基于联邦学习的模型训练过程中的需求,向所述第二通信设备发送第三信息。
可选的,所述第一发送模块具体用于向所述第二通信设备发送第一请求,所述第一请求用于请求所述第二通信设备参加联邦学习,所述第一请求中携带有所述第三信息。
可选的,联邦学习装置70还包括:
处理模块,用于在第一通信设备从多个第二通信设备接收到多个模型性能信息时,对多个所述模型性能信息进行汇总,得到第三模型性能信息,并根据所述第三模型性能信息,判定模型训练是否结束。
可选的,联邦学习装置70还包括:
反馈模块,用于将所述第三模型性能信息反馈给模型使用者。
可选的,联邦学习装置70还包括:
选择模块,用于根据所述第一信息,选择第三通信设备参加下一轮联邦学习。该第三通信设备不同于第二通信设备,具体为参与到联邦学习新的成员(client)设备,可以包括但不限于终端以及MTLF等智能化网元设备。比如,若根据收到的第一信息确定较多的成员不再适合参加下一轮联邦学习,则可以选择新的成员参加下一轮联邦学习,以保证联邦学习顺利进行。
本申请实施例提供的联邦学习装置70能够实现图4所示的方法实施例实现的各个过程,并达到相同的技术效果,为避免重复,这里不再赘述。
请参见图8,图8是本申请实施例提供的一种联邦学习装置的结构示意图,该装置应用于第二通信设备,该第二通信设备具体为联邦学习中的成员(client),包括但不限于终端以及MTLF等智能化网元设备。如图8所示,联邦学习装置80包括:
第二确定模块81,用于确定第一信息,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息;
第二发送模块82,用于向第一通信设备发送所述第一信息,所述第一信息用于所述第一通信设备确定所述第二通信设备是否参加下一轮联邦学习。
可选的,所述状态信息包括以下至少一项:
负载信息;
资源使用信息。
可选的,所述负载信息包括以下至少一项:平均负载信息、峰值负载信息;
所述资源使用信息包括以下至少一项:平均资源使用信息、峰值资源使用信息。
可选的,所述模型性能信息包括以下至少一项:
本地模型训练完成后的第一模型性能信息;
本地模型训练开始前的第二模型性能信息。
可选的,所述模型性能信息包括以下至少一项:准确度、平均绝对值误差、精确度、 均方差。
可选的,联邦学习装置80包括:
第二接收模块,用于从所述第一通信设备接收第三信息,所述第三信息用于标识所述第二通信设备需要反馈所述第一信息;
所述第二确定模块81具体用于:根据所述第三信息,确定所述第一信息。
可选的,所述第二接收模块还用于:从所述第一通信设备接收第一请求,所述第一请求用于请求所述第二通信设备参加联邦学习,所述第一请求中携带有所述第三信息。
本申请实施例提供的联邦学习装置80能够实现图5所示的方法实施例实现的各个过程,并达到相同的技术效果,为避免重复,这里不再赘述。
可选的,如图9所示,本申请实施例还提供一种通信设备90,包括处理器91和存储器92,存储器92上存储有可在所述处理器91上运行的程序或指令,例如,该通信设备90为第一通信设备时,该程序或指令被处理器91执行时实现上述图4所示的联邦学习方法实施例的各个步骤,且能达到相同的技术效果。该通信设备90为第二通信设备时,该程序或指令被处理器91执行时实现上述图5所示的联邦学习方法实施例的各个步骤,且能达到相同的技术效果,为避免重复,这里不再赘述。
本申请实施例还提供一种通信设备,包括处理器和通信接口,例如,该通信设备为第一通信设备时,通信接口用于从第二通信设备接收第一信息,处理器用于根据所述第一信息,确定所述第二通信设备是否参加下一轮联邦学习;或者该通信设备为第二通信设备时,处理器用于确定第一信息,通信接口用于向第一通信设备发送所述第一信息;所述第一信息包括以下至少一项:用于指示第二通信设备是否同意参加联邦学习的第二信息、第二通信设备在本轮联邦学习的状态信息、在本轮联邦学习的本地模型训练完成后的第一模型性能信息、在本轮联邦学习的本地模型训练开始前的第二模型性能信息。该实施例与上述方法实施例对应,上述方法实施例的各个实施过程和实现方式均可适用于该实施例中,且能达到相同的技术效果。
具体地,图10为实现本申请实施例的一种终端的硬件结构示意图。
该终端1000包括但不限于:射频单元1001、网络模块1002、音频输出单元1003、输入单元1004、传感器1005、显示单元1006、用户输入单元1007、接口单元1008、存储器1009以及处理器1010等中的至少部分部件。
本领域技术人员可以理解,终端1000还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器1010逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图10中示出的终端结构并不构成对终端的限定,终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
应理解的是,本申请实施例中,输入单元1004可以包括图形处理单元(Graphics Processing Unit,GPU)10041和麦克风10042,图形处理器10041对在视频捕获模式或图 像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元1006可包括显示面板10061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板10061。用户输入单元1007包括触控面板10071以及其他输入设备10072中的至少一种。触控面板10071,也称为触摸屏。触控面板10071可包括触摸检测装置和触摸控制器两个部分。其他输入设备10072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。
本申请实施例中,射频单元1001接收来自网络侧设备的下行数据后,可以传输给处理器1010进行处理;另外,射频单元1001可以向网络侧设备发送上行数据。通常,射频单元1001包括但不限于天线、放大器、收发信机、耦合器、低噪声放大器、双工器等。
存储器1009可用于存储软件程序或指令以及各种数据。存储器1009可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器1009可以包括易失性存储器或非易失性存储器,或者,存储器1009可以包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器1009包括但不限于这些和任意其它适合类型的存储器。
处理器1010可包括一个或多个处理单元;可选的,处理器1010集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作,调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器1010中。
可选的,终端1000可以作为联邦学习中的成员,处理器1010,用于确定第一信息;
射频单元1001,用于向联邦学习中的服务器发送第一信息,所述第一信息用于服务器确定终端1000是否参加下一轮联邦学习;第一信息包括以下至少一项:用于指示终端1000是否同意参加联邦学习的第二信息、终端1000在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息等。
本申请实施例提供的终端1000能够实现图5所示的方法实施例实现的各个过程,并达到相同的技术效果,为避免重复,这里不再赘述。
具体地,本申请实施例还提供了一种网络侧设备。如图11所示,该网络侧设备110 包括:处理器111、网络接口112和存储器113。其中,网络接口112例如为通用公共无线接口(common public radio interface,CPRI)。
具体地,本申请实施例的网络侧设备110还包括:存储在存储器113上并可在处理器111上运行的指令或程序,处理器111调用存储器113中的指令或程序执行图7和/或图8中所示各模块执行的方法,并达到相同的技术效果,为避免重复,故不在此赘述。
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述联邦学习方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,该处理器为上述实施例中所述的终端中的处理器。该可读存储介质,包括计算机可读存储介质,如计算机只读存储器ROM、随机存取存储器RAM、磁碟或者光盘等。
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述联邦学习方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片,系统芯片,芯片系统或片上系统芯片等。
本申请实施例另提供了一种计算机程序/程序产品,所述计算机程序/程序产品被存储在存储介质中,所述计算机程序/程序产品被至少一个处理器执行以实现上述联邦学习方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
本申请实施例还提供了一种通信系统,包括如上的第一通信设备和第二通信设备,所述第一通信设备可用于执行如图4所示的联邦学习方法的步骤,所述第二通信设备可用于执行如如图5所示的联邦学习方法的步骤。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质 (如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (21)

  1. 一种联邦学习方法,包括:
    第一通信设备从第二通信设备接收第一信息,其中,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息、用于指示第二通信设备退出联邦学习的意愿信息;
    所述第一通信设备根据所述第一信息,确定所述第二通信设备是否参加下一轮联邦学习。
  2. 根据权利要求1所述的方法,其中,所述状态信息包括以下至少一项:
    负载信息;
    资源使用信息。
  3. 根据权利要求2所述的方法,其中,所述负载信息包括以下至少一项:平均负载信息、峰值负载信息;
    所述资源使用信息包括以下至少一项:平均资源使用信息、峰值资源使用信息。
  4. 根据权利要求1所述的方法,其中,所述模型性能信息包括至少一项:
    本地模型训练完成后的第一模型性能信息;
    本地模型训练开始前的第二模型性能信息。
  5. 根据权利要求1所述的方法,其中,所述模型性能信息包括以下至少一项:准确度、平均绝对值误差、精确度、均方差。
  6. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述第一通信设备向所述第二通信设备发送第三信息,所述第三信息用于标识所述第二通信设备需要反馈所述第一信息。
  7. 根据权利要求6所述的方法,其中,所述第三信息包括以下至少一项:
    用于标识所述第二通信设备需要反馈所述第二信息的信息;
    用于标识所述第二通信设备需要反馈状态信息的信息;
    用于标识所述第二通信设备需要反馈模型性能信息的信息。
  8. 根据权利要求6所述的方法,其中,所述向所述第二通信设备发送第三信息,包括以下至少一项:
    所述第一通信设备根据预设策略,向所述第二通信设备发送第三信息;
    所述第一通信设备根据在基于联邦学习的模型训练过程中的需求,向所述第二通信设备发送第三信息。
  9. 根据权利要求6所述的方法,其中,所述向所述第二通信设备发送第三信息,包括:
    所述第一通信设备向所述第二通信设备发送第一请求,所述第一请求用于请求所述第 二通信设备参加联邦学习,所述第一请求中携带有所述第三信息。
  10. 根据权利要求1所述的方法,其中,若所述第一通信设备从多个第二通信设备接收到多个所述模型性能信息,所述方法还包括:
    所述第一通信设备对多个所述模型性能信息进行汇总,得到第三模型性能信息;
    所述第一通信设备根据所述第三模型性能信息,判定模型训练是否结束。
  11. 根据权利要求10所述的方法,其中,所述得到第三模型性能信息之后,所述方法还包括:
    所述第一通信设备将所述第三模型性能信息反馈给模型使用者。
  12. 根据权利要求1所述的方法,其中,所述接收第一信息之后,所述方法还包括:
    所述第一通信设备根据所述第一信息,选择第三通信设备参加下一轮联邦学习,其中,所述第三通信设备不同于所述第二通信设备,为参与到联邦学习的新的成员设备。
  13. 一种联邦学习方法,包括:
    第二通信设备确定第一信息,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息、用于指示第二通信设备退出联邦学习的意愿信息;
    所述第二通信设备向第一通信设备发送所述第一信息,所述第一信息用于所述第一通信设备确定所述第二通信设备是否参加下一轮联邦学习。
  14. 根据权利要求13所述的方法,其中,所述状态信息包括以下至少一项:
    负载信息;
    资源使用信息。
  15. 根据权利要求13所述的方法,其中,所述模型性能信息包括至少一项:
    本地模型训练完成后的第一模型性能信息;
    本地模型训练开始前的第二模型性能信息。
  16. 根据权利要求13所述的方法,其中,所述确定第一信息,包括:
    所述第二通信设备从所述第一通信设备接收第三信息,所述第三信息用于标识所述第二通信设备需要反馈所述第一信息;
    所述第二通信设备根据所述第三信息,确定所述第一信息。
  17. 根据权利要求16所述的方法,其中,所述从所述第一通信设备接收第三信息,包括:
    所述第二通信设备从所述第一通信设备接收第一请求,所述第一请求用于请求所述第二通信设备参加联邦学习,所述第一请求中携带有所述第三信息。
  18. 一种联邦学习装置,包括:
    第一接收模块,用于从第二通信设备接收第一信息,其中,所述第一信息包括以下至少一项:用于指示所述第二通信设备是否同意参加联邦学习的第二信息、所述第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息、用于指示第二通信设备退 出联邦学习的意愿信息;
    第一确定模块,用于根据所述第一信息,确定所述第二通信设备是否参加下一轮联邦学习。
  19. 一种联邦学习装置,包括:
    第二确定模块,用于确定第一信息,所述第一信息包括以下至少一项:用于指示第二通信设备是否同意参加联邦学习的第二信息、第二通信设备在本轮联邦学习的状态信息、本轮联邦学习的模型性能信息、用于指示第二通信设备退出联邦学习的意愿信息;
    第二发送模块,用于向第一通信设备发送所述第一信息,所述第一信息用于所述第一通信设备确定所述第二通信设备是否参加下一轮联邦学习。
  20. 一种通信设备,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至12任一项所述的联邦学习方法的步骤,或者实现如权利要求13至17任一项所述的联邦学习方法的步骤。
  21. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至12任一项所述的联邦学习方法的步骤,或者实现如权利要求13至17任一项所述的联邦学习方法的步骤。
PCT/CN2023/106114 2022-07-08 2023-07-06 联邦学习方法、装置、通信设备及可读存储介质 WO2024008154A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210815546.7 2022-07-08
CN202210815546.7A CN117411793A (zh) 2022-07-08 2022-07-08 联邦学习方法、装置、通信设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2024008154A1 true WO2024008154A1 (zh) 2024-01-11

Family

ID=89454298

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/106114 WO2024008154A1 (zh) 2022-07-08 2023-07-06 联邦学习方法、装置、通信设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN117411793A (zh)
WO (1) WO2024008154A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723620A (zh) * 2020-05-25 2021-11-30 株式会社日立制作所 无线联邦学习中的终端调度方法和装置
CN113988319A (zh) * 2021-10-27 2022-01-28 深圳前海微众银行股份有限公司 联邦学习模型的训练方法、装置、电子设备、介质及产品
CN114079902A (zh) * 2020-08-13 2022-02-22 Oppo广东移动通信有限公司 联邦学习的方法和装置
CN114444708A (zh) * 2020-10-31 2022-05-06 华为技术有限公司 获取模型的方法、装置、设备、系统及可读存储介质
WO2022111639A1 (zh) * 2020-11-30 2022-06-02 华为技术有限公司 联邦学习方法、装置、设备、系统及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723620A (zh) * 2020-05-25 2021-11-30 株式会社日立制作所 无线联邦学习中的终端调度方法和装置
CN114079902A (zh) * 2020-08-13 2022-02-22 Oppo广东移动通信有限公司 联邦学习的方法和装置
CN114444708A (zh) * 2020-10-31 2022-05-06 华为技术有限公司 获取模型的方法、装置、设备、系统及可读存储介质
WO2022111639A1 (zh) * 2020-11-30 2022-06-02 华为技术有限公司 联邦学习方法、装置、设备、系统及计算机可读存储介质
CN113988319A (zh) * 2021-10-27 2022-01-28 深圳前海微众银行股份有限公司 联邦学习模型的训练方法、装置、电子设备、介质及产品

Also Published As

Publication number Publication date
CN117411793A (zh) 2024-01-16

Similar Documents

Publication Publication Date Title
Van Le et al. A deep reinforcement learning based offloading scheme in ad-hoc mobile clouds
CN110598870A (zh) 一种联邦学习方法及装置
WO2022016964A1 (zh) 纵向联邦建模优化方法、设备及可读存储介质
CN116745780A (zh) 用于去中心化联邦学习的方法和系统
Yuan et al. Online dispatching and fair scheduling of edge computing tasks: A learning-based approach
Falowo et al. Dynamic RAT selection for multiple calls in heterogeneous wireless networks using group decision-making technique
CN112948885B (zh) 实现隐私保护的多方协同更新模型的方法、装置及系统
US20220386136A1 (en) Facilitating heterogeneous network analysis and resource planning for advanced networks
Huang et al. Multi-agent reinforcement learning for cost-aware collaborative task execution in energy-harvesting D2D networks
Yu et al. Collaborative computation offloading for multi-access edge computing
WO2023109827A1 (zh) 客户端筛选方法及装置、客户端及中心设备
Han et al. SplitGP: Achieving both generalization and personalization in federated learning
US11381408B2 (en) Method and apparatus for enhanced conferencing
Liu et al. Hastening stream offloading of inference via multi-exit dnns in mobile edge computing
WO2024008154A1 (zh) 联邦学习方法、装置、通信设备及可读存储介质
Zou et al. Resource multi-objective mapping algorithm based on virtualized network functions: RMMA
CN111191143A (zh) 应用推荐方法及装置
WO2023169402A1 (zh) 模型的准确度确定方法、装置及网络侧设备
Du et al. Online two-timescale service placement for time-sensitive applications in MEC-assisted network: A TMAGRL approach
WO2023207980A1 (zh) 模型信息获取方法、发送方法、装置、节点和储存介质
WO2024120470A1 (zh) 模型训练方法、终端及网络侧设备
WO2023186099A1 (zh) 信息反馈方法、装置及设备
WO2023186091A1 (zh) 样本确定方法、装置及设备
WO2024032694A1 (zh) Csi预测处理方法、装置、通信设备及可读存储介质
WO2023169404A1 (zh) 模型的准确度确定方法、装置及网络侧设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23834921

Country of ref document: EP

Kind code of ref document: A1