WO2024087573A1 - 一种联邦学习方法及装置 - Google Patents

一种联邦学习方法及装置 Download PDF

Info

Publication number
WO2024087573A1
WO2024087573A1 PCT/CN2023/092742 CN2023092742W WO2024087573A1 WO 2024087573 A1 WO2024087573 A1 WO 2024087573A1 CN 2023092742 W CN2023092742 W CN 2023092742W WO 2024087573 A1 WO2024087573 A1 WO 2024087573A1
Authority
WO
WIPO (PCT)
Prior art keywords
edge device
edge
model
devices
central node
Prior art date
Application number
PCT/CN2023/092742
Other languages
English (en)
French (fr)
Inventor
邵云峰
李秉帅
卢嘉勋
郑臻哲
吴帆
胡大海
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024087573A1 publication Critical patent/WO2024087573A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning

Definitions

  • the present application relates to the field of artificial intelligence (AI) technology, and in particular to a federated learning method and device.
  • AI artificial intelligence
  • Federated learning is a type of distributed machine learning (ML) that can jointly train models using business data on various terminal devices without sharing data between different terminal devices.
  • each terminal device learns the model from the central node based on local business data to obtain local learning results.
  • the central node then obtains the local learning results of multiple terminal devices, aggregates the models learned by these multiple terminal devices, and sends the aggregated model to each terminal device. Subsequently, the terminal device learns the aggregated model again based on local business data, and so on, until the learning is completed.
  • federated learning relies on the central node. If the computing resources or bandwidth resources of the central node are limited and cannot support the aggregation of more models, federated learning cannot be achieved.
  • the embodiments of the present application provide a federated learning method and apparatus for reducing the burden of a central node in performing federated learning and improving the efficiency of federated learning.
  • an embodiment of the present application provides a federated learning method, which can be performed by a first communication device, which can be a communication device or a communication device that can support the communication device to implement the functions required for the method, such as a chip system.
  • the first communication device is a central node, or a chip set in the central node, or other components for implementing the functions of the central node.
  • the following describes the federated learning method provided in the first aspect by taking the first communication device as a central node as an example.
  • the federated learning method includes: the central node sends a first model to at least one central edge device respectively, and receives at least one second model.
  • at least one central edge device corresponds one-to-one to at least one edge device group
  • an edge device group includes at least one edge device.
  • a second model is obtained by aggregating the third models obtained by each edge device in at least one edge device group.
  • a third model is a model obtained by learning the first model based on local data by at least one terminal device within the coverage range of an edge device in collaboration.
  • the central node aggregates the at least one second model to obtain a fourth model.
  • multiple edge devices are divided into multiple edge device groups.
  • One of the edge devices i.e., the central edge device in this article
  • the central node sends the first model to each edge device in an edge device group through the central edge device in the edge device group. That is, the central node does not need to interact with multiple edge devices separately, so that the number of edge devices interacting with the central node is reduced, which reduces the burden on the central node. Therefore, the edge devices in the edge device group can be increased so that more edge devices participate in learning, thereby accelerating the learning of the model and improving the efficiency of federated learning.
  • the method further includes: the central node divides the multiple edge devices participating in the learning into at least one edge device group according to one or more of the following information, and determines the central edge device of each edge device group: first information, second information or third information.
  • first information is used to indicate the communication relationship between the multiple edge devices
  • the communication relationship is used to indicate the edge devices among the multiple edge devices that can communicate with each edge device.
  • the second information is used to indicate the communication delay of the communication link of each edge device among the multiple edge devices.
  • the third information is used to indicate the model similarity between the multiple edge devices.
  • the embodiment of the present application provides a variety of division methods for dividing multiple edge devices into at least one edge device group, and the embodiment of the present application does not limit the specific division method.
  • the embodiment of the present application can divide multiple edge devices into at least one edge device group according to the actual needs of actual model learning. For example, it is relatively simple to divide multiple edge devices into at least one edge device group according to the communication relationship between multiple edge devices.
  • multiple edge devices with short communication delays can be divided into a group, thereby shortening the communication delay of each edge device in the group, which helps to shorten the model learning time.
  • multiple edge devices with relatively similar models can be divided into a group, which helps to speed up the convergence time of the model within the group and improve the efficiency of learning.
  • the central node divides the multiple edge devices participating in the learning into at least one edge device group according to one or more of the following information, including: the central node groups the multiple edge devices according to the first information or the second information to obtain M edge device groups, where M is an integer greater than or equal to 1; then, the central node groups the respective edge device groups in the M edge device groups again according to the third information to obtain at least one edge device group.
  • Multiple edge devices are divided into at least one edge device group by multiple division methods to minimize the time of model learning.
  • the method further includes: for a first edge device among multiple edge devices, the central node sends an acquisition instruction to the first edge device, and receives fourth information from the first edge device.
  • the acquisition instruction is used to instruct the first edge device to report information about edge devices that can communicate with the first edge device.
  • the fourth information includes information about edge devices that can communicate with the first edge device.
  • the central node determines the communication relationship between the multiple edge devices based on the fourth information respectively from the multiple edge devices.
  • the central node can determine the communication relationship between the multiple edge devices by instructing each edge device to report an edge device that can communicate directly, which is not affected by the layout of the multiple edge devices and is more flexible.
  • the method further includes: the central node reads configuration information from multiple edge devices respectively, and determines the communication relationship between the multiple edge devices based on the read configuration information.
  • the configuration information of one edge device among the multiple edge devices includes information of other edge devices that can communicate with the one edge device.
  • the information of other edge devices that can directly communicate with the edge device can be (pre) configured in the edge device, so that the central node can determine the communication relationship between multiple edge devices by reading the configuration information of each edge device.
  • an embodiment of the present application provides a federated learning method, which can be performed by a second communication device, which can be a communication device or a communication device capable of supporting the communication device to implement the functions required by the method, such as a chip system.
  • the second communication device is an edge device, such as a base station, or a device disposed at an edge device.
  • the federated learning method provided in the second aspect is described below by taking the first communication device as the first edge device as an example.
  • the federated learning method includes: a first edge device receives a first model from a central node, sends the first model to other edge devices in a first edge device group except the first edge device, and sends a second model to the central node.
  • the first edge device is a central edge device of the first edge device group, and the first edge device group includes at least one edge device.
  • the second model is obtained by aggregating third models obtained by each edge device in at least one edge device group. At least one edge device group includes the first edge device group.
  • a third model is a model obtained by learning the first model based on local data by at least one terminal device within the coverage range of an edge device in collaboration with the others.
  • any edge device group within an edge device group can cooperate with at least one terminal device within the coverage area to learn the first model sent by the central node based on local data to obtain a third model.
  • Multiple third models within one or more edge device groups can be aggregated and reported to the central node through a central edge device, so that the number of edge devices interacting with the central node is reduced, reducing the burden on the central node. Therefore, the edge devices within the edge device group can be increased so that more edge devices participate in learning, thereby accelerating the learning of the model and improving the efficiency of federated learning.
  • the method further includes: the first edge device receives at least one third model, aggregates the at least one third model and the third model obtained by learning the first model based on local data by the first edge device and at least one terminal device covered by the first edge device, and obtains the second model.
  • one of the at least one third model comes from the second edge device in the first edge device group.
  • one of the at least one third model comes from the first terminal device.
  • the first terminal device is a terminal device that moves from the coverage of an edge device in the second edge device group to the coverage of the first edge device.
  • the method further includes: the first edge device receives an acquisition instruction sent by the central node, and sends fourth information to the central node.
  • the acquisition instruction is used to instruct the first edge device to report information of edge devices that can communicate with the first edge device.
  • the fourth information includes information of edge devices that can communicate with the first edge device.
  • an embodiment of the present application provides a federated learning method, which can be performed by a third communication device, and the second communication device can be a communication device or a communication device that can support the communication device to implement the functions required by the method, such as a chip system.
  • the third communication device is a terminal device, or a chip set in the terminal device, or other components for implementing the functions of the terminal device. The following describes the federated learning method provided in the third aspect by taking the first communication device as an example of a terminal device.
  • the federated learning method includes: a first terminal device receives a third model from a second edge device in a first edge device group, the first terminal device is located in the coverage of the second edge device; moves from the coverage of the second edge device to the coverage of a third edge device in the second edge device group, and sends the third model to the third edge device.
  • the third model is a model obtained by at least one terminal device in the collaborative coverage of the second edge device learning the first model from the central node based on local data.
  • the terminal device can serve as a transmission medium within two edge device groups. For example, when the terminal device is within the coverage range of the second edge device in the first edge device group, it can obtain the third model obtained by the second edge device; when the terminal device moves to the coverage range of the third edge device in the second edge device group, the obtained third model can be sent to the third edge device, thereby realizing transmission between model groups.
  • an embodiment of the present application provides a communication device, which has the function of implementing the behavior in the method embodiment of the first aspect above.
  • a communication device which has the function of implementing the behavior in the method embodiment of the first aspect above.
  • the communication device may be the central node in the first aspect, or the communication device may be a device capable of implementing the method provided in the first aspect, such as a chip or a chip system.
  • the communication device has the function of implementing the behavior in the method embodiment of the second aspect mentioned above, and specific reference may be made to the description of the second aspect, which will not be repeated here.
  • the communication device may be the first edge device in the second aspect, or the communication device may be a device capable of implementing the method provided in the second aspect, such as a chip or a chip system.
  • the communication device has the function of implementing the behavior in the method embodiment of the third aspect mentioned above, and specific reference may be made to the description of the third aspect, which will not be repeated here.
  • the communication device may be the terminal device in the third aspect, or the communication device may be a device capable of implementing the method provided in the third aspect, such as a chip or a chip system.
  • the communication device includes corresponding means or modules for executing the method of the first aspect, the second aspect, or the third aspect.
  • the communication device includes a processing unit (sometimes also referred to as a processing module or a processor) and/or a transceiver unit (sometimes also referred to as a transceiver module or a transceiver).
  • a processing unit sometimes also referred to as a processing module or a processor
  • a transceiver unit sometimes also referred to as a transceiver module or a transceiver.
  • an embodiment of the present application provides a communication device, which may be the communication device in the first aspect of the above-mentioned embodiment, or a chip or chip system arranged in the communication device in the first aspect.
  • the communication device may be the communication device in the second aspect of the above-mentioned embodiment, or a chip or chip system arranged in the communication device in the second aspect.
  • the communication device may be the communication device in the third aspect of the above-mentioned embodiment, or a chip or chip system arranged in the communication device in the third aspect.
  • the communication device includes a communication interface and a processor, and optionally, also includes a memory.
  • the memory is used to store a computer program, and the processor is coupled to the memory and the communication interface. When the processor reads the computer program or instruction, the communication device executes the method performed by the central node, the first edge device or the terminal device in the above-mentioned method embodiment.
  • an embodiment of the present application provides a communication device, the communication device comprising an input/output interface and a logic circuit.
  • the input/output interface is used to input and/or output information.
  • the logic circuit is used to execute the method described in the first aspect, the second aspect, or the third aspect.
  • an embodiment of the present application provides a chip system, which includes a processor and may also include a memory and/or a communication interface, for implementing the method described in the first aspect, the second aspect, or the third aspect.
  • the chip system also includes a memory for storing a computer program.
  • the chip system may be composed of a chip, or may include a chip and other discrete devices.
  • an embodiment of the present application provides a communication system, comprising a central node, at least one edge device and at least one terminal device, wherein the central node is used to execute the method performed by the central node in the above-mentioned first aspect, any one edge device is used to execute the method performed by the first edge device in the above-mentioned second aspect, and any one terminal device is used to execute the method performed by the terminal device in the above-mentioned third aspect.
  • the present application provides a computer-readable storage medium storing a computer program.
  • the computer program When the computer program is executed, the method in the first aspect, the second aspect or the third aspect is implemented.
  • a computer program product comprising: a computer program code, when the computer program code is run, the method in the first aspect, the second aspect or the third aspect is executed.
  • beneficial effects of the fourth to tenth aspects and their implementations can refer to the description of the beneficial effects of the first to third aspects and their implementations.
  • FIG1 is a schematic diagram of an architecture of federated learning provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of the process of federated learning under the architecture shown in FIG1;
  • FIG3 is a schematic diagram of another architecture of federated learning provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of the architecture of hierarchical federated learning provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of the architecture of federated learning applicable to an embodiment of the present application.
  • FIG6 is a schematic diagram of a process of federated learning provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of a natural grouping of multiple edge devices provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of dividing multiple edge devices based on communication delay according to an embodiment of the present application.
  • FIG9 is a schematic diagram of the communication relationship between multiple edge devices provided in an embodiment of the present application.
  • FIG10 is a schematic block diagram of a communication device provided in an embodiment of the present application.
  • FIG. 11 is another schematic block diagram of a communication device provided in an embodiment of the present application.
  • the federated learning method provided in the embodiment of the present application can be applied to the training of machine learning models in the federated learning scenario.
  • the machine learning model for example, includes a neural network or other types of machine learning modules.
  • Terminal equipment also known as terminal or terminal device, is a device with wireless transceiver function, which can send signals to network equipment or receive signals from network equipment.
  • Terminal equipment may include user equipment (UE), sometimes also called terminal, access station, UE station, remote station, wireless communication equipment, or user device, etc.
  • UE user equipment
  • the terminal device is used to connect people, objects, machines, etc., and can be widely used in various scenarios, such as but not limited to the following scenarios: terminal devices in cellular communications, device to device (D2D), vehicle to everything (V2X), machine-to-machine/machine-type communications (M2M/MTC), Internet of Things (IoT), virtual reality (VR), augmented reality (AR), industrial control, self driving, remote medical, smart grid, smart furniture, smart office, smart wearable, smart transportation, smart city, drones, robots and other scenarios.
  • D2D device to device
  • V2X vehicle to everything
  • M2M/MTC machine-to-machine/machine-type communications
  • IoT Internet of Things
  • VR virtual reality
  • AR augmented reality
  • industrial control self driving, remote medical, smart grid, smart furniture, smart office, smart wearable, smart transportation, smart city, drones, robots and other scenarios.
  • the terminal device may also be a wearable device.
  • Wearable devices may also be referred to as wearable smart devices or smart wearable devices, etc., which are a general term for the intelligent design of daily wearables and the development of wearable devices using wearable technology, such as glasses, gloves, watches, clothing and shoes.
  • the terminal device may also include a relay.
  • the terminal device may be a customer premises equipment (CPE), which may receive signals from network devices and forward the signals to other terminal devices. Or it may be understood that anything that can communicate data with a base station can be regarded as a terminal device.
  • CPE customer premises equipment
  • the various terminal devices introduced above if located on a vehicle (for example, placed in a vehicle or installed in a vehicle), can be considered as vehicle-mounted terminal devices, which are also called on-board units (OBU).
  • OEM on-board units
  • the terminal device may also be a vehicle-mounted module, vehicle-mounted module, vehicle-mounted component, vehicle-mounted chip or vehicle-mounted unit that is built into the vehicle as one or more components or units.
  • the vehicle may implement the method of the embodiment of the present application through the built-in vehicle-mounted module, vehicle-mounted module, vehicle-mounted component, vehicle-mounted chip or vehicle-mounted unit.
  • the terminal device may refer to a device for realizing the functions of the terminal device, or may be a device capable of A device that can support the terminal device to implement the function, such as a chip system, can be installed in the terminal device.
  • the terminal device can also be a vehicle detector.
  • the chip system can be composed of a chip, or it can include a chip and other discrete devices.
  • the device for implementing the function of the terminal device is described as an example of a terminal device.
  • Network equipment refers to the access equipment that terminal equipment uses to access the mobile communication system wirelessly, such as access network (AN) equipment, such as base stations.
  • Network equipment may also refer to equipment that communicates with terminal equipment at the air interface.
  • Network equipment may include evolved Node B (also referred to as eNB or e-NodeB) in the long-term evolution (LTE) system or long-term evolution-advanced (LTE-A); network equipment may also include the next generation node B (gNB) in the fifth generation (5G) system; or, network equipment may also include access nodes in wireless-fidelity (Wi-Fi) systems; or network equipment may be relay stations, vehicle-mounted equipment, and future evolved public land mobile network (PLMN) equipment, equipment in D2D networks, equipment in M2M networks, equipment in IoT networks, or network equipment in PLMN networks.
  • PLMN public land mobile network
  • the base station in the embodiment of the present application may include a centralized unit (CU) and a distributed unit (DU), and multiple DUs may be centrally controlled by one CU.
  • CU and DU may be divided according to the protocol layer functions of the wireless network they possess, for example, the functions of the packet data convergence protocol (PDCP) layer and the protocol layers above are set in the CU, and the functions of the protocol layers below the PDCP, such as the radio link control (RLC) layer and the medium access control (MAC) layer, are set in the DU.
  • RLC radio link control
  • MAC medium access control
  • the radio frequency device can be remote and not placed in the DU, or it can be integrated in the DU, or part of it can be remote and part of it can be integrated in the DU, and the embodiment of the present application does not impose any restrictions.
  • the control plane (CP) and the user plane (UP) of the CU can also be separated and divided into different entities for implementation, namely the control plane CU entity (CU-CP entity) and the user plane CU entity (CU-UP entity).
  • the signaling generated by the CU can be sent to the terminal device through the DU, or the signaling generated by the UE can be sent to the CU through the DU.
  • the DU can directly encapsulate the signaling through the protocol layer and transparently transmit it to the UE or CU without parsing it.
  • the device for implementing the function of the network device may be a network device, or may be a device capable of supporting the network device to implement the function, such as a chip system, which may be installed in the network device.
  • the device for implementing the function of the network device is described as an example of a network device.
  • Central node also known as central server, central side server, or central end server, refers to a type of device with computing capabilities, such as a personal computer, cloud server, etc.
  • the central node can be a network management platform.
  • Edge devices also known as edge-side devices or edge-end devices, refer to devices that provide entry points to the core network of an enterprise or service provider, and can also provide connectivity to carrier and service provider networks.
  • edge devices can be routers, routing switches, integrated access devices (IADs), multiplexers, and various metropolitan area networks (MANs) and wide area networks (WANs) access devices.
  • IADs integrated access devices
  • MANs metropolitan area networks
  • WANs wide area networks
  • Federated learning also known as federated machine learning, joint learning, alliance learning, etc., is a distributed machine learning that can use the business data on each terminal device to jointly train the neural model without sharing data between different terminal devices.
  • Figure 1 is a diagram of a federated learning architecture provided in an embodiment of the present application.
  • Figure 1 takes a central node and multiple terminal devices as an example, and does not limit the specific types of these multiple terminal devices.
  • these multiple terminal devices can be one or more of mobile phones, computers, and vehicle-mounted devices.
  • the federated learning process mainly includes a local learning (training) model process and a central node aggregation model process.
  • the local training model process refers to the terminal device learning the global model from the central node based on local business data, and uploading the change in model parameters (also called model gradient) generated during the local learning process to the central node.
  • model parameters also called model gradient
  • the central node aggregation model process refers to the central node aggregating (also called convergence) the global model based on the model gradients from all or part of the terminal devices. Repeatedly iterate the local learning (training) model process and the central node aggregation model process until the model converges or reaches a pre-set number of learning times, and the learning ends.
  • Figure 2 shows a federated learning process based on the architecture shown in Figure 1.
  • Figure 2 takes multiple terminal devices including five terminal devices (i.e., terminal device 1-terminal device 5 in Figure 2) as an example.
  • each terminal device can report its own resource information to the central node.
  • the central node can summarize the resource information of multiple terminal devices to filter the terminal devices participating in the learning.
  • the terminal devices participating in the learning can be one or more terminal devices among the multiple terminal devices. If the resources of the terminal device are updated during the model learning process, the terminal device reports the updated resources to the central node so that the central node can filter more suitable terminal devices to participate in the learning.
  • the central node can form a learning resource list based on the resource information reported by multiple terminal devices, that is, the learning resource list includes the resource information of multiple terminal devices.
  • the central node determines the terminal devices participating in the learning based on the learning resource list. If a terminal device subsequently reports updated resources, the central node updates the learning resource list, and it can also be considered that the central node maintains the learning resource list.
  • the central node determines the terminal devices participating in the learning, and broadcasts the global model and model weight to the determined terminal devices. Any terminal device participating in the learning uses the received model weight as the initial weight of this learning, learns the global model from the central node based on the local business data, and obtains the local learning result, for example, the change in model parameters generated during the learning process (also called model gradient). Each terminal device participating in the learning sends the model gradient obtained in this learning to the central node.
  • the central node aggregates the global model according to the model gradients of multiple terminal devices to complete a round of learning.
  • the central node aggregates the global model according to the model gradients of multiple terminal devices, which can also be understood as the central node integrating (updating) the model weight.
  • the next round of learning process begins, that is, the central node sends the aggregated global model to each terminal device, and then each terminal device learns the aggregated global model respectively, and sends the model gradient obtained from the learning to the central node, and the central node aggregates the aggregated global model again according to the received model gradient. And so on, until the model converges or reaches the maximum number of learning times set in advance, the learning stops.
  • the terminal devices participating in the learning can be different in different rounds of learning.
  • the terminal devices participating in the learning may be terminal device 1, terminal device 3, and terminal device 5; in the N+1th round of training, the terminal devices participating in the learning are terminal device 1, terminal device 2, and terminal device 4.
  • the central node Since the terminal devices can learn the received models based on local business data, and then the central node aggregates the models learned by multiple terminal devices, that is, the central node does not need to collect business data from multiple terminal devices to learn the models, therefore, the data between the terminal devices is not shared, which can improve the security of the data.
  • Edge devices refer to devices that can provide capabilities similar to those of the central node at the edge of the central node. For example, edge devices can aggregate models learned from terminal devices.
  • Edge devices can be communication devices located between the central node and the terminal device. Since the transmission link between edge devices and terminal devices is shorter than the transmission link between central nodes and terminal devices, the time consumption of the entire federated learning process can be reduced.
  • a network device such as a base station
  • the federated learning architecture that uses network devices as edge devices is also called edge-based federated learning (Edge-based FL).
  • Edge-based FL edge-based federated learning
  • the edge device is taken as an example of a network device.
  • Figure 3 is a schematic diagram of an edge-based federated learning architecture.
  • Figure 3 takes an edge-based federated learning architecture including four edge devices (i.e., network devices 1-network devices 4) and multiple terminal devices as an example.
  • network devices 1-network devices 4 can communicate with each other.
  • Network devices 1-network devices 4 each have multiple terminal devices within their coverage range (the dotted lines in Figure 4 indicate the coverage range).
  • the number and/or type of terminal devices in different network device coverage ranges may be different.
  • Any network device can communicate with any terminal device within the coverage range.
  • the network device is equivalent to the central node in Figure 1, and can cooperate with the terminal devices within the coverage range of the network device for federated learning.
  • the obtained model gradient (or model) can be sent to one or some of the surrounding adjacent network devices.
  • the network device can also receive model gradients sent by one or some network devices.
  • Each network device can aggregate the model obtained by its own learning and the model from other network devices, and use the aggregated model as the latest model. Afterwards, each network device broadcasts the latest model to at least one terminal device within the coverage range, and again cooperates with the at least one terminal device for federated learning. And so on, until the network device decides to stop learning.
  • a network device cannot communicate with a network device that is far away, and needs to forward the model or model gradient through the network device located between the two. That is, during the entire federated learning process, there are many interactions between devices and a large amount of communication, which will increase the model convergence time and slow convergence in the federated learning process.
  • federated learning can be performed in collaboration with different network devices.
  • a central node can be introduced, which can be used to aggregate the models obtained after learning from multiple network devices.
  • the architecture in which a central node collaborates with multiple network devices for federated learning, and each network device collaborates with the terminal devices in its coverage range for federated learning is also called a hierarchical federated learning architecture.
  • Figure 4 is a schematic diagram of a hierarchical federated learning architecture.
  • Figure 4 takes a hierarchical federated learning architecture including a central node, four edge devices (i.e., network device 1-network device 4) and multiple terminal devices as an example.
  • Figure 4 can be considered to add a central node on the basis of Figure 3.
  • the difference from Figure 3 is that after each network device cooperates with the terminal devices within the coverage area to perform federated learning, the model gradient obtained from the learning is sent to the central node.
  • the central node aggregates the models sent to each network device according to the model gradients sent by each network device.
  • the central node broadcasts the aggregated model to each network device, and each network device performs federated learning again, and so on, until the central node determines to stop learning.
  • the central node is responsible for aggregating the models learned by multiple network devices. There are many network devices interacting with the central node, and the requirements for the computing resources and bandwidth resources of the central node are also high. When the computing resources or bandwidth resources of the central node are limited, it cannot support the model aggregation of a large number of network devices, that is, federated learning cannot be achieved.
  • the edge devices participating in the learning can be grouped.
  • the edge devices in any group complete the aggregation of the model obtained by the learning of multiple edge devices in the group, and send it to the central node by an edge device in the group. It can be seen that by grouping the edge devices participating in the learning, the edge devices interacting with the central node can be grouped. The number of edge devices is reduced, thereby reducing the burden on the central node. Moreover, in each round of learning, more edge devices can participate in learning, making the model converge faster, thereby reducing the number of learning times and improving learning efficiency.
  • the hierarchical federated learning architecture in the embodiment of the present application may include more edge devices or fewer edge devices, and each edge device may cover more or fewer terminal devices, as shown in Figure 5.
  • Figure 5 takes the example that the edge device can be a network device, and takes the example that includes 1 central node, 5 edge devices (edge device 1-edge device 5), and each edge device includes at least one terminal device within its coverage range.
  • the coverage range of edge device 1 includes terminal devices 1-terminal devices 3, and the coverage range of edge device 2 includes terminal devices 4-terminal devices 5.
  • Figure 6 is a schematic diagram of the process of federated learning provided in an embodiment of the present application.
  • the process shown in Figure 6 refers to the process of one round of model learning. It can be understood that any edge device can send its resource information to the central node, and the central node can receive resource information sent by multiple edge devices respectively, and determine the edge devices participating in this round of model learning based on the multiple resource information received.
  • the central node divides multiple edge devices participating in learning into at least one edge device group, and determines a central edge device of each edge device group.
  • an edge device group includes at least one edge device, and any two edge devices in the at least one edge device can communicate. That is, any edge device in an edge device group can communicate with other edge devices in the group without the participation of the central node.
  • the central node can also select an edge device from at least one edge device in the edge device group as a representative to communicate with the central node.
  • an edge device that communicates with the central node in an edge device group is referred to as a central edge device.
  • the central node determines multiple edge devices participating in the learning, groups the multiple edge devices, and determines the central edge device of each edge device group.
  • the multiple edge devices participating in the learning may be different in different rounds of learning, and accordingly, the number of edge device groups and the edge devices included in each edge device group may be different.
  • the central edge devices in each edge device group may also be different in different rounds of learning. The following will introduce how to determine the central edge device in each edge device group. First, it will introduce how to group multiple edge devices.
  • the embodiment of the present application does not limit the way in which the central node groups multiple edge devices.
  • multiple edge devices can be grouped using any one or more of the following grouping methods.
  • the central node can divide multiple edge devices into at least one edge device group according to the communication relationship between the multiple edge devices (that is, the first information in the embodiment of the present application).
  • the communication relationship may be used to indicate edge devices among multiple edge devices that can communicate with each edge device. That is, for any edge device, the communication relationship may indicate edge devices that can communicate with the edge device. Alternatively, the communication relationship may indicate whether any two edge devices among multiple edge devices can communicate directly (or point-to-point).
  • the edge devices participating in the learning include edge device 1-edge device 4, where edge device 1 and edge device 2 can communicate with each other, and any two edge devices among edge device 3, edge device 4 and edge device 5 can communicate with each other.
  • edge device 1 is far away from any edge device among edge device 3-edge device 5 and cannot communicate.
  • Edge device 2 is also unable to communicate with any edge device among edge device 3-edge device 5.
  • This communication relationship may indicate edge device 2 that can communicate with edge device 1, edge device 1 that can communicate with edge device 2, edge device 4 and edge device 5 that can communicate with edge device 3, and so on.
  • the central node may Determine the edge devices participating in the learning. The edge devices participating in different rounds of learning may be different.
  • the central node can determine the first information, that is, determine the communication relationship between the multiple edge devices participating in the learning.
  • the central node can determine the communication relationship between the multiple edge devices participating in the learning.
  • the central node can determine the communication relationship between the multiple edge devices participating in the learning, for example, including but not limited to the following two determination methods.
  • Determination method 1 For any edge device participating in the learning, the central node can instruct the edge device to report the information of the edge device that can communicate with the edge device, so that the central node can obtain the information of the edge device that can directly communicate among multiple edge devices, and further determine the communication relationship between the multiple edge devices participating in the learning.
  • the following takes the first edge device participating in the learning as an example to introduce how the central node determines the communication relationship between the multiple edge devices participating in the learning.
  • the central node may send an acquisition instruction to the first edge device, and the acquisition instruction is used to instruct the first edge device to report the information of the edge device that can communicate with the first edge device.
  • the first edge device receives the acquisition instruction and sends the information of the edge device that can directly communicate with the first edge device to the central node.
  • any edge device receives the acquisition instruction sent by the central node and sends the information of the edge device that can directly communicate with itself to the central node.
  • the information of the edge device that directly communicates with an edge device is called the fourth information. Then the central node can receive multiple fourth information, and thus the communication relationship between the multiple edge devices participating in the learning can be determined based on the multiple fourth information.
  • Determination method 2 For any edge device, the information of the edge device that can directly communicate with the edge device can be pre-configured. That is, each edge device stores the configuration information of the edge device that can directly communicate with itself. The central node obtains the configuration information of multiple edge devices participating in the learning, and the communication relationship between these multiple edges can be determined based on the multiple configuration information obtained.
  • At least one edge device among the multiple edge devices is divided into a group according to the communication relationship. As shown in FIG5 , edge devices 1 and 2 are a group, and edge devices 3, 4, and 5 are a group.
  • edge devices are naturally grouped.
  • Figure 7, is a schematic diagram of the natural grouping of multiple edge devices.
  • the edge device is a micro base station.
  • a macro base station can be connected to multiple micro base stations, so multiple micro base stations connected to the same macro base station are a group. That is, multiple edge devices are naturally grouped.
  • at least one group into which multiple edge devices are divided can be an antenna grouping of multiple edge devices.
  • the central node can group the multiple edge devices according to the communication relationship between the multiple edge devices.
  • Figure 8 is a schematic diagram of dividing multiple edge devices based on communication delay.
  • the cluster member nodes and cluster head nodes in Figure 8 are both edge devices, where the cluster head node is a central edge device.
  • multiple edge devices can be abstracted as multiple nodes, distributed in a graph (for example, called the first graph), and multiple edge devices are grouped, which is equivalent to performing image segmentation on the first graph to obtain multiple subgraphs.
  • the edge devices corresponding to the nodes in each subgraph belong to a group.
  • the specific image segmentation method used to group multiple edge devices is not limited in the embodiments of the present application.
  • the first graph is image segmented by a hierarchical segmentation algorithm (such as the metis algorithm).
  • the central node may divide the multiple edge devices into at least one edge device group according to the communication delay of the communication link of each edge device in the multiple edge devices (ie, the second information in this article).
  • the central node can divide multiple edge devices into a group according to the demand for lower communication cost, which can not only realize the grouping of multiple edge devices, but also ensure the lower communication cost as much as possible.
  • the communication delay between each pair of edge devices in each round can be counted, and the central node obtains the communication delay between each pair of edge devices among the multiple edge devices participating in the current round of learning, and then groups the multiple edge devices according to the obtained communication delay.
  • Figure 9 is a schematic diagram of the communication relationship of multiple edge devices.
  • Multiple edge devices can be abstracted as multiple nodes, distributed in a graph (for example, called the second graph), and the second graph is segmented.
  • the second graph is segmented by a hierarchical segmentation algorithm (such as the metis algorithm).
  • the thick lines in Figure 9 indicate the boundaries of the segmentation, and accordingly, Figure 9 takes the division into four groups as an example.
  • the central node may divide the multiple edge devices into at least one edge device group according to the model similarity between the multiple edge devices (ie, the third information in this article).
  • the model similarity between two edge devices refers to the similarity between the models obtained by the two edge devices after learning the same model using sample data.
  • the central node can divide multiple edge devices into at least one edge device group based on the model similarity between the multiple edge devices. Since multiple edge devices with similar models are grouped together, the difference between the models obtained by multiple edge devices in the group after learning is small, which helps to speed up the aggregation of multiple models in the group.
  • the central node can send the same model and the same sample data to the multiple edge devices.
  • Each of the multiple edge devices receives the model and sample data from the central node and learns the model through the sample data.
  • Each edge device sends the learned model to the central node, so that the central node receives the learned models sent by the multiple edge devices. Subsequently, the central node calculates the similarity between the models of the two edge devices and groups the multiple edge devices according to the obtained similarity.
  • the central node may group multiple edge devices using any one of the above-mentioned grouping methods 1 to 3, or may group multiple edge devices using multiple grouping methods from 1 to 3.
  • the central node groups multiple edge devices according to the first information to obtain M edge device groups, where M is an integer greater than or equal to 1; then, the central node groups each edge device group in the M edge device groups again according to the third information to obtain at least one edge device group. That is, the central node uses grouping method one and grouping method three to group multiple edge devices.
  • the central node divides the multiple edge device groups into M edge device groups according to the communication relationship between the multiple edge devices; and then further groups the M edge device groups according to the model similarity between the two edge devices.
  • edge devices 1-edge devices 5 are naturally divided into edge device group 1 and edge device group 2.
  • the central node can further divide edge device 3 and edge device 4 in edge device group 2 into one group, and divide edge device 5 into one group according to the model similarity between the two edge devices.
  • the central node groups multiple edge devices according to the second information to obtain M edge device groups, where M is an integer greater than or equal to 1; then, the central node groups each edge device group in the M edge device groups again according to the third information to obtain at least one edge device group. That is, the central node uses grouping method 2 and grouping method 3 to group multiple edge devices.
  • the central node divides multiple edge devices into M edge device groups according to the communication delay of the communication link between the two edge devices; and then further groups the M edge devices again according to the model similarity between the two edge devices.
  • the central node divides multiple edge devices into at least one edge device group, and it is also necessary to determine the central edge device in each edge device group.
  • the embodiment of the present application does not limit the specific method of determining the central edge device.
  • the central node can randomly select an edge device from the edge device group as the central edge device.
  • the central node can determine the central edge device according to the service burden of each edge device in the edge device group. For example, the central node selects the edge device with the lightest service burden in the edge device group as the central edge device; for another example, the central node selects any edge device with a service burden lower than a threshold from the edge device group as the central edge device.
  • the central node can determine the central edge device according to the communication delay between each edge device in the edge device group and the central node. For example, the central node selects the edge device with the shortest communication delay with the central node in the edge device group as the central edge device; for another example, the central node selects any edge device with a communication delay with the central node lower than a threshold from the edge device group as the central edge device.
  • the central node interacts with the central edge device group in each edge device.
  • an edge device in any group completes the aggregation of the model obtained by learning multiple edge devices in the group, and an edge device in the group sends it to the central node.
  • This can reduce the number of edge devices interacting with the central node, thereby reducing the burden on the central node.
  • more edge devices can participate in the learning, so that the model converges faster, thereby reducing the number of learning times and improving the efficiency of federated learning.
  • the central node sends a first model to at least one central edge device respectively.
  • the first model can be considered as a model that requires federated learning, also known as a global model.
  • the central node can send the first model to at least one central edge device respectively, and each central edge device broadcasts the first model within the group, so that each edge device in any edge device group can receive the first model. That is, the central node can send the first model to each edge device in at least one edge device group respectively through at least one central edge device.
  • any edge device in any edge device group After any edge device in any edge device group receives the first model, it can cooperate with at least one terminal device within the coverage range to perform federated learning on the first model based on local data until the model converges or reaches the number of training times, and obtains the learned model (for example, called the third model).
  • the learned model for example, called the third model.
  • the central node can aggregate the third models obtained by each edge device to complete a round of model learning.
  • multiple third models can be aggregated within a group, or multiple third models can be aggregated between groups.
  • aggregating multiple third models within a group means: for any edge device group, an edge device within the edge device group can aggregate the third model within the edge device group.
  • Aggregating multiple third models between groups means: the third model aggregated by an edge device within an edge device group includes the third model within the edge device group and the third model that does not belong to the edge device group.
  • the model obtained by aggregating multiple third models in the embodiment of the present application is referred to as the second model.
  • Aggregating multiple third models within a group can aggregate these multiple third models through any edge device within the group.
  • the following takes the first edge device group as an example to introduce how to aggregate multiple third models within the group.
  • the central edge device of the first edge device group is the first edge device.
  • the other edge devices in the first edge device group except the first edge device may send the obtained third model to the first edge device.
  • the first edge device receives multiple third models, aggregates the multiple third models, and obtains the second model.
  • the other edge devices may send the third model directly to the first edge device, or forward the third model to the first edge device through the edge devices in the first edge device group.
  • the second edge device in the first edge device group may send the third model directly to the first edge device, or forward the third model to the first edge device through the third edge device in the first edge device group.
  • the second edge device in the first edge device group may also The third model can be sent to the first edge device through the terminal device.
  • the terminal device can locally store information of multiple edge devices, and when the terminal device moves from the coverage area of the second edge device to the coverage area of the first edge device, the third model from the second edge device is sent to the first edge device.
  • other edge devices in the first edge device group except the first edge device may aggregate multiple third models in the first edge device group.
  • the second edge device in the first edge device group obtains the third model and may send the third model to the third edge device in the first edge device group.
  • other edge devices in the first edge device group except the third edge device also send the obtained third model to the third edge device.
  • the third edge device receives multiple third models, aggregates the multiple third models with the third model obtained by the third edge device, and obtains a second model. In this case, the third edge device sends the second model to the first edge device, so as to send the second model to the central node through the first edge device.
  • any edge device in the first edge device group can aggregate the received multiple third models, and the multiple third models can include the third model in the first edge device group and the third models in other edge device groups except the first edge device group.
  • any edge device in the first edge device group can aggregate at least one third model in the first edge device group and at least one third model in the second edge device group.
  • the third edge device in the second edge device group can send the obtained third model to any terminal device covered by the third edge device. Due to the mobility of the terminal device, when the terminal device moves to the coverage range of any edge device (such as the fourth edge device) in the first edge device group, the terminal device can send the third model from the third edge device to the fourth edge device.
  • the fourth edge device After receiving the third model from the third edge device, the fourth edge device forwards the third model to the first edge device, and the first edge device aggregates the third model and the other models received. Alternatively, after receiving the third model from the third edge device, the four edge devices aggregate the third model and the third model learned by the fourth edge device. After the fourth edge device aggregates the third model, it can send the aggregated third model to the first edge device, and the first edge device further aggregates the received third model and the third model obtained by learning the first model based on local data in cooperation with at least one terminal device within the coverage area of the first edge device.
  • At least one central edge device sends a second model to the central node respectively.
  • the central edge device in each edge device group After the central edge device in each edge device group obtains the second model, it sends the second model to the central node. That is, the central node can receive at least one second model through at least one edge device center. Alternatively, the central edge device in each edge device group can send the change in model parameters between the second model and the first model (i.e., model gradient) to the central node. It should be understood that one second model corresponds to one model gradient. The central node can receive at least one model gradient from at least one central edge device.
  • the central node aggregates at least one second model.
  • At least one central edge device sends a second model to the central node respectively, and the central node can receive at least one second model.
  • the central node aggregates at least one second model to obtain a fourth model.
  • the fourth model is the model obtained in this round of federated learning.
  • the central node sends the fourth model to the central edge devices of each edge device group.
  • the central edge device of any edge device group broadcasts the fourth model so that each edge device included in any edge device group receives the fourth model. Any edge device cooperates with the terminal devices within the coverage range to perform federated learning on the fourth model.
  • the process of the next round of learning is the same as that of this round of learning until the model converges or the preset number of learning rounds is reached.
  • the central node sends the first model to edge device 1 in edge device group 1, and sends the first model to edge device 3 in edge device group 2.
  • edge device 1 is Central edge device
  • edge device 3 is the central edge device of edge device group 2.
  • edge device 1 receives the first model, it broadcasts the first model in edge device group 1, so that edge device 2 and terminal devices 1-terminal devices 3 covered by edge device 1 can also obtain the first model.
  • edge device 3 receives the first model, it broadcasts the first model in edge device group 2, so that edge device 4 and edge device 5 and terminal devices 6-terminal devices 7 covered by edge device 3 can also obtain the first model.
  • edge device 2 After edge device 2, edge device 4 and edge device 5 obtain the first model, they continue to broadcast the first model, so that the terminal devices under the respective coverage of edge device 2, edge device 4 and edge device 5 can obtain the first model.
  • Each edge device cooperates with at least one terminal device covered by it to learn the first model based on local data respectively, and a third model can be obtained.
  • edge device 1 cooperates with terminal devices 1-terminal devices 3 to learn the first model based on local data, and a third model can be obtained.
  • each edge device After each edge device obtains the third model, it can aggregate multiple third models in the group to obtain the second model, and then the central edge device sends the second model to the central node.
  • edge device group 1 edge device 2 can send the obtained third model to edge device 1, and edge device 1 aggregates the third model and the third model obtained by training the terminal devices 1-terminal devices 3 under its coverage to obtain the second model.
  • Edge device 1, as the central edge node then sends the third model finally trained by this edge device group 1 to the central node.
  • edge device 3 can send the obtained third model to terminal device 6.
  • terminal device 6 When terminal device 6 moves into the coverage of edge device 1, terminal device 6 sends the third model trained in edge device group 2 to edge device 1.
  • Edge device 1 can aggregate the third model received from terminal device 6, the third model from edge device 2, and the third model obtained by training the terminal devices 1-terminal devices 3 in cooperation with edge device 1 to obtain the second model, and then send it to the central node.
  • edge device 3 as a central edge device obtaining the second model and reporting the second model to the central node is similar to the process of edge device 1 obtaining the second model and reporting the second model to the central node, and will not be repeated.
  • the central node can receive the second model from edge device 1 and the second model from edge device 3, and aggregate the second model from edge device 1 and the second model from edge device 3 to obtain the fourth model. If the fourth model does not converge, or the predetermined learning round is greater than N, then continue the N+1th round of learning similar to the Nth round of learning process.
  • the central node sends the model obtained after the Nth round of learning, that is, the fourth model, to edge device 1 and edge device 3 respectively.
  • Edge device 1 receives the fourth model and broadcasts the fourth model; similarly, edge device 3 receives the fourth model and broadcasts the fourth model.
  • the N+1th round of learning is similar to the process of the Nth round of learning, and will not be repeated here until the final trained model converges or reaches the preset number of learning rounds.
  • the embodiment of the present application groups the edge devices participating in the learning, so that the edge devices interacting with the central node are reduced, thereby reducing the burden on the central node. Moreover, in each round of learning, more edge devices can participate in the learning, so that the model converges faster and improves the efficiency of learning.
  • the method provided by the embodiment of the present application is introduced by the interaction between the central node, the first edge device and the terminal device.
  • the central node, the first edge device and the terminal device may respectively include a hardware structure and/or a software module, and the above functions are realized in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether a function of the above functions is executed in the form of a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.
  • FIG 10 is a schematic block diagram of a communication device 1000 provided in an embodiment of the present application.
  • the communication device 1000 may include a processing module 1010 and a transceiver module 1020.
  • a storage unit may be further included, which may be used to store instructions (codes or programs) and/or data.
  • the processing module 1010 and the transceiver module 1020 may be coupled to the storage unit.
  • the processing module 1010 may read the instructions (codes or programs) and/or data in the storage unit to implement the corresponding method.
  • the above-mentioned modules may be independently arranged or partially or fully integrated.
  • the communication device 1000 can correspond to the behaviors and functions of the central node in the above-mentioned method embodiments.
  • the communication device 1000 can be a central node, or a component (such as a chip or circuit) applied to the central node, or a chip or chipset in the central node or a part of the chip used to execute related method functions.
  • the transceiver module 1020 is used to send a first model to at least one central edge device and receive at least one second model, wherein the at least one central edge device corresponds one-to-one to at least one edge device group, an edge device group includes at least one edge device, a second model is obtained by aggregating third models obtained by each edge device in at least one edge device group, and a third model is a model obtained by learning the first model based on local data by at least one terminal device within the collaborative coverage of an edge device; the processing module 1010 is used to aggregate at least one second model to obtain a fourth model.
  • the processing module 1010 is also used to divide the multiple edge devices participating in the learning into at least one edge device group and determine the central edge device of each edge device group according to one or more of the following information: first information, second information or third information.
  • first information is used to indicate the communication relationship between the multiple edge devices
  • the communication relationship is used to indicate the edge devices among the multiple edge devices that can communicate with each edge device.
  • the second information is used to indicate the communication delay of the communication link of each edge device among the multiple edge devices.
  • the third information is used to indicate the model similarity between the multiple edge devices.
  • the processing module 1010 when the processing module 1010 divides the multiple edge devices participating in the learning into the at least one edge device group according to the one or more information, it is specifically used to: group the multiple edge devices according to the first information or the second information to obtain M edge device groups, where M is an integer greater than or equal to 1; and group each edge device group in the M edge device groups again according to the third information to obtain at least one edge device group.
  • the transceiver module 1020 is also used to: send an acquisition instruction to a first edge device among multiple edge devices, and receive fourth information from the first edge device; the acquisition instruction is used to instruct the first edge device to report information of edge devices that can communicate with the first edge device, and the fourth information includes information of edge devices that can communicate with the first edge device; the processing module 1010 is also used to determine the communication relationship between multiple edge devices based on the fourth information respectively from the multiple edge devices.
  • the processing module 1010 is also used to obtain configuration information from multiple edge devices respectively, and the configuration information of one edge device among the multiple edge devices includes information of other edge devices that can communicate with the one edge device; the communication relationship between the multiple edge devices is determined based on the obtained configuration information.
  • the communication device 1000 can correspond to the behavior and function of the first edge device in the above-mentioned method embodiment.
  • the communication device 1000 can be a base station, or a component (such as a chip or circuit) used in a base station, or a chip or chipset in the base station or a part of the chip used to execute related method functions.
  • the transceiver module 1020 is used to receive the first model from the central node, and send the first model to other edge devices in the first edge device group except the communication device, the communication device 1000 is the central edge device of the first edge device group, and the first edge device group includes at least one edge device.
  • the processing module 1010 is used to determine the second model, which The second model is obtained by aggregating the third models obtained by each edge device in at least one edge device group. At least one edge device group includes a first edge device group, and a third model is a model obtained by learning the first model based on local data by at least one terminal device within the coverage range of an edge device.
  • the transceiver module 1020 is also used to send the second model to the central node.
  • the transceiver module 1020 is also used to receive at least one third model, wherein one of the at least one third model comes from a second edge device or a first terminal device in the first edge device group.
  • the first terminal device is a terminal device that moves from the coverage of an edge device in the second edge device group to the coverage of the communication device 1000.
  • the processing module 1010 is also used to aggregate at least one third model and the communication device 1000 cooperates with at least one terminal device covered by the communication device 1000 to learn the first model based on local data to obtain a second model.
  • the transceiver module 1020 is also used to receive an acquisition instruction sent by the central node and send fourth information to the central node.
  • the acquisition instruction is used to instruct the communication device 1000 to report information of an edge device that can communicate with the communication device 1000.
  • the fourth information includes information of an edge device that can communicate with the communication device 1000.
  • the communication device 1000 can correspond to the behavior and function of the first terminal device in the above-mentioned method embodiment.
  • the communication device 1000 can be a terminal device, or a component (such as a chip or circuit) used in a terminal device, or a chip or chipset in the terminal device or a part of a chip used to execute related method functions.
  • the transceiver module 1020 is used to receive a third model from a second edge device in the first edge device group, where the third model is a model obtained by at least one terminal device within the collaborative coverage of the second edge device learning the first model from the central node based on local data, and the communication device 1000 belongs to the at least one terminal device.
  • the processing module 1010 is used to control the transceiver module 1020 to send the third model to the third edge device when determining that the communication device moves from the coverage of the second edge device to the coverage of the third edge device in the second edge device group.
  • processing module 1010 in the embodiment of the present application can be implemented by a processor or a processor-related circuit component
  • transceiver module 1020 can be implemented by a transceiver or a transceiver-related circuit component or a communication interface.
  • FIG11 is a schematic block diagram of a communication device 1100 provided in an embodiment of the present application.
  • the communication device 1100 may be a central node, which can implement the function of the central node in the method provided in the embodiment of the present application.
  • the communication device 1100 may also be a device that can support the central node to implement the corresponding function in the method provided in the embodiment of the present application, wherein the communication device 1100 may be a chip system.
  • the chip system may be composed of a chip, or may include a chip and other discrete devices. For specific functions, please refer to the description in the above method embodiment.
  • the communication device 1100 may also be an edge device, such as a base station, which can implement the function of the first edge device in the method provided in the embodiment of the present application.
  • the communication device 1100 may also be a device that can support the first edge device to implement the corresponding function in the method provided in the embodiment of the present application, wherein the communication device 1100 may be a chip system.
  • the chip system may be composed of a chip, or may include a chip and other discrete devices.
  • the communication device 1100 may be a terminal device, which can implement the function of the terminal device in the method provided in the embodiment of the present application.
  • the communication device 1100 may also be a device that can support the terminal device to implement the corresponding functions in the method provided in the embodiment of the present application, wherein the communication device 1100 may be a chip system.
  • the chip system may be composed of a chip, or may include a chip and other discrete devices. For specific functions, please refer to the description in the above method embodiment.
  • the communication device 1100 includes one or more processors 1120, which can be used to implement or support the communication device 1100 to implement the function of the central node in the method provided in the embodiment of the present application.
  • the one or more processors 1120 can also be used to implement or support the communication device 1100 to implement the present application. Please provide the function of the first edge device in the method provided in the embodiment.
  • One or more processors 1120 may also be used to implement or support the communication device 1100 to implement the function of the terminal device in the method provided in the embodiment of the present application. Please refer to the detailed description in the method example for details, which will not be repeated here.
  • the processor 1120 may also be referred to as a processing unit or a processing module, which may implement certain control functions.
  • the processor 1120 may be a general-purpose processor or a dedicated processor, etc. For example, it includes: a central processing unit, an application processor, a modem processor, a graphics processor, an image signal processor, a digital signal processor, a video codec processor, a controller, a memory, and/or a neural network processor, etc.
  • the central processing unit may be used to control the communication device 1100, execute software programs and/or process data. Different processors may be independent devices, or they may be integrated in one or more processors, for example, integrated in one or more application-specific integrated circuits.
  • the communication device 1100 includes one or more memories 1130 for storing program instructions and/or data.
  • the memory 1130 is coupled to the processor 1120.
  • the coupling in the embodiment of the present application is an indirect coupling or communication connection between devices, units or modules, which can be electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processor 1120 may operate in conjunction with the memory 1130.
  • the processor 1120 may execute program instructions and/or data stored in the memory 1130 so that the communication device 1100 implements the corresponding method. At least one of the at least one memory may be included in the processor 1120.
  • the communication device 1100 may also include a communication interface 1110, using any transceiver-like device for communicating with other devices or communication networks, such as a radio access network (RAN), a wireless local area network (WLAN), a wired access network, etc.
  • the communication interface 1110 is used to communicate with other devices through a transmission medium, so that the device used in the communication device 1100 can communicate with other devices.
  • the communication device 1100 is a central node
  • the other device is an edge device; or, when the communication device is an edge device, the other device is a central node or a terminal device.
  • the processor 1120 can use the communication interface 1110 to send and receive data.
  • the communication interface 1110 can specifically be a transceiver.
  • connection medium between the communication interface 1110, the processor 1120 and the memory 1130 is not limited in the embodiment of the present application.
  • the memory 1130, the processor 1120 and the communication interface 1110 are connected through a bus 1140.
  • the bus is represented by a bold line in FIG. 11.
  • the connection mode between other components is only for schematic illustration and is not limited thereto.
  • the bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one bold line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
  • the processor 1120 may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic block diagrams disclosed in the embodiment of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor, etc.
  • the steps of the method disclosed in the embodiment of the present application may be directly embodied as being executed by a hardware processor, or may be executed by a combination of hardware and software modules in the processor.
  • the memory 1130 may be a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory may be independent and connected to the processor via the communication bus 1140.
  • the memory may also be integrated with the processor.
  • the memory 1130 is used to store computer-executable instructions for executing the solution of the present application, and the processor 1120 executes the instructions. Control execution.
  • the processor 1120 is used to execute the computer-executable instructions stored in the memory 1130, thereby implementing the federated learning method provided in the above embodiment of the present application.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application code, which is not specifically limited in the embodiments of the present application.
  • the processing module in the above embodiment can be a processor, for example: a central processing unit (CPU).
  • the processing module can be a processor of a chip system.
  • the transceiver module or the communication interface can be an input/output interface or an interface circuit of the chip system.
  • the interface circuit can be a code/data read/write interface circuit.
  • the interface circuit can be used to receive code instructions (code instructions are stored in a memory, can be read directly from the memory, or can be read from the memory through other devices) and transmit them to the processor; the processor can be used to run the code instructions to execute the method in the above method embodiment.
  • the interface circuit can also be a signal transmission interface circuit between a communication processor and a transceiver.
  • the device may include a transceiver unit and a processing unit, wherein the transceiver unit may be an input/output circuit and/or a communication interface; and the processing unit may be an integrated processor or microprocessor or integrated circuit.
  • the embodiment of the present application also provides a communication system, specifically, the communication system includes at least one central node, at least one edge device and at least one terminal device.
  • the communication system includes a central node, a first edge device and at least one terminal device for the relevant functions in the above embodiment. Please refer to the relevant description in the above method embodiment for details, which will not be repeated here.
  • a computer-readable storage medium including instructions, which, when executed on a computer, causes the computer to execute the method executed by the central node in the embodiment of the present application.
  • instructions when executed on a computer, causes the computer to execute the method executed by the first edge device in the embodiment of the present application.
  • when executed on a computer causes the computer to execute the method executed by the terminal device in the embodiment of the present application.
  • a computer program product including instructions, which, when executed on a computer, causes the computer to execute the method executed by the central node in the embodiment of the present application.
  • a computer when executed on a computer, causes the computer to execute the method executed by the first edge device in the embodiment of the present application.
  • a computer when executed on a computer, causes the computer to execute the method executed by the terminal device in the embodiment of the present application.
  • the embodiment of the present application provides a chip system, which includes a processor and may also include a memory, for implementing the function of the central node in the aforementioned method; or for implementing the function of the first edge device in the aforementioned method; or for implementing the function of the terminal device in the aforementioned method.
  • the chip system may be composed of a chip, or may include a chip and other discrete devices.
  • the size of the serial numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the part of the technical solution of the present application that contributes essentially or the part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), RAM, disk or CD-ROM and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Computer And Data Communications (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

一种联邦学习方法及装置,该联邦学习方法包括中心节点向至少一个中心边缘设备分别发送第一模型,并接收至少一个第二模型,对至少一个第二模型进行聚合,获得第四模型。至少一个中心边缘设备与至少一个边缘设备组一一对应。一个第二模型是聚合至少一个边缘设备组中的各个边缘设备分别获得的第三模型得到的。一个第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对第一模型学习获得的模型。多个边缘设备划分为多个边缘设备组,通过一个边缘设备组中的中心边缘设备向该边缘设备组中的各个边缘设备发送第一模型,使得与中心节点交互的边缘设备减少。这样可通过增加参与学习的边缘设备,加快模型的学习,提高联邦学习的效率。

Description

一种联邦学习方法及装置
相关申请的交叉引用
本申请要求在2022年10月29日提交中国国家知识产权局、申请号为202211340211.0、申请名称为“一种联邦学习方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能(artificial intelligence,AI)技术领域,尤其涉及一种联邦学习方法及装置。
背景技术
联邦学习(federated learning,FL)是一种分布式机器学习(machine learning,ML),能够在不同终端设备之间不共享数据的前提下,利用各个终端设备上的业务数据共同训练模型。
一种典型的联邦学习,在任一轮次的学习过程中,由各个终端设备基于本地的业务数据对来自于中心节点的模型进行学习,获得本地学习结果。之后由中心节点获取多个终端设备的本地学习结果,对这多个终端设备学习所得到的模型进行聚合,并将聚合后的模型发送给各个终端设备。后续由终端设备根据本地业务数据对聚合后的模型再次学习,以此类推,直到学习结束。
上述联邦学习依赖中心节点,如果中心节点的计算资源或者带宽资源有限,无法支持较多模型的聚合,也就无法实现联邦学习。
发明内容
本申请实施例提供一种联邦学习方法及装置,用于减轻中心节点执行联邦学习的负担,提高联邦学习的效率。
第一方面,本申请实施例提供的一种联邦学习方法,该方法可由第一通信装置执行,第一通信装置可以是通信设备或能够支持通信设备实现该方法所需的功能的通信装置,例如芯片系统。示例性地,所述第一通信装置为中心节点,或者为设置在中心节点中的芯片,或者为用于实现中心节点的功能的其他部件。下面以所述第一通信装置是中心节点为例进行描述第一方面提供的联邦学习方法。
该联邦学习方法包括:中心节点向至少一个中心边缘设备分别发送第一模型,并接收至少一个第二模型。其中,至少一个中心边缘设备与至少一个边缘设备组一一对应,一个边缘设备组包括至少一个边缘设备。一个第二模型是聚合至少一个边缘设备组中的各个边缘设备分别获得的第三模型得到的。一个第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对第一模型学习获得的模型。之后,中心节点对所述至少一个第二模型进行聚合,获得第四模型。
本申请实施例中,将多个边缘设备划分为多个边缘设备组。任意一个边缘设备组通过 其中的一个边缘设备(即本文中的中心边缘设备)与中心节点进行交互。例如,中心节点通过一个边缘设备组中的中心边缘设备向该边缘设备组中的各个边缘设备发送第一模型。即中心节点无需与多个边缘设备分别交互,这样与中心节点交互的边缘设备减少,减轻了中心节点的负担。因此,可以增加边缘设备组内的边缘设备,使得参与学习的边缘设备更多,从而可加快模型的学习,提高联邦学习的效率。
在可能的实现方式中,所述方法还包括:中心节点根据如下的一种或多种信息将参与学习的多个边缘设备划分为至少一个边缘设备组,并确定各个边缘设备组各自的中心边缘设备:第一信息、第二信息或第三信息。其中,第一信息用于指示多个边缘设备间的通信关系,所述通信关系用于指示多个边缘设备中能够与各个边缘设备通信的边缘设备。第二信息用于指示多个边缘设备中各个边缘设备的通信链路的通信时延。第三信息用于指示多个边缘设备之间的模型相似性。
本申请实施例提供了将多个边缘设备划分为至少一个边缘设备组的多种划分方式,具体使用何种划分方式本申请实施例不作限制。本申请实施例可根据实际模型学习的实际需求将多个边缘设备划分为至少一个边缘设备组。例如,根据多个边缘设备间的通信关系将多个边缘设备划分为至少一个边缘设备组较为简单。又例如,可将通信时延较短的多个边缘设备划分为一组,从而缩短组内各个边缘设备的通信时延,有助于缩短模型学习的时间。又例如,可将模型较为相似的多个边缘设备划分为一组,有助于加快组内模型的收敛时间,提高学习的效率。
在可能的实现方式中,中心节点根据如下的一种或多种信息将参与学习的多个边缘设备划分为至少一个边缘设备组,包括:中心节点根据第一信息或第二信息对多个边缘设备进行分组,获得M个边缘设备组,M为大于或等于1的整数;之后,中心节点再根据第三信息对所述M个边缘设备组中的各个边缘设备组进行再次分组,获得至少一个边缘设备组。通过多种划分方式将多个边缘设备划分为至少一个边缘设备组,以尽量缩短模型学习的时间。
在可能的实现方式中,所述方法还包括:针对多个边缘设备中的第一边缘设备,中心节点向第一边缘设备发送获取指令,接收来自第一边缘设备的第四信息。其中,该获取指令用于指示第一边缘设备上报与第一边缘设备能够通信的边缘设备的信息。第四信息包括与第一边缘设备能够通信的边缘设备的信息。中心节点根据分别来自多个边缘设备的第四信息确定多个边缘设备间的通信关系。该方案中,中心节点可通过指示各个边缘设备上报能够直接通信的边缘设备,确定多个边缘设备间的通信关系,不受多个边缘设备的布局的影响,更为灵活。
在可能的实现方式中,所述方法还包括:中心节点从多个边缘设备分别读取配置信息,根据所读取的配置信息确定多个边缘设备间的通信关系。这多个边缘设备中的一个边缘设备的配置信息包括与所述一个边缘设备能够通信的其他边缘设备的信息。该方案中,针对任意一个边缘设备,可将能够与该边缘设备直接通信的其他边缘设备的信息(预)配置在该边缘设备,从而中心节点通过读取各个边缘设备的配置信息,即可确定多个边缘设备的通信关系。
第二方面,本申请实施例提供的一种联邦学习方法,该方法可由第二通信装置执行,第二通信装置可以是通信设备或能够支持通信设备实现该方法所需的功能的通信装置,例如芯片系统。示例性地,所述第二通信装置为边缘设备,例如基站,或者为设置在边缘设 备中的芯片,或者为用于实现边缘设备的功能的其他部件。下面以所述第一通信装置是第一边缘设备为例进行描述第二方面提供的联邦学习方法。
所述联邦学习方法包括:第一边缘设备接收来自中心节点的第一模型,将第一模型发送给第一边缘设备组中除第一边缘设备之外的其他边缘设备,并向中心节点发送第二模型。其中,第一边缘设备为第一边缘设备组的中心边缘设备,第一边缘设备组包括至少一个边缘设备。第二模型是聚合至少一个边缘设备组中的各个边缘设备分别获得的第三模型得到的。至少一个边缘设备组包括第一边缘设备组。一个第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对第一模型学习获得的模型。
本申请实施例中,一个边缘设备组内的任意边缘设备组可协同覆盖范围内的至少一个终端设备基于本地数据对中心节点下发的第一模型进行学习,获得第三模型。可以聚合一个或多个边缘设备组内的多个第三模型,通过一个中心边缘设备上报给中心节点,使得与中心节点交互的边缘设备减少,减轻了中心节点的负担。因此,可以增加边缘设备组内的边缘设备,使得参与学习的边缘设备更多,从而可加快模型的学习,提高联邦学习的效率。
在可能的实现方式中,所述方法还包括:第一边缘设备接收至少一个第三模型,对所述至少一个第三模型以及第一边缘设备协同第一边缘设备覆盖的至少一个终端设备基于本地数据对第一模型进行学习获得的第三模型进行聚合,获得所述第二模型。其中,至少一个第三模型中的一个第三模型来自第一边缘设备组内的第二边缘设备。或者,至少一个第三模型中的一个第三模型来自第一终端设备。其中,第一终端设备为从第二边缘设备组中的一个边缘设备的覆盖范围移动到第一边缘设备覆盖范围内的终端设备。
在可能的实现方式中,所述方法还包括:第一边缘设备接收中心节点发送的获取指令,向中心节点发送第四信息。其中,获取指令用于指示第一边缘设备上报与第一边缘设备能够通信的边缘设备的信息。第四信息包括与第一边缘设备能够通信的边缘设备的信息。
关于第二方面以及第二方面的各个可能的实施方式所带来的技术效果,可以参考对第一方面以及第一方面的各个可能的实施方式的技术效果的介绍。
第三方面,本申请实施例提供的一种联邦学习方法,该方法可由第三通信装置执行,第二通信装置可以是通信设备或能够支持通信设备实现该方法所需的功能的通信装置,例如芯片系统。示例性地,所述第三通信装置为终端设备,或者为设置在终端设备中的芯片,或者为用于实现终端设备的功能的其他部件。下面以所述第一通信装置是终端设备为例进行描述第三方面提供的联邦学习方法。
所述联邦学习方法包括:第一终端设备接收来自第一边缘设备组中的第二边缘设备的第三模型,第一终端设备位于第二边缘设备的覆盖范围;从第二边缘设备的覆盖范围移动到第二边缘设备组中的第三边缘设备的覆盖范围,向第三边缘设备发送该第三模型。其中,第三模型是第二边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对来自中心节点的第一模型学习获得的模型。
在本申请实施例中,终端设备可作为两个边缘设备组内的传输媒介,例如,终端设备在第一边缘设备组的中的第二边缘设备的覆盖范围之内,可获取第二边缘设备所得到的第三模型;当终端设备移动到第二边缘设备组中的第三边缘设备的覆盖范围之内,可将所获得的第三模型发送给第三边缘设备,从而实现模型组间的传递。
第四方面,本申请实施例提供了一种通信装置,所述通信装置具有实现上述第一方面的方法实施例中行为的功能,具体可以参见第一方面的描述,此处不再赘述。该通信装置 可以是第一方面中的中心节点,或者该通信装置可以是能够实现第一方面提供的方法的装置,例如芯片或芯片系统。或者,所述通信装置具有实现上述第二方面的方法实施例中行为的功能,具体可以参见第二方面的描述,此处不再赘述。该通信装置可以是第二方面中的第一边缘设备,或者该通信装置可以是能够实现第二方面提供的方法的装置,例如芯片或芯片系统。或者,所述通信装置具有实现上述第三方面的方法实施例中行为的功能,具体可以参见第三方面的描述,此处不再赘述。该通信装置可以是第三方面中的终端设备,或者该通信装置可以是能够实现第三方面提供的方法的装置,例如芯片或芯片系统。
在一个可能的设计中,该通信装置包括用于执行第一方面或第二方面或第三方面的方法的相应手段(means)或模块。例如,所述通信装置:包括处理单元(有时也称为处理模块或处理器)和/或收发单元(有时也称为收发模块或收发器)。这些单元(模块)可以执行上述第一方面或第二方面或第三方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第五方面,本申请实施例提供一种通信装置,该通信装置可以为上述实施例中第一方面中的通信装置,或者为设置在第一方面中的通信装置中的芯片或芯片系统。或者,该通信装置可以为上述实施例中第二方面中的通信装置,或者为设置在第二方面中的通信装置中的芯片或芯片系统。或者,该通信装置可以为上述实施例中第三方面中的通信装置,或者为设置在第三方面中的通信装置中的芯片或芯片系统。该通信装置包括通信接口以及处理器,可选的,还包括存储器。其中,该存储器用于存储计算机程序,处理器与存储器、通信接口耦合,当处理器读取所述计算机程序或指令时,使通信装置执行上述方法实施例中由中心节点、第一边缘设备或终端设备所执行的方法。
第六方面,本申请实施例提供了一种通信装置,该通信装置包括输入输出接口和逻辑电路。输入输出接口用于输入和/或输出信息。逻辑电路用于执行第一方面或第二方面或第三方面所述的方法。
第七方面,本申请实施例提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器和/或通信接口,用于实现第一方面或第二方面或第三方面中所述的方法。在一种可能的实现方式中,所述芯片系统还包括存储器,用于保存计算机程序。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
第八方面,本申请实施例提供了一种通信系统,所述通信系统包括中心节点、至少一个边缘设备和至少一个终端设备,其中,中心节点用于执行上述第一方面中由中心节点所执行的方法,任意一个边缘设备用于执行上述第二方面中由第一边缘设备所执行的方法,任意一个终端设备用于执行上述第三方面中由终端设备所执行的方法。
第九方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,当该计算机程序被运行时,实现上述第一方面或第二方面或第三方面中的方法。
第十方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被运行时,使得上述第一方面或第二方面或第三方面中的方法被执行。
上述第四方面至第十方面及其实现方式的有益效果可以参考对第一方面至第三方面及其实现方式的有益效果的描述。
附图说明
图1为本申请实施例提供的联邦学习的一种架构示意图;
图2为图1所示架构下的联邦学习的流程示意图;
图3为本申请实施例提供的联邦学习的另一种架构示意图;
图4为本申请实施例提供的层次化联邦学习的架构示意图;
图5为本申请实施例适用的联邦学习的架构示意图;
图6为本申请实施例提供的联邦学习的流程示意图;
图7为本申请实施例提供的多个边缘设备天然分组的一种示意图;
图8为本申请实施例提供的基于通信时延划分多个边缘设备的示意图;
图9为本申请实施例提供的多个边缘设备的通信关系的示意图;
图10为本申请实施例提供的通信装置的一种示意性框图;
图11为本申请实施例提供的通信装置的另一种示意性框图。
具体实施方式
本申请实施例提供的联邦学习方法可应用于联邦学习场景中的机器学习模型的训练。机器学习模型,例如包括神经网络或其他类型的机器学习模块等。为方便理解本申请实施例提供的联邦学习方法,首先对本申请实施例所涉及的一些概念或术语进行介绍。
1)终端设备,也称为终端或者终端装置,是一种具有无线收发功能的设备,可以向网络设备发送信号,或接收来自网络设备的信号。终端设备可包括用户设备(user equipment,UE),有时也称为终端、接入站、UE站、远方站、无线通信设备、或用户装置等等。所述终端设备用于连接人,物,机器等,可广泛用于各种场景,例如包括但不限于以下场景:蜂窝通信、设备到设备(device to device,D2D)、车到万物(vehicle to everything,V2X)、机器到机器/机器类通信(machine-to-machine/machine-type communications,M2M/MTC)、物联网(internet to things,IoT)、虚拟现实(virtual reality,VR)、增强现实(augmented reality,AR)、工业控制(industrial control)、无人驾驶(self driving)、远程医疗(remote medical)、智能电网(smart grid)、智能家具、智能办公、智能穿戴、智能交通、智慧城市(smart city)、无人机、机器人等场景中的终端设备。
作为示例而非限定,在本申请实施例中,该终端设备还可以是可穿戴设备。可穿戴设备也可以称为穿戴式智能设备或智能穿戴式设备等,是应用穿戴式技术对日常穿戴进行智能化设计、开发出可以穿戴的设备的总称,如眼镜、手套、手表、服饰及鞋等。终端设备还可以包括中继(relay),例如,终端设备可以是客户终端设备(customer premise equipment,CPE),CPE可接收来自网络设备的信号,并将该信号转发给其他终端设备。或者理解为,能够与基站进行数据通信的都可以看作终端设备。如上介绍的各种终端设备,如果位于车辆上(例如放置在车辆内或安装在车辆内),都可以认为是车载终端设备,车载终端设备例如也称为车载单元(on-board unit,OBU)。
终端设备还可以是作为一个或多个部件或者单元而内置于车辆的车载模块、车载模组、车载部件、车载芯片或者车载单元,车辆通过内置的所述车载模块、车载模组、车载部件、车载芯片或者车载单元可以实施本申请实施例的方法。
在本申请实施例中,终端设备可以是指用于实现终端设备的功能的装置,也可以是能 够支持终端设备实现该功能的装置,例如芯片系统,该装置可以被安装在终端设备中。例如终端设备也可以是车辆探测器。本申请实施例中,芯片系统可以由芯片构成,也可以包括芯片和其他分立器件。本申请实施例提供的技术方案中,以用于实现终端设备的功能的装置是终端设备为例进行描述。
2)网络设备,网络设备,是终端设备通过无线方式接入到移动通信系统中的接入设备,例如包括接入网(access network,AN)设备,例如基站。网络设备也可以是指在空口与终端设备通信的设备。网络设备可以包括长期演进(long term evolution,LTE)系统或高级长期演进(long term evolution-advanced,LTE-A)中的演进型基站(evolved Node B)(也简称为eNB或e-NodeB);网络设备也可以包括第五代(5th generation,5G)系统中的下一代节点B(next generation node B,gNB);或者,网络设备也可以包括无线保真(wireless-fidelity,Wi-Fi)系统中的接入节点等;或者网络设备可以为中继站、车载设备以及未来演进的公共陆地移动网络(Public Land Mobile Network,PLMN)设备、D2D网络中的设备、M2M网络中的设备、IoT网络中的设备或者PLMN网络中的网络设备等。本申请的实施例对网络设备所采用的具体技术和具体设备形态不做限定。
另外,本申请实施例中的基站可以包括集中式单元(centralized unit,CU)和分布式单元(distributed unit,DU),多个DU可以由一个CU集中控制。CU和DU可以根据其具备的无线网络的协议层功能进行划分,例如分组数据汇聚协议(packet data convergence protocol,PDCP)层及以上协议层的功能设置在CU,PDCP以下的协议层,例如无线链路控制(radio link control,RLC)层和介质访问控制(medium access control,MAC)层等的功能设置在DU。需要说明的是,这种协议层的划分仅仅是一种举例,还可以在其它协议层划分。射频装置可以拉远,不放在DU中,也可以集成在DU中,或者部分拉远部分集成在DU中,本申请实施例不作任何限制。另外,在一些实施例中,还可以将CU的控制面(control plan,CP)和用户面(user plan,UP)分离,分成不同实体来实现,分别为控制面CU实体(CU-CP实体)和用户面CU实体(CU-UP实体)。在该网络架构中,CU产生的信令可以通过DU发送给终端设备,或者UE产生的信令可以通过DU发送给CU。DU可以不对该信令进行解析而直接通过协议层封装而透传给UE或CU。
本申请实施例中,用于实现网络设备的功能的装置可以是网络设备,也可以是能够支持网络设备实现该功能的装置,例如芯片系统,该装置可以被安装在网络设备中。在本申请实施例提供的技术方案中,以用于实现网络设备的功能的装置是网络设备为例进行描述。
3)中心节点,也称为中心服务器、中心侧服务器,或中心端服务器,指具有计算能力的一类型设备,例如可以是个人计算机、云端服务器等。例如,中心节点可以是网络管理平台。
4)边缘设备(edge device),也称为边缘侧设备或边缘端设备,指的是向企业或服务提供商核心网络提供入口点的设备,也可以提供到运营商和服务提供商网络的连通性。例如,边缘设备可为路由器、路由交换机、集成接入设备((integrated access device,IAD)、多路复用器,以及各种城域网(metropolitan area network,MAN)和广域网(wide area network,WAN)接入设备。
5)联邦学习,又称为联邦机器学习、联合学习、联盟学习等,是一种分布式机器学习,能够在不同终端设备之间不共享数据的前提下,利用各个终端设备上的业务数据共同训练神经模型。
请参见图1,为本申请实施例提供的一种联邦学习架构图。图1以包括一个中心节点和多个终端设备为例,且对这多个终端设备的具体类型不作限制。例如,这多个终端设备可以是手机、计算机、车载设备中的一种或多种。联邦学习过程主要包括本地学习(训练)模型过程和中心节点聚合模型过程。本地训练模型过程指的是终端设备基于本地的业务数据对来自于中心节点的全局模型进行学习,并将本地学习过程中产生的模型参数的变化量(也称为模型梯度)上传给中心节点。中心节点聚合模型过程指的是中心节点根据来自全部或部分终端设备的模型梯度对全局模型进行聚合(也称为汇聚)。反复迭代本地学习(训练)模型过程和中心节点聚合模型过程,直到模型收敛或者达到预先设定的学习次数,结束学习。
请参见图2,示出了基于图1所示的架构的联邦学习过程。图2以多个终端设备包括五个终端设备(即图2中的终端设备1-终端设备5)为例。在初始阶段,各个终端设备可向中心节点上报自身的资源信息。中心节点可以汇总多个终端设备的资源信息筛选参与学习的终端设备。其中,参与学习的终端设备可以是这多个终端设备中的一个或多个终端设备。如果在模型学习过程中,终端设备的资源有更新,那么该终端设备将更新的资源上报给中心节点,以使得中心节点筛选更为合适地的终端设备参与学习。具体的,中心节点可以根据多个终端设备上报的资源信息形成学习资源列表,即该学习资源列表包括多个终端设备的资源信息。需要学习模型时,中心节点根据学习资源列表确定参与学习的终端设备。如果后续有终端设备上报更新后的资源,中心节点更新学习资源列表,也可以认为,中心节点维护学习资源列表。
在任一轮次的学习之前,中心节点确定参与学习的终端设备,并向所确定的终端设备广播全局模型以及模型权重。参与学习的任意一个终端设备将所接收的模型权重作为本次学习的初始权重,基于本地的业务数据对来自于中心节点的全局模型进行学习,获得本地学习结果,例如,学习过程中产生的模型参数的变化量(也称为模型梯度)。参与学习的各个终端设备将本次学习获得的模型梯度发送给中心节点。中心节点根据多个终端设备的模型梯度对全局模型进行聚合,完成一轮的学习。中心节点根据多个终端设备的模型梯度对全局模型进行聚合,也可以理解为,中心节点整合(更新)模型权重。之后,开始下一轮的学习过程,即中心节点将聚合的全局模型发送给各个终端设备,再由各个终端设备分别对聚合后的全局模型进行学习,并将学习获得的模型梯度发送给中心节点,由中心节点根据所接收的模型梯度对聚合后的全局模型再次进行聚合。以此类推,直到模型收敛或者达到预先设定的最大学习次数,停止学习。其中,不同轮次的学习中,参与学习的终端设备可以不同。例如,图2中第N轮训练中,参与学习的终端设备可以是终端设备1、终端设备3和终端设备5;第N+1轮训练中,参与学习的终端设备为终端设备1、终端设备2和终端设备4。由于终端设备可以基于本地的业务数据对所接收的模型进行学习后,再由中心节点聚合多个终端设备学习获得的模型,即无需由中心节点收集多个终端设备的业务数据学习模型,因此,各个终端设备之间的数据不共享,可提高数据的安全性。
前述图1所示的架构中,如果终端设备距离中心节点较远,即终端设备与中心节点之间的通信链路较长,那么联邦学习的整个过程耗时也较长。为此,提出通过边缘设备先对多个终端设备学习的模型进行聚合,再由中心节点聚合至少一个边缘设备所获得的模型。边缘设备指的是能够在中心节点边缘提供近似中心节点的能力的设备,例如,边缘设备能够聚合来自终端设备学习所得的模型。边缘设备可以是位于中心节点和终端设备之间的通 信链路上的设备。由于边缘设备与终端设备之间的传输链路相较于中心节点与终端设备之间的传输链路较短,所以可以降低联邦学习的整个过程耗时。
例如,可以将网络设备(例如基站)作为边缘设备,通过网络设备协同其覆盖范围内的终端设备执行联邦学习。利用网络设备作为边缘设备的联邦学习架构也称为基于边缘的联邦学习(Edge-based FL)。在下文的介绍中,以边缘设备是网络设备为例。
请参见图3,为基于边缘的联邦学习架构的一种示意图。图3以基于边缘的联邦学习架构包括四个边缘设备(即网络设备1-网络设备4)以及多个终端设备为例。其中,网络设备1-网络设备4两两之间可以进行通信。网络设备1-网络设备4各自覆盖范围内有多个终端设备(图4中虚线示意覆盖范围)。不同的网络设备覆盖范围的终端设备的数量和/或类型可以不同。任意网络设备可以与覆盖范围内的任意终端设备进行通信。针对任意一个网络设备,该网络设备相当于图1中的中心节点,可以协同该网络设备覆盖范围内的终端设备进行联邦学习。该网络设备完成一轮学习之后,可将所获得的模型梯度(或者模型)发送给周围相邻的某个或某些网络设备。当然,该网络设备也可以接收某个或某些网络设备发送的模型梯度。各个网络设备可聚合自己学习得到的模型和来自其他网络设备的模型,并将聚合后的模型作为最新的模型。之后,各个网络设备将最新的模型广播给所覆盖范围内的至少一个终端设备,再次协同这至少一个终端设备进行联邦学习。以此类推,直到网络设备确定停止学习。
图3所示的架构中,一个网络设备无法与距离较远的网络设备进行通信,需要通过位于二者之间的网络设备转发模型或模型梯度。即在整个联邦学习过程中,设备间交互次数较多,通信量较大,这会导致联邦学习过程中模型收敛时间增长,收敛缓慢。
另外,考虑到一个网络设备的覆盖范围有限,即一个网络设备覆盖的终端设备数量有限,那么通过该网络设备协同其覆盖范围的终端设备进行联邦学习所得到的模型的性能有限。因此,可以协同不同网络设备进行联邦学习。这种场景中,可以引入一个中心节点,该中心节点可以用来聚合多个网络设备学习后所得的模型。通过中心节点协同多个网络设备进行联邦学习,以及每个网络设备协同所覆盖范围的终端设备进行联邦学习的架构也称为层次化联邦学习架构。
请参见图4,为层次化联邦学习架构的一种示意图。图4以层次化联邦学习架构包括一个中心节点、四个边缘设备(即网络设备1-网络设备4)以及多个终端设备为例。图4可以认为在图3的基础上增加了一个中心节点。与图3的不同在于,各个网络设备协同所覆盖范围内的终端设备进行联邦学习后,将学习所得到的模型梯度发送给中心节点。之后,中心节点根据各个网络设备发送的模型梯度对发送给各个网络设备的模型进行聚合。后续,中心节点将聚合后的模型广播给各个网络设备,各个网络设备分别再次进行联邦学习,以此类推,直到中心节点确定停止学习。
图4所示的架构,一旦中心节点发生故障,那么无法实现联邦学习。且中心节点负责聚合多个网络设备学习所得的模型,与中心节点交互的网络设备较多,对中心节点的计算资源和带宽资源的要求也较高。当中心节点的计算资源或者带宽资源有限,无法支持大量网络设备的模型聚合,即无法实现联邦学习。
为解决上述问题,在本申请实施例中,可对参与学习的边缘设备进行分组。任意一个组内的边缘设备完成聚合该组内多个边缘设备学习后所得的模型,由该组内的一个边缘设备发送给中心节点。可见,通过对参与学习的边缘设备的分组,使得与中心节点交互的边 缘设备减少,从而减轻中心节点的负担。且,在每一轮次学习中,参与学习的边缘设备可以更多,使得模型收敛更快,从而减少学习次数,提高学习的效率。
下面结合附图对本申请实施例提供的方案进行详细介绍。本申请实施例提供的技术方案可以应用于层次化联邦学习架构,如图4所示的架构。需要说明的是,本申请实施例中的层次化联邦学习架构可以包括更多边缘设备或者更少边缘设备,每个边缘设备覆盖的终端设备可以更多或者更少,如5所示。图5以边缘设备可以是网络设备为例,且以包括1个中心节点、5个边缘设备(边缘设备1-边缘设备5)以及各个边缘设备覆盖范围内包括至少一个终端设备为例。例如,边缘设备1覆盖范围内包括终端设备1-终端设备3,边缘设备2覆盖范围内包括终端设备4-终端设备5。
请参见图6,为本申请实施例提供的联邦学习的流程示意图。图6所示的流程指的是一轮模型学习的流程。可以理解的是,任意一个边缘设备可将自己的资源信息发送给中心节点,中心节点可接收多个边缘设备分别发送的资源信息,根据所接收的多个资源信息确定参与本轮次模型学习的边缘设备。
S601、中心节点将参与学习的多个边缘设备划分为至少一个边缘设备组,并确定各个边缘设备组的中心边缘设备。
中心节点启动模型学习之前,可将参与学习的多个边缘设备进行分组,获得至少一个边缘设备组。其中,一个边缘设备组包括至少一个边缘设备,该至少一个边缘设备中任意两个边缘设备可以进行通信。也就是,一个边缘设备组内的任意一个边缘设备无需中心节点的参与,即可实现与组内的其他边缘设备进行通信。通过对多个边缘设备的分组,可降低对中心节点的依赖。另外,中心节点还可以从该边缘设备组中的至少一个边缘设备选择一个边缘设备为代表,与中心节点进行通信。为方便描述,在本申请实施例中,将一个边缘设备组中与中心节点进行通信的边缘设备称为中心边缘设备。在每一轮次的学习之前,中心节点确定参与学习的多个边缘设备,并对这多个边缘设备进行分组,以及确定各个边缘设备组的中心边缘设备。其中,不同轮次的学习,参与学习的多个边缘设备可能不同,相应的,边缘设备组的数量以及各个边缘设备组包括的边缘设备可能不同。不同轮次的学习,各个边缘设备组中的中心边缘设备也可能不同。关于如何确定各个边缘设备组中的中心边缘设备后续介绍,首先介绍如何多个边缘设备进行分组。
本申请实施例对中心节点对多个边缘设备的分组方式不作限制,例如,可以通过如下的任意一种或多种分组方式对多个边缘设备进行分组。
分组方式一,中心节点可根据多个边缘设备之间的通信关系(也就是本申请实施例中的第一信息)将多个边缘设备划分为至少一个边缘设备组。
该通信关系可用于指示多个边缘设备中能够与各个边缘设备通信的边缘设备。也就是,针对任意一个边缘设备,该通信关系可以指示能够与该边缘设备通信的边缘设备。或者,该通信关系可指示多个边缘设备中两两边缘设备是否能够直接(或者点对点)通信。
沿用图5的例子,参与学习的边缘设备包括边缘设备1-边缘设备4,其中,边缘设备1与边缘设备2可以互相通信,边缘设备3、边缘设备4和边缘设备5中任意两个边缘设备可以互相通信。而边缘设备1与边缘设备3-边缘设备5中的任意边缘设备距离较远,不能通信。边缘设备2与边缘设备3-边缘设备5中的任意边缘设备也不能通信。该通信关系可指示能够与边缘设备1通信的边缘设备2,能够与边缘设备2通信的边缘设备1,能够与边缘设备3通信的边缘设备4和边缘设备5,等等。中心节点在每一轮次学习之前,可 确定参与学习的边缘设备。不同轮次的学习,参与学习的边缘设备可能不同。例如,第一轮次的学习,参与学习的边缘设备有4个;第二轮次的学习,参与学习的边缘设备有5个。中心节点在每一轮次的学习之前,可确定第一信息,即确定参与学习的多个边缘设备间的通信关系。中心节点确定参与学习的多个边缘设备间的通信关系的方式多种,例如包括但不限于如下两种确定方式。
确定方式一,针对参与学习的任意边缘设备,中心节点可指示该边缘设备上报能够与该边缘设备通信的边缘设备的信息,从而中心节点可获取多个边缘设备中能够直接通信的边缘设备的信息,进一步确定参与学习的多个边缘设备间的通信关系。为方便描述,下面以参与学习的第一边缘设备为例,介绍中心节点如何确定参与学习的多个边缘设备间的通信关系。
针对第一边缘设备,中心节点可向第一边缘设备发送获取指令,该获取指令用于指示第一边缘设备上报与第一边缘设备能够通信的边缘设备的信息。第一边缘设备接收该获取指令,将可以与第一边缘设备直接通信的边缘设备的信息发送给中心节点。类似的,任意一个边缘设备接收到中心节点发送的获取指令,向中心节点发送可以与自身直接通信的边缘设的信息。为方便描述,与一个边缘设备直接通信的边缘设备的信息称为第四信息。那么中心节点可接收多个第四信息,从而根据这多个第四信息可确定参与学习的多个边缘设备之间的通信关系。
确定方式二,针对任意一个边缘设备,可预先配置能够与该边缘设备直接通信的边缘设备的信息。即各个边缘设备存储有能够与自身直接通信的边缘设备的配置信息。中心节点获取参与学习的多个边缘设备的配置信息,根据所获取的多个配置信息可确定这多个边缘间的通信关系。
中心节点确定参与学习的多个边缘设备之间的通信关系之后,根据该通信关系将多个边缘设备中的至少一个边缘设备划分为一组。如图5所示,边缘设备1和边缘设备2为一组,边缘设备3、边缘设备4和边缘设备5为一组。
在可能的场景中,多个边缘设备是天然分组的。例如,请参见图7,为多个边缘设备天然分组的一种示意图。其中,边缘设备为微基站。可以理解的是,一个宏基站可连接多个微基站,那么连接在相同宏基站的多个微基站为一组。即多个边缘设备天然分组。这种情况下,多个边缘设备划分的至少一个组可为多个边缘设备的天线分组。
如果多个边缘设备之间的通信关系不是天然分组,那么中心节点可根据多个边缘设备间的通信关系对多个边缘设备进行分组。举例来说,请参见图8,为基于通信时延划分多个边缘设备的示意图。图8中的簇成员节点和簇头节点都为边缘设备,其中,簇头节点为中心边缘设备。换句话说,可将多个边缘设备抽象为多个节点,分布在一个图(例如称为第一图)中,对多个边缘设备进行分组,相当于,将第一图进行图像分割,获得多个子图。每个子图内的节点对应的边缘设备属于一个组。具体使用何种图像分割方式以对多个边缘设备进行分组,本申请实施例不作限制。例如,通过层次化分割算法(例如metis算法)对第一图进行图像分割。
分组方式二,中心节点可根据多个边缘设备中各个边缘设备的通信链路的通信时延(即本文中的第二信息)将多个边缘设备划分为至少一个边缘设备组。
可以理解的是,两个边缘设备间的通信链路的通信时延较短,可认为这两个边缘设备之间的距离较短,那么这两个边缘设备的通信成本较低。相反,如果两个边缘设备间的通 信链路的通信时延较长,可认为两个边缘设备之间的距离较长,这两个边缘设备的通信成本较高,甚至两个边缘设备不能通信。例如,如果两个边缘设备间的通信链路的通信时延超过预设时长,可认为这两个边缘设备不能通信。在本申请实施例中,中心节点可按照通信成本较低的需求将多个边缘设备划分为一组,既可以实现对多个边缘设备的分组,又可以尽量保证较低的通信成本。例如,可以统计每一轮次中两两边缘设备间的通信时延,中心节点获取参与本轮次学习的多个边缘设备中两两边缘设备间的通信时延,再根据获取的通信时延对这多个边缘设备进行分组。
例如,请参见图9,为多个边缘设备的通信关系的示意图。可将多个边缘设备抽象为多个节点,分布在一个图(例如称为第二图),对第二图进行分割。例如,在具体分组时,将不同边缘设备间的通信链路的通信时延作为权重,通过设置优化目标为组内通信成本最低,通过层次化分割算法(例如metis算法)对第二图进行图像分割。如图9所示,图9中的粗线示意分割的边界,相应的,图9以分为四组为例。
分组方式三,中心节点可根据多个边缘设备之间的模型相似性(即本文中的第三信息)将多个边缘设备划分为至少一个边缘设备组。
两个边缘设备之间的模型相似性指的是这两个边缘设备针对同一模型分别采用样本数据学习后所得的模型之间的相似性。中心节点可根据多个边缘设备之间的模型相似性将多个边缘设备划分为至少一个边缘设备组。由于模型相似的多个边缘设备分为一组,那么该组内的多个边缘设备学习后所得的模型之间的差异较小,有助于加快组内多个模型的聚合。
例如,中心节点可向这多个边缘设备发送相同的模型以及相同的样本数据,这多个边缘设备中的各个边缘设备接收到来自中心节点的模型以及样本数据,通过该样本数据对模型进行学习。各个边缘设备将学习后的模型发送给中心节点,从而中心节点接收多个边缘设备分别发送的学习后的模型。后续,中心节点计算两两边缘设备间模型之间的相似性,根据获得的相似性对这多个边缘设备进行分组。
中心节点可使用上述分组方式一到分组方式三中的任意一种分组对多个边缘设备进行分组,也可以使用分组方式一到分组方式三中的多种分组方式对多个边缘设备进行分组。
例如,中心节点根据第一信息对多个边缘设备进行分组,获得M个边缘设备组,M为大于或等于1的整数;之后,中心节点再根据第三信息对M个边缘设备组中的各个边缘设备组进行再次分组,获得至少一个边缘设备组。也就是,中心节点使用分组方式一和分组方式三对多个边缘设备进行分组。中心节点根据多个边缘设备的通信关系将多个边缘设备组划分为M个边缘设备组;再根据两两边缘设备间的模型相似性进一步将M个边缘设备组再分组。沿用图5的例子,边缘设备1-边缘设备5天然分为边缘设备组1和边缘设备组2。中心节点根据两两边缘设备间的模型相似性可进一步将边缘设备组2中的边缘设备3和边缘设备4划分为一组,将边缘设备5划分为一组。
又例如,中心节点根据第二信息对多个边缘设备进行分组,获得M个边缘设备组,M为大于或等于1的整数;之后,中心节点再根据第三信息对M个边缘设备组中的各个边缘设备组进行再次分组,获得至少一个边缘设备组。也就是,中心节点使用分组方式二和分组方式三对多个边缘设备进行分组。中心节点根据两两边缘设备间的通信链路的通信时延将多个边缘设备划分为M个边缘设备组;再根据两两边缘设备间的模型相似性进一步将M个边缘设备租再次进行分组。
中心节点将多个边缘设备划分为至少一个边缘设备组,还需要确定各个边缘设备组中的中心边缘设备。本申请实施例对确定中心边缘设备的具体方式不作限制。示例性的,针对任意一个边缘设备组,中心节点可以从该边缘设备组中随机选择一个边缘设备作为中心边缘设备。示例性的,中心节点可以根据该边缘设备组中各个边缘设备的业务负担确定中心边缘设备。例如,中心节点选择边缘设备组中业务负担最轻的边缘设备作为中心边缘设备;又例如,中心节点从边缘设备组中选择业务负担低于阈值的任意一个边缘设备作为中心边缘设备。示例性的,中心节点可以根据该边缘设备组中各个边缘设备与中心节点的通信时延确定中心边缘设备。例如,中心节点选择边缘设备组中与中心节点通信时延最短的边缘设备作为中心边缘设备;又例如,中心节点从边缘设备组中选择与中心节点通信时延低于阈值的任意一个边缘设备作为中心边缘设备。
中心节点通过与各个边缘设备中的中心边缘设备组进行交互,例如,任意一个组内的边缘设备完成聚合该组内多个边缘设备学习后所得的模型,由该组内的一个边缘设备发送给中心节点。这样可使得与中心节点交互的边缘设备减少,从而减轻中心节点的负担。且,在每一轮次学习中,参与学习的边缘设备可以更多,使得模型收敛更快,从而减少学习次数,提高联邦学习的效率。
S602、中心节点向至少一个中心边缘设备分别发送第一模型。
第一模型可认为是需要联邦学习的模型,也称为全局模型。当需要学习第一模型时,中心节点可向至少一个中心边缘设备分别发送第一模型,各个中心边缘设备在组内广播第一模型,这样任意一个边缘设备组中的各个边缘设备都可以接收到第一模型。即中心节点可以通过至少一个中心边缘设备向至少一个边缘设备组中的各个边缘设备分别发送第一模型。任意一个边缘设备组内的任意一个边缘设备接收第一模型后,可协同覆盖范围内的至少一个终端设备基于本地数据对第一模型进行联邦学习,直到模型收敛或者达到训练次数,获得学习后的模型(例如称为第三模型)。对于边缘设备协同覆盖范围内的至少一个边缘设备对第一模型进行联邦学习的过程可参考前述图3所示架构的内容,此处不再赘述。
中心节点可对各个边缘设备获得的第三模型进行聚合,完成一轮次的模型学习。在本申请实施例中,可组内聚合多个第三模型,也可以组间聚合多个第三模型。其中,组内聚合多个第三模型指的是:针对任意一个边缘设备组,可由该边缘设备组内的一个边缘设备聚合该边缘设备组内的第三模型。组间聚合多个第三模型指的是:一个边缘设备组内的一个边缘设备聚合的第三模型包括该边缘设备组内的第三模型和不属于该边缘设备组内的第三模型。为方便描述,本申请实施例将聚合多个第三模型得到的模型称为第二模型。组内聚合多个第三模型,可以通过该组内的任一边缘设备聚合这多个第三模型。下面以第一边缘设备组为例介绍如何组内聚合多个第三模型。其中,第一边缘设备组的中心边缘设备为第一边缘设备。
作为一种示例,第一边缘设备组内除第一边缘设备之外的其他边缘设备获得第三模型之后,可将获得的第三模型发送给第一边缘设备。第一边缘设备接收多个第三模型,对这多个第三模型进行聚合,获得第二模型。
其中,所述其他边缘设备可将第三模型直接发送给第一边缘设备,也可以通过第一边缘设备组内的边缘设备将第三模型转发给第一边缘设备。例如,第一边缘设备组内的第二边缘设备可将第三模型直接发送给第一边缘设备,也可以通过第一边缘设备组内的第三边缘设备将第三模型转发给第一边缘设备。又例如,第一边缘设备组内的第二边缘设备也可 以通过终端设备将第三模型发送给第一边缘设备。这种情况下,终端设备本地可保存多个边缘设备的信息,当终端设备从第二边缘设备的覆盖范围移动到第一边缘设备的覆盖范围,将来自第二边缘设备的第三模型发送给第一边缘设备。
作为另一种示例,第一边缘设备组内除第一边缘设备之外的其他边缘设备可对第一边缘设备组内的多个第三模型进行聚合。例如,第一边缘设备组内的第二边缘设备获得第三模型,可将第三模型发送给第一边缘设备组内的第三边缘设备。类似地,第一边缘设备组内除第三边缘设备之外的其他边缘设备也向第三边缘设备发送所获得的第三模型。第三边缘设备接收多个第三模型,将这多个第三模型与第三边缘设备获得的第三模型进行聚合,获得第二模型。这种情况下,第三边缘设备将第二模型发送给第一边缘设备,以通过第一边缘设备将第二模型发送给中心节点。
对于组间聚合多个第三模型,第一边缘设备组内的任意一个边缘设备可聚合所接收的多个第三模型,这多个第三模型可以包括第一边缘设备组内的第三模型和除第一边缘设备组之外的其他边缘设备组中的第三模型。例如,第一边缘设备组内的任意一个边缘设备可聚合第一边缘设备组内的至少一个第三模型和第二边缘设备组内的至少一个第三模型。举例来说,第二边缘设备组内的第三边缘设备可将获得的第三模型发送给第三边缘设备覆盖的任意一个终端设备。由于终端设备的移动性,当终端设备移动到第一边缘设备组内任意一个边缘设备(例如第四边缘设备)的覆盖范围内,该终端设备可将来自第三边缘设备的第三模型发送给第四边缘设备。第四边缘设备接收来自第三边缘设备的第三模型之后,将该第三模型转发给第一边缘设备,由第一边缘设备对该第三模型以及所接收的其他模型进行聚合。或者,四边缘设备接收来自第三边缘设备的第三模型之后,将该第三模型和第四边缘设备学习获得的第三模型进行聚合。第四边缘设备聚合第三模型之后,可将聚合后的第三模型发送给第一边缘设备,由第一边缘设备进一步对所接收第三模型以及第一边缘设备协同所覆盖范围内的至少一个终端设备基于本地数据对第一模型进行学习获得的第三模型进行聚合。
S603、至少一个中心边缘设备分别向中心节点发送第二模型。
各个边缘设备组中的中心边缘设备获得第二模型之后,将第二模型发送给中心节点。即中心节点通过至少一个边缘设备中心可接收至少一个第二模型。或者,各个边缘设备组中的中心边缘设备可将第二模型与第一模型之间模型参数的变化量(即模型梯度)发送给中心节点。应理解,一个第二模型对应一个模型梯度。中心节点从至少一个中心边缘设备可接收至少一个模型梯度。
S604、中心节点聚合至少一个第二模型。
至少一个中心边缘设备分别向中心节点发送第二模型,中心节点可接收至少一个第二模型。中心节点聚合至少一个第二模型,获得第四模型。第四模型为本轮次联邦学习获得的模型。在下一轮次的学习中,中心节点将第四模型发送给各个边缘设备组的中心边缘设备。任意一个边缘设备组的中心边缘设备广播第四模型,使得任意一个边缘设备组包括的各个边缘设备均接收到第四模型。任意一个边缘设备协同覆盖范围内的终端设备对第四模型进行联邦学习。下一轮次学习的流程与本轮次学习的流程相同,直到模型收敛或者达到预设的学习轮数。
以图5为例,第N轮次学习中,中心节点向边缘设备组1中的边缘设备1发送第一模型,并向边缘设备组2中的边缘设备3发送第一模型。假设边缘设备1为边缘设备组1的 中心边缘设备,边缘设备3为边缘设备组2的中心边缘设备。边缘设备1接收第一模型后,在边缘设备组1中广播第一模型,从而边缘设备2以及边缘设备1覆盖的终端设备1-终端设备3也可以获得第一模型。同理,边缘设备3接收第一模型后,在边缘设备组2中广播第一模型,从而边缘设备4和边缘设备5以及边缘设备3覆盖的终端设备6-终端设备7也可以获得第一模型。边缘设备2、边缘设备4以及边缘设备5获得第一模型后,继续广播第一模型,这样边缘设备2、边缘设备4以及边缘设备5各自覆盖范围下的终端设备都可以获得第一模型。各个边缘设备协同所覆盖的至少一个终端设备基于本地数据分别对第一模型进行学习,可获得第三模型。例如,边缘设备1协同终端设备1-终端设备3基于本地数据对第一模型进行学习,可获得第三模型。
各个边缘设备获得第三模型之后,可以聚合组内的多个第三模型,获得第二模型,之后由中心边缘设备将第二模型发送给中心节点。以边缘设备组1为例,边缘设备2可将获得的第三模型发送给边缘设备1,边缘设备1对该第三模型以及边缘设备1协同自己覆盖范围下的终端设备1-终端设备3训练获得的第三模型进行聚合,获得第二模型,边缘设备1作为中心边缘节点再将最终本边缘设备组1训练得到的第三模型发送给中心节点。或者,也可由对来自多个组的多个第三模型进行聚合,获得第二模型,之后由中心边缘设备将第二模型发送给中心节点。例如,边缘设备3可将获得的第三模型发送给终端设备6,当终端设备6移动到边缘设备1的覆盖范围内时,终端设备6将在边缘设备组2训练得到的第三模型发送给边缘设备1。边缘设备1可将从终端设备6接收的第三模型,以及来自边缘设备2的第三模型和边缘设备1协同终端设备1-终端设备3训练获得的第三模型进行聚合,获得第二模型,再发送给中心节点。
边缘设备3作为中心边缘设备得到第二模型以及向中心节点上报第二模型的过程,与上述边缘设备1得到第二模型以及向中心节点上报第二模型的过程相似,不再重复赘述。由此,中心节点可接收来自边缘设备1的第二模型和来自边缘设备3的第二模型,对来自边缘设备1的第二模型和来自边缘设备3的第二模型进行聚合获得第四模型。如果第四模型没有收敛,或者预定的学习轮次大于N,那么继续与上述第N轮次学习过程类似的第N+1轮次学习。
与第N轮次学习类似,在第N+1轮次学习过程中,中心节点向边缘设备1和边缘设备3分别发送第N轮次学习之后所获得的模型,即第四模型。边缘设备1接收第四模型,广播第四模型;同理,边缘设备3接收第四模型,广播第四模型。后续,第N+1轮次学习过程类似第N轮次学习的流程,此处不再赘述,直至最终训练得到的模型收敛或者达到预设的学习轮数为止。
本申请实施例通过对参与学习的边缘设备进行分组,使得与中心节点交互的边缘设备减少,从而减轻中心节点的负担。且,在每一轮次学习中,参与学习的边缘设备可以更多,使得模型收敛更快,提高学习的效率。
上述本申请提供的实施例中,以中心节点、第一边缘设备和终端设备之间的交互对本申请实施例提供的方法进行了介绍。为了实现上述本申请实施例提供的方法中的各功能,中心节点、第一边缘设备和终端设备分别可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
下面结合附图介绍本申请实施例中用来实现上述方法的通信装置。
图10为本申请实施例提供的通信装置1000的示意性框图。该通信装置1000可以包括处理模块1010和收发模块1020。可选地,还可以包括存储单元,该存储单元可以用于存储指令(代码或者程序)和/或数据。处理模块1010和收发模块1020可以与该存储单元耦合,例如,处理模块1010可以读取存储单元中的指令(代码或者程序)和/或数据,以实现相应的方法。上述各个模块可以独立设置,也可以部分或者全部集成。
一些可能的实施方式中,通信装置1000能够对应实现上述方法实施例中的中心节点的行为和功能,通信装置1000可以为中心节点,也可以为应用于中心节点中的部件(例如芯片或者电路),也可以是中心节点中的芯片或芯片组或芯片中用于执行相关方法功能的一部分。
例如,收发模块1020用于向至少一个中心边缘设备分别发送第一模型,以及接收至少一个第二模型,所述至少一个中心边缘设备与至少一个边缘设备组一一对应,一个边缘设备组包括至少一个边缘设备,一个第二模型是聚合至少一个边缘设备组中的各个边缘设备分别获得的第三模型得到的,一个第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对第一模型学习获得的模型;处理模块1010用于对至少一个第二模型进行聚合,获得第四模型。
作为一种可选的实现方式,处理模块1010还用于根据如下的一种或多种信息将参与学习的多个边缘设备划分为至少一个边缘设备组,并确定各个边缘设备组各自的中心边缘设备:第一信息、第二信息或第三信息。其中,第一信息用于指示多个边缘设备间的通信关系,该通信关系用于指示多个边缘设备中能够与各个边缘设备通信的边缘设备。第二信息用于指示多个边缘设备中各个边缘设备的通信链路的通信时延。第三信息用于指示多个边缘设备之间的模型相似性。
作为一种可选的实现方式,处理模块1010根据所述一种或多种信息将参与学习的多个边缘设备划分为所述至少一个边缘设备组时,具体用于:根据第一信息或第二信息对多个边缘设备进行分组,获得M个边缘设备组,M为大于或等于1的整数;根据第三信息对M个边缘设备组中的各个边缘设备组进行再次分组,获得至少一个边缘设备组。
作为一种可选的实现方式,收发模块1020还用于:针对多个边缘设备中的第一边缘设备,向第一边缘设备发送获取指令,接收来自第一边缘设备的第四信息;该获取指令用于指示第一边缘设备上报与第一边缘设备能够通信的边缘设备的信息,第四信息包括与第一边缘设备能够通信的边缘设备的信息;处理模块1010还用于根据分别来自多个边缘设备的第四信息确定多个边缘设备间通信关系。
作为一种可选的实现方式,处理模块1010还用于从多个边缘设备分别获取配置信息,多个边缘设备中的一个边缘设备的配置信息包括与一个边缘设备能够通信的其他边缘设备的信息;根据所获取的配置信息确定多个边缘设备间的通信关系。
一些可能的实施方式中,通信装置1000能够对应实现上述方法实施例中的第一边缘设备的行为和功能,通信装置1000可以为基站,也可以为应用于基站中的部件(例如芯片或者电路),也可以是基站中的芯片或芯片组或芯片中用于执行相关方法功能的一部分。
例如,收发模块1020用于接收来自中心节点的第一模型,将第一模型发送给第一边缘设备组中除通信装置之外的其他边缘设备,通信装置1000为第一边缘设备组的中心边缘设备,第一边缘设备组包括至少一个边缘设备。处理模块1010用于确定第二模型,该 第二模型是聚合至少一个边缘设备组中的各个边缘设备分别获得的第三模型得到的。至少一个边缘设备组包括第一边缘设备组,一个第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对第一模型学习获得的模型。收发模块1020还用于向中心节点发送第二模型。
作为一种可选的实现方式,收发模块1020还用于接收至少一个第三模型,其中,至少一个第三模型中的一个第三模型来自第一边缘设备组内的第二边缘设备或者第一终端设备。第一终端设备为从第二边缘设备组中的一个边缘设备的覆盖范围移动到通信装置1000覆盖范围内的终端设备。处理模块1010还用于聚合至少一个第三模型以及通信装置1000协同通信装置1000覆盖的至少一个终端设备基于本地数据对第一模型进行学习获得的第三模型进行聚合,获得第二模型。
作为一种可选的实现方式,收发模块1020还用于接收中心节点发送的获取指令,向中心节点发送第四信息。获取指令用于指示通信装置1000上报与通信装置1000能够通信的边缘设备的信息。第四信息包括与通信装置1000能够通信的边缘设备的信息。
一些可能的实施方式中,通信装置1000能够对应实现上述方法实施例中的第一终端设备的行为和功能,通信装置1000可以为终端设备,也可以为应用于终端设备中的部件(例如芯片或者电路),也可以是终端设备中的芯片或芯片组或芯片中用于执行相关方法功能的一部分。
例如,收发模块1020用于接收来自第一边缘设备组中的第二边缘设备的第三模型,该第三模型是第二边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对来自中心节点的第一模型学习获得的模型,通信装置1000属于所述至少一个终端设备。处理模块1010用于确定所述通信装置从第二边缘设备的覆盖范围移动到第二边缘设备组中的第三边缘设备的覆盖范围时,控制收发模块1020向第三边缘设备发送第三模型。
应理解,本申请实施例中的处理模块1010可以由处理器或处理器相关电路组件实现,收发模块1020可以由收发器或收发器相关电路组件或者通信接口实现。
图11为本申请实施例提供的通信装置1100的示意性框图。其中,通信装置1100可以是中心节点,能够实现本申请实施例提供的方法中的中心节点的功能。通信装置1100也可以是能够支持中心节点实现本申请实施例提供的方法中对应的功能的装置,其中,该通信装置1100可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。具体的功能可以参见上述方法实施例中的说明。通信装置1100也可以是边缘设备,例如基站,能够实现本申请实施例提供的方法中第一边缘设备的功能。通信装置1100也可以是能够支持第一边缘设备实现本申请实施例提供的方法中对应的功能的装置,其中,该通信装置1100可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。具体的功能可以参见上述方法实施例中的说明。通信装置1100可以是终端设备,能够实现本申请实施例提供的方法中的终端设备的功能。通信装置1100也可以是能够支持终端设备实现本申请实施例提供的方法中对应的功能的装置,其中,该通信装置1100可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。具体的功能可以参见上述方法实施例中的说明。
通信装置1100包括一个或多个处理器1120,可以用于实现或用于支持通信装置1100实现本申请实施例提供的方法中的中心节点的功能。具体参见方法示例中的详细描述,此处不做赘述。一个或多个处理器1120也可以用于实现或用于支持通信装置1100实现本申 请实施例提供的方法中第一边缘设备的功能。一个或多个处理器1120也可以用于实现或用于支持通信装置1100实现本申请实施例提供的方法中终端设备的功能。具体参见方法示例中的详细描述,此处不做赘述。处理器1120也可以称为处理单元或处理模块,可以实现一定的控制功能。处理器1120可以是通用处理器或者专用处理器等。例如,包括:中央处理器,应用处理器,调制解调处理器,图形处理器,图像信号处理器,数字信号处理器,视频编解码处理器,控制器,存储器,和/或神经网络处理器等。所述中央处理器可以用于对通信装置1100进行控制,执行软件程序和/或处理数据。不同的处理器可以是独立的器件,也可以是集成在一个或多个处理器中,例如,集成在一个或多个专用集成电路上。
可选地,通信装置1100中包括一个或多个存储器1130,用于存储程序指令和/或数据。存储器1130和处理器1120耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器1120可能和存储器1130协同操作。处理器1120可能执行存储器1130中存储的程序指令和/或数据,以使得通信装置1100实现相应的方法。所述至少一个存储器中的至少一个可以包括于处理器1120中。
通信装置1100还可以包括通信接口1110,使用任何收发器一类的装置,用于与其他设备或通信网络,如无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN),有线接入网等通信。该通信接口1110用于通过传输介质和其它设备进行通信,从而用于通信装置1100中的装置可以和其它设备进行通信。示例性地,当该通信装置1100为中心节点时,该其它设备为边缘设备;或者,当该通信装置为边缘设备时,该其它设备为中心节点或终端设备。处理器1120可以利用通信接口1110收发数据。通信接口1110具体可以是收发器。
本申请实施例中不限定上述通信接口1110、处理器1120以及存储器1130之间的具体连接介质。本申请实施例在图11中以存储器1130、处理器1120以及通信接口1110之间通过总线1140连接,总线在图11中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在本申请实施例中,处理器1120可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器1130可以是ROM或可存储静态信息和指令的其他类型的静态存储设备,RAM或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信总线1140与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器1130用于存储执行本申请方案的计算机执行指令,并由处理器1120来 控制执行。处理器1120用于执行存储器1130中存储的计算机执行指令,从而实现本申请上述实施例提供的联邦学习方法。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
可以理解的是,上述实施例中的处理模块可以是处理器,例如:中央处理模块(central processing unit,CPU)。处理模块可以是芯片系统的处理器。收发模块或通信接口可以是芯片系统的输入输出接口或接口电路。例如,接口电路可以为代码/数据读写接口电路。所述接口电路,可以用于接收代码指令(代码指令存储在存储器中,可以直接从存储器读取,或也可以经过其他器件从存储器读取)并传输至处理器;处理器可以用于运行所述代码指令以执行上述方法实施例中的方法。又例如,接口电路也可以为通信处理器与收发机之间的信号传输接口电路。
当该通信装置为芯片类的装置或者电路时,该装置可以包括收发单元和处理单元。其中,所述收发单元可以是输入输出电路和/或通信接口;处理单元为集成的处理器或者微处理器或者集成电路。
本申请实施例还提供一种通信系统,具体的,通信系统包括至少一个中心节点、至少一个边缘设备以及至少一个终端设备。示例性的,通信系统包括用于上述实施例中的相关功能的中心节点、第一边缘设备以及至少一个终端设备。具体请参考上述方法实施例中的相关描述,这里不再赘述。
本申请实施例中还提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行本申请实施例中的中心节点执行的方法。或者,当其在计算机上运行时,使得计算机执行本申请实施例中的第一边缘设备执行的方法。或者,当其在计算机上运行时,使得计算机执行本申请实施例中的终端设备执行的方法。
本申请实施例中还提供一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行本申请实施例中的中心节点执行的方法。或者,当其在计算机上运行时,使得计算机执行本申请实施例中的第一边缘设备执行的方法。或者,当其在计算机上运行时,使得计算机执行本申请实施例中的终端设备执行的方法。
本申请实施例提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现前述方法中的中心节点的功能;或者用于实现前述方法中第一边缘设备的功能;或者用于实现前述方法中终端设备的功能。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各种说明性逻辑块(illustrative logical block)和步骤(step),能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (24)

  1. 一种联邦学习方法,其特征在于,包括:
    中心节点向至少一个中心边缘设备分别发送第一模型,所述至少一个中心边缘设备与至少一个边缘设备组一一对应,一个边缘设备组包括至少一个边缘设备;
    所述中心节点接收至少一个第二模型,一个所述第二模型是聚合至少一个边缘设备组中的各个边缘设备分别获得的第三模型得到的,一个所述第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对所述第一模型进行学习获得的模型;
    所述中心节点对所述至少一个第二模型进行聚合,获得第四模型。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    所述中心节点根据如下的一种或多种信息将参与学习的多个边缘设备划分为所述至少一个边缘设备组,并确定各个边缘设备组各自的中心边缘设备:
    第一信息,用于指示所述多个边缘设备间的通信关系,所述通信关系用于指示所述多个边缘设备中能够与各个边缘设备通信的边缘设备;
    第二信息,用于指示所述多个边缘设备中各个边缘设备的通信链路的通信时延;
    第三信息,用于指示所述多个边缘设备之间的模型相似性。
  3. 如权利要求2所述的方法,其特征在于,所述中心节点根据如下的一种或多种信息将参与学习的多个边缘设备划分为所述至少一个边缘设备组,包括:
    所述中心节点根据所述第一信息或所述第二信息对所述多个边缘设备进行分组,获得M个边缘设备组,所述M为大于或等于1的整数;
    所述中心节点根据所述第三信息对所述M个边缘设备组中的各个边缘设备组进行再次分组,获得所述至少一个边缘设备组。
  4. 如权利要求2或3所述的方法,其特征在于,所述方法还包括:
    针对所述多个边缘设备中的第一边缘设备,所述中心节点向所述第一边缘设备发送获取指令,所述获取指令用于指示所述第一边缘设备上报与所述第一边缘设备能够通信的边缘设备的信息;
    所述中心节点接收来自所述第一边缘设备的第四信息,所述第四信息包括与所述第一边缘设备能够通信的边缘设备的信息,所述第一边缘设备为所述多个边缘设备中的任意边缘设备;
    所述中心节点根据分别来自多个边缘设备的第四信息确定所述多个边缘设备间的通信关系。
  5. 如权利要求2或3所述的方法,其特征在于,所述方法还包括:
    所述中心节点从所述多个边缘设备分别读取配置信息,所述多个边缘设备中的一个边缘设备的配置信息包括与所述一个边缘设备能够通信的其他边缘设备的信息;
    所述中心节点根据所读取的配置信息确定所述多个边缘设备间的通信关系。
  6. 一种联邦学习方法,其特征在于,包括:
    第一边缘设备接收来自中心节点的第一模型,所述第一边缘设备为第一边缘设备组的中心边缘设备,所述第一边缘设备组包括至少一个边缘设备;
    所述第一边缘设备将所述第一模型发送给所述第一边缘设备组中除所述第一边缘设备之外的其他边缘设备;
    所述第一边缘设备向所述中心节点发送第二模型,所述第二模型是聚合至少一个边缘 设备组中的各个边缘设备分别获得的第三模型得到的,所述至少一个边缘设备组包括所述第一边缘设备组,一个所述第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对所述第一模型进行学习获得的模型。
  7. 如权利要求6所述的方法,其特征在于,所述方法还包括:
    所述第一边缘设备接收至少一个第三模型,其中,所述至少一个第三模型中的一个第三模型来自:所述第一边缘设备组内的第二边缘设备,或者,第一终端设备,其中,所述第一终端设备为从第二边缘设备组中的一个边缘设备的覆盖范围移动到所述第一边缘设备覆盖范围内的终端设备;
    所述第一边缘设备对所述至少一个第三模型以及所述第一边缘设备协同所述第一边缘设备覆盖的至少一个终端设备基于本地数据对所述第一模型进行学习获得的第三模型进行聚合,获得所述第二模型。
  8. 如权利要求6所述的方法,其特征在于,所述方法还包括:
    所述第一边缘设备接收所述中心节点发送的获取指令,所述获取指令用于指示所述第一边缘设备上报与所述第一边缘设备能够通信的边缘设备的信息;
    所述第一边缘设备向所述中心节点发送第四信息,所述第四信息包括与所述第一边缘设备能够通信的边缘设备的信息。
  9. 一种联邦学习方法,其特征在于,包括:
    第一终端设备接收来自第一边缘设备组中的第二边缘设备的第三模型,所述第三模型是所述第二边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对来自中心节点的第一模型学习获得的模型,所述第一终端设备位于所述第二边缘设备的覆盖范围内;
    所述第一终端设备从所述第二边缘设备的覆盖范围移动到第二边缘设备组中的第三边缘设备的覆盖范围,向所述第三边缘设备发送所述第三模型。
  10. 一种通信方法,其特征在于,包括:
    中心节点向至少一个中心边缘设备分别发送第一模型,所述至少一个中心边缘设备与至少一个边缘设备组一一对应,一个边缘设备组包括至少一个边缘设备;
    所述至少一个中心边缘设备中的第一边缘设备接收所述第一模型,并向所述中心节点发送第二模型,所述第一边缘设备为第一边缘设备组中的中心边缘设备,所述第一边缘设备组包括至少一个边缘设备,所述第二模型是聚合至少一个边缘设备组中的各个边缘设备分别获得的第三模型得到的,所述至少一个边缘设备组包括所述第一边缘设备组,一个所述第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对所述第一模型进行学习获得的模型;
    所述中心节点对接收的至少一个所述第二模型进行聚合,获得第四模型。
  11. 一种通信装置,其特征在于,包括处理模块和收发模块;
    所述收发模块,用于向至少一个中心边缘设备分别发送第一模型,以及接收至少一个第二模型,所述至少一个中心边缘设备与至少一个边缘设备组一一对应,一个边缘设备组包括至少一个边缘设备,一个所述第二模型是聚合至少一个边缘设备组中的各个边缘设备分别获得的第三模型得到的,一个所述第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对所述第一模型学习获得的模型;
    所述处理模块,用于对所述至少一个第二模型进行聚合,获得第四模型。
  12. 如权利要求11所述的通信装置,其特征在于,所述处理模块还用于:
    根据如下的一种或多种信息将参与学习的多个边缘设备划分为所述至少一个边缘设备组,并确定各个边缘设备组各自的中心边缘设备:
    第一信息,用于指示所述多个边缘设备间的通信关系,所述通信关系用于指示所述多个边缘设备中能够与各个边缘设备通信的边缘设备;
    第二信息,用于指示所述多个边缘设备中各个边缘设备的通信链路的通信时延;
    第三信息,用于指示所述多个边缘设备之间的模型相似性。
  13. 如权利要求12所述的通信装置,其特征在于,所述处理模块根据所述一种或多种信息将参与学习的多个边缘设备划分为所述至少一个边缘设备组时,具体用于:
    根据所述第一信息或所述第二信息对所述多个边缘设备进行分组,获得M个边缘设备组,所述M为大于或等于1的整数;
    根据所述第三信息对所述M个边缘设备组中的各个边缘设备组进行再次分组,获得所述至少一个边缘设备组。
  14. 如权利要求12或13所述的通信装置,其特征在于,所述收发模块还用于:
    针对所述多个边缘设备中的第一边缘设备,向所述第一边缘设备发送获取指令,接收来自所述第一边缘设备的第四信息,所述获取指令用于指示所述第一边缘设备上报与所述第一边缘设备能够通信的边缘设备的信息,所述第四信息包括与所述第一边缘设备能够通信的边缘设备的信息;
    所述处理模块还用于根据分别来自多个边缘设备的第四信息确定所述多个边缘设备间通信关系。
  15. 如权利要求12或13所述的通信装置,其特征在于,所述处理模块还用于:
    从所述多个边缘设备分别获取配置信息,所述多个边缘设备中的一个边缘设备的配置信息包括与所述一个边缘设备能够通信的其他边缘设备的信息;
    根据所获取的配置信息确定所述多个边缘设备间的通信关系。
  16. 一种通信装置,其特征在于,包括处理模块和收发模块;
    所述收发模块,用于接收来自中心节点的第一模型,将所述第一模型发送给第一边缘设备组中除所述通信装置之外的其他边缘设备,所述通信装置为第一边缘设备组的中心边缘设备,所述第一边缘设备组包括至少一个边缘设备;
    所述处理模块,用于确定第二模型,所述第二模型是聚合至少一个边缘设备组中的各个边缘设备分别获得的第三模型得到的,所述至少一个边缘设备组包括所述第一边缘设备组,一个所述第三模型是一个边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对所述第一模型学习获得的模型;
    所述收发模块,还用于向所述中心节点发送所述第二模型。
  17. 如权利要求16所述的通信装置,其特征在于,所述收发模块还用于接收至少一个第三模型,其中,所述至少一个第三模型中的一个第三模型来自:所述第一边缘设备组内的第二边缘设备,或者,第一终端设备,其中,所述第一终端设备为从第二边缘设备组中的一个边缘设备的覆盖范围移动到所述通信装置覆盖范围内的终端设备;
    所述处理模块还用于聚合所述至少一个第三模型以及所述通信装置协同所述通信装置覆盖的至少一个终端设备基于本地数据对所述第一模型进行学习获得的第三模型进行聚合,获得所述第二模型。
  18. 如权利要求16所述的通信装置,其特征在于,所述收发模块还用于:
    接收所述中心节点发送的获取指令,所述获取指令用于指示所述通信装置上报与所述通信装置能够通信的边缘设备的信息;
    向所述中心节点发送第四信息,所述第四信息包括与所述通信装置能够通信的边缘设备的信息。
  19. 一种通信装置,其特征在于,包括处理模块和收发模块;
    所述收发模块,用于接收来自第一边缘设备组中的第二边缘设备的第三模型,所述第三模型是所述第二边缘设备协同覆盖范围内的至少一个终端设备基于本地数据对来自中心节点的第一模型学习获得的模型,所述通信装置属于所述至少一个终端设备;
    所述处理模块,用于确定所述通信装置从所述第二边缘设备的覆盖范围移动到第二边缘设备组中的第三边缘设备的覆盖范围时,控制所述收发模块向所述第三边缘设备发送所述第三模型。
  20. 一种通信装置,其特征在于,所述通信装置包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于执行存储在所述存储器上的计算机程序,使得所述通信装置执行如权利要求1~5任一项所述的方法,或者,使得所述通信装置执行如权利要求6~8任一项所述的方法,或者,使得所述通信装置执行如权利要求9所述的方法。
  21. 一种通信系统,其特征在于,包括中心节点和多个边缘设备,其中,所述中心节点用于实现如权利要求1~5中任一项所述的方法,所述多个边缘设备中的任意边缘设备用于实现如权利要求6~8中任一项所述的方法。
  22. 如权利要求21所述的系统,其特征在于,所述系统还包括至少一个终端设备,所述至少一个终端设备中的任意终端设备用于实现如权利要求9所述的方法。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被计算机执行时,使所述计算机执行如权利要求1~5中任一项所述的方法,或者,使所述计算机执行如权利要求6~8中任一项所述的方法,或者,使所述计算机执行如权利要求9所述的方法。
  24. 一种计算机程序产品,其特征在于,所述计算机程序产品存储有计算机程序,所述计算机程序当被计算机执行时,使所述计算机执行如权利要求1~5中任一项所述的方法,或者,使所述计算机执行如权利要求6~8中任一项所述的方法,或者,使所述计算机执行如权利要求9所述的方法。
PCT/CN2023/092742 2022-10-29 2023-05-08 一种联邦学习方法及装置 WO2024087573A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211340211.0A CN118012596A (zh) 2022-10-29 2022-10-29 一种联邦学习方法及装置
CN202211340211.0 2022-10-29

Publications (1)

Publication Number Publication Date
WO2024087573A1 true WO2024087573A1 (zh) 2024-05-02

Family

ID=90829897

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/092742 WO2024087573A1 (zh) 2022-10-29 2023-05-08 一种联邦学习方法及装置

Country Status (2)

Country Link
CN (1) CN118012596A (zh)
WO (1) WO2024087573A1 (zh)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255934A (zh) * 2021-06-07 2021-08-13 大连理工大学 移动边缘云中网络不确定性感知的联邦学习方法及系统
CN113469340A (zh) * 2021-07-06 2021-10-01 华为技术有限公司 一种模型处理方法、联邦学习方法及相关设备
CN113610303A (zh) * 2021-08-09 2021-11-05 北京邮电大学 一种负荷预测方法及系统
WO2021247448A1 (en) * 2020-06-01 2021-12-09 Intel Corporation Federated learning optimizations
CN113902021A (zh) * 2021-10-13 2022-01-07 北京邮电大学 一种高能效的聚类联邦边缘学习策略生成方法和装置
CN114363911A (zh) * 2021-12-31 2022-04-15 哈尔滨工业大学(深圳) 一种部署分层联邦学习的无线通信系统及资源优化方法
CN114444708A (zh) * 2020-10-31 2022-05-06 华为技术有限公司 获取模型的方法、装置、设备、系统及可读存储介质
CN114828095A (zh) * 2022-03-24 2022-07-29 上海科技大学 一种基于任务卸载的高效数据感知分层联邦学习方法
CN114866545A (zh) * 2022-04-19 2022-08-05 郑州大学 一种基于空中计算的半异步分层联邦学习方法及系统
WO2022175818A1 (en) * 2021-02-19 2022-08-25 Telefonaktiebolaget Lm Ericsson (Publ) Edge device, edge server and synchronization thereof for improving distributed training of an artificial intelligence (ai) model in an ai system
CN115174404A (zh) * 2022-05-17 2022-10-11 南京大学 一种基于sdn组网的多设备联邦学习系统

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021247448A1 (en) * 2020-06-01 2021-12-09 Intel Corporation Federated learning optimizations
CN114444708A (zh) * 2020-10-31 2022-05-06 华为技术有限公司 获取模型的方法、装置、设备、系统及可读存储介质
WO2022175818A1 (en) * 2021-02-19 2022-08-25 Telefonaktiebolaget Lm Ericsson (Publ) Edge device, edge server and synchronization thereof for improving distributed training of an artificial intelligence (ai) model in an ai system
CN113255934A (zh) * 2021-06-07 2021-08-13 大连理工大学 移动边缘云中网络不确定性感知的联邦学习方法及系统
CN113469340A (zh) * 2021-07-06 2021-10-01 华为技术有限公司 一种模型处理方法、联邦学习方法及相关设备
CN113610303A (zh) * 2021-08-09 2021-11-05 北京邮电大学 一种负荷预测方法及系统
CN113902021A (zh) * 2021-10-13 2022-01-07 北京邮电大学 一种高能效的聚类联邦边缘学习策略生成方法和装置
CN114363911A (zh) * 2021-12-31 2022-04-15 哈尔滨工业大学(深圳) 一种部署分层联邦学习的无线通信系统及资源优化方法
CN114828095A (zh) * 2022-03-24 2022-07-29 上海科技大学 一种基于任务卸载的高效数据感知分层联邦学习方法
CN114866545A (zh) * 2022-04-19 2022-08-05 郑州大学 一种基于空中计算的半异步分层联邦学习方法及系统
CN115174404A (zh) * 2022-05-17 2022-10-11 南京大学 一种基于sdn组网的多设备联邦学习系统

Also Published As

Publication number Publication date
CN118012596A (zh) 2024-05-10

Similar Documents

Publication Publication Date Title
US20230209390A1 (en) Intelligent Radio Access Network
CN110831261B (zh) 用于组合的rrc非活动恢复、rrc rna&nas注册过程的装置
US20220116866A1 (en) Method and apparatus for mutually exclusive access to network slices in wireless communication system
US11690038B2 (en) Method and apparatus for mutually exclusive access to network slice for roaming terminal in wireless communication system
WO2021081959A1 (zh) 通信方法、设备及系统
Tan et al. Wireless technology and protocol for IIoT and digital twins
CN115174404A (zh) 一种基于sdn组网的多设备联邦学习系统
US11689968B2 (en) Method and apparatus for executing virtualized network function
CN117729618A (zh) 通信方法以及相关设备
WO2024087573A1 (zh) 一种联邦学习方法及装置
EP3286961A1 (en) Using networking relationship in configuring radio connectivity
CN114079975A (zh) 业务流量分流的方法和装置
Fujino et al. Wireless network technologies to support the age of IoT
WO2023066046A1 (zh) 一种合约管理方法、装置及系统
US20220150898A1 (en) Method and apparatus for allocating gpu to software package
WO2022206513A1 (zh) 模型处理的方法、通信装置和系统
KR102416342B1 (ko) 3차원 셀룰러 네트워크에서 간헐적 인식 연합 학습 방법
KR20220064806A (ko) 소프트웨어 패키지에 gpu를 할당하는 방법 및 장치
CN109819452B (zh) 一种基于雾计算虚拟容器的无线接入网络构建方法
CN108984558A (zh) 一种用户设备数据通信方法及设备
Verma et al. A bibliometric content analysis for understanding 5G technology: promise, challenges and future road map
WO2024067245A1 (zh) 模型匹配的方法和通信装置
WO2024082274A1 (zh) Ai任务指示的方法、通信装置和系统
WO2024061125A1 (zh) 一种通信方法及装置
WO2024067364A1 (zh) 通信方法、装置及系统