WO2024065709A1 - 一种通信方法及相关设备 - Google Patents

一种通信方法及相关设备 Download PDF

Info

Publication number
WO2024065709A1
WO2024065709A1 PCT/CN2022/123355 CN2022123355W WO2024065709A1 WO 2024065709 A1 WO2024065709 A1 WO 2024065709A1 CN 2022123355 W CN2022123355 W CN 2022123355W WO 2024065709 A1 WO2024065709 A1 WO 2024065709A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
information
result
learning mode
network
Prior art date
Application number
PCT/CN2022/123355
Other languages
English (en)
French (fr)
Inventor
张公正
李榕
王坚
童文
马江镭
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/123355 priority Critical patent/WO2024065709A1/zh
Publication of WO2024065709A1 publication Critical patent/WO2024065709A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of communications, and in particular to a communication method and related equipment.
  • Federated learning represented by federated learning, is a distributed learning technology. Each device uses a local data set to train a model and reports the trained model or update to the parameter server. The server uses a fusion algorithm represented by federated averaging to weighted average the local models of multiple devices, obtain a global model, and send it to each device to update the model.
  • the federated learning framework does not require uploading user data, thereby achieving joint learning of multiple devices while protecting user privacy.
  • the middle layer represented by network devices only performs forwarding operations and does not fully utilize the computing power of the middle layer; and the computing and storage capabilities of each terminal are different, and they may not have the ability to train and deploy a unified model.
  • the present application provides a communication method and related devices, wherein the first device fully utilizes computing power to participate in joint training in a multi-level joint learning scenario. Compared with the solution of only forwarding at the intermediate level, it can integrate heterogeneous models of downstream devices and reduce the processing flow of upstream devices, thereby increasing the learning efficiency of multi-level joint learning.
  • the first aspect of the embodiment of the present application provides a communication method that can be applied to a multi-level joint learning scenario.
  • the method can be executed by a first device, or by a component of the first device (such as a processor, a chip, or a chip system, etc.).
  • the first device can specifically be a network device such as a base station, a transmitting and receiving point (TRP), or a terminal or a core network device.
  • TRP transmitting and receiving point
  • the method includes: obtaining a first result, the first result is the result obtained by inferring a public data set by a model on the second device side; updating the first model based on the public data set and the first result to obtain a second model, the first model is a local model of the first device; sending the second model to a third device; receiving a third model sent by the third device, the third model is obtained by processing the second model; obtaining a second result based on the third model and the public data set, and the second result is used to update the model on the second device side.
  • the first device updates the local model according to the first result of the model on the downstream second device side, and sends the updated local model to the upstream third device.
  • the third model obtained by the upstream third device according to the model updated by the second device is received.
  • the second result of the public data set is updated according to the third model, and the second result is used to update the model on the downstream second device side.
  • the above step: obtaining the first result includes: receiving the first result from the second device, the first result being a result obtained by the second device using a second device side model to infer a public data set.
  • the first device can reduce the computing power and storage resources occupied by the first device using the model inference process on the second device side by directly obtaining the first result.
  • the above step: obtaining the first result includes: receiving a second device-side model from a second device; and using the second device-side model to infer the public data set to obtain the first result.
  • the first device can obtain the model on the second device side and perform the reasoning process itself, which can reduce the computing power and storage resources occupied by the second device's reasoning.
  • the above steps also include: sending a second result to the second device, and the second result is used by the second device to update a second device side model.
  • the second device may use the second result to update the second device side model, thereby reducing the computing power resources and storage resources occupied by the first device in updating the second device side model.
  • the above steps also include: updating the second device side model based on the second result; and sending the updated second device side model to the second device.
  • the first device may use the second result to update the second device side model, and send it to the second device, thereby reducing the computing power resources and storage resources occupied by the second device in updating the second device side model.
  • the above steps also include: sending indication information to the second device, the indication information is used to synchronize a common data set between the first device and the second device, and the synchronization corresponding operation includes at least one of the following: adding, deleting, and modifying; receiving confirmation information sent by the second device, and the confirmation information is used to synchronize the common data set.
  • the first device and the second device participating in the joint learning may synchronize the public data set, thereby ensuring that the interactive prediction results (eg, the first result and the second result) correspond to the same public data set.
  • the second aspect of the embodiment of the present application provides a communication method that can be applied to a model training scenario.
  • the method can be performed by a second device, or by a component of the second device (such as a processor, a chip, or a chip system, etc.).
  • the second device can be specifically a terminal device, and the method includes: obtaining a first model, the first model is obtained based on the first information of the second device and the second model, and the first model is part of the second model; the first information includes capability information and/or business requirement information.
  • the second device can determine the first model from the second model through the capability information and/or business requirement information of the second device, and the data of the second device are all inferred using the first model (ie, a substructure of the second model).
  • the above-mentioned second model includes N layers of first networks, at least one layer of the first network in the N layers of first networks includes more than two parallel sub-networks, the first model includes N layers of second networks, the first number is less than the second number, the first number is the number of sub-networks included in at least one layer of the second network in the N layers of second networks, the second number is the number of sub-networks included in the first network corresponding to at least one layer of the second network in the N layers of first networks, and N is a positive integer.
  • the first model is an example of a part of the second model, the first model and the second model have the same number of network layers, and the number of subnetworks in at least one second network layer of the first model is less than the number of subnetworks in the first network in the second model corresponding to the at least one second network layer.
  • the first model is one or more paths in the second model.
  • the above steps also include: receiving a first parameter from the first device, where the first parameter is used to indicate an adjustment to the subnetwork; and updating the first model based on the first parameter.
  • the first device and the second device can adjust the sub-network in the first model through the first parameter, thereby improving the performance of the first model.
  • the above steps also include: receiving a second model from the first device; obtaining the first model, including: determining the first model from the second model based on the first information.
  • an example of the second device acquiring the first model may be to receive the second model sent by the first device and determine the first model from the second model, so as to reduce the computing power resources and storage resources occupied by the second device in determining the first model.
  • the above steps determine the first model from the second model based on the first information, including: determining the subnetworks of each layer of the first network in the N layers of the first network based on the first information; and constructing the first model based on the subnetworks.
  • the subnetwork in each network layer is determined by the first information, and then the first model is obtained.
  • the capability information is used to determine the number of subnetworks of each second network in the N-layer second networks
  • the service requirement information is used to determine the subnetworks in each second network.
  • the number of sub-networks is determined by capability information, and the sub-networks are determined by service requirement information, so that the sub-networks of the first model can be accurately selected from the second model.
  • the above step: obtaining the first model includes: sending first information to the first device, the first information being used by the first device to determine the first model from the second model; and receiving the second model sent by the first device.
  • another example of the second device acquiring the first model can be to report the first information to the first device and receive the first model sent by the first device, where the first model is determined by the first device based on the first information.
  • the above steps also include: training the first model based on local data to obtain a third model; sending the third model to the first device, and the third model is used by the first device to update the second model.
  • the first device and the second device may perform joint training using the first model.
  • the above steps further include: acquiring a fourth model, where the fourth model is obtained by updating the second model with the third model; and updating the third model based on the fourth model.
  • the first device and the second device may perform joint training using the first model.
  • the first model includes N layers of first networks
  • the second model includes M layers of first networks
  • N and M are positive integers
  • M is less than or equal to N.
  • the first model is an example of a part of the second model, which can be used in a segmented network scenario.
  • the first model is only an encoder or a decoder in the entire second model.
  • the third aspect of the embodiment of the present application provides a communication method that can be applied to a model training scenario.
  • the method can be executed by a first device, or by a component of the first device (such as a processor, a chip, or a chip system, etc.).
  • the first device can specifically be a network device (such as a base station, a TRP, etc.), and the method includes: receiving a first message sent by a second device, the first information includes capability information and/or service requirement information of the second device, and the first information is used to determine whether the learning mode of the model is a federated learning mode or a distillation learning mode; determining the learning mode of the first device and the second device, and the learning mode is a federated learning mode or a distillation learning mode; sending an indication message to the second device, and the indication message is used to indicate the learning mode; receiving a second message sent by the second device, and the second information is used to update the model on the first device side.
  • a network device such as a base station, a TRP, etc.
  • the first device determines a learning mode that matches the second device through the capability information/business requirement information of the second device, so that the first device can flexibly apply the model training scenario and improve the model training efficiency.
  • the fourth aspect of the embodiment of the present application provides a communication method that can be applied to a model training scenario.
  • the method can be executed by a second device, or by a component of the second device (such as a processor, a chip, or a chip system, etc.).
  • the second device can specifically be a terminal device, and the method includes: sending a first message to a first device, the first message including capability information and/or business requirement information of the second device, and the first information is used to determine whether the learning mode of the model is a federated learning mode or a distillation learning mode; receiving an indication message sent by the first device, the indication message is used to indicate the learning mode; sending a second message to the first device based on the learning mode, and the second information is used to update the model on the first device side.
  • the first device determines a learning mode that matches the second device through the capability information/business requirement information of the second device, so that the first device can flexibly apply the model training scenario and improve the model training efficiency.
  • the second information is a weight or gradient of the model.
  • the method can be applied to a scenario where the computing power (eg, computing power, storage capacity) of the second device is relatively strong.
  • the computing power eg, computing power, storage capacity
  • the second information is the result obtained by model reasoning on a public data set.
  • the method can be applied to a scenario where the second device has poor capabilities (such as computing power and storage capacity).
  • the fifth aspect of the embodiment of the present application provides a first device that can be applied to a multi-level joint learning scenario.
  • the first device includes: an acquisition unit, which is used to acquire a first result, and the first result is the result obtained by inferring a public data set by a model on the second device side; an update unit, which is used to update the first model based on the public data set and the first result to obtain a second model, and the first model is a local model of the first device; a sending unit, which is used to send the second model to a third device; a receiving unit, which is used to receive a third model sent by the third device, and the third model is obtained by processing the second model; the acquisition unit is also used to acquire a second result based on the third model and the public data set, and the second result is used to update the model on the second device side.
  • the above-mentioned acquisition unit is specifically used to receive a first result from a second device, where the first result is a result obtained by the second device using a second device side model to infer a public data set.
  • the above-mentioned acquisition unit is specifically used to receive a second device-side model from a second device; the acquisition unit is specifically used to use the second device-side model to infer the public data set to obtain the first result.
  • the above-mentioned sending unit is also used to send a second result to the second device, and the second result is used by the second device to update the second device side model.
  • the above-mentioned updating unit is also used to update the second device side model based on the second result; the sending unit is also used to send the updated second device side model to the second device.
  • the sixth aspect of the embodiment of the present application provides a second device that can be applied to a model training scenario.
  • the second device includes: an acquisition unit, used to acquire a first model, the first model is obtained based on the first information of the second device and the second model, and the first model is a part of the second model; the first information includes capability information and/or business requirement information.
  • the above-mentioned second model includes N layers of first networks, at least one layer of the first network in the N layers of first networks includes more than two parallel sub-networks, the first model includes N layers of second networks, the first number is less than the second number, the first number is the number of sub-networks included in at least one layer of the second network in the N layers of second networks, the second number is the number of sub-networks included in the first network corresponding to at least one layer of the second network in the N layers of first networks, and N is a positive integer.
  • the above-mentioned second device also includes: a receiving unit, used to receive a first parameter from the first device, the first parameter is used to indicate an adjustment to the sub-network; and an updating unit, used to update the first model based on the first parameter.
  • the above-mentioned receiving unit is used to receive the second model from the first device; the acquisition unit is specifically used to determine the first model from the second model based on the first information.
  • the above-mentioned acquisition unit is specifically used to determine the subnetwork of each layer of the first network in the N-layer first network based on the first information; the acquisition unit is specifically used to construct the first model based on the subnetwork.
  • the capability information is used to determine the number of subnetworks of each second network in the N-layer second network
  • the service requirement information is used to determine the subnetworks in each second network.
  • the above-mentioned acquisition unit is specifically used to send first information to the first device, and the first information is used by the first device to determine the first model from the second model; the acquisition unit is specifically used to receive the second model sent by the first device.
  • the above-mentioned updating unit is also used to train the first model based on local data to obtain a third model; the sending unit is also used to send the third model to the first device, and the third model is used for the first device to update the second model.
  • the above-mentioned acquisition unit is also used to acquire a fourth model, and the fourth model is obtained by updating the second model by the third model; the updating unit is used to update the third model based on the fourth model.
  • the first model includes N layers of first networks
  • the second model includes M layers of first networks
  • N and M are positive integers
  • M is less than or equal to N.
  • the seventh aspect of the embodiment of the present application provides a first device that can be applied to a model training scenario.
  • the first device includes: a receiving unit, which is used to receive first information sent by a second device, the first information includes capability information and/or business requirement information of the second device, and the first information is used to determine whether the learning mode of the model is a federated learning mode or a distillation learning mode; a determining unit, which is used to determine the learning mode of the first device and the second device, and the learning mode is a federated learning mode or a distillation learning mode; a sending unit, which is used to send indication information to the second device, and the indication information is used to indicate the learning mode; the receiving unit is also used to receive second information sent by the second device, and the second information is used to update the model on the first device side.
  • the eighth aspect of the embodiment of the present application provides a second device that can be applied to a model training scenario.
  • the second device includes: a sending unit, which is used to send first information to the first device, the first information includes capability information and/or business requirement information of the second device, and the first information is used to determine whether the learning mode of the model is a federated learning mode or a distillation learning mode; a receiving unit, which is used to receive indication information sent by the first device, and the indication information is used to indicate the learning mode; a sending unit, which is used to send second information to the first device based on the learning mode, and the second information is used to update the model on the first device side.
  • the second information is a weight or gradient of the model.
  • the second information is a result obtained by model reasoning on a public data set.
  • a ninth aspect of an embodiment of the present application provides a first device, comprising: a processor, the processor being coupled to a memory, the memory being used to store programs or instructions, and when the programs or instructions are executed by the processor, the first device implements the method of the above-mentioned first aspect or any possible implementation of the first aspect, or enables the second device to implement the above-mentioned third aspect or any possible implementation of the third aspect.
  • the tenth aspect of an embodiment of the present application provides a second device, including: a processor, the processor is coupled to a memory, the memory is used to store programs or instructions, when the program or instructions are executed by the processor, the second device implements the method in the above-mentioned second aspect or any possible implementation of the second aspect, and the second device implements the above-mentioned fourth aspect or any possible implementation of the fourth aspect.
  • An eleventh aspect of an embodiment of the present application provides a communication system, comprising: the first device of the ninth aspect above, and/or the second device of the tenth aspect above.
  • a chip system which includes at least one processor for supporting a first device to implement the functions involved in the above-mentioned first aspect or any possible implementation of the first aspect; or for supporting a first device to implement the functions involved in the above-mentioned third aspect or any possible implementation of the third aspect.
  • the chip system may also include a memory for storing program instructions and data necessary for the communication device.
  • the chip system may be composed of a chip, or may include a chip and other discrete devices.
  • the chip system also includes an interface circuit, which provides program instructions and/or data for the at least one processor.
  • the twelfth aspect of an embodiment of the present application provides a chip system, which includes at least one processor for supporting a second device to implement the functions involved in the above-mentioned second aspect or any possible implementation of the second aspect; or for supporting a second device to implement the functions involved in the above-mentioned fourth aspect or any possible implementation of the fourth aspect.
  • the chip system may also include a memory for storing program instructions and data necessary for the communication device.
  • the chip system may be composed of a chip, or may include a chip and other discrete devices.
  • the chip system also includes an interface circuit, which provides program instructions and/or data for the at least one processor.
  • a thirteenth aspect of an embodiment of the present application provides a computer-readable medium having a computer program or instruction stored thereon.
  • the computer program or instruction is executed on a computer, the computer is caused to execute the method in the first aspect or any possible implementation of the first aspect, or the computer is caused to execute the method in the second aspect or any possible implementation of the second aspect, or the computer is caused to execute the method in the third aspect or any possible implementation of the third aspect, or the computer is caused to execute the method in the fourth aspect or any possible implementation of the fourth aspect.
  • a fourteenth aspect of an embodiment of the present application provides a computer program product.
  • the computer program product When the computer program product is executed on a computer, it enables the computer to execute the method in the first aspect or any possible implementation of the first aspect, enables the computer to execute the method in the second aspect or any possible implementation of the second aspect, or enables the computer to execute the method in the third aspect or any possible implementation of the third aspect, or enables the computer to execute the method in the fourth aspect or any possible implementation of the fourth aspect.
  • the present application has the following advantages: the first device updates the local model according to the first result of the model on the downstream second device side, and sends the updated local model to the upstream third device.
  • the third model obtained by the upstream third device according to the model updated by the second device is received.
  • the new second result of the public data set is updated according to the third model, and the second result is used to update the model on the downstream second device side.
  • the first device fully utilizes the computing power to participate in the joint training in the multi-level joint learning scenario. Compared with the solution that the intermediate level only forwards, the processing flow of the upstream device can be reduced, thereby increasing the learning efficiency of the multi-level joint learning.
  • FIG1A is a schematic diagram of a communication system provided in an embodiment of the present application.
  • FIG1B is a schematic diagram of a multi-level learning architecture provided in an embodiment of the present application.
  • FIG2 is a flow chart of a communication method provided in an embodiment of the present application.
  • FIG3 is another schematic diagram of a flow chart of a communication method provided in an embodiment of the present application.
  • FIG4 is another schematic diagram of a flow chart of a communication method provided in an embodiment of the present application.
  • FIG5 is another schematic flow chart of a communication method provided in an embodiment of the present application.
  • FIG6 is an example diagram of a second model provided in an embodiment of the present application.
  • FIG7A is an example diagram of a first model provided in an embodiment of the present application.
  • FIG7B is another example diagram of the first model provided in an embodiment of the present application.
  • FIG8A is another example diagram of the second model provided in an embodiment of the present application.
  • FIG8B is another example diagram of the first model provided in an embodiment of the present application.
  • FIG9 is another schematic flow chart of a communication method provided in an embodiment of the present application.
  • FIG10 is another schematic flow chart of a communication method provided in an embodiment of the present application.
  • FIG11 is another schematic flow chart of a communication method provided in an embodiment of the present application.
  • FIG12 is another schematic diagram of a communication system provided in an embodiment of the present application.
  • FIG13 is a schematic diagram of the structure of a first device provided in an embodiment of the present application.
  • FIG14 is a schematic diagram of a structure of a second device provided in an embodiment of the present application.
  • FIG15 is another schematic diagram of the structure of the first device provided in an embodiment of the present application.
  • FIG16 is another schematic diagram of the structure of the second device provided in an embodiment of the present application.
  • FIG17 is another schematic diagram of the structure of the first device provided in an embodiment of the present application.
  • FIG. 18 is another schematic diagram of the structure of the second device provided in an embodiment of the present application.
  • Federated learning is a distributed learning technology. Each device uses a local data set to train a model and reports the trained model or update to the parameter server.
  • the server uses a fusion algorithm represented by federated averaging to weighted average the local models of multiple devices, obtain a global model, and send it to each device to update the model.
  • the federated learning framework does not need to upload user data, thereby achieving joint learning of multiple devices while protecting user privacy.
  • Hybrid experts are a type of neural network that trains multiple sub-models, each of which is called an expert. For each sample, the control module selects which expert to use for reasoning, thereby increasing the model capacity without significantly increasing computing requirements.
  • Knowledge distillation is a method for transferring knowledge between models with different structures. Knowledge transfer is achieved by using the output of one neural network as the label for training another neural network.
  • system and “network” in the embodiments of the present application can be used interchangeably.
  • “At least one” means one or more, and “plurality” means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can represent: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A and B can be singular or plural.
  • the character “/” generally indicates that the objects associated with each other are in an “or” relationship.
  • “At least one of the following” or similar expressions refers to any combination of these items, including any combination of single items or plural items.
  • At least one of A, B and C includes A, B, C, AB, AC, BC or ABC.
  • the ordinal numbers such as “first” and “second” mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the order, timing, priority or importance of multiple objects.
  • FIG. 1A is a schematic diagram of a communication system provided in an embodiment of the present application.
  • FIG1A shows a schematic diagram of a communication system according to an embodiment of the present application.
  • the communication system includes a server/core network device 101 , network devices 1021 and 1022 , and terminal devices 1031 and 1032 .
  • Figure 1A only takes one server/core network device 101, two network devices 1021 and 1022, and two terminal devices 1031 and 1032 as an example for illustration. In actual applications, there may be more server/core network devices, network devices, and terminal devices.
  • each terminal device accesses the server/core network device 101 may also be different.
  • Two terminal devices 1031 and 1032 can access the server/core network device 101 through the same network device 1021, or one terminal device can access the server/core network device 101 through one network device (not shown in Figure 1A).
  • the two terminal devices 1031 and 1032 are generally connected to the network device 1021 via a wireless network, or via a wired network. If connected via a wireless network, the specific connection form can be a cellular wireless network, a WiFi network, or other types of wireless networks.
  • the network devices 1021 and 1022 can be connected to the server/core network device 101 via a wireless network or a wired network. If connected via a wired network, the general connection form is a fiber optic network.
  • the terminal device 1031 and the terminal device 1032 may be directly connected via a wireless network or a wired network, or may be indirectly connected via the network device 1021, etc., which is not specifically limited here.
  • network devices 1021 and 1022 can be devices independent of the server/core network device 101, or can be artificial intelligence (AI) nodes of the server/core network device 101, etc.
  • AI artificial intelligence
  • the communication system shown in Figure 1A can be applied to cellular systems related to the third generation partnership project (3GPP), such as long term evolution (LTE) systems, fourth generation (4G) communication systems, new radio (NR) systems and other fifth generation (5G) communication systems, and can also be applied to wireless fidelity (WiFi) systems, communication systems that support the integration of multiple wireless technologies, or sixth generation (6G) communication systems and other communication systems evolved after 5G.
  • 3GPP third generation partnership project
  • LTE long term evolution
  • 4G fourth generation
  • NR new radio
  • WiFi wireless fidelity
  • 6G sixth generation
  • 6G sixth generation
  • the terminal devices, network devices, and server/core network devices involved in the embodiments of the present application have communication functions and may also have AI computing capabilities. These devices can perform machine learning training through local data samples, and can also receive models trained by other devices for fusion and send to other devices, thereby realizing joint learning of multiple devices.
  • the terminal device mentioned in the embodiments of the present application may be a device with wireless transceiver function, specifically user equipment (UE), access terminal, subscriber unit, user station, mobile station, remote station, remote terminal, mobile device, user terminal, wireless communication device, user agent or user device.
  • UE user equipment
  • the terminal device may also be a satellite phone, a cellular phone, a smart phone, a wireless data card, a wireless modem, a machine type communication device, a cordless phone, a session initiation protocol (SIP) phone, a wireless local loop (WLL) station, a personal digital assistant (PDA), a handheld device with wireless communication function, a computing device or other processing device connected to a wireless modem, a vehicle-mounted device, a communication device carried on a high-altitude aircraft, a wearable device, a drone, a robot, a terminal in device-to-device communication (D2D), a vehicle-to-everything (Vehicle-to-E
  • V2X vehicle-to-everything
  • VR virtual reality
  • AR augmented reality
  • the network device mentioned in the embodiments of the present application may be a device with wireless transceiver functions for communicating with a terminal device, or may be a device for connecting a terminal device to a wireless network.
  • the network device may be a node in a wireless access network, which may also be referred to as a base station, or may also be referred to as a radio access network (RAN) node (or device).
  • RAN radio access network
  • the network device may be an evolved Node B (eNB or eNodeB) in LTE; or a next generation node B (gNB) in a 5G network or a base station in a future evolved public land mobile network (PLMN), a broadband network service gateway (BNG), an aggregation switch or a non-third generation partnership project (3GPP) access device, etc.
  • eNB evolved Node B
  • gNB next generation node B
  • PLMN public land mobile network
  • BNG broadband network service gateway
  • 3GPP non-third generation partnership project
  • the network devices in the embodiments of the present application may include various forms of base stations, such as: macro base stations, micro base stations (also called small stations), relay stations, access points, devices that implement base station functions in communication systems that evolve after 5G, access points (AP) in WiFi systems, transmission points (TRP), transmitting points (TP), mobile switching centers, and devices that perform base station functions in device-to-device (D2D), vehicle-to-everything (V2X), and machine-to-machine (M2M) communications, etc., and may also include centralized units (CU) and distributed units (DU) in cloud access network (C-RAN) systems, and network devices in non-terrestrial communication networks (NTN) communication systems, that is, they can be deployed on high-altitude platforms or satellites.
  • the embodiments of the present application do not make specific limitations on this.
  • the server mentioned in the embodiment of the present application can also be understood as an AI server (AI Function, AIF).
  • the core network device can include, for example, an access and mobility management function (access and mobility management function, AMF), a user plane function (user plane function, UPF) or a session management function (session management function, SMF), etc. It can be understood that the core network device can also be called a core network element. Among them, the core network element can be used to complete functions such as registration, connection, and session management.
  • the core network element mainly includes a network exposure function (network exposure function, NEF) network element, a policy control function (policy control function, PCF) network element, an application function (application function, AF) network element, an access and mobility management function (access and mobility management function, AMF) network element, a session management function module (session management function, SMF) network element, and a user plane function (user plane function, UPF) network element, etc.
  • NEF network exposure function
  • PCF policy control function
  • application function application function, AF
  • AMF access and mobility management function
  • SMF session management function
  • UPF user plane function
  • a server is generally not considered as a core network element.
  • the server includes functions for implementing model processing and the like.
  • each device can implement multi-level joint learning.
  • the multi-level joint learning of Fig. 1A is described below with reference to Fig. 1B.
  • Figure 1B is a structural diagram of a multi-level learning architecture, which includes a terminal device, a first-level node, and a second-level node.
  • the first-level node is the network device shown in Figure 1A
  • the second-level node is the server/core network device shown in Figure 1A.
  • Figure 1B can be regarded as a round of joint learning process, and the process of Figure 1B is repeated until the stop condition is met.
  • the stop condition can be model convergence, training time reaches a preset time, or the number of training times reaches a preset number of times.
  • the first-level node and the second-level node can be understood as a federated learning layer.
  • the first-level node and the terminal device can be understood as a knowledge distillation learning layer.
  • the terminal device and the network device pre-store a public data set.
  • the public data set can also be obtained by interacting with the terminal device and the network device, which is not limited here.
  • FIG. 1B is only an exemplary description of the first-level node being a network device and the second-level node being a server/core network device.
  • the first-level node and the second-level node may also be other terminal devices, etc., in which case it can be regarded as multi-level joint learning of terminal devices.
  • the second-level node and the third-level node are not specifically limited here.
  • a round of joint learning shown in FIG1B includes steps 1 to 7, which are described below respectively.
  • Step 1 The server/core network device triggers the joint learning process and selects the network devices participating in the federated learning. And notify each network device to participate in the joint learning process.
  • the network devices participating in the federated learning further select the terminal devices participating in the federated distillation. And notify each terminal device to participate in the joint learning process.
  • the terminal device can be selected based on the capability information and/or business requirement information of the terminal device.
  • the capability information may include at least one of the following: computing capability, storage capability, etc.
  • the business requirement information includes at least one of the following: data distribution of each terminal device, reasoning tasks of each terminal device, etc.
  • Step 2 After the terminal device participating in the joint learning process trains the local model according to the local data set, the updated local model infers the public data set to obtain the first result, and reports the first result to the network device.
  • Step 3 Based on the received first result, the network device uses the public data set as a training set, the label of the training set as a label for calculating the loss function, and the first result as a soft label for calculating the loss function to train the network device side model.
  • Step 4 The network device reports the trained network-side device model to the server/core network device.
  • Step 5 The server/core network device performs weighted averaging on the received network side device models to obtain the global model obtained in this round of learning, and sends the global model to each network device.
  • Step 6 The network device replaces the trained network-side device model with the global model, and uses the global model to process the public data set to obtain a second result, which is then sent to the terminal device.
  • Step 7 After receiving the second result, the terminal device uses the public data set as the training set, the label of the training set as the label for calculating the loss function, and the second result as the soft label for calculating the loss function to train the terminal side model. At this point, a round of joint learning is completed.
  • knowledge distillation can also be performed through network devices to perform terminal-side model fusion, that is, the terminal device trains a local model based on local private data, and reports the trained local model to the network device; the network device trains the network device side model through knowledge distillation based on the public data set and each terminal side model, and further distills and updates each terminal side model, and sends the updated terminal side model to the terminal device for updating.
  • FIG 2 is a flow chart of a communication method provided in an embodiment of the present application.
  • the method may include steps 201 to 207. Steps 201 to 207 are described in detail below.
  • the first device in this embodiment is the first-level node/network device in Figures 1A and 1B above.
  • the second device is a terminal device, and the third device is the server/core network device in Figures 1A and 1B above.
  • the number of the first device, the second device, and the third device can be one or more, and is not specifically limited here.
  • Step 201 A first device obtains a first result.
  • step 201 includes step 2011 and step 2012 .
  • Step 2011 The second device infers the public data set based on the second device side model to obtain a first result.
  • the second device uses the local second device side model to infer the public data set to obtain the first result. Specifically, the second device inputs the data in the public data set into the second device side model to obtain the inference result (ie, the first result).
  • the second device side model is trained by the second device using local data.
  • Step 2012 The second device sends the first result to the first device.
  • the second device After acquiring the first result, the second device sends the first result to the first device. Correspondingly, the first device receives the first result sent by the second device.
  • step 201 includes step 2013 and step 2014.
  • Step 2013 The second device sends the second device side model to the first device.
  • the second device uses the local data training model to obtain the second device side model. After the second device generates the second device side model, it sends the second device side model to the first device. Correspondingly, the first device receives the second device side model sent by the second device.
  • Step 2014 The first device infers the public data set based on the second device side model to obtain a first result.
  • the first device infers the public data set based on the second device-side model to obtain the first result.
  • the first result is obtained by inferring the public data set through the second device-side model.
  • the second device reports the first result to the first device.
  • the second device reports the second device side model to the first device, and the first device then uses the second device side model to infer the public data set to obtain the first result.
  • Step 202 The first device updates the first model based on the first result to obtain a second model.
  • the public data set is used as a training set
  • the label of the training set is used as a hard label of the loss function
  • the first result is used as a soft label of the loss function.
  • the first device uses the training set as input, and trains the first model to obtain the second model with the goal of reducing the value of the loss function.
  • the loss function is used to represent the first difference and the second difference.
  • the first difference is the difference between the output of the first model and the hard label
  • the second difference is the difference between the output of the first model and the soft label.
  • This step can also be understood as the first device performing knowledge distillation based on the public data set and the first result.
  • model of interaction between the devices in the embodiments of the present application may refer to the entire model, or the weight of the model (such as all weights, or weights with incremental updates, etc., which are not specifically limited here.
  • the first model weight here can be all parameter weights of the first model, or can be parameter weights adjusted relative to the last learning, and the specifics are not limited here.
  • Step 203 The first device sends the second model to the third device.
  • the first device trains/updates the first model to obtain the second model
  • the first device sends the second model to the third device.
  • the third device receives the second model sent by the first device.
  • Step 204 The third device updates the third device side model based on the second model to obtain a third model.
  • the third device side model is updated based on the second model to obtain a third model.
  • the number of second models acquired by the third device depends on the number of first devices participating in the joint learning, that is, the third device can acquire the second models reported by multiple first devices.
  • the third device After the third device acquires the second models reported by multiple first devices, it updates the third device side model based on the multiple second models to obtain the third model. For example, the parameters of the multiple second models and the third device side model are weighted averaged to obtain the third model.
  • This step can be understood as the third device collecting the latest models of multiple first devices and using the latest models of multiple first devices to obtain a global model (i.e., the third model).
  • the global model is obtained by weighted averaging the parameters of the latest models of multiple first devices and the model on the third device side.
  • Step 205 The third device sends the third model to the first device.
  • the third device After acquiring the third model, the third device sends the third model to the first device. Correspondingly, the first device receives the third model sent by the third device.
  • This step can be understood as that the third device collects the second models reported by multiple first devices, uses the second model to update the last third device side model to obtain a global model, and sends the global model to each first device.
  • Step 206 The first device obtains a second result based on the public data set and the third model.
  • the first device After the first device receives the third model sent by the third device, the first device obtains a second result based on the public data set and the third model.
  • the third model infers the public data set to obtain a second result.
  • the above steps 205 and 206 can be replaced by: after obtaining the third model, the third device obtains the second result based on the public data set and the third model, and sends the second result to the first device.
  • Step 207 Update the second device-side model based on the second result.
  • the first device after the first device obtains the second result, it sends the second result to the second device.
  • the second device updates the second device side model based on the second result and the public data set. For example, similar to the above, by using the public data set as a training set, the label of the training set as a hard label of the loss function, and the second result as a soft label of the loss function.
  • the second device uses the training set as input and trains and updates the second device side model with the goal of reducing the value of the loss function.
  • the first device stores a second device-side model (for example, the second device-side model is obtained through the second case of the aforementioned step 201), and the first device can update the second device-side model based on the second result. And send the updated second device-side model to the second device.
  • a second device-side model for example, the second device-side model is obtained through the second case of the aforementioned step 201
  • the first device can update the second device-side model based on the second result. And send the updated second device-side model to the second device.
  • the second device for example, similar to the above, by using a public data set as a training set, the label of the training set as a hard label of the loss function, and the second result as a soft label of the loss function.
  • the first device uses the training set as input, and trains and updates the second device-side model with the goal of reducing the value of the loss function.
  • the first device updates the local model according to the first result of the model on the downstream second device side, and sends the updated local model to the upstream third device.
  • the third model obtained by the upstream third device according to the model updated by the second device is received.
  • the second result of the public data set is updated according to the third model, and the second result is used to update the model on the downstream second device side.
  • FIG3 is another communication method provided by an embodiment of the present application.
  • FIG3 can be understood as describing the first device as a base station, the second device as a terminal device, and the third device as an AI server as an example.
  • the method includes steps 301 to 308. They are described below.
  • Step 301 The AI server sends first trigger information to the base station.
  • the AI server sends first trigger information to the base station, and correspondingly, the base station receives the first trigger information sent by the AI server.
  • the first trigger information is used to notify the base station to perform joint learning.
  • the AI server may broadcast the first trigger information.
  • the first trigger information may include at least one of the following: a joint learning identifier, a learning area (for example, represented by a cell identifier or a dedicated learning area identifier), etc.
  • the joint learning identifier is used to indicate a specific AI task.
  • the learning area is used to indicate an area participating in the joint learning, and base stations within the area can participate in the joint learning.
  • Step 302 The base station sends second trigger information to the terminal device.
  • the base station After receiving the first trigger information, the base station sends the second trigger information to the terminal device.
  • the terminal device receives the second trigger information sent by the base station.
  • the second trigger information is used to instruct the terminal device to perform joint learning.
  • the base station broadcasts the second trigger information, where the second trigger information includes at least one of the following: a joint learning identifier, a terminal ID, a feedback time-frequency resource location, etc.
  • the terminal ID is used to indicate the terminal device participating in the joint learning
  • the feedback time-frequency resource location is used to indicate the physical resources used by the terminal to feedback the training completion information.
  • Step 303 The terminal device sends training completion information to the base station.
  • the terminal device After receiving the second trigger information, the terminal device uses local data to train the model, and sends training completion information to the base station after the training is completed. Correspondingly, the base station receives the training completion information sent by the terminal device.
  • the training completion information is used to indicate that the terminal device has completed the training of the model.
  • Step 304 The base station sends configuration information to the terminal device.
  • the base station sends configuration information to the terminal device.
  • the terminal device receives the configuration information sent by the base station.
  • the configuration information is used to indicate the physical resources and/or transmission parameters used by the terminal device to report the first result.
  • the physical resources may be the aforementioned feedback time-frequency resource location, etc.
  • the transmission parameters may include a coding rate and/or a modulation order, etc.
  • Step 305 The terminal device sends the first result to the base station.
  • the terminal device uses the locally trained model to infer the public data set to obtain a first result. Specifically, the locally trained model infers the public data set to obtain the first result.
  • the first result is sent to the base station.
  • the base station receives the first result sent by the terminal device.
  • Step 306 The base station sends the first model to the AI server.
  • the base station After the base station obtains the first result, it can train the local model based on the first result and the public data set to obtain the first model, and send the first model to the AI server.
  • the AI server receives the first model sent by the base station.
  • Step 307 The AI server sends the second model to the base station.
  • the AI server After receiving the first model sent by the base station, the AI server uses the first model to update the local model of the AI server to obtain a second model, and sends the second model to the base station.
  • the base station receives the second model sent by the AI server.
  • the AI server merges and updates the first models reported by multiple base stations to obtain a second model, and broadcasts the second model to each base station.
  • Step 308 The base station sends the second result to the terminal device.
  • the base station After receiving the second model sent by the AI server, the base station uses the second model to infer the public data set to obtain a second result. And sends the second result to the terminal device. Correspondingly, the terminal device receives the second result sent by the base station. The second result is used to update the model on the terminal device side.
  • FIG3 only describes the interaction between the devices by taking the first device as a base station, the second device as a terminal device, and the third device as an AI server as an example.
  • the specific process reference can be made to the description in the embodiments shown in FIG1B and FIG2, which will not be repeated here.
  • the first device and the second device participating in the joint learning can synchronize the public data set. This ensures that the prediction results of the interaction (e.g., the first result, the second result) correspond to the same public data set.
  • the synchronization process of the above public data set can be shown in Figure 4, and the interaction process includes steps 401 and 402.
  • Step 401 The first device sends indication information to the second device.
  • the first device sends indication information to the second device.
  • the second device receives the indication information sent by the first device.
  • the indication information is used for synchronizing a common data set between the first device and the second device.
  • the operation corresponding to the synchronization includes at least one of the following: adding, deleting, modifying, etc.
  • an index may be configured for each sample in the public data set, and the sample may be indicated by the index.
  • the operations corresponding to the above synchronization include: adding a sample, deleting a sample, modifying a sample, and the like.
  • the indication information may be carried by at least one of the following: radio resource control (RRC) signaling, downlink control information (DCI), medium access control control element (MAC CE) in the media access control information, etc.
  • RRC radio resource control
  • DCI downlink control information
  • MAC CE medium access control control element
  • Step 402 The second device sends confirmation information to the first device.
  • the second device After receiving the indication information, the second device performs operations such as adding, deleting, and modifying the public data set, and sends confirmation information to the first device, where the confirmation information is used to synchronize the public data set.
  • the confirmation information may include the latest public data set, an incrementally updated public data set, an index of a sample to be modified and the modified content, an index of a sample to be deleted, etc., which are not specifically limited here.
  • the first device and the second device participating in the joint learning may synchronize the public data set, thereby ensuring that the prediction results of the interaction (eg, the first result and the second result) correspond to the same public data set.
  • Figure 5 is another flow chart of the communication method provided in an embodiment of the present application.
  • the method may include steps 501 to 508. Steps 501 to 508 are described in detail below.
  • the first device and the second device in this embodiment may be the terminal device and the network device in the embodiments shown in Figures 1A to 4, or may be the network device and the server/core network device, which are not specifically limited here. That is, the embodiment shown in Figure 5 can be combined with the embodiments shown in Figures 1A to 4.
  • Step 501 The second device obtains the first model.
  • the second device obtains a first model, which is obtained based on the first information of the second device and the second model.
  • the first model is a part of the second model.
  • the first information includes capability information and/or business requirement information.
  • the first model and the second model have the same number of network layers, and the subnetworks of each network layer in the first model are part of the subnetworks of each network layer in the second model.
  • the second model includes N layers of first networks, at least one first network in the N layers of first networks includes more than two parallel subnetworks, the first model includes N layers of second networks, the first number is less than the second number, the first number is the number of subnetworks included in at least one second network in the N layers of second networks, the second number is the number of subnetworks included in the first network corresponding to at least one second network in the N layers of first networks, and N is a positive integer.
  • the second model is shown in FIG6 .
  • the second model includes n layers of first networks, where n is an integer greater than 2.
  • the n layers of first networks include: a first layer of first network NN1, a second layer of first network NN2, ..., and an nth layer of first network NNn.
  • the first layer of first network NN1 includes K subnetworks: subnetwork NN1-1, subnetwork NN1-2, ..., and subnetwork NN1-K.
  • the second layer of first network NN2 includes L subnetworks: subnetwork NN2-1, subnetwork NN2-2, ..., and subnetwork NN2-L.
  • the nth layer of first network NNn includes M subnetworks: subnetwork NNn-1, subnetwork NNn-2, ..., and subnetwork NNn-M. Among them, K, L, and M are positive integers.
  • the first model is shown in FIG7A , and the first model includes n layers of second networks.
  • the n layers of second networks include: a first layer of second network NN1, a second layer of second network NN2, ..., and an nth layer of second network NNn.
  • the first layer second network NN1 includes K-P subnetworks: subnetwork NN1-1, ..., subnetwork NN1-(K-P).
  • the second layer second network NN2 includes L subnetworks: subnetwork NN2-1, subnetwork NN2-2, ..., subnetwork NN2-L.
  • the nth layer second network NNn includes M subnetworks: subnetwork NNn-1, subnetwork NNn-2, ..., subnetwork NNn-M.
  • P is a positive integer greater than 0 and less than K. It can be understood that Figures 6 and 7A are just examples of the second model and the first model, used to describe a situation where the first model is part of the second model.
  • the first model and the second model have different numbers of network layers, and some of the network layers in all network layers in the second model are the first model.
  • the first model includes N layers of the first network
  • the second model includes M layers of the first network, where N and M are positive integers, and M is less than or equal to N.
  • the second model is continued as an example in Figure 6.
  • the first model is shown in Figure 7B, and the first model includes an n-m layer second network.
  • the n-m layer second network includes: a first layer second network NN1,..., an n-m layer second network NN(n-m).
  • the first layer second network NN1 includes K subnetworks: subnetwork NN1-1, subnetwork NN1-2,..., subnetwork NN1-K.
  • the n-m layer second network NN(n-m) includes Q subnetworks: subnetwork NN(n-m)-1, subnetwork NN(n-m)-2,..., subnetwork NNNN(n-m)-Q.
  • Q and m are positive integers. It can be understood that Figures 6 and 7B are just examples of the second model and the first model, used to describe another situation where the first model is part of the second model.
  • the capability information of the second device is used to determine the number of subnetworks in the first model.
  • the business requirement information is used to determine each subnetwork.
  • the capability information may include at least one of the following: computing capability, storage capability, etc.
  • the business requirement information includes at least one of the following: data distribution of each second device, reasoning task of each second device, etc.
  • step 501 includes step 5011 and step 5012 .
  • Step 5011 The first device sends the second model to the second device.
  • the first device sends the second model at the first device to the second device.
  • the second device receives the second model sent by the first device.
  • the second model can also be understood as a large model with multiple experts (ie, the first network/the second network).
  • Step 5012 The second device determines the first model from the second model based on the first information.
  • the first model may be determined from the second model based on the first information.
  • the number of sub-networks in the first model can be determined by the capability information, and the service requirement information can be used to determine each sub-network, thereby determining which sub-networks are selected from the second model as the first model.
  • Determining the first model from the second model can be understood as determining one or more paths in the second model as the first model.
  • the path is used to represent the subnetwork selected from the second model.
  • the second model may include a path selection module and an expert selection module matching the first network.
  • the expert selection module is used to determine the subnetworks in the first network.
  • the path selection module is used to determine the number of subnetworks selected in each layer of the first network in the second model.
  • the input information of the path selection module includes at least one of the following: terminal ID, terminal capability information, terminal business demand information, input sample, number of paths, etc.
  • the output of the path selection module is a vector of the same dimension as the input sample or the average value of all samples.
  • the expert selection module receives information sent by the path selection module (for example, input sample, number of paths, etc.) and outputs the weight of each subnetwork.
  • the number of expert selection modules can be one or more.
  • each layer of the first network can correspond to an expert selection module for determining the subnetworks in each first network layer.
  • the number of subnetworks selected by the expert selection module from each layer of the first network can be one or more, which is not specifically limited here. If the expert selection module determines a sub-network from each layer of the first network, the selected sub-networks of each layer of the first network can be connected and regarded as a path.
  • the above-mentioned path control module may not participate in the reasoning of the second model. That is, the second model is a model that deploys all paths. During the specific reasoning, the expert selection module will determine one of the paths to process the sample based on the input sample, thereby determining the weight of the sub-network on the path.
  • the second model is shown in FIG8A (i.e., N, M, L, and K in FIG6 are 3), and the first model is shown in FIG8B. Describe the above process of determining the first model from the second model.
  • the second model includes 3 layers of first networks. They are: the first layer first network NN1, the second layer first network NN2, and the third layer first network NN3.
  • the first layer first network NN1 includes: subnetwork NN1-1, subnetwork NN1-2, and subnetwork NN1-3.
  • the second layer first network NN2 includes: subnetwork NN2-1, subnetwork NN2-2, and subnetwork NN2-3.
  • the third layer first network NN3 includes: subnetwork NN3-1, subnetwork NN2-3, and subnetwork NN3-3.
  • multiple first networks can be matched with one expert selection module.
  • each layer of the first network can also be matched with one expert selection module.
  • expert selection module 1 is used to determine the subnetwork selected in the first layer first network.
  • Expert selection module 2 is used to determine the subnetwork selected in the second layer first network.
  • the expert selection module 3 is used to determine the subnetwork selected in the third layer of the first network.
  • the expert selection module outputs the weights of each subnetwork
  • the subnetwork with a larger weight can be selected in each layer of the first network as the subnetwork of the first model.
  • the weight of the subnetwork NN1-1 in the first layer of the first network NN1 is 0.1
  • the weight of the subnetwork NN1-2 is 0.7
  • the weight of the subnetwork NN1-3 is 0.1.
  • the weight of the subnetwork NN2-1 in the second layer of the first network NN2 is 0.05, the weight of the subnetwork NN2-2 is 0.6, and the weight of the subnetwork NN2-3 is 0.1.
  • the weight of the subnetwork NN3-1 in the third layer of the first network NN3 is 0.8, the weight of the subnetwork NN3-2 is 0.02, and the weight of the subnetwork NN3-3 is 0.08. It can be seen that the weight of the subnetwork NN1-2 is 0.7, which is the largest weight in the first layer of the first network NN1.
  • the weight of sub-network NN2-2 is 0.6, which is the largest weight in the first network NN2 of the second layer.
  • the weight of sub-network NN3-1 is 0.8, which is the largest weight in the first network NN3 of the third layer.
  • a path is determined: sub-network NN1-2, sub-network NN2-2, sub-network NN3-1.
  • the first model determined based on the above process is shown in FIG8B.
  • FIG. 8A and FIG. 8B are only examples. In practical applications, the second model and the first model may have other situations, and the first model may include multiple paths, which are not specifically limited here.
  • FIG8B above is the first case where the aforementioned first model is part of the second model.
  • the network may also divide the model into two parts, and accordingly, the selected path is also divided into two parts.
  • the segmented model one part is deployed on the second device (such as a terminal), and the other part is deployed on the first device (such as a base station), such as an encoder and decoder for channel information feedback.
  • the first model may be a part of the encoder or decoder in the second model.
  • the base station For each terminal, the base station needs to maintain path information so that when joint reasoning is performed, the corresponding path, that is, the decoder paired with the encoder, can be selected.
  • segmented neural networks segmented learning can be used, that is, the terminal and the base station implement model training by interacting with intermediate features and gradients.
  • step 501 includes steps 5013 to 5015.
  • Step 5013 The second device sends the first information to the first device.
  • the second device sends the first information to the first device.
  • the first device receives the first information sent by the second device.
  • the description of the first information can refer to the above, and will not be repeated here.
  • Step 5014 The first device determines the first model from the second model based on the first information.
  • This step is similar to the aforementioned step 5012 and will not be repeated here.
  • Step 5015 The first device sends the first model to the second device.
  • the first device determines the first model from the second model, it sends the first model to the second device.
  • the second device receives the first model sent by the first device.
  • the first device sends the second model to the second device. Then the second device determines the first model from the second model based on the first information. In the second situation, the second device reports the first information to the first device. Then the first device determines the first model from the second model based on the first information and sends the first model to the second device.
  • Step 502 The first device sends a first parameter to the second device. This step is optional.
  • the first device may also send a first parameter to the second device. Accordingly, the second device receives the first parameter sent by the first device.
  • the first parameter is used to indicate the adjustment of the sub-network.
  • the first parameter may include at least one of the following: addition or deletion of the first model sub-network.
  • Step 503 The second device updates the first model based on the first parameter. This step is optional.
  • the second device updates the first model based on the first parameter.
  • the second device adds, deletes, modifies, and other operations on the subnetwork in the first model using the received first parameter.
  • step 502 and step 503 is triggered by the first device.
  • the adjustment process of the sub-network may also be triggered by the second device, for example, when a new terminal is added to the joint learning, etc., which is not specifically limited here.
  • Step 504 The second device trains the first model based on the local data to obtain a third model. This step is optional.
  • the second device After the second device acquires the first model, it trains the first model based on local data to obtain a third model.
  • This process can also be understood as the second device fine-tuning the first model so that the fine-tuned third model can better meet the reasoning of the data at the second device.
  • Step 505 The second device sends the third model to the first device. This step is optional.
  • the second device after the second device fine-tunes the first model to obtain the third model, the second device sends the third model to the first device.
  • the first device receives the third model sent by the second device.
  • Step 506 The first device updates the second model based on the third model to obtain a fourth model. This step is optional.
  • the first device uses the third model to update the local second model to obtain a fourth model.
  • This step can also be understood as the base station adjusting the second model on the base station side to obtain the fourth model by receiving the latest model reported by the downstream terminal. For example, the same sub-network of the model at each terminal (ie, each first model) is merged to obtain the fourth model.
  • Step 507 The first device sends the fourth model to the second device. This step is optional.
  • the first device may send the fourth model to the second device.
  • the second device receives the fourth model sent by the first device.
  • This step can be understood as the base station fusing the same sub-networks of the models at each terminal to obtain a fourth model, and sending the fourth model to the second device so that the second device updates the local model according to the third model.
  • Step 508 The second device updates the third model based on the fourth model. This step is optional.
  • the second device uses the fourth model to update the local third model.
  • step 501, step 504 to step 508 of this embodiment can be understood as joint learning.
  • the network device side trains a multi-expert large model (i.e., the second model) based on existing data.
  • the terminal device or network device determines the sub-model (i.e., the first model) determined by one or more expert paths based on capability information/business demand information.
  • Each terminal device obtains a trained sub-model (i.e., the third model) based on local data training and reports it.
  • the network device fuses the same parts of the sub-models reported by each terminal device, and sends the fused model (i.e., the fourth model) to the terminal device, so that the terminal device updates the local model.
  • the network device or terminal device needs to maintain the terminal ID and the path information of the sub-model, which is used for the average of each module model involved in the model fusion on the network device side.
  • each terminal device infers based on the sub-model, and the network device infers based on the large model.
  • the terminal device can further perform knowledge distillation based on the sub-model to obtain a smaller model or a model more suitable for local hardware.
  • the communication method provided in this embodiment includes step 501.
  • the communication method can determine the first model from the second model according to the capability information and/or service requirement information of the terminal. And the data of the terminal are all inferred using this path (i.e., the subnetwork in the first model).
  • the communication method provided in this embodiment includes steps 501 to 503.
  • the communication method can determine the first model from the second model according to the capability information and/or service requirement information of the terminal. And the data of the terminal are all inferred using this path (i.e., the subnetwork in the first model). In addition, timely adjustment of the subnetwork in the first model can also be achieved.
  • the communication method provided in this embodiment includes step 501, step 504 to step 508.
  • the communication method can determine the first model from the second model based on the capability information and/or business requirement information of the terminal. And the data of the terminal are all inferred using this path (i.e., the subnetwork in the first model). In addition, it can be applied to the joint learning scenario. In this case,
  • the communication method provided in this embodiment includes steps 501 to 508.
  • the first device and the second device in this embodiment may be the terminal device and the network device in the embodiments shown in Figures 1A to 4, or may be the network device and the server/core network device, which are not specifically limited here. That is, the embodiment shown in Figure 9 can be combined with the embodiments shown in Figures 1A to 8B.
  • Step 901 The second device sends first information to the first device.
  • the second device sends the first information to the first device.
  • the first device receives the first information sent by the second device.
  • the first information includes capability information and/or service requirement information of the second device.
  • the capability information may include at least one of the following: computing capability, storage capability, etc.
  • the service requirement information may include at least one of the following: data distribution of the second device, reasoning tasks of the second device, etc.
  • Step 902 The first device sends indication information to the second device.
  • the first device After the first device obtains the first information of the second device, it determines whether the learning mode of the model is a federated learning mode or a distillation learning mode based on the first information. After determining the learning mode, it sends indication information to the second device. The corresponding second device receives the indication information sent by the first device. The indication information is used to indicate that the learning mode is a federated learning mode or a distillation learning mode.
  • the learning mode of the second device is determined to be a distillation learning mode (also called a knowledge distillation mode).
  • a large model is applied for a second device with strong computing power.
  • the learning mode of the second device is determined to be a federated learning mode.
  • Step 903 The second device sends second information based on the indication information.
  • the second device receives the indication information, determines the learning mode according to the indication information, and determines the second information to be sent to the first device based on the indication information.
  • the second information is used to update the model on the first device side.
  • the second information is a first result obtained by inferring a public data set using a model on the second device side.
  • the second information is the weight or gradient of the second device-side model.
  • the first device determines a learning mode that matches the second device through the capability information/business requirement information of the second device, so that the first device can flexibly apply the model training scenario and improve the model training efficiency.
  • the first one is the distillation learning model (also called knowledge distillation model).
  • the communication process includes steps 1001 to 1006 .
  • Step 1001 UE1 and UE2 send first information to a base station.
  • the UE1 and UE2 send first information to the base station.
  • the first information of UE1 includes capability information and/or service requirement information of UE1.
  • the first information of UE2 includes capability information and/or service requirement information of UE2.
  • the capability information may include at least one of the following: computing capability, storage capability, etc.
  • the business requirement information may include at least one of the following: data distribution of the second device, reasoning tasks of the second device, etc.
  • Step 1002 The base station sends indication information (knowledge distillation) to UE1 and UE2.
  • the base station determines that the learning mode is the knowledge distillation mode based on the first information of UE1 and UE2, and sends indication information to UE1 and UE2, where the indication information is used to indicate that the learning mode is the knowledge distillation mode.
  • Step 1003 UE1 and UE2 perform local training.
  • UE1 and UE2 After UE1 and UE2 determine that the learning mode is the knowledge distillation mode, they train the model through their respective local data, and obtain the inference result (ie, the first result) based on the trained model and the public data set.
  • Step 1004 UE1 and UE2 send the first result to the base station.
  • UE1 and UE2 After obtaining the first result, UE1 and UE2 report the first result to the base station.
  • Step 1005 base station model training.
  • the base station After receiving the first result sent by UE1 and UE2, the base station uses the first result to update the base station side model, and obtains the inference result (ie, the second result) based on the updated model and the public data set.
  • the inference result ie, the second result
  • Step 1006 The base station sends the second result to UE1 and UE2.
  • the base station After obtaining the second result, the base station sends the second result to UE1 and UE2.
  • the second result is used for UE1 and UE2 to update their respective local models.
  • the process of the embodiment shown in FIG. 10 is similar to the process between the first-level node and the terminal device in FIG. 1B , and similar descriptions are not repeated here.
  • the second is the federated learning model.
  • the communication process includes steps 1101 to 1106 .
  • Step 1101 UE1 and UE2 send first information to a base station.
  • the UE1 and UE2 send first information to the base station.
  • the first information of UE1 includes capability information and/or service requirement information of UE1.
  • the first information of UE2 includes capability information and/or service requirement information of UE2.
  • the capability information may include at least one of the following: computing capability, storage capability, etc.
  • the business requirement information may include at least one of the following: data distribution of the second device, reasoning tasks of the second device, etc.
  • Step 1102 The base station sends indication information (federated learning) to UE1 and UE2.
  • the base station determines that the learning mode is the federated learning mode based on the first information of UE1 and UE2, and sends indication information to UE1 and UE2, where the indication information is used to indicate that the learning mode is the federated learning mode.
  • Step 1103 UE1 and UE2 perform local training.
  • UE1 and UE2 After UE1 and UE2 determine that the learning mode is the federated learning mode, they train the model through their respective local data and obtain the weight/gradient of the trained model.
  • Step 1104 UE1 and UE2 send weights/gradients to the base station.
  • UE1 and UE2 After obtaining the weights/gradients, UE1 and UE2 report the weights/gradients to the base station.
  • Step 1105 base station model training.
  • the base station After receiving the weights/gradients sent by UE1 and UE2, the base station uses the weights/gradients to update the base station side model, and updates the updated model weights/gradients (which may be referred to as updated weights/gradients).
  • Step 1106 The base station sends updated weights/gradients to UE1 and UE2.
  • the base station After obtaining the second result, the base station sends updated weights/gradients to UE1 and UE2.
  • the updated weights/gradients are used by UE1 and UE2 to update their respective local models.
  • the base station determines the learning mode that matches the UE through the capability information/service requirement information of the UE, so that the model training scenario can be flexibly applied and the model training efficiency can be improved.
  • the embodiment of the present application also provides a communication process between terminals.
  • the scenario is shown in FIG12 , which takes one base station and four UEs as an example.
  • the base station does not store a public data set and only forwards data.
  • UEs with strong capabilities can be responsible for large model fusion and maintenance.
  • Each UE trains a local model based on local data.
  • Directly connected UEs transmit models through D2D, and the fusion node fuses the model based on local data, including distilling it into small models and sending it to other nodes.
  • Non-directly connected UEs forward the model through the base station, and the fusion node fuses the model based on local data and sends it to the base station.
  • the base station can serve as a fusion model storage node and a fusion terminal scheduling node.
  • the ID of the model fusion terminal is used as the destination identifier for model transmission.
  • a UE with stronger capabilities can be responsible for large model integration and maintenance, so as to realize model transmission through D2D or base station transfer and realize the model training process.
  • an embodiment of the first device in the embodiment of the present application includes:
  • An acquiring unit 1301 is used to acquire a first result, where the first result is a result obtained by inferring a public data set using a model on the second device side;
  • An updating unit 1302 configured to update a first model based on the public data set and the first result to obtain a second model, where the first model is a local model of the first device;
  • the sending unit 1303 is configured to send the second model to a third device
  • the receiving unit 1304 is configured to receive a third model sent by a third device, where the third model is obtained by processing the second model;
  • the acquisition unit 1301 is further used to acquire a second result based on the third model and the public data set, and the second result is used to update the second device side model.
  • the acquisition unit 1301 is specifically used to receive a first result from a second device, where the first result is a result obtained by the second device using a second device side model to infer a public data set.
  • the acquisition unit 1301 is specifically used to receive a second device-side model from a second device; the acquisition unit 1301 is specifically used to use the second device-side model to infer the public data set to obtain the first result.
  • the sending unit 1303 is further used to send a second result to the second device, and the second result is used by the second device to update the second device side model.
  • the updating unit 1302 is further used to update the second device side model based on the second result; the sending unit 1303 is further used to send the updated second device side model to the second device.
  • the sending unit 1303 is also used to send an indication message to the second device, and the indication message is used to synchronize a common data set between the first device and the second device, and the corresponding operation of synchronization includes at least one of the following: adding, deleting, and modifying; the receiving unit 1304 is also used to receive a confirmation message sent by the second device, and the confirmation message is used to synchronize the common data set.
  • the updating unit 1302 updates the local model according to the first result of the downstream second device side model, and the sending unit 1303 sends the updated local model to the upstream third device.
  • the receiving unit 1304 receives the third model obtained by the upstream third device according to the model updated by the second device. Then, the new second result of the public data set is updated according to the third model, and the second result is used to update the downstream second device side model.
  • the first device fully utilizes the computing power to participate in the joint training in the multi-level joint learning scenario. Compared with the solution of only forwarding at the intermediate level, the processing flow of the upstream device can be reduced, thereby increasing the learning efficiency of multi-level joint learning.
  • an embodiment of the second device in the embodiment of the present application includes:
  • the acquisition unit 1401 is used to acquire a first model, where the first model is obtained based on first information of a second device and a second model, and the first model is a part of the second model; the first information includes capability information and/or business requirement information.
  • the second model includes N layers of first networks, at least one layer of the first network in the N layers of first networks includes more than two parallel sub-networks, the first model includes N layers of second networks, the first number is less than the second number, the first number is the number of sub-networks included in at least one layer of the second network in the N layers of second networks, the second number is the number of sub-networks included in the first network corresponding to at least one layer of the second network in the N layers of first networks, and N is a positive integer.
  • the second device further includes: a receiving unit 1402, configured to receive a first parameter from the first device, where the first parameter is used to indicate adjustment of the sub-network;
  • the second device further includes: an updating unit 1403, configured to update the first model based on the first parameter.
  • the receiving unit 1402 is used to receive the second model from the first device; the acquiring unit 1401 is specifically used to determine the first model from the second model based on the first information.
  • the acquisition unit 1401 is specifically used to determine the subnetwork of each layer of the first network in the N-layer first network based on the first information; the acquisition unit 1401 is specifically used to build the first model based on the subnetwork.
  • the capability information is used to determine the number of sub-networks in each second network in the N-layer second network
  • the service requirement information is used to determine the sub-networks in each second network.
  • the acquisition unit 1401 is specifically used to send first information to the first device, and the first information is used by the first device to determine the first model from the second model; the acquisition unit 1401 is specifically used to receive the first model sent by the first device.
  • the updating unit 1403 is used to train the first model based on local data to obtain a third model; the sending unit 1404 is used to send the third model to the first device, and the third model is used for the first device to update the second model.
  • the acquisition unit 1401 is further used to acquire a fourth model, where the fourth model is obtained by updating the second model with the third model; and the updating unit 1403 is used to update the third model based on the fourth model.
  • the first model includes N layers of first networks
  • the second model includes M layers of first networks, N and M are positive integers, and M is less than or equal to N.
  • the second device can determine the first model from the second model through the capability information and/or business requirement information of the second device, and the data of the second device are all inferred using this path (ie, a substructure of the second model).
  • FIG. 15 another embodiment of the first device in the embodiment of the present application includes:
  • a receiving unit 1501 is used to receive first information sent by a second device, where the first information includes capability information and/or service requirement information of the second device, and the first information is used to determine whether a learning mode of the model is a federated learning mode or a distillation learning mode;
  • a determining unit 1502 is used to determine a learning mode between the first device and the second device, where the learning mode is a federated learning mode or a distillation learning mode;
  • the sending unit 1503 is used to send indication information to the second device, where the indication information is used to indicate the learning mode;
  • the receiving unit 1501 is further used to receive second information sent by the second device, where the second information is used to update the model on the first device side.
  • the second information is the weight or gradient of the model.
  • the second information is the result obtained by model reasoning on a public dataset.
  • the first device determines a learning mode that matches the second device through the capability information/business requirement information of the second device, so that the first device can flexibly apply the model training scenario and improve the model training efficiency.
  • another embodiment of the second device in the embodiment of the present application includes:
  • a sending unit 1601 is configured to send first information to a first device, where the first information includes capability information and/or service requirement information of a second device, and the first information is used to determine whether a learning mode of a model is a federated learning mode or a distillation learning mode;
  • the receiving unit 1602 is configured to receive indication information sent by the first device, where the indication information is used to indicate a learning mode;
  • the sending unit 1601 is used to send second information to the first device based on the learning mode, and the second information is used to update the model on the first device side.
  • the second information is the weight or gradient of the model.
  • the second information is the result obtained by model reasoning on a public dataset.
  • the first device determines a learning mode that matches the second device through the capability information/business requirement information of the second device, so that the first device can flexibly apply the model training scenario and improve the model training efficiency.
  • Figure 17 is a structural diagram of the second device involved in the above-mentioned embodiments provided in an embodiment of the present application, wherein the second device can specifically be the second device/network device in the above-mentioned embodiments, and the structure of the second device can refer to the structure shown in Figure 17.
  • the second device includes at least one processor 1711, at least one memory 1712, at least one transceiver 1713, at least one network interface 1714 and one or more antennas 1715.
  • the processor 1711, the memory 1712, the transceiver 1713 and the network interface 1714 are connected, for example, through a bus. In the embodiment of the present application, the connection may include various interfaces, transmission lines or buses, etc., which are not limited in this embodiment.
  • the antenna 1715 is connected to the transceiver 1713.
  • the network interface 1714 is used to connect the second device to other communication devices through a communication link.
  • the network interface 1714 may include a network interface between the second device and the core network device, such as an S1 interface.
  • the network interface may include a network interface between the second device and other network devices (such as other access network devices or core network devices), such as an X2 or Xn interface.
  • the processor 1711 is mainly used to process the communication protocol and communication data, and to control the entire second device, execute the software program, and process the data of the software program, for example, to support the second device to perform the actions described in the embodiment.
  • the second device may include a baseband processor and a central processor.
  • the baseband processor is mainly used to process the communication protocol and communication data
  • the central processor is mainly used to control the entire terminal device, execute the software program, and process the data of the software program.
  • the processor 1711 in Figure 17 can integrate the functions of the baseband processor and the central processor. It can be understood by those skilled in the art that the baseband processor and the central processor can also be independent processors, interconnected by technologies such as buses.
  • the terminal device may include multiple baseband processors to adapt to different network formats, and the terminal device may include multiple central processors to enhance its processing capabilities.
  • the various components of the terminal device can be connected through various buses.
  • the baseband processor can also be described as a baseband processing circuit or a baseband processing chip.
  • the central processor can also be described as a central processing circuit or a central processing chip.
  • the function of processing the communication protocol and communication data can be built in the processor, or stored in the memory in the form of a software program, and the processor executes the software program to realize the baseband processing function.
  • the memory is mainly used to store software programs and data.
  • the memory 1712 can exist independently and be connected to the processor 1711.
  • the memory 1712 can be integrated with the processor 1711, for example, integrated into a chip.
  • the memory 1712 can store program codes for executing the technical solutions of the embodiments of the present application, and the execution is controlled by the processor 1711.
  • the various types of computer program codes executed can also be regarded as drivers of the processor 1711.
  • FIG17 shows only one memory and one processor.
  • the memory may also be referred to as a storage medium or a storage device, etc.
  • the memory may be a storage element on the same chip as the processor, i.e., an on-chip storage element, or an independent storage element, which is not limited in the embodiments of the present application.
  • the transceiver 1713 can be used to support the reception or transmission of radio frequency signals between the second device and the terminal, and the transceiver 1713 can be connected to the antenna 1715.
  • the transceiver 1713 includes a transmitter Tx and a receiver Rx.
  • one or more antennas 1715 can receive radio frequency signals
  • the receiver Rx of the transceiver 1713 is used to receive the radio frequency signals from the antennas, convert the radio frequency signals into digital baseband signals or digital intermediate frequency signals, and provide the digital baseband signals or digital intermediate frequency signals to the processor 1711, so that the processor 1711 further processes the digital baseband signals or digital intermediate frequency signals, such as demodulation and decoding.
  • the transmitter Tx in the transceiver 1713 is also used to receive modulated digital baseband signals or digital intermediate frequency signals from the processor 1711, convert the modulated digital baseband signals or digital intermediate frequency signals into radio frequency signals, and send the radio frequency signals through one or more antennas 1715.
  • the receiver Rx can selectively perform one or more stages of down-mixing and analog-to-digital conversion processing on the RF signal to obtain a digital baseband signal or a digital intermediate frequency signal, and the order of the down-mixing and analog-to-digital conversion processing is adjustable.
  • the transmitter Tx can selectively perform one or more stages of up-mixing and digital-to-analog conversion processing on the modulated digital baseband signal or digital intermediate frequency signal to obtain a RF signal, and the order of the up-mixing and digital-to-analog conversion processing is adjustable.
  • the digital baseband signal and the digital intermediate frequency signal can be collectively referred to as a digital signal.
  • the transceiver may also be referred to as a transceiver unit, a transceiver, a transceiver device, etc.
  • a device in a transceiver unit for implementing a receiving function may be regarded as a receiving unit
  • a device in a transceiver unit for implementing a sending function may be regarded as a sending unit, that is, the transceiver unit includes a receiving unit and a sending unit
  • the receiving unit may also be referred to as a receiver, an input port, a receiving circuit, etc.
  • the sending unit may be referred to as a transmitter, a transmitter, or a transmitting circuit, etc.
  • the second device shown in Figure 17 can be specifically used to implement the steps implemented by the network device in the method embodiments shown in Figures 1A to 12, and to achieve the corresponding technical effects of the network device.
  • the specific implementation method of the second device shown in Figure 17 can refer to the description in the method embodiments shown in Figures 1A to 12, and will not be repeated here.
  • the second device can be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a car computer, etc., taking the mobile phone as an example:
  • PDA personal digital assistant
  • POS point of sales
  • FIG18 is a block diagram showing a partial structure of a mobile phone related to a terminal device provided in an embodiment of the present application.
  • the mobile phone includes: a radio frequency (RF) circuit 1810, a memory 1820, an input unit 1830, a display unit 1840, a sensor 1850, an audio circuit 1860, a wireless fidelity (WiFi) module 1870, a processor 1880, and a power supply 1890.
  • RF radio frequency
  • the RF circuit 1810 can be used for receiving and sending signals during information transmission or communication. In particular, after receiving the downlink information of the base station, it is sent to the processor 1880 for processing; in addition, the designed uplink data is sent to the base station.
  • the RF circuit 1810 includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, etc.
  • the RF circuit 1810 can also communicate with the network and other devices through wireless communication.
  • the above wireless communication can use any communication standard or protocol, including but not limited to global system of mobile communication (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code division multiple access (WCDMA), long term evolution (LTE), email, short messaging service (SMS), etc.
  • GSM global system of mobile communication
  • GPRS general packet radio service
  • CDMA code division multiple access
  • WCDMA wideband code division multiple access
  • LTE long term evolution
  • email short messaging service
  • SMS short messaging service
  • the memory 1820 can be used to store software programs and modules.
  • the processor 1880 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 1820.
  • the memory 1820 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the data storage area can store data created according to the use of the mobile phone (such as audio data, a phone book, etc.), etc.
  • the memory 1820 can include a high-speed random access memory, and can also include a non-volatile memory, such as at least one disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 1830 can be used to receive input digital or character information, and to generate key signal input related to the user settings and function control of the mobile phone.
  • the input unit 1830 may include a touch panel 1831 and other input devices 1832.
  • the touch panel 1831 also known as a touch screen, can collect the user's touch operation on or near it (such as the user's operation on the touch panel 1831 or near the touch panel 1831 using any suitable object or accessory such as a finger, stylus, etc.), and drive the corresponding connection device according to a pre-set program.
  • the touch panel 1831 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it to the processor 1880, and can receive and execute commands sent by the processor 1880.
  • the touch panel 1831 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 1830 may also include other input devices 1832.
  • other input devices 1832 may include but are not limited to one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
  • the display unit 1840 may be used to display information input by the user or information provided to the user and various menus of the mobile phone.
  • the display unit 1840 may include a display panel 1841.
  • the display panel 1841 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.
  • the touch panel 1831 may cover the display panel 1841. When the touch panel 1831 detects a touch operation on or near it, it is transmitted to the processor 1880 to determine the type of the touch event, and then the processor 1880 provides a corresponding visual output on the display panel 1841 according to the type of the touch event.
  • the touch panel 1831 and the display panel 1841 are used as two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1831 and the display panel 1841 may be integrated to implement the input and output functions of the mobile phone.
  • the mobile phone may also include at least one sensor 1850, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1841 according to the brightness of the ambient light, and the proximity sensor may turn off the display panel 1841 and/or the backlight when the mobile phone is moved to the ear.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (generally three axes), and can detect the magnitude and direction of gravity when stationary.
  • the audio circuit 1860, the speaker 1861, and the microphone 1862 can provide an audio interface between the user and the mobile phone.
  • the audio circuit 1860 can transmit the received audio data to the speaker 1861 after converting the received audio data into an electrical signal, which is converted into a sound signal for output; on the other hand, the microphone 1862 converts the collected sound signal into an electrical signal, which is received by the audio circuit 1860 and converted into audio data, and then the audio data is output to the processor 1880 for processing, and then sent to another mobile phone through the RF circuit 1810, or the audio data is output to the memory 1820 for further processing.
  • WiFi is a short-range wireless transmission technology.
  • the mobile phone can help users send and receive emails, browse web pages and access streaming media through the WiFi module 1870, which provides users with wireless broadband Internet access.
  • FIG. 18 shows the WiFi module 1870, it is understandable that it is not a necessary component of the mobile phone.
  • the processor 1880 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. By running or executing software programs and/or modules stored in the memory 1820 and calling data stored in the memory 1820, it executes various functions of the mobile phone and processes data, thereby monitoring the mobile phone as a whole.
  • the processor 1880 may include one or more processing units; preferably, the processor 1880 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface and application programs, etc., and the modem processor mainly processes wireless communications. It is understandable that the above-mentioned modem processor may not be integrated into the processor 1880.
  • the mobile phone also includes a power supply 1890 (such as a battery) for supplying power to various components.
  • a power supply 1890 (such as a battery) for supplying power to various components.
  • the power supply can be logically connected to the processor 1880 through a power management system, so that the power management system can manage functions such as charging, discharging, and power consumption management.
  • the mobile phone may also include a camera, a Bluetooth module, etc., which will not be described in detail here.
  • the processor 1880 included in the terminal device can perform the functions in the embodiments shown in Figures 1A to 12 above, which will not be repeated here.
  • An embodiment of the present application also provides a computer-readable storage medium storing one or more computer-executable instructions.
  • the processor executes the method described in the possible implementation methods of the first device/second device/third device in the aforementioned embodiments.
  • An embodiment of the present application also provides a computer program product (or computer program) storing one or more computers.
  • the processor executes the method of the possible implementation mode of the above-mentioned first device/second device/third device.
  • the embodiment of the present application also provides a chip system, which includes at least one processor for supporting a terminal device to implement the functions involved in the possible implementation of the first device/second device/third device mentioned above.
  • the chip system also includes an interface circuit, which provides program instructions and/or data to the at least one processor.
  • the chip system may also include a memory, which is used to store the necessary program instructions and data for the terminal device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), disk or optical disk and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请提供了一种通信方法及相关设备,可以应用于多级联合学习场景。第一设备根据下游第二设备侧模型的第一结果更新本地模型,并向上游第三设备发送更新后的本地模型。从而接收上游第三设备根据第二设备更新的模型处理得到的第三模型。进而根据该第三模型更新对公共数据集新的第二结果,该第二结果用于更新下游第二设备侧模型。可以看出,第一设备在多级联合学习场景下,充分利用计算能力参与联合训练。相较于中间级只做转发的方案,可以融合下游设备异构的模型,并减少上游设备的处理流程,从而增加多级联合学习的学习效率。

Description

一种通信方法及相关设备 技术领域
本申请涉及通信领域,尤其涉及一种通信方法及相关设备。
背景技术
以联邦学习为代表的联合学习是一种分布式学习技术,各设备利用本地数据集进行模型训练,将训练得到的模型或更新上报给参数服务器;服务器通过以联邦平均为代表的融合算法,加权平均多个设备的本地模型,得到全局模型并下发给各设备,实现模型的更新。联邦学习框架不需要上传用户数据,从而在保障用户隐私的前提下,实现多个设备的联合学习。
目前,在基于联邦学习的三层联合学习场景中,以网络设备为代表的中间层只进行转发操作,没有充分利用中间层的计算能力;而各终端的计算和存储能力不同,不一定具有训练和部署统一模型的能力。
因此,如何在三层联合学习场景中有效利用中间层的计算能力是亟待解决的技术问题。
发明内容
本申请提供了一种通信方法及相关设备,第一设备在多级联合学习场景下,充分利用计算能力参与联合训练。相较于中间级只做转发的方案,可以融合下游设备异构的模型,并减少上游设备的处理流程,从而增加多级联合学习的学习效率。
本申请实施例第一方面提供了一种通信方法,可以应用于多级联合学习场景。该方法可以由第一设备执行,也可以由第一设备的部件(例如处理器、芯片、或芯片系统等)执行。该第一设备具体可以为基站、传输点(transmitting and receiving point,TRP)等网络设备,也可以为终端或核心网设备,该方法包括:获取第一结果,第一结果为第二设备侧模型推理公共数据集得到的结果;基于公共数据集与第一结果更新第一模型以得到第二模型,第一模型为第一设备的本地模型;向第三设备发送第二模型;接收第三设备发送的第三模型,第三模型由第二模型处理得到;基于第三模型与公共数据集获取第二结果,第二结果用于更新第二设备侧模型。
本申请实施例中,第一设备根据下游第二设备侧模型的第一结果更新本地模型,并向上游第三设备发送更新后的本地模型。从而接收上游第三设备根据第二设备更新的模型处理得到的第三模型。进而根据该第三模型更新对公共数据集的第二结果,该第二结果用于更新下游第二设备侧模型。可以看出,第一设备在多级联合学习场景下,充分利用计算能力参与联合训练。相较于中间级只做转发的方案,可以融合下游设备异构的模型,并减少上游设备的处理流程,从而增加多级联合学习的学习效率。
可选地,在第一方面的一种可能的实现方式中,上述步骤:获取第一结果,包括:接收来自第二设备的第一结果,第一结果为第二设备使用第二设备侧模型推理公共数据集得到的结果。
该种可能的实现方式中,第一设备可以通过直接获取第一结果的方式,减少第一设备使用第二设备侧模型推理过程占用的算力资源与存储资源。
可选地,在第一方面的一种可能的实现方式中,上述步骤:获取第一结果,包括:接收来自第二设备的第二设备侧模型;使用第二设备侧模型推理公共数据集得到第一结果。
该种可能的实现方式中,第一设备可以通过获取第二设备侧模型的方式,自身执行推理过程,可以减少第二设备推理所占用的算力资源与存储资源。
可选地,在第一方面的一种可能的实现方式中,上述步骤还包括:向第二设备发送第二结果,第二结果用于第二设备更新第二设备侧模型。
该种可能的实现方式中,第二设备可以使用第二结果进行更新第二设备侧模型。减少第一设备更新第二设备侧模型所占用的算力资源与存储资源。
可选地,在第一方面的一种可能的实现方式中,上述步骤还包括:基于第二结果更新第二设备侧模型;向第二设备发送更新后的第二设备侧模型。
该种可能的实现方式中,第一设备可以使用第二结果进行更新第二设备侧模型。并下发至第二设备。减少第二设备更新第二设备侧模型所占用的算力资源与存储资源。
可选地,在第一方面的一种可能的实现方式中,上述步骤还包括:向第二设备发送指示信息,指示信息用于第一设备与第二设备同步公共数据集,同步对应的操作包括以下至少一项:增加、删除、修改;接收第二设备发送的确认信息,确认信息用于同步公共数据集。
该种可能的实现方式中,参与联合学习的第一设备与第二设备可以对公共数据集进行同步。从而确保交互的预测结果(例如第一结果、第二结果)是与相同公共数据集对应的。
本申请实施例第二方面提供了一种通信方法,可以应用于模型训练场景。该方法可以由第二设备执行,也可以由第二设备的部件(例如处理器、芯片、或芯片系统等)执行。该第二设备具体可以为终端设备,该方法包括:获取第一模型,第一模型基于第二设备的第一信息与第二模型得到,第一模型为第二模型的一部分;第一信息包括能力信息和/或业务需求信息。
本申请实施例中,第二设备可以通过第二设备的能力信息和/或业务需求信息从第二模型中确定第一模型。且第二设备的数据都采用第一模型(即第二模型的一个子结构)进行推理。
可选地,在第二方面的一种可能的实现方式中,上述第二模型包括N层第一网络,N层第一网络中至少一层第一网络包括两个以上并行的子网络,第一模型包括N层第二网络,第一数量小于第二数量,第一数量为N层第二网络中至少一层第二网络包括子网络的数量,第二数量为N层第一网络中对应至少一层第二网络的第一网络所包括子网络的数量,N为正整数。
该种可能的实现方式中,第一模型为第二模型一部分的一种示例,第一模型与第二模型的网络层数相同,第一模型的至少一层第二网络中子网络的数量少于该至少一层第二网络对应第二模型中第一网络的子网络数量。该种情况下,也可以理解为第一模型为第二模型中的一条或多条路径。
可选地,在第二方面的一种可能的实现方式中,上述步骤还包括:接收来自第一设备的第一参数,第一参数用于指示对子网络的调整;基于第一参数更新第一模型。
该种可能的实现方式中,第一设备与第二设备可以通过第一参数实现对第一模型中子网络的调整,进而提升第一模型的性能。
可选地,在第二方面的一种可能的实现方式中,上述步骤还包括:接收来自第一设备的 第二模型;获取第一模型,包括:基于第一信息从第二模型中确定第一模型。
该种可能的实现方式中,第二设备获取第一模型的一种示例,可以通过接收第一设备发送的第二模型,并从第二模型中确定出第一模型。以减少第二设备确定第一模型所占用的算力资源与存储资源。
可选地,在第二方面的一种可能的实现方式中,上述步骤基于第一信息从第二模型中确定第一模型,包括:基于第一信息确定N层第一网络中各层第一网络的子网络;基于子网络构建第一模型。
该种可能的实现方式中,通过第一信息确定各网络层中的子网络,进而得到第一模型。
可选地,在第二方面的一种可能的实现方式中,上述能力信息用于确定N层第二网络中各第二网络的子网络的数量,业务需求信息用于确定各第二网络中子网络。
该种可能的实现方式中,通过能力信息确定子网络的数量,并通过业务需求信息确定子网络。从而可以精准从第二模型中选择作为第一模型的子网络。
可选地,在第二方面的一种可能的实现方式中,上述步骤:获取第一模型,包括:向第一设备发送第一信息,第一信息用于第一设备从第二模型中确定第一模型;接收第一设备发送的第二模型。
该种可能的实现方式中,第二设备获取第一模型的另一种示例,可以通过向第一设备上报第一信息,接收第一设备发送的第一模型,该第一模型由第一设备基于第一信息所确定。通过将从第二模型中确定第一模型的过程放在第一设备侧,以减少第二设备的算力资源与存储资源。
可选地,在第二方面的一种可能的实现方式中,上述步骤还包括:基于本地数据训练第一模型得到第三模型;向第一设备发送第三模型,第三模型用于第一设备更新第二模型。
该种可能的实现方式中,第一设备与第二设备可以使用第一模型进行联合训练。
可选地,在第二方面的一种可能的实现方式中,上述步骤还包括:获取第四模型,第四模型由第三模型更新第二模型得到;基于第四模型更新第三模型。
该种可能的实现方式中,第一设备与第二设备可以使用第一模型进行联合训练。
可选地,在第二方面的一种可能的实现方式中,上述第一模型包括N层第一网络,第二模型包括M层第一网络,N与M为正整数,且M小于或等于N。
该种可能的实现方式中,第一模型为第二模型一部分的一种示例,可以用在分割网络场景。例如,第一模型只是整个第二模型中的编码器或解码器等。
本申请实施例第三方面提供了一种通信方法,可以应用于模型训练场景。该方法可以由第一设备执行,也可以由第一设备的部件(例如处理器、芯片、或芯片系统等)执行。该第一设备具体可以为网络设备(例如基站、TRP等),该方法包括:接收第二设备发送的第一信息,第一信息包括第二设备的能力信息和/或业务需求信息,第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;确定第一设备与第二设备的学习模式,学习模式为联邦学习模式或蒸馏学习模式;向第二设备发送指示信息,指示信息用于指示学习模式;接收第二设备发送的第二信息,第二信息用于更新第一设备侧的模型。
本申请实施例中,第一设备通过第二设备的能力信息/业务需求信息来确定与第二设备匹配的学习模式,从而可以灵活适用模型训练的场景,提升模型训练效率。
本申请实施例第四方面提供了一种通信方法,可以应用于模型训练场景。该方法可以由第二设备执行,也可以由第二设备的部件(例如处理器、芯片、或芯片系统等)执行。该第二设备具体可以为终端设备,该方法包括:向第一设备发送第一信息,第一信息包括第二设备的能力信息和/或业务需求信息,第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;接收第一设备发送的指示信息,指示信息用于指示学习模式;基于学习模式向第一设备发送第二信息,第二信息用于更新第一设备侧的模型。
本申请实施例中,第一设备通过第二设备的能力信息/业务需求信息来确定与第二设备匹配的学习模式,从而可以灵活适用模型训练的场景,提升模型训练效率。
可选地,在第三方面或第四方面的一种可能的实现方式中,在联邦学习模式下,第二信息为模型的权重或梯度。
该种可能的实现方式中,该方法可以应用于第二设备算力(例如算力能力、存储能力)较强的场景。
可选地,在第三方面或第四方面的一种可能的实现方式中,上述在蒸馏学习模式下,第二信息为模型推理公共数据集得到的结果。
该种可能的实现方式中,该方法可以应用于第二设备能力(例如算力能力、存储能力)较差的场景。
本申请实施例第五方面提供了一种第一设备,可以应用于多级联合学习场景。该第一设备包括:获取单元,用于获取第一结果,第一结果为第二设备侧模型推理公共数据集得到的结果;更新单元,用于基于公共数据集与第一结果更新第一模型以得到第二模型,第一模型为第一设备的本地模型;发送单元,用于向第三设备发送第二模型;接收单元,用于接收第三设备发送的第三模型,第三模型由第二模型处理得到;获取单元,还用于基于第三模型与公共数据集获取第二结果,第二结果用于更新第二设备侧模型。
可选地,在第五方面的一种可能的实现方式中,上述的获取单元,具体用于接收来自第二设备的第一结果,第一结果为第二设备使用第二设备侧模型推理公共数据集得到的结果。
可选地,在第五方面的一种可能的实现方式中,上述的获取单元,具体用于接收来自第二设备的第二设备侧模型;获取单元,具体用于使用第二设备侧模型推理公共数据集得到第一结果。
可选地,在第五方面的一种可能的实现方式中,上述的发送单元,还用于向第二设备发送第二结果,第二结果用于第二设备更新第二设备侧模型。
可选地,在第五方面的一种可能的实现方式中,上述的更新单元,还用于基于第二结果更新第二设备侧模型;发送单元,还用于向第二设备发送更新后的第二设备侧模型。
可选地,在第五方面的一种可能的实现方式中,上述的发送单元,还用于向第二设备发送指示信息,指示信息用于第一设备与第二设备同步公共数据集,同步对应的操作包括以下至少一项:增加、删除、修改;接收单元,还用于接收第二设备发送的确认信息,确认信息用于同步公共数据集。
本申请实施例第六方面提供了一种第二设备,可以应用于模型训练场景。该第二设备包括:获取单元,用于获取第一模型,第一模型基于第二设备的第一信息与第二模型得到,第一模型为第二模型的一部分;第一信息包括能力信息和/或业务需求信息。
可选地,在第六方面的一种可能的实现方式中,上述的第二模型包括N层第一网络,N层第一网络中至少一层第一网络包括两个以上并行的子网络,第一模型包括N层第二网络,第一数量小于第二数量,第一数量为N层第二网络中至少一层第二网络包括子网络的数量,第二数量为N层第一网络中对应至少一层第二网络的第一网络所包括子网络的数量,N为正整数。
可选地,在第六方面的一种可能的实现方式中,上述的第二设备还包括:接收单元,用于接收来自第一设备的第一参数,第一参数用于指示对子网络的调整;更新单元,用于基于第一参数更新第一模型。
可选地,在第六方面的一种可能的实现方式中,上述的接收单元,用于接收来自第一设备的第二模型;获取单元,具体用于基于第一信息从第二模型中确定第一模型。
可选地,在第六方面的一种可能的实现方式中,上述的获取单元,具体用于基于第一信息确定N层第一网络中各层第一网络的子网络;获取单元,具体用于基于子网络构建第一模型。
可选地,在第六方面的一种可能的实现方式中,上述的能力信息用于确定N层第二网络中各第二网络的子网络的数量,业务需求信息用于确定各第二网络中子网络。
可选地,在第六方面的一种可能的实现方式中,上述的获取单元,具体用于向第一设备发送第一信息,第一信息用于第一设备从第二模型中确定第一模型;获取单元,具体用于接收第一设备发送的第二模型。
可选地,在第六方面的一种可能的实现方式中,上述的更新单元,还用于基于本地数据训练第一模型得到第三模型;发送单元,还用于向第一设备发送第三模型,第三模型用于第一设备更新第二模型。
可选地,在第六方面的一种可能的实现方式中,上述的获取单元,还用于获取第四模型,第四模型由第三模型更新第二模型得到;更新单元,用于基于第四模型更新第三模型。
可选地,在第六方面的一种可能的实现方式中,上述的第一模型包括N层第一网络,第二模型包括M层第一网络,N与M为正整数,且M小于或等于N。
本申请实施例第七方面提供了一种第一设备,可以应用于模型训练场景。该第一设备包括:接收单元,用于接收第二设备发送的第一信息,第一信息包括第二设备的能力信息和/或业务需求信息,第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;确定单元,用于确定第一设备与第二设备的学习模式,学习模式为联邦学习模式或蒸馏学习模式;发送单元,用于向第二设备发送指示信息,指示信息用于指示学习模式;接收单元,还用于接收第二设备发送的第二信息,第二信息用于更新第一设备侧的模型。
本申请实施例第八方面提供了一种第二设备,可以应用于模型训练场景。该第二设备包括:发送单元,用于向第一设备发送第一信息,第一信息包括第二设备的能力信息和/或业务需求信息,第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;接收单元,用于接收第一设备发送的指示信息,指示信息用于指示学习模式;发送单元,用于基于学习模式向第一设备发送第二信息,第二信息用于更新第一设备侧的模型。
可选地,在第七方面或第八方面的一种可能的实现方式中,在联邦学习模式下,第二信息为模型的权重或梯度。
可选地,在第七方面或第八方面的一种可能的实现方式中,在蒸馏学习模式下,第二信息为模型推理公共数据集得到的结果。
本申请实施例第九方面提供了一种第一设备,包括:处理器,处理器与存储器耦合,存储器用于存储程序或指令,当程序或指令被处理器执行时,使得该第一设备实现上述第一方面或第一方面的任意可能的实现方式中的方法,或者使得该第二设备实现上述第三方面或第三方面的任意可能的实现方式中的方法。
本申请实施例第十方面提供了一种第二设备,包括:处理器,处理器与存储器耦合,存储器用于存储程序或指令,当程序或指令被处理器执行时,使得该第二设备实现上述第二方面或第二方面的任意可能的实现方式中的方法,使得该第二设备实现上述第四方面或第四方面的任意可能的实现方式中的方法。
本申请实施例第十一方面提供了一种通信系统,包括:上述第九方面的第一设备,和/或上述第十方面的第二设备。
本申请实施例第十一方面提供了一种芯片系统,该芯片系统包括至少一个处理器,用于支持第一设备实现上述第一方面或第一方面任意一种可能的实现方式中所涉及的功能;或者用于支持第一设备实现上述第三方面或第三方面任意一种可能的实现方式中所涉及的功能。
在一种可能的设计中,该芯片系统还可以包括存储器,存储器,用于保存该通信设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。可选的,所述芯片系统还包括接口电路,所述接口电路为所述至少一个处理器提供程序指令和/或数据。
本申请实施例第十二方面提供了一种芯片系统,该芯片系统包括至少一个处理器,用于支持第二设备实现上述第二方面或第二方面任意一种可能的实现方式中所涉及的功能;或者用于支持第二设备实现上述第四方面或第四方面任意一种可能的实现方式中所涉及的功能。
在一种可能的设计中,该芯片系统还可以包括存储器,存储器,用于保存该通信设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。可选的,所述芯片系统还包括接口电路,所述接口电路为所述至少一个处理器提供程序指令和/或数据。
本申请实施例第十三方面提供了一种计算机可读介质,其上存储有计算机程序或指令,当计算机程序或指令在计算机上运行时,使得计算机执行前述第一方面或第一方面的任意可能的实现方式中的方法,或者使得计算机执行前述第二方面或第二方面的任意可能的实现方式中的方法,或者使得计算机执行前述第三方面或第三方面的任意可能的实现方式中的方法,或者使得计算机执行前述第四方面或第四方面的任意可能的实现方式中的方法。
本申请实施例第十四方面提供了一种计算机程序产品,该计算机程序产品在计算机上执行时,使得计算机执行前述第一方面或第一方面的任意可能的实现方式中的方法,使得计算机执行前述第二方面或第二方面的任意可能的实现方式中的方法,或者使得计算机执行前述第三方面或第三方面的任意可能的实现方式中的方法,或者使得计算机执行前述第四方面或第四方面的任意可能的实现方式中的方法。
从以上技术方案可以看出,本申请具有以下优点:第一设备根据下游第二设备侧模型的第一结果更新本地模型,并向上游第三设备发送更新后的本地模型。从而接收上游第三设备 根据第二设备更新的模型处理得到的第三模型。进而根据该第三模型更新对公共数据集新的第二结果,该第二结果用于更新下游第二设备侧模型。可以看出,第一设备在多级联合学习场景下,充分利用计算能力参与联合训练。相较于中间级只做转发的方案,可以减少上游设备的处理流程,从而增加多级联合学习的学习效率。
附图说明
图1A为本申请实施例提供的通信系统的一个示意图;
图1B为本申请实施例提供的多级学习架构的一个示意图;
图2为本申请实施例提供的通信方法的一个流程示意图;
图3为本申请实施例提供的通信方法的另一个流程示意图;
图4为本申请实施例提供的通信方法的另一个流程示意图;
图5为本申请实施例提供的通信方法的另一个流程示意图;
图6为本申请实施例提供的第二模型的一个示例图;
图7A为本申请实施例提供的第一模型的一个示例图;
图7B为本申请实施例提供的第一模型的另一个示例图;
图8A为本申请实施例提供的第二模型的另一个示例图;
图8B为本申请实施例提供的第一模型的另一个示例图;
图9为本申请实施例提供的通信方法的另一个流程示意图;
图10为本申请实施例提供的通信方法的另一个流程示意图;
图11为本申请实施例提供的通信方法的另一个流程示意图;
图12为本申请实施例提供的通信系统的另一个示意图;
图13为本申请实施例提供的第一设备的一个结构示意图;
图14为本申请实施例提供的第二设备的一个结构示意图;
图15为本申请实施例提供的第一设备的另一个结构示意图;
图16为本申请实施例提供的第二设备的另一个结构示意图;
图17为本申请实施例提供的第一设备的另一个结构示意图;
图18为本申请实施例提供的第二设备的另一个结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
首先,对本申请实施例中的部分用语进行解释说明,以便于本领域技术人员理解。
1、联邦学习
联邦学习是一种分布式学习技术,各设备利用本地数据集进行模型训练,将训练得到的模型或更新上报给参数服务器,服务器通过以联邦平均为代表的融合算法,加权平均多个设备的本地模型,得到全局模型并下发给各设备,实现模型的更新。联邦学习框架不需要上传 用户数据,从而在保障用户隐私的前提下,实现多个设备的联合学习。
2、混合专家(Mixed of Expert,MoE)
混合专家是一种神经网络,训练多个子模型,各个子模型称为专家,针对各样本,通过控制模块选择使用哪个专家推理,从而实现提升模型容量而不大幅增加计算需求。知识蒸馏是一种实现不同结构模型知识迁移的一种方法,通过将一个神经网络的输出作为另一个神经网络训练的标签,实现知识的转移。
3、本申请实施例中的术语“系统”和“网络”可被互换使用。“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A、同时存在A和B、单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如“A,B和C中的至少一个”包括A,B,C,AB,AC,BC或ABC。以及,除非有特别说明,本申请实施例提及“第一”、“第二”等序数词是用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。
请参阅图1A,为本申请实施例提供的通信系统的一个示意图。
图1A示出了本申请实施例的一种通信系统的示意图。该通信系统中包括服务器/核心网设备101、网络设备1021与1022、终端设备1031与1032。
本申请实施例中,图1A仅以1个服务器/核心网设备101、两个网络设备1021与1022以及两个终端设备1031与1032为例进行说明,在实际应用中,可以有更多的服务器/核心网设备、网络设备以及终端设备。
各终端设备接入服务器/核心网设备101的方式也可以有所不同,可以两个终端设备1031与1032通过同一个网络设备1021接入服务器/核心网设备101,也可以是一个终端设备通过一个网络设备接入服务器/核心网设备101(图1A中未示出)。
两个终端设备1031以及1032与网络设备1021之间一般通过无线网络连接,也可以通过有线网络连接,如果是通过无线网络连接,具体的连接形式可以为蜂窝状无线网络,或者是WiFi网络,或者是其他类型的无线网络。
网络设备1021以及1022与服务器/核心网设备101之间可以通过无线网络连接,也可以通过有线网络连接,如果是通过有线网络连接,一般的连接形式为光纤网络。
可选地,终端设备1031与终端设备1032可以通过无线网络或有线网络直接连接。也可以通过网络设备1021进行间接连接等,具体此处不做限定。
可以理解的是,网络设备1021与1022可以是独立于服务器/核心网设备101以外的设备,还可以是服务器/核心网设备101的人工智能(artificial intelligence,AI)节点等等。
图1A所示的通信系统可以应用于第三代合作伙伴计划(3rd generation partnership project,3GPP)相关的蜂窝系统,例如,长期演进(long term evolution,LTE)系统,第四代(4th generation,4G)通信系统,新无线(new radio,NR)系统等第五代(5th generation,5G)通信系统,还可以应用于无线保真(wireless fidelity,WiFi)系统,支持多种无线技术融合的通信系统,或者是第六代(6th generation,6G)通信系统等5G之后 演进的通信系统。
本申请实施例涉及的终端设备、网络设备、服务器/核心网设备具有通信功能,还可以具有AI计算能力,这些设备可以通过本地的数据样本进行机器学习的训练,也可以接收其他设备训练的模型进行融合,发送给其他设备,从而实现多个设备的联合学习。
本申请实施例中提及的终端设备,可以是一种具有无线收发功能的设备,具体可以指用户设备(user equipment,UE)、接入终端、用户单元(subscriber unit)、用户站、移动台(mobile station)、远方站、远程终端、移动设备、用户终端、无线通信设备、用户代理或用户装置。终端设备还可以是卫星电话、蜂窝电话、智能手机、无线数据卡、无线调制解调器、机器类型通信设备、可以是无绳电话、会话启动协议(session initiation protocol,SIP)电话、无线本地环路(wireless local loop,WLL)站、个人数字处理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备、高空飞机上搭载的通信设备、可穿戴设备、无人机、机器人、设备到设备通信(device-to-device,D2D)中的终端、车到一切(vehicle to everything,V2X)中的终端、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端或者未来通信网络中的终端设备等,本申请不作限制。
本申请实施例中提及的网络设备,可以是具有无线收发功能的设备,用于与终端设备进行通信,也可以是一种将终端设备接入到无线网络的设备。网络设备可以为无线接入网中的节点,又可以称为基站,还可以称为无线接入网(radio access network,RAN)节点(或设备)。网络设备可以是LTE中的演进型基站(evolved Node B,eNB或eNodeB);或者5G网络中的下一代节点B(next generation node B,gNB)或者未来演进的公共陆地移动网络(public land mobile network,PLMN)中的基站,宽带网络业务网关(broadband network gateway,BNG),汇聚交换机或者非第三代合作伙伴项目(3rd generation partnership project,3GPP)接入设备等。可选的,本申请实施例中的网络设备可以包括各种形式的基站,例如:宏基站、微基站(也称为小站)、中继站、接入点、5G之后演进的通信系统中实现基站功能的设备、WiFi系统中的接入点(access point,AP)、传输点(transmitting and receiving point,TRP)、发射点(transmitting point,TP)、移动交换中心以及设备到设备(Device-to-Device,D2D)、车辆外联(vehicle-to-everything,V2X)、机器到机器(machine-to-machine,M2M)通信中承担基站功能的设备等,还可以包括云接入网(cloud radio access network,C-RAN)系统中的集中式单元(centralized unit,CU)和分布式单元(distributed unit,DU、非陆地通信网络(non-terrestrial network,NTN)通信系统中的网络设备,即可以部署于高空平台或者卫星。本申请实施例对此不作具体限定。
本申请实施例中提及的服务器也可以理解为是AI服务器(AI Function,AIF)。核心网设备可以例如包括访问和移动管理功能(access and mobility management function,AMF)、用户面功能(user plane function,UPF)或会话管理功能(session management function, SMF)等。可以理解的是,核心网设备也可以称为核心网网元。其中,核心网网元可以用于完成注册、连接、会话管理等功能。该核心网网元主要包含网络开放功能(network exposure function,NEF)网元、策略控制功能(policy control function,PCF)网元、应用功能(application function,AF)网元、接入与移动性管理功能(access and mobility management function,AMF)网元、会话管理功能模块(session management function,SMF)网元以及用户平面功能(user plane function,UPF)网元等。
可选地,服务器(server)一般不视为核心网网元。该server包括用于实现模型处理等。
图1A所示的通信系统中,各设备可以实现多级联合学习。下面通过图1B对图1A的多级联合学习进行描述。
图1B为多级学习架构的结构图,该多级学习架构包括终端设备、第一级节点、第二级节点。这里以第一级节点为图1A所示的网络设备,第二级节点为图1A所示的服务器/核心网设备为例进行说明。图1B可以视作为一轮联合学习的过程,重复图1B的过程直到满足停止条件。该停止条件可以是模型收敛、训练时长达到预设时长或训练次数达到预设次数等。另外,第一级节点与第二级节点之间可以理解为是联邦学习层。第一级节点与终端设备之间可以理解为知识蒸馏学习层。一般情况下,终端设备与网络设备预先存储有公共数据集。当然,也可以是终端设备与网络设备交互等方式获取公共数据集,具体此处不做限定。
可以理解的是,图1B仅以第一级节点是网络设备,第二级节点是服务器/核心网设备为例进行示例性描述。在实际应用中,第一级节点与第二级节点也可以是其他终端设备等,该种情况下可以看做是终端设备的多级联合学习。对于第二级节点与第三级节点具体此处不做限定。
图1B所示的一轮联合学习包括步骤1至步骤7,下面分别进行描述。
步骤1:服务器/核心网设备触发联合学习过程,选择参与联邦学习的网络设备。并通知各网络设备参与联合学习过程。参与联邦学习的网络设备进一步选择参与联邦蒸馏的终端设备。并通知各终端设备参与联合学习过程。具体的,可以通过终端设备的能力信息和/或业务需求信息选取终端设备。能力信息可以包括以下至少一项:计算能力、存储能力等。业务需求信息包括以下至少一项:各终端设备的数据分布、各终端设备的推理任务等。
步骤2:参与联合学习过程的终端设备根据本地数据集对本地模型进行训练之后,更新后的本地模型推理公共数据集得到第一结果。并将第一结果上报给网络设备。
步骤3:网络设备根据接收到的第一结果,将公共数据集作为训练集,训练集的标签作为计算损失函数的标签,第一结果作为计算损失函数的软标签,训练网络设备侧模型。
步骤4:网络设备将训练好的网络侧设备模型上报给服务器/核心网设备。
步骤5:服务器/核心网设备将收到的网络侧设备模型进行加权平均,以得到本轮学习得到的全局模型。并将全局模型下发给各网络设备。
步骤6:网络设备用全局模型替换上述训练好的网络侧设备模型。并利用全局模型对公共数据集进行处理以得到第二结果。再将第二结果下发给终端设备。
步骤7:终端设备接收到第二结果之后,将公共数据集作为训练集,训练集的标签作为计算损失函数的标签,第二结果作为计算损失函数的软标签,训练终端侧模型。至此,完成 一轮联合学习。
可以理解的是,上述步骤2、步骤3、步骤7中,也可以通过网络设备进行知识蒸馏进行终端侧模型融合,即终端设备基于本地私有数据训练本地模型,将训练得到的本地模型上报给网络设备;网络设备基于公共数据集和各终端侧模型,对网络设备侧模型通过知识蒸馏进行训练,并进一步蒸馏更新各终端侧模型,将更新后的终端侧模型并下发给终端设备进行更新。
为了更详细的了解上述联合学习的流程,下面结合图1A与图1B的通信架构,对本申请实施例中的通信方法进行描述:
请参阅图2,本申请实施例提供的通信方法的一个流程示意图,该方法可以包括步骤201至步骤207。下面对步骤201至步骤207进行详细说明。本实施例中的第一设备为前述图1A与图1B中的第一级节点/网络设备。第二设备为终端设备,第三设备为前述图1A与图1B中的服务器/核心网设备。另外,对于第一设备、第二设备以及第三设备的数量可以是一个或多个,具体此处不做限定。
步骤201,第一设备获取第一结果。
本申请实施例中第一设备获取第一结果有多种情况,下面分别进行描述。
第一种情况,该情况下的步骤201包括步骤2011与步骤2012。
步骤2011,第二设备基于第二设备侧模型推理公共数据集得到第一结果。
可选地,第二设备使用本地的第二设备侧模型推理公共数据集得到第一结果。具体的,第二设备将公共数据集中的数据输入第二设备侧模型中得到推理结果(即第一结果)。
可选地,上述的第二设备侧模型是第二设备使用本地数据进行训练得到。
步骤2012,第二设备向第一设备发送第一结果。
第二设备获取第一结果之后,向第一设备发送该第一结果。相应的,第一设备接收第二设备发送的第一结果。
第二种情况,该情况下的步骤201包括步骤2013与步骤2014。
步骤2013,第二设备向第一设备发送第二设备侧模型。
可选地,第二设备使用本地数据训练模型得到第二设备侧模型。第二设备生成第二设备侧模型之后,向第一设备发送第二设备侧模型。相应的,第一设备接收第二设备发送的第二设备侧模型。
步骤2014,第一设备基于第二设备侧模型推理公共数据集得到第一结果。
可选地,第一设备获取第二设备侧模型之后,基于第二设备侧模型推理公共数据集得到第一结果。具体的,通过第二设备侧模型推理公共数据集得到第一结果。
可以理解的是,上述两种情况只是举例,在实际应用中,第一设备获取第一结果的方式还有很多,例如从数据库中选取等方式,具体此处不做限定。
可以看出,上述两种情况下,可以根据实际需要选择。第一种情况是第二设备向第一设备上报的是第一结果。第二种情况是第二设备向第一设备上报的是第二设备侧模型,第一设备再使用第二设备侧模型推理公共数据集得到第一结果。
步骤202,第一设备基于第一结果更新第一模型以得到第二模型。
第一设备获取第一结果之后,可以基于该第一结果更新第一模型以得到第二模型。
可选地,将公共数据集作为训练集,训练集的标签作为损失函数的硬标签,第一结果作为损失函数的软标签。第一设备以该训练集作为输入,以降低损失函数的值为目标对第一模型训练以得到第二模型。其中,损失函数用于表示第一差异与第二差异。第一差异为第一模型的输出与硬标签之间的差异,第二差异为第一模型的输出与软标签之间的差异。
可选地,损失函数包括第一差异与第二差异的加权,其中,第一差异与第二差异的权重系数可以根据实际需要设置,例如,损失函数=0.4*第一差异+0.6*第二差异。具体此处不做限定。
本步骤也可以理解为第一设备基于公共数据集与第一结果进行知识蒸馏。
另外,需要说明的是,本申请实施例中各设备交互的模型(例如第一模型、第二模型、第三模型等)可以是指整个模型,也可以是模型的权重(例如全部权重,或有增量更新的权重等,具体此处不做限定。
可以理解的是,这里的第一模型权重可以是第一模型的所有参数权重,也可以是相对于上次学习有调整的参数权重,具体此处不做限定。
步骤203,第一设备向第三设备发送第二模型。
第一设备训练/更新第一模型得到第二模型之后,向第三设备发送第二模型。相应的,第三设备接收第一设备发送的第二模型。
步骤204,第三设备基于第二模型更新第三设备侧模型以得到第三模型。
第三设备接收第一设备发送的第二模型之后。基于第二模型更新第三设备侧模型以得到第三模型。
本步骤中,第三设备获取第二模型的数量取决于参与联合学习的第一设备数量,即该第三设备可以获取多个第一设备上报的第二模型。第三设备获取多个第一设备上报的第二模型之后,基于多个第二模型更新第三设备侧模型以得到第三模型。例如,将多个第二模型与第三设备侧模型的参数进行加权平均以得到第三模型。
本步骤可以理解为,第三设备收集多个第一设备处最新的模型,并使用多个第一设备最新的模型得到全局模型(即第三模型)。例如,将多个第一设备最新的模型与第三设备侧模型的参数进行加权平均得到全局模型。
步骤205,第三设备向第一设备发送第三模型。
第三设备获取第三模型之后,向第一设备发送第三模型。相应的,第一设备接收第三设备发送的第三模型。
本步骤可以理解为,第三设备通过收集多个第一设备上报的第二模型之后,使用第二模型更新上一次的第三设备侧模型得到全局模型,并将该全局模型下发至各第一设备。
步骤206,第一设备基于公共数据集与第三模型获取第二结果。
第一设备接收第三设备发送的第三模型之后,基于公共数据集与第三模型获取第二结果。
可选地,第三模型推理公共数据集得到第二结果。
可以理解的是,在有些场景中,若第三设备存储有公共数据集,上述步骤205与步骤206则可以替换为:第三设备在获取第三模型之后,基于公共数据集与第三模型得到第二结 果,并将第二结果发给第一设备。
步骤207,基于第二结果更新第二设备侧模型。
本实施例中,基于第二结果更新第二设备侧模型由多种情况,下面分别进行描述。
在一种可能实现的方式中,第一设备获取第二结果之后,向第二设备发送第二结果。第二设备基于该第二结果与公共数据集更新第二设备侧模型。例如,与前述类似,通过将公共数据集作为训练集,训练集的标签作为损失函数的硬标签,第二结果作为损失函数的软标签。第二设备以该训练集作为输入,以降低损失函数的值为目标对第二设备侧模型进行训练更新。
在另一种可能实现的方式中,第一设备处存储有第二设备侧模型(例如,通过前述步骤201的第二种情况获取第二设备侧模型),第一设备可以基于第二结果更新第二设备侧模型。并将更新后的第二设备侧模型下发至第二设备。例如,与前述类似,通过将公共数据集作为训练集,训练集的标签作为损失函数的硬标签,第二结果作为损失函数的软标签。第一设备以该训练集作为输入,以降低损失函数的值为目标对第二设备侧模型进行训练更新。
本申请实施例中,第一设备根据下游第二设备侧模型的第一结果更新本地模型,并向上游第三设备发送更新后的本地模型。从而接收上游第三设备根据第二设备更新的模型处理得到的第三模型。进而根据该第三模型更新对公共数据集的第二结果,该第二结果用于更新下游第二设备侧模型。可以看出,第一设备在多级联合学习场景下,充分利用计算能力参与联合训练。相较于中间级只做转发的方案,可以融合下游设备异构的模型,并减少上游设备的处理流程,从而增加多级联合学习的学习效率。
图3为本申请实施例提供的另一种通信方法。图3可以理解为是以第一设备为基站,第二设备为终端设备,第三设备为AI服务器为例进行描述。该方法包括步骤301至步骤308。下面分别进行描述。
步骤301,AI服务器向基站发送第一触发信息。
AI服务器向基站发送第一触发信息,相应的,基站接收AI服务器发送的第一触发信息。该第一触发信息用于通知基站进行联合学习。
可选地,AI服务器可以广播该第一触发信息。该第一触发信息中可以包括以下至少一项:联合学习标识、学习区域(例如用小区标识或专用的学习区域标识表示)等。联合学习标识用于指示具体的AI任务。学习区域用于指示参与联合学习的区域,该区域内的基站可以参与联合学习。
步骤302,基站向终端设备发送第二触发信息。
基站收到第一触发信息之后,向终端设备发送第二触发信息。相应的,终端设备接收基站发送的第二触发信息。该第二触发信息用于指示终端设备进行联合学习。
可选地,基站收到第一触发信息之后,广播第二触发信息,该第二触发信息包括以下至少一项:联合学习标识、终端ID、反馈时频资源位置等。终端ID用于指示参与联合学习的终端设备,反馈时频资源位置用于指示终端反馈训练完成信息使用的物理资源。
步骤303,终端设备向基站发送训练完成信息。
终端数设备接收第二触发信息之后,分别使用本地数据训练模型,训练完成后向基站发送训练完成信息。相应的,基站接收终端设备发送的训练完成信息。该训练完成信息用于指 示终端设备已对模型完成训练。
步骤304,基站向终端设备发送配置信息。
基站向终端设备发送配置信息。相应的,终端设备接收基站发送的配置信息。该配置信息用于指示终端设备上报第一结果使用的物理资源和/或传输参数等。物理资源可以是前述的反馈时频资源位置等。传输参数可以包括编码码率和/或调制阶数等。
可以理解的是,若前述第二触发信息中没有反馈时频资源位置,则配置信息可以包括反馈时频资源位置。当然,若终端设备有预设值的反馈时频资源位置,则前述第二触发信息与配置信息中可以不包括反馈时频资源位置。
步骤305,终端设备向基站发送第一结果。
终端设备使用本地训练好的模型对公共数据集进行推理得到第一结果。具体的,本地训练好的模型推理公共数据集得到第一结果。并向基站发送第一结果。相应的,基站接收终端设备发送的第一结果。
其中,关于第一结果的描述可以参考前述图1B与图2所示实施例中第一结果的描述,此处不再赘述。
步骤306,基站向AI服务器发送第一模型。
基站获取第一结果之后,可以基于第一结果与公共数据集对本地模型进行训练以得到第一模型。并向AI服务器发送第一模型。相应的,AI服务器接收基站发送的第一模型。
步骤307,AI服务器向基站发送第二模型。
AI服务器接收基站发送的第一模型之后,使用第一模型更新AI服务器的本地模型以得到第二模型。并向基站发送第二模型。相应的,基站接收AI服务器发送的第二模型。
可选地,AI服务器将多个基站上报的第一模型进行融合更新后得到第二模型,将第二模型广播给各基站。
步骤308,基站向终端设备发送第二结果。
基站收到AI服务器发送的第二模型之后,使用第二模型推理公共数据集得到第二结果。并向终端设备发送第二结果。相应的,终端设备接收基站发送的第二结果。该第二结果用于更新终端设备侧模型。
可以理解的是,图3只是以第一设备为基站,第二设备为终端设备,第三设备为AI服务器为例对各设备之间的交互进行描述。其中,关于具体过程可以参考前述图1B与图2所示实施例中的描述,此处不再赘述。
此外,上述图1B至图3所示实施例中联合学习的过程中,参与联合学习的第一设备与第二设备可以对公共数据集进行同步。从而确保交互的预测结果(例如第一结果、第二结果)是与相同公共数据集对应的。在一种可能实现的方式中,上述公共数据集的同步过程可以如图4所示,该交互流程包括步骤401与步骤402。
步骤401,第一设备向第二设备发送指示信息。
第一设备向第二设备发送指示信息。相应的,第二设备接收第一设备发送的指示信息。该指示信息用于第一设备与第二设备同步公共数据集。该同步对应的操作包括以下至少一项:增加、删除、修改等。
可选地,可以为公共数据集中的各样本配置一个索引,通过该索引指示样本。上述同步对应的操作包括:样本增加、样本删除、样本修改等。
可选地,指示信息可以通过承载于以下至少一项:无线资源控制(Radio Resource Control,RRC)信令、下行控制信息(downlink control information,DCI)、媒体接入控制信息中的控制单元(Medium Access Control Control Element,MAC CE)等。
步骤402,第二设备向第一设备发送确认信息。
第二设备接收指示信息之后,进行公共数据集的增加、删除、修改等操作。并向第一设备发送确认信息,该确认信息用于同步公共数据集。
可选地,确认信息可以包括最新的公共数据集,也可以包括增量更新的公共数据集,还可以包括待修改样本的索引以及修改后的内容,还可以包括待删除样本的索引等,具体此处不做限定。
本实施例,参与联合学习的第一设备与第二设备可以对公共数据集进行同步。从而确保交互的预测结果(例如第一结果、第二结果)是与相同公共数据集对应的。
请参阅图5,本申请实施例提供的通信方法的另一个流程示意图,该方法可以包括步骤501至步骤508。下面对步骤501至步骤508进行详细说明。本实施例中的第一设备与第二设备可以是前述图1A至图4所示实施例中的终端设备与网络设备,也可以是网络设备与服务器/核心网络设备,具体此处不做限定。即图5所示实施例可以与图1A至图4所示实施例结合。
步骤501,第二设备获取第一模型。
第二设备获取第一模型,该第一模型基于第二设备的第一信息与第二模型得到。第一模型为第二模型的一部分。第一信息包括能力信息和/或业务需求信息。
其中,第一模型为第二模型的一部分有多种情况。
在一种可能实现的方式中,第一模型与第二模型的网络层数相同,第一模型中各网络层数的子网络为第二模型中对应各网络层数的子网络的一部分。该种情况下,第二模型包括N层第一网络,N层第一网络中至少一层第一网络包括两个以上并行的子网络,第一模型包括N层第二网络,第一数量小于第二数量,第一数量为N层第二网络中至少一层第二网络包括子网络的数量,第二数量为N层第一网络中对应至少一层第二网络的第一网络所包括子网络的数量,N为正整数。
示例性的,第二模型如图6所示。该第二模型包括n层第一网络,n为大于2的整数。n层第一网络包括:第一层第一网络NN1,第二层第一网络NN2,...,第n层第一网络NNn。其中,第一层第一网络NN1包括K个子网络:子网络NN1-1,子网络NN1-2,...,子网络NN1-K。第二层第一网络NN2包括L个子网络:子网络NN2-1,子网络NN2-2,...,子网络NN2-L。第n层第一网络NNn包括M个子网络:子网络NNn-1,子网络NNn-2,...,子网络NNn-M。其中,K、L、M为正整数。第一模型如图7A所示,第一模型包括n层第二网络。n层第二网络包括:第一层第二网络NN1,第二层第二网络NN2,...,第n层第二网络NNn。其中,第一层第二网络NN1包括K-P个子网络:子网络NN1-1,...,子网络NN1-(K-P)。第二层第二网络NN2包括L个子网络:子网络NN2-1,子网络NN2-2,...,子网络NN2-L。第n层第二网络NNn包括M个子网络:子网络NNn-1,子网络NNn-2,...,子网络NNn-M。其中, P为大于0小于K的正整数。可以理解的是,图6与图7A只是第二模型与第一模型的举例,用于描述第一模型为第二模型的一部分的一种情况。
在另一种可能实现的方式中,第一模型与第二模型的网络层数不同,第二模型中所有网络层中的部分网络层为第一模型。该种情况下,第一模型包括N层第一网络,第二模型包括M层第一网络,N与M为正整数,且M小于或等于N。
示例性的,延续第二模型如图6的举例。第一模型如图7B所示,第一模型包括n-m层第二网络。n-m层第二网络包括:第一层第二网络NN1,...,第n-m层第二网络NN(n-m)。其中,第一层第二网络NN1包括K个子网络:子网络NN1-1,子网络NN1-2,...,子网络NN1-K。第n-m层第二网络NN(n-m)包括Q个子网络:子网络NN(n-m)-1,子网络NN(n-m)-2,...,子网络NNNN(n-m)-Q。其中,Q与m为正整数。可以理解的是,图6与图7B只是第二模型与第一模型的举例,用于描述第一模型为第二模型的一部分的另一种情况。
另外,第二设备的能力信息用于确定第一模型中子网络的数量。业务需求信息用于确定各子网络。能力信息可以包括以下至少一项:计算能力、存储能力等。业务需求信息包括以下至少一项:各第二设备的数据分布、各第二设备的推理任务等。
本申请实施例中第二设备获取第一模型有多种情况,下面分别进行描述。
第一种情况,该情况下的步骤501包括步骤5011与步骤5012。
步骤5011,第一设备向第二设备发送第二模型。
第一设备向第二设备发送第一设备处的第二模型。相应的,第二设备接收第一设备发送的第二模型。该第二模型也可以理解为是具有多个专家(即第一网络/第二网络)的大模型。
步骤5012,第二设备基于第一信息从第二模型中确定第一模型。
第二设备获取第二模型之后,可以基于第一信息从第二模型中确定第一模型。
具体的,可以通过能力信息确定第一模型中子网络的数量。通过业务需求信息用于确定各子网络。从而确定从第二模型中选择哪些子网络作为第一模型。
从第二模型中确定第一模型可以理解为是在第二模型中确定一条或多条路径作为第一模型。该路径用于表示从第二模型中选择的子网络。
可选地,第二模型中可以包括路径选择模块以及与第一网络匹配的专家选择模块。该专家选择模块用于确定第一网络中的子网络。其中,路径选择模块用于确定后续在第二模型中每层第一网络中选择子网络的数量。该路径选择模块的输入信息包括以下至少一项:终端ID、终端的能力信息、终端的业务需求信息、输入样本、路径数量等。路径选择模块的输出为与输入样本同样维度的向量或所有样本的平均值。专家选择模块接收路径选择模块发送的信息(例如,输入样本、路径数量等),输出各子网络的权重。专家选择模块的数量可以是一个或多个。在专家选择模块的数量为1个时,该专家选择模块用于对第二模型中所有层第一网络的子网络进行选择。在专家选择模块的数量与第二模型中第一网络的数量相同时,每层第一网络可以对应一个专家选择模块,用于确定各自第一网络层中的子网络。另外,专家选择模块从每层第一网络中选择子网络的数量可以是一个或多个,具体此处不做限定。若专家选择模块从每层第一网络中确定一个子网络,则各层第一网络被选择的子网络连接起来可以视为一条路径。
另外,上述的路径控制模块可以不参与第二模型的推理。即第二模型是部署包含所有路 径的模型,在具体推理的时候,专家选择模块会根据输入样本,确定选择其中的一个路径对样本进行处理,从而确定路径上子网络的权重。
示例性的,以第二模型如图8A(即图6中的N、M、L、K为3),第一模型如图8B所示。描述上述从第二模型中确定第一模型的过程。如图8A所示,第二模型包括3层第一网络。分别为:第一层第一网络NN1,第二层第一网络NN2,第三层第一网络NN3。第一层第一网络NN1包括:子网络NN1-1,子网络NN1-2,子网络NN1-3。第二层第一网络NN2包括:子网络NN2-1,子网络NN2-2,子网络NN2-3。第三层第一网络NN3包括:子网络NN3-1,子网络NN2-3,子网络NN3-3。另外,多个第一网络可以匹配一个专家选择模块,当然,为了提升模型的总体性能,也可以每一层第一网络可以匹配一个专家选择模块。例如,专家选择模块1用于确定第一层第一网络中选取的子网络。专家选择模块2用于确定第二层第一网络中选取的子网络。专家选择模块3用于确定第三层第一网络中选取的子网络。以从每一层第一网络中选取一个子网络为例(即选择一个路径),专家选择模块输出各子网络的权重之后,可以在每一层第一网络中选择权重较大的子网络作为第一模型的子网络。如图8A所示,第一层第一网络NN1中子网络NN1-1的权重为0.1,子网络NN1-2的权重为0.7,子网络NN1-3的权重为0.1。第二层第一网络NN2中子网络NN2-1的权重为0.05,子网络NN2-2的权重为0.6,子网络NN2-3的权重为0.1。第三层第一网络NN3中子网络NN3-1的权重为0.8,子网络NN3-2的权重为0.02,子网络NN3-3的权重为0.08。可以看出,子网络NN1-2的权重为0.7是第一层第一网络NN1中最大的权重。子网络NN2-2的权重为0.6是第二层第一网络NN2中最大的权重。子网络NN3-1的权重为0.8是第三层第一网络NN3中最大的权重。从而确定出一个路径:子网络NN1-2、子网络NN2-2、子网络NN3-1。进而基于上述过程确定的第一模型如图8B所示。
可以理解的是,上述图8A与图8B只是举例,在实际应用中,第二模型与第一模型还可以有其他情况,以及第一模型包括的路径也可以是多个,具体此处不做限定。
可选地,上述图8B是前述第一模型为第二模型的一部分的第一种情况。对于前述第一模型为第二模型的一部分的第二种情况,例如,对于应用于联合推理的自编码器等结构,网络还可能将模型切分成两个部分,相应地,选择的路径也分成两个部分。对于分割的模型,其中一部分部署在第二设备(例如终端),另一部分部署在第一设备(例如基站),如用于信道信息反馈的编码器和解码器。例如,在第二模型包括编码器与解码器的情况,第一模型可以是第二模型中的编码器或解码器中的一部分。对于每个终端,基站需要维护路径信息,以便在联合推理的时候,选择与对应的路径,即与编码器配对的解码器。对于分割的神经网络,可以通过分割学习,即终端和基站通过交互中间特征和梯度实现模型的训练。
第二种情况,该情况下的步骤501包括步骤5013至步骤5015。
步骤5013,第二设备向第一设备发送第一信息。
第二设备向第一设备发送第一信息。相应的,第一设备接收第二设备发送的第一信息。该第一信息的描述可以参考前述,此处不再赘述。
步骤5014,第一设备基于第一信息从第二模型中确定第一模型。
本步骤与前述5012的步骤类似,此处不再赘述赘述。
步骤5015,第一设备向第二设备发送第一模型。
第一设备从第二模型中确定第一模型之后,向第二设备发送第一模型。相应的,第二设备接收第一设备发送的第一模型。
可以看出,上述两种情况下,可以根据实际需要选择。第一种情况是第一设备向第二设备发送第二模型。进而第二设备根据第一信息从第二模型中确定第一模型。第二种情况是第二设备向第一设备上报的是第一信息。进而第一设备根据第一信息从第二模型中确定第一模型,并将第一模型发给第二设备。
步骤502,第一设备向第二设备发送第一参数。本步骤是可选地。
可选地,第一设备获取第一模型之后,还可以向第二设备发送第一参数。相应的,第二设备接收第一设备发送的第一参数。该第一参数用于指示对子网络的调整。
其中,第一参数可以包括以下至少一项:第一模型子网络的增加、删除等。
步骤503,第二设备基于第一参数更新第一模型。本步骤是可选地。
可选地,第二设备接收第一参数之后,基于第一参数更新第一模型。
具体的,第二设备通过接收到的第一参数对第一模型中的子网络进行增加、删除、修改等操作。
可以理解的是,步骤502与步骤503的过程是第一设备触发的。在实际应用中,子网络的调整过程也可以是第二设备触发的,例如,联合学习中增加有新的终端等,具体此处不做限定。
步骤504,第二设备基于本地数据训练第一模型得到第三模型。本步骤是可选地。
可选地,第二设备获取第一模型之后,基于本地数据训练第一模型以得到第三模型。该过程也可以理解为第二设备对第一模型的微调,使得微调后的第三模型更能满足第二设备处数据的推理。
步骤505,第二设备向第一设备发送第三模型。本步骤是可选地。
可选地,第二设备微调第一模型得到第三模型之后,向第一设备发送第三模型。相应的,第一设备接收第二设备发送的第三模型。
步骤506,第一设备基于第三模型更新第二模型得到第四模型。本步骤是可选地。
可选地,第一设备接收第三模型之后,使用第三模型更新本地的第二模型以得到第四模型。
本步骤也可以理解为,基站通过接收下游终端上报的最新模型,调整基站侧的第二模型以得到第四模型。例如,将各终端处模型(即各第一模型)的相同子网络进行融合处理得到第四模型。
步骤507,第一设备向第二设备发送第四模型。本步骤是可选地。
可选地,第一设备获取第四模型之后,可以向第二设备发送第四模型。相应的,第二设备接收第一设备发送的第四模型。
本步骤可以理解为,基站将各终端处模型的相同子网络进行融合处理后得到第四模型。并将第四模型下发至第二设备,使得第二设备根据该第三模型更新本地模型。
步骤508,第二设备基于第四模型更新第三模型。本步骤是可选地。
可选地,第二设备获取第四模型之后,使用该第四模型更新本地的第三模型。
另外,本实施例的步骤501、步骤504至步骤508可以理解为是联合学习。具体的,网 络设备侧基于已有数据训练多专家大模型(即第二模型)。终端设备或网络设备基于能力信息/业务需求信息,确定由一个或多个专家路径确定的子模型(即第一模型)。各终端设备基于本地数据训练得到训练好的子模型(即第三模型)并上报。网络设备根据各终端设备上报的子模型的相同部分进行融合,并将融合后的模型(即第四模型)下发给终端设备,使得终端设备对本地模型进行更新。其中,网络设备或终端设备需要维护终端ID和子模型的路径信息,用于网络设备侧模型融合时参与各模块模型的平均。训练完成后,各终端设备基于子模型推理,网络设备基于大模型推理。终端设备可进一步基于子模型进行知识蒸馏得到更小或更适合本地硬件的模型。
在一种可能实现的方式中,本实施例提供的通信方法包括步骤501。该种情况下,该通信方法可以根据终端的能力信息和/或业务需求信息从第二模型中确定第一模型。且终端的数据都采用这一路径(即第一模型中的子网络)进行推理。
在另一种可能实现的方式中,本实施例提供的通信方法包括步骤501至503。该种情况下,该通信方法可以根据终端的能力信息和/或业务需求信息从第二模型中确定第一模型。且终端的数据都采用这一路径(即第一模型中的子网络)进行推理。此外,还可以实现第一模型中子网络的及时调整。
在另一种可能实现的方式中,本实施例提供的通信方法包括步骤501、步骤504至步骤508。该通信方法可以根据终端的能力信息和/或业务需求信息从第二模型中确定第一模型。且终端的数据都采用这一路径(即第一模型中的子网络)进行推理。此外,可以适用于联合学习场景。该种情况下,
在另一种可能实现的方式中,本实施例提供的通信方法包括步骤501至508。
请参阅图9,本申请实施例提供的通信方法的另一个流程示意图,该方法可以包括步骤901至步骤903。下面对步骤901至步骤903进行详细说明。本实施例中的第一设备与第二设备可以是前述图1A至图4所示实施例中的终端设备与网络设备,也可以是网络设备与服务器/核心网络设备,具体此处不做限定。即图9所示实施例可以与图1A至图8B所示实施例结合。
步骤901,第二设备向第一设备发送第一信息。
第二设备向第一设备发送第一信息。相应的,第一设备接收第二设备发送的第一信息。该第一信息包括第二设备的能力信息和/或业务需求信息。
可选地,能力信息可以包括以下至少一项:计算能力、存储能力等。业务需求信息包括以下至少一项:第二设备的数据分布、第二设备的推理任务等。
步骤902,第一设备向第二设备发送指示信息。
第一设备获取第二设备的第一信息之后,基于该第一信息确定模型的学习模式为联邦学习模式,还是蒸馏学习模式。确定学习模式之后,向第二设备发送指示信息。相应的第二设备接收第一设备发送的指示信息。该指示信息用于指示学习模式为联邦学习模式或蒸馏学习模式。
例如,对于计算能力较差的第二设备,适用小模型。从而确定该第二设备的学习模式为蒸馏学习模式(也可以称为知识蒸馏模式)。又例如,对于计算能力较强的第二设备,适用大模型。从而确定该第二设备的学习模式为联邦学习模式。
步骤903,第二设备基于指示信息发送第二信息。
第二设备接收到指示信息,根据该指示信息确定出学习模式。并基于该指示信息确定向第一设备发送的第二信息。该第二信息用于更新第一设备侧模型。
在一种可能实现的方式中,在蒸馏学习模式下,第二信息为第二设备侧模型推理公共数据集得到的第一结果。
在另一种可能实现的方式中,在联邦学习模式下,第二信息为第二设备侧模型的权重或梯度。
本实施例中,第一设备通过第二设备的能力信息/业务需求信息来确定与第二设备匹配的学习模式,从而可以灵活适用模型训练的场景,提升模型训练效率。
可以看出,基于学习模式的不同,通信流程有所不同。下面以两个UE、一个基站为例结合图10与图11分别进行描述。
第一种,蒸馏学习模式(或称为知识蒸馏模式)。
如图10所示,该通信流程包括步骤1001至步骤1006。
步骤1001,UE1与UE2向基站发送第一信息。
UE1与UE2向基站发送第一信息。UE1的第一信息包括UE1的能力信息和/或业务需求信息。UE2的第一信息包括UE2的能力信息和/或业务需求信息。
其中,能力信息可以包括以下至少一项:计算能力、存储能力等。业务需求信息包括以下至少一项:第二设备的数据分布、第二设备的推理任务等。
步骤1002,基站向UE1与UE2发送指示信息(知识蒸馏)。
基站基于UE1与UE2的第一信息确定学习模式为知识蒸馏模式。并向UE1与UE2发送指示信息,该指示信息用于指示学习模式为知识蒸馏模式。
步骤1003,UE1与UE2本地训练。
UE1与UE2确定学习模式为知识蒸馏模式之后,通过各自的本地数据训练模型。并基于训练好的模型与公共数据集得到推理结果(即第一结果)。
步骤1004,UE1与UE2向基站发送第一结果。
UE1与UE2获取第一结果之后,向基站上报第一结果。
步骤1005,基站模型训练。
基站接收UE1与UE2发送的第一结果之后,使用第一结果对基站侧模型进行更新。并基于更新后的模型与公共数据集得到推理结果(即第二结果)。
步骤1006,基站向UE1与UE2发送第二结果。
基站获取第二结果之后,向UE1与UE2发送第二结果,该第二结果用于UE1与UE2更新各自的本地模型。
图10所示实施例的流程与前述图1B中第一级节点与终端设备之间的流程类似,此处对于类似的描述不再赘述。
第二种,联邦学习模式。
如图11所示,该通信流程包括步骤1101至步骤1106。
步骤1101,UE1与UE2向基站发送第一信息。
UE1与UE2向基站发送第一信息。UE1的第一信息包括UE1的能力信息和/或业务需求信息。UE2的第一信息包括UE2的能力信息和/或业务需求信息。
其中,能力信息可以包括以下至少一项:计算能力、存储能力等。业务需求信息包括以下至少一项:第二设备的数据分布、第二设备的推理任务等。
步骤1102,基站向UE1与UE2发送指示信息(联邦学习)。
基站基于UE1与UE2的第一信息确定学习模式为联邦学习模式。并向UE1与UE2发送指示信息,该指示信息用于指示学习模式为联邦学习模式。
步骤1103,UE1与UE2本地训练。
UE1与UE2确定学习模式为联邦学习模式之后,通过各自的本地数据训练模型。并得到训练好的模型的权重/梯度。
步骤1104,UE1与UE2向基站发送权重/梯度。
UE1与UE2获取权重/梯度之后,向基站上报权重/梯度。
步骤1105,基站模型训练。
基站接收UE1与UE2发送的权重/梯度之后,使用权重/梯度对基站侧模型进行更新。并将更新后模型权重/梯度(可以称为更新后的权重/梯度)。
步骤1106,基站向UE1与UE2发送更新后的权重/梯度。
基站获取第二结果之后,向UE1与UE2发送更新后的权重/梯度,该更新后的权重/梯度用于UE1与UE2更新各自的本地模型。
本实施例中,基站通过UE的能力信息/业务需求信息来确定与UE匹配的学习模式,从而可以灵活适用模型训练的场景,提升模型训练效率。
另外,本申请实施例还提供了一种终端与终端之间的通信流程。该场景如图12所示,该场景以1个基站与4个UE为例。
该场景下,基站未存储有公共数据集,只做数据转发。可以将能力(例如计算能力、存储能力等)强的UE负责大模型融合和维护。各UE根据本地数据训练本地模型。直连UE通过D2D传输模型,融合节点根据本地数据融合模型,包括蒸馏到小模型后发送给其他节点。非直连UE通过基站转发模型,融合节点根据本地数据融合模型后发送给基站。基站可作为融合模型存储节点和融合终端调度节点。模型融合终端的ID作为模型传输的目的地标识。
本实施例中,可以将较强能力的UE负责大模型融合和维护。从而实现通过D2D或者基站中转实现模型传输,实现模型训练过程。
上面对本申请实施例中的通信方法进行了描述,下面对本申请实施例中的相关设备进行描述。
请参阅图13,本申请实施例中第一设备的一个实施例包括:
获取单元1301,用于获取第一结果,第一结果为第二设备侧模型推理公共数据集得到的结果;
更新单元1302,用于基于公共数据集与第一结果更新第一模型以得到第二模型,第一模型为第一设备的本地模型;
发送单元1303,用于向第三设备发送第二模型;
接收单元1304,用于接收第三设备发送的第三模型,第三模型由第二模型处理得到;
获取单元1301,还用于基于第三模型与公共数据集获取第二结果,第二结果用于更新第二设备侧模型。
可选地,获取单元1301,具体用于接收来自第二设备的第一结果,第一结果为第二设备使用第二设备侧模型推理公共数据集得到的结果。
可选地,获取单元1301,具体用于接收来自第二设备的第二设备侧模型;获取单元1301,具体用于使用第二设备侧模型推理公共数据集得到第一结果。
可选地,发送单元1303,还用于向第二设备发送第二结果,第二结果用于第二设备更新第二设备侧模型。
可选地,更新单元1302,还用于基于第二结果更新第二设备侧模型;发送单元1303,还用于向第二设备发送更新后的第二设备侧模型。
可选地,发送单元1303,还用于向第二设备发送指示信息,指示信息用于第一设备与第二设备同步公共数据集,同步对应的操作包括以下至少一项:增加、删除、修改;接收单元1304,还用于接收第二设备发送的确认信息,确认信息用于同步公共数据集。
本实施例中,第一设备中各单元所执行的操作与前述图1A至图12所示实施例中描述的类似,此处不再赘述。
本实施例中,更新单元1302根据下游第二设备侧模型的第一结果更新本地模型,发送单元1303向上游第三设备发送更新后的本地模型。从而接收单元1304接收上游第三设备根据第二设备更新的模型处理得到的第三模型。进而根据该第三模型更新对公共数据集新的第二结果,该第二结果用于更新下游第二设备侧模型。可以看出,第一设备在多级联合学习场景下,充分利用计算能力参与联合训练。相较于中间级只做转发的方案,可以减少上游设备的处理流程,从而增加多级联合学习的学习效率。
请参阅图14,本申请实施例中第二设备的一个实施例包括:
获取单元1401,用于获取第一模型,第一模型基于第二设备的第一信息与第二模型得到,第一模型为第二模型的一部分;第一信息包括能力信息和/或业务需求信息。
可选地,第二模型包括N层第一网络,N层第一网络中至少一层第一网络包括两个以上并行的子网络,第一模型包括N层第二网络,第一数量小于第二数量,第一数量为N层第二网络中至少一层第二网络包括子网络的数量,第二数量为N层第一网络中对应至少一层第二网络的第一网络所包括子网络的数量,N为正整数。
可选地,第二设备还包括:接收单元1402,用于接收来自第一设备的第一参数,第一参数用于指示对子网络的调整;
可选地,第二设备还包括:更新单元1403,用于基于第一参数更新第一模型。
可选地,接收单元1402,用于接收来自第一设备的第二模型;获取单元1401,具体用于基于第一信息从第二模型中确定第一模型。
可选地,获取单元1401,具体用于基于第一信息确定N层第一网络中各层第一网络的子网络;获取单元1401,具体用于基于子网络构建第一模型。
可选地,能力信息用于确定N层第二网络中各第二网络的子网络的数量,业务需求信息用于确定各第二网络中子网络。
可选地,获取单元1401,具体用于向第一设备发送第一信息,第一信息用于第一设备从第二模型中确定第一模型;获取单元1401,具体用于接收第一设备发送的第一模型。
可选地,更新单元1403,用于基于本地数据训练第一模型得到第三模型;发送单元1404,用于向第一设备发送第三模型,第三模型用于第一设备更新第二模型。
可选地,获取单元1401,还用于获取第四模型,第四模型由第三模型更新第二模型得到;更新单元1403,用于基于第四模型更新第三模型。
可选地,第一模型包括N层第一网络,第二模型包括M层第一网络,N与M为正整数,且M小于或等于N。
本实施例中,第二设备中各单元所执行的操作与前述图1A至图12所示实施例中描述的类似,此处不再赘述。
本实施例中,第二设备可以通过第二设备的能力信息和/或业务需求信息从第二模型中确定第一模型。且第二设备的数据都采用这一路径(即第二模型的一个子结构)进行推理。
请参阅图15,本申请实施例中第一设备的另一个实施例包括:
接收单元1501,用于接收第二设备发送的第一信息,第一信息包括第二设备的能力信息和/或业务需求信息,第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;
确定单元1502,用于确定第一设备与第二设备的学习模式,学习模式为联邦学习模式或蒸馏学习模式;
发送单元1503,用于向第二设备发送指示信息,指示信息用于指示学习模式;
接收单元1501,还用于接收第二设备发送的第二信息,第二信息用于更新第一设备侧的模型。
可选地,在联邦学习模式下,第二信息为模型的权重或梯度。
可选地,在蒸馏学习模式下,第二信息为模型推理公共数据集得到的结果。
本实施例中,第一设备中各单元所执行的操作与前述图1A至图12所示实施例中描述的类似,此处不再赘述。
本实施例中,第一设备通过第二设备的能力信息/业务需求信息来确定与第二设备匹配的学习模式,从而可以灵活适用模型训练的场景,提升模型训练效率。
请参阅图16,本申请实施例中第二设备的另一个实施例包括:
发送单元1601,用于向第一设备发送第一信息,第一信息包括第二设备的能力信息和/或业务需求信息,第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;
接收单元1602,用于接收第一设备发送的指示信息,指示信息用于指示学习模式;
发送单元1601,用于基于学习模式向第一设备发送第二信息,第二信息用于更新第一设备侧的模型。
可选地,在联邦学习模式下,第二信息为模型的权重或梯度。
可选地,在蒸馏学习模式下,第二信息为模型推理公共数据集得到的结果。
本实施例中,第二设备中各单元所执行的操作与前述图1A至图12所示实施例中描述的类似,此处不再赘述。
本实施例中,第一设备通过第二设备的能力信息/业务需求信息来确定与第二设备匹配的学习模式,从而可以灵活适用模型训练的场景,提升模型训练效率。
请参阅图17,为本申请的实施例提供的上述实施例中所涉及的第二设备的结构示意图,其中,该第二设备具体可以为前述实施例中的第二设备/网络设备,该第二设备的结构可以参考图17所示的结构。
第二设备包括至少一个处理器1711、至少一个存储器1712、至少一个收发器1713、至少一个网络接口1714和一个或多个天线1715。处理器1711、存储器1712、收发器1713和网络接口1714相连,例如通过总线相连,在本申请实施例中,所述连接可包括各类接口、传输线或总线等,本实施例对此不做限定。天线1715与收发器1713相连。网络接口1714用于使得第二设备通过通信链路,与其它通信设备相连,例如网络接口1714可以包括第二设备与核心网设备之间的网络接口,例如S1接口,网络接口可以包括第二设备和其他网络设备(例如其他接入网设备或者核心网设备)之间的网络接口,例如X2或者Xn接口。
处理器1711主要用于对通信协议以及通信数据进行处理,以及对整个第二设备进行控制,执行软件程序,处理软件程序的数据,例如用于支持第二设备执行实施例中所描述的动作。第二设备可以可以包括基带处理器和中央处理器,基带处理器主要用于对通信协议以及通信数据进行处理,中央处理器主要用于对整个终端设备进行控制,执行软件程序,处理软件程序的数据。图17中的处理器1711可以集成基带处理器和中央处理器的功能,本领域技术人员可以理解,基带处理器和中央处理器也可以是各自独立的处理器,通过总线等技术互联。本领域技术人员可以理解,终端设备可以包括多个基带处理器以适应不同的网络制式,终端设备可以包括多个中央处理器以增强其处理能力,终端设备的各个部件可以通过各种总线连接。所述基带处理器也可以表述为基带处理电路或者基带处理芯片。所述中央处理器也可以表述为中央处理电路或者中央处理芯片。对通信协议以及通信数据进行处理的功能可以内置在处理器中,也可以以软件程序的形式存储在存储器中,由处理器执行软件程序以实现基带处理功能。
存储器主要用于存储软件程序和数据。存储器1712可以是独立存在,与处理器1711相连。可选的,存储器1712可以和处理器1711集成在一起,例如集成在一个芯片之内。其中,存储器1712能够存储执行本申请实施例的技术方案的程序代码,并由处理器1711来控制执行,被执行的各类计算机程序代码也可被视为是处理器1711的驱动程序。
图17仅示出了一个存储器和一个处理器。在实际的终端设备中,可以存在多个处理器和多个存储器。存储器也可以称为存储介质或者存储设备等。存储器可以为与处理器处于同一芯片上的存储元件,即片内存储元件,或者为独立的存储元件,本申请实施例对此不做限定。
收发器1713可以用于支持第二设备与终端之间射频信号的接收或者发送,收发器1713可以与天线1715相连。收发器1713包括发射机Tx和接收机Rx。具体地,一个或多个天线1715可以接收射频信号,该收发器1713的接收机Rx用于从天线接收所述射频信号,并将射频信号转换为数字基带信号或数字中频信号,并将该数字基带信号或数字中频信号提供给所 述处理器1711,以便处理器1711对该数字基带信号或数字中频信号做进一步的处理,例如解调处理和译码处理。此外,收发器1713中的发射机Tx还用于从处理器1711接收经过调制的数字基带信号或数字中频信号,并将该经过调制的数字基带信号或数字中频信号转换为射频信号,并通过一个或多个天线1715发送所述射频信号。具体地,接收机Rx可以选择性地对射频信号进行一级或多级下混频处理和模数转换处理以得到数字基带信号或数字中频信号,所述下混频处理和模数转换处理的先后顺序是可调整的。发射机Tx可以选择性地对经过调制的数字基带信号或数字中频信号时进行一级或多级上混频处理和数模转换处理以得到射频信号,所述上混频处理和数模转换处理的先后顺序是可调整的。数字基带信号和数字中频信号可以统称为数字信号。
收发器也可以称为收发单元、收发机、收发装置等。可选的,可以将收发单元中用于实现接收功能的器件视为接收单元,将收发单元中用于实现发送功能的器件视为发送单元,即收发单元包括接收单元和发送单元,接收单元也可以称为接收机、输入口、接收电路等,发送单元可以称为发射机、发射器或者发射电路等。
需要说明的是,图17所示第二设备具体可以用于实现图1A至图12所示方法实施例中网络设备所实现的步骤,并实现网络设备对应的技术效果,图17所示第二设备的具体实现方式,均可以参考图1A至图12所示方法实施例中的叙述,此处不再一一赘述。
请参阅图18,本申请实施例提供了另一种第二设备。为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请实施例方法部分。该第二设备可以为包括手机、平板电脑、个人数字助理(personal digital assistant,PDA)、销售终端设备(point of sales,POS)、车载电脑等任意终端设备,以终端设备为手机为例:
图18示出的是与本申请实施例提供的终端设备相关的手机的部分结构的框图。参考图18,手机包括:射频(radio frequency,RF)电路1810、存储器1820、输入单元1830、显示单元1840、传感器1850、音频电路1860、无线保真(wireless fidelity,WiFi)模块1870、处理器1880、以及电源1890等部件。本领域技术人员可以理解,图18中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图18对手机的各个构成部件进行具体的介绍:
RF电路1810可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器1880处理;另外,将设计上行的数据发送给基站。通常,RF电路1810包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(low noise amplifier,LNA)、双工器等。此外,RF电路1810还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(global system of mobile communication,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址(code division multiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA)、长期演进(long term evolution,LTE)、电子邮件、短消息服务(short messaging service,SMS)等。
存储器1820可用于存储软件程序以及模块,处理器1880通过运行存储在存储器1820的 软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器1820可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1820可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元1830可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元1830可包括触控面板1831以及其他输入设备1832。触控面板1831,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1831上或在触控面板1831附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板1831可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1880,并能接收处理器1880发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1831。除了触控面板1831,输入单元1830还可以包括其他输入设备1832。具体地,其他输入设备1832可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元1840可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元1840可包括显示面板1841,可选的,可以采用液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)等形式来配置显示面板1841。进一步的,触控面板1831可覆盖显示面板1841,当触控面板1831检测到在其上或附近的触摸操作后,传送给处理器1880以确定触摸事件的类型,随后处理器1880根据触摸事件的类型在显示面板1841上提供相应的视觉输出。虽然在图18中,触控面板1831与显示面板1841是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板1831与显示面板1841集成而实现手机的输入和输出功能。
手机还可包括至少一种传感器1850,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板1841的亮度,接近传感器可在手机移动到耳边时,关闭显示面板1841和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路1860、扬声器1861,传声器1862可提供用户与手机之间的音频接口。音频电路1860可将接收到的音频数据转换后的电信号,传输到扬声器1861,由扬声器1861转换为声音信号输出;另一方面,传声器1862将收集的声音信号转换为电信号,由音频电路1860接收后转换为音频数据,再将音频数据输出处理器1880处理后,经RF电路1810以发送给比如另一手机,或者将音频数据输出至存储器1820以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块1870可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图18示出了WiFi模块1870,但是可以理解的是,其并不属于手机的必须构成。
处理器1880是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器1820内的软件程序和/或模块,以及调用存储在存储器1820内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器1880可包括一个或多个处理单元;优选的,处理器1880可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1880中。
手机还包括给各个部件供电的电源1890(比如电池),优选的,电源可以通过电源管理系统与处理器1880逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
在本申请实施例中,该终端设备所包括的处理器1880可以执行前述图1A至图12所示实施例中的功能,此处不再赘述。
本申请实施例还提供一种存储一个或多个计算机执行指令的计算机可读存储介质,当计算机执行指令被处理器执行时,该处理器执行如前述实施例中第一设备/第二设备/第三设备可能的实现方式所述的方法。
本申请实施例还提供一种存储一个或多个计算机的计算机程序产品(或称计算机程序),当计算机程序产品被该处理器执行时,该处理器执行上述第一设备/第二设备/第三设备可能实现方式的方法。
本申请实施例还提供了一种芯片系统,该芯片系统包括至少一个处理器,用于支持终端设备实现上述第一设备/第二设备/第三设备可能的实现方式中所涉及的功能。可选的,所述芯片系统还包括接口电路,所述接口电路为所述至少一个处理器提供程序指令和/或数据。在一种可能的设计中,该芯片系统还可以包括存储器,存储器,用于保存该终端设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个 单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,read-only memory)、随机存取存储器(RAM,random access memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (31)

  1. 一种通信方法,其特征在于,所述方法应用于第一设备,所述方法包括:
    获取第一结果,所述第一结果为第二设备侧模型推理公共数据集得到的结果;
    基于所述公共数据集与所述第一结果更新第一模型以得到第二模型,所述第一模型为所述第一设备的本地模型;
    向第三设备发送第二模型;
    接收所述第三设备发送的第三模型,所述第三模型由所述第二模型处理得到;
    基于所述第三模型与所述公共数据集获取第二结果,所述第二结果用于更新所述第二设备侧模型。
  2. 根据权利要求1所述的方法,其特征在于,所述获取第一结果,包括:
    接收来自第二设备的第一结果,所述第一结果为所述第二设备使用所述第二设备侧模型推理所述公共数据集得到的结果。
  3. 根据权利要求1所述的方法,其特征在于,所述获取第一结果,包括:
    接收来自所述第二设备的所述第二设备侧模型;
    使用所述第二设备侧模型推理所述公共数据集得到所述第一结果。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    向第二设备发送所述第二结果,所述第二结果用于所述第二设备更新所述第二设备侧模型。
  5. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    基于所述第二结果更新所述第二设备侧模型;
    向所述第二设备发送更新后的第二设备侧模型。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述方法还包括:
    向所述第二设备发送指示信息,所述指示信息用于所述第一设备与所述第二设备同步所述公共数据集,所述同步对应的操作包括以下至少一项:增加、删除、修改;
    接收所述第二设备发送的确认信息,所述确认信息用于同步所述公共数据集。
  7. 一种通信方法,其特征在于,所述方法应用于第二设备,所述方法包括:
    获取第一模型,所述第一模型基于所述第二设备的第一信息与第二模型得到,所述第一模型为所述第二模型的一部分;所述第一信息包括能力信息和/或业务需求信息。
  8. 根据权利要求7所述的方法,其特征在于,所述第二模型包括N层第一网络,所述N层第一网络中至少一层第一网络包括两个以上并行的子网络,所述第一模型包括N层第二网络,第一数量小于第二数量,所述第一数量为所述N层第二网络中至少一层第二网络包括子网络的数量,所述第二数量为所述N层第一网络中对应所述至少一层第二网络的第一网络所包括子网络的数量,N为正整数。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    接收来自第一设备的第一参数,所述第一参数用于指示对所述子网络的调整;
    基于所述第一参数更新所述第一模型。
  10. 根据权利要求7至9中任一项所述的方法,其特征在于,所述方法还包括:
    接收来自第一设备的所述第二模型;
    所述获取第一模型,包括:
    基于所述第一信息从所述第二模型中确定所述第一模型。
  11. 根据权利要求10所述的方法,其特征在于,所述基于所述第一信息从所述第二模型中确定所述第一模型,包括:
    基于所述第一信息确定所述N层第一网络中各层第一网络的子网络;
    基于所述子网络构建所述第一模型。
  12. 根据权利要求7至11中任一项所述的方法,其特征在于,所述能力信息用于确定所述N层第二网络中各第二网络的子网络的数量,所述业务需求信息用于确定所述各第二网络中子网络。
  13. 根据权利要求7至9中任一项所述的方法,其特征在于,所述获取第一模型,包括:
    向所述第一设备发送所述第一信息,所述第一信息用于第一设备从所述第二模型中确定所述第一模型;
    接收所述第一设备发送的所述第一模型。
  14. 根据权利要求7至13中任一项所述的方法,其特征在于,所述方法还包括:
    基于本地数据训练所述第一模型得到第三模型;
    向所述第一设备发送所述第三模型,所述第三模型用于所述第一设备更新所述第二模型。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    获取第四模型,所述第四模型由所述第三模型更新所述第二模型得到;
    基于所述第四模型更新所述第三模型。
  16. 根据权利要求7所述的方法,其特征在于,所述第一模型包括N层第一网络,所述第二模型包括M层第一网络,N与M为正整数,且M小于或等于N。
  17. 一种通信方法,其特征在于,所述方法应用于第一设备,所述方法包括:
    接收第二设备发送的第一信息,所述第一信息包括所述第二设备的能力信息和/或业务需求信息,所述第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;
    确定所述第一设备与所述第二设备的学习模式,所述学习模式为联邦学习模式或蒸馏学习模式;
    向所述第二设备发送指示信息,所述指示信息用于指示所述学习模式;
    接收所述第二设备发送的第二信息,所述第二信息用于更新第一设备侧的模型。
  18. 根据权利要求17所述的方法,其特征在于,在所述联邦学习模式下,所述第二信息为所述模型的权重或梯度。
  19. 根据权利要求17所述的方法,其特征在于,在所述蒸馏学习模式下,所述第二信息为所述模型推理公共数据集得到的结果。
  20. 一种通信方法,其特征在于,所述方法应用于第二设备,所述方法包括:
    向第一设备发送第一信息,所述第一信息包括所述第二设备的能力信息和/或业务需求信息,所述第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;
    接收所述第一设备发送的指示信息,所述指示信息用于指示所述学习模式;
    基于所述学习模式向所述第一设备发送第二信息,所述第二信息用于更新所述第一设备侧的模型。
  21. 根据权利要求20所述的方法,其特征在于,在所述联邦学习模式下,所述第二信息为所述模型的权重或梯度。
  22. 根据权利要求20所述的方法,其特征在于,在所述蒸馏学习模式下,所述第二信息为所述模型推理公共数据集得到的结果。
  23. 一种第一设备,其特征在于,所述第一设备包括:
    获取第一结果,所述第一结果为第二设备侧模型推理公共数据集得到的结果;
    基于所述公共数据集与所述第一结果更新第一模型以得到第二模型,所述第一模型为所述第一设备的本地模型;
    向第三设备发送第二模型;
    接收所述第三设备发送的第三模型,所述第三模型由所述第二模型处理得到;
    基于所述第三模型与所述公共数据集获取第二结果,所述第二结果用于更新所述第二设备侧模型。
  24. 一种第二设备,其特征在于,所述第二设备包括:
    获取第一模型,所述第一模型基于所述第二设备的第一信息与第二模型得到,所述第一模型为所述第二模型的一部分;所述第一信息包括能力信息和/或业务需求信息。
  25. 一种第一设备,其特征在于,所述第一设备包括:
    接收第二设备发送的第一信息,所述第一信息包括所述第二设备的能力信息和/或业务需求信息,所述第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;
    确定所述第一设备与所述第二设备的学习模式,所述学习模式为联邦学习模式或蒸馏学习模式;
    向所述第二设备发送指示信息,所述指示信息用于指示所述学习模式;
    接收所述第二设备发送的第二信息,所述第二信息用于更新第一设备侧的模型。
  26. 一种第二设备,其特征在于,所述第二设备包括:
    向第一设备发送第一信息,所述第一信息包括所述第二设备的能力信息和/或业务需求信息,所述第一信息用于确定模型的学习模式为联邦学习模式或蒸馏学习模式;
    接收所述第一设备发送的指示信息,所述指示信息用于指示所述学习模式;
    基于所述学习模式向所述第一设备发送第二信息,所述第二信息用于更新所述第一设备侧的模型。
  27. 一种第一设备,其特征在于,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,当所述程序或指令被所述处理器执行时,使得所述第一设备执行如权利要求1-6、17-19中任一项所述的方法。
  28. 一种第二设备,其特征在于,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,当所述程序或指令被所述处理器执行时,使得所述第二设备执行如权利要求7-16、20-22中任一项所述的方法。
  29. 一种通信系统,其特征在于,所述通信系统包括如权利要求27所述的第一设备,和/或如权利要求28所述的第二设备。
  30. 一种计算机可读存储介质,其特征在于,所述介质存储有指令,当所述指令被计算机执行时,实现权利要求1至22中任一项所述的方法。
  31. 一种计算机程序产品,其特征在于,包括指令,当所述指令在计算机上运行时,使得计算机执行如权利要求1至22中任一项所述的方法。
PCT/CN2022/123355 2022-09-30 2022-09-30 一种通信方法及相关设备 WO2024065709A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/123355 WO2024065709A1 (zh) 2022-09-30 2022-09-30 一种通信方法及相关设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/123355 WO2024065709A1 (zh) 2022-09-30 2022-09-30 一种通信方法及相关设备

Publications (1)

Publication Number Publication Date
WO2024065709A1 true WO2024065709A1 (zh) 2024-04-04

Family

ID=90475671

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/123355 WO2024065709A1 (zh) 2022-09-30 2022-09-30 一种通信方法及相关设备

Country Status (1)

Country Link
WO (1) WO2024065709A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210166083A1 (en) * 2019-11-29 2021-06-03 EMC IP Holding Company LLC Methods, devices, and computer program products for model adaptation
CN113379042A (zh) * 2021-07-23 2021-09-10 支付宝(杭州)信息技术有限公司 保护数据隐私的业务预测模型训练方法及装置
WO2022012257A1 (zh) * 2020-07-13 2022-01-20 华为技术有限公司 通信的方法及通信装置
WO2022022274A1 (zh) * 2020-07-31 2022-02-03 华为技术有限公司 一种模型训练方法及装置
WO2022041947A1 (zh) * 2020-08-24 2022-03-03 华为技术有限公司 一种更新机器学习模型的方法及通信装置
CN114154643A (zh) * 2021-11-09 2022-03-08 浙江师范大学 基于联邦蒸馏的联邦学习模型的训练方法、系统和介质
WO2022057510A1 (zh) * 2020-09-21 2022-03-24 华为技术有限公司 协同推理方法及通信装置
WO2022169136A1 (en) * 2021-02-02 2022-08-11 Samsung Electronics Co., Ltd. Method, system and apparatus for federated learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210166083A1 (en) * 2019-11-29 2021-06-03 EMC IP Holding Company LLC Methods, devices, and computer program products for model adaptation
WO2022012257A1 (zh) * 2020-07-13 2022-01-20 华为技术有限公司 通信的方法及通信装置
WO2022022274A1 (zh) * 2020-07-31 2022-02-03 华为技术有限公司 一种模型训练方法及装置
WO2022041947A1 (zh) * 2020-08-24 2022-03-03 华为技术有限公司 一种更新机器学习模型的方法及通信装置
WO2022057510A1 (zh) * 2020-09-21 2022-03-24 华为技术有限公司 协同推理方法及通信装置
WO2022169136A1 (en) * 2021-02-02 2022-08-11 Samsung Electronics Co., Ltd. Method, system and apparatus for federated learning
CN113379042A (zh) * 2021-07-23 2021-09-10 支付宝(杭州)信息技术有限公司 保护数据隐私的业务预测模型训练方法及装置
CN114154643A (zh) * 2021-11-09 2022-03-08 浙江师范大学 基于联邦蒸馏的联邦学习模型的训练方法、系统和介质

Similar Documents

Publication Publication Date Title
Ateya et al. Study of 5G services standardization: Specifications and requirements
CN110958636B (zh) Csi报告的上报方法、终端设备及网络设备
CN110120878A (zh) 获取链路质量的方法和装置
CN110049519A (zh) 会话建立方法、会话转移方法、设备和存储介质
WO2022048505A1 (zh) 多流关联传输的方法、装置及系统
CN111263410B (zh) 一种资源预留方法及设备
US20230180068A1 (en) Electronic apparatus, radio communication method, and computer-readable storage medium
CN111436139B (zh) 一种模式控制方法、终端和网络侧设备
WO2019196826A1 (zh) 旁链路信息的传输方法及设备
CN112788783B (zh) 一种中继连接建立方法及设备
US20230026021A1 (en) Relay ue determining method and device
CN110072279A (zh) 一种网络注册模式切换的方法及终端
WO2024055642A1 (zh) 确定调度信息类型的方法、装置、网络设备及存储介质
CN112637953A (zh) 一种切换bwp的方法及终端设备
WO2024065709A1 (zh) 一种通信方法及相关设备
CN111835547A (zh) 服务质量QoS管理方法及相关设备
WO2022001753A1 (zh) 一种通信方法及装置
CN109587260A (zh) 一种资源获取方法、装置以及系统
CN105430700A (zh) 基于多虚拟接入点关联接入的无线局域网移动性管理方法
EP4185011A1 (en) Relay communication method, and communication apparatus
CN110621022A (zh) 一种资源指示方法、装置及系统
WO2024093717A1 (zh) 电子设备、用于无线通信的方法以及计算机可读存储介质
CN110351889B (zh) 一种连接建立方法和设备
CN113453160B (zh) 一种通信方法及相关设备
WO2023125598A1 (zh) 一种通信方法及通信装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22960328

Country of ref document: EP

Kind code of ref document: A1