WO2023226650A1 - 一种模型训练的方法和装置 - Google Patents

一种模型训练的方法和装置 Download PDF

Info

Publication number
WO2023226650A1
WO2023226650A1 PCT/CN2023/089751 CN2023089751W WO2023226650A1 WO 2023226650 A1 WO2023226650 A1 WO 2023226650A1 CN 2023089751 W CN2023089751 W CN 2023089751W WO 2023226650 A1 WO2023226650 A1 WO 2023226650A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
processing
processing node
node
public
Prior art date
Application number
PCT/CN2023/089751
Other languages
English (en)
French (fr)
Inventor
童文
马江镭
李榕
王坚
张公正
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023226650A1 publication Critical patent/WO2023226650A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence, and more specifically, to a method and device for model training.
  • each device With the advent of the big data era, each device generates a large amount of raw data in various forms every day. In order to make full use of this data for model training, the two most typical training architectures are centralized learning (CL). ) and federated learning (FL).
  • CL centralized learning
  • FL federated learning
  • federated learning is a distributed machine learning method.
  • local data on multiple edge devices can be used to train models on the multiple edge devices, and then the trained model is uploaded to the center.
  • the central server can serve as a processing node to aggregate models from multiple edge devices to generate a common model, and deliver the common model to the multiple edge devices, so that the multiple edge devices can generate the public model based on local data.
  • the model is updated.
  • the processing nodes used to generate the public model are fixed.
  • the public model can only be generated through the central server.
  • using the central server as a processing node may not be the optimal solution, for example, as the network topology changes and the data generated by edge devices changes.
  • This application provides a method and device for model training.
  • the method can determine the processing nodes for the next round of model processing according to actual needs before the next round of model processing, so as to adapt to changes in application scenarios.
  • a method of model processing is provided, which method can be executed by a processing device, or can also be executed by a chip or chip system or circuit configured in the processing device, which can be collectively referred to as a processing node.
  • This application discusses No restrictions. The following description takes execution by the first processing node as an example.
  • the method may include: a first processing node obtains at least one first model; the first processing node processes at least one first model to generate a first public model; the first processing node determines a second processing node, and the second processing node is The processing node of the next round of model processing, the first public model is obtained by the second processing node before the next round of model processing.
  • the preferred processing node may change. Therefore, the first processing node passes as the next Wheel model processing determines appropriate processing nodes, which can better adapt to changes in application scenarios, thereby improving model training. performance.
  • the first processing node and the second processing node are different processing nodes, and the method further includes: the first processing node sending the first public model to the second processing node.
  • the first processing node and the second processing node are different processing nodes
  • the first processing node can transmit the generated common model to the second processing node
  • the second processing node can Based on the public model obtained from one round of model processing, the public model is updated and optimized to improve the efficiency and performance of model training.
  • the second processing node can be arbitrarily designated by the first processing node according to actual needs, the public model can be continuously transmitted between different nodes in the network through multiple rounds of model processing.
  • the first processing node can send the public model to the second processing node after generating the public model without sending the public model to all participating nodes, it is beneficial to reduce communication overhead.
  • the first processing node and the second processing node are the same processing node.
  • the processing node of this round of model processing (the first processing node) and the processing node of the next round of model processing (the second processing node) can be the same processing node.
  • the first processing node can There is no need to send the first public model to the second processing node.
  • the first processing node determines the second processing node, including: the first processing node determines the second processing node based on the indication of the first public model.
  • the first processing node may determine the second processing node based on the indication of the first public model.
  • the first public model may indicate the second processing node to the first processing node according to the characteristics of the first public model.
  • the characteristic of the first public model may be the size of the parameters of the first public model.
  • the first public model can indicate the first processing.
  • the node determines a node with strong computing power as the second processing node; for another example, the characteristics of the first public model may be the current functional characteristics of the first public model.
  • the current function of the first public model is a classification function. Therefore, if there is local data for a classification learning task at a certain node in the network, in this case, the first public model can instruct the first processing
  • the node determines the node as the second processing node.
  • a first public model may indicate a second processing node to a first processing node based on parameters of the first public model.
  • the parameters of the first public model include corresponding routing information, and the routing information can be used to indicate the processing node of the next round of model processing. Therefore, the first processing node can determine the first processing node based on the routing information in the first public model. Two processing nodes.
  • the first processing node acquiring at least one first model includes: the first processing node receiving the first model from at least one participating node.
  • the first processing node may obtain at least one first model by receiving a first model from at least one participating node, or in other words, the at least one first model may include a first model from at least one participating node. one model, so that the first processing node can fully utilize the first model from the participating nodes to perform model processing to generate a public model with better performance.
  • the first processing node receives a message from at least one participating Before the first model of the node, the method further includes: the first processing node sending indication information to at least one participating node, the indication information being used to instruct at least one participating node to send the first model of at least one participating node to the first processing node.
  • the instruction information can be used to trigger at least one participating node to send (upload) the first model of the at least one participating node to the first processing node.
  • the participating node has generated the first model before receiving the instruction information, or in other words, the participating node has locally saved the first model. At this time, if the participating node receives the first model from the first processing node instruction information, the first model can be uploaded to the first processing node according to the instruction of the instruction information.
  • the participating node has not yet generated the first model when receiving the instruction information, or in other words, the participating node does not save the first model locally.
  • the participating node may generate the first model after receiving the instruction information from the first processing node, and further, may upload the generated first model to the first processing node.
  • the first processing node acquires at least one first model, including: the first processing node generates a first model of the first processing node.
  • the first processing node can also obtain at least one first model by generating the first model itself, or in other words, the at least one first model can also include the first model generated by the first processing node. , thus, the first processing node can make full use of the first model of each node in the network to perform model film processing to generate a public model with better performance.
  • the first processing node processes at least one first model and generates a first public model, including: the first processing node performs aggregation processing on at least one first model. , generate the first public model.
  • the first processing node may perform aggregation processing on at least one first model to generate a first common model.
  • the aggregation processing can cause at least one first model to be fused into a public model with better performance, thereby improving the performance of the public model generated by the first processing node.
  • the first processing node performs aggregation processing on at least one first model to generate a first public model, including: the first processing node performs aggregation processing on at least one first model parameter Perform processing to generate the first public model.
  • the first processing node can process parameters of at least one first model, thereby generating a first public model with better performance.
  • the first processing node may generate the first public model by averaging parameters of at least one first model, where the values of the parameters of the first public model are the values of the at least one first model. The average of the parameters of a model.
  • the first processing node may also generate the first public model by calculating other statistical values of parameters of the at least one first model.
  • the first processing node may generate the first public model by calculating the median of the at least one first model parameter.
  • the value of the parameter of the generated first public model is the value of the at least one first model. The median of the parameters.
  • the first processing node processes parameters of at least one first model to generate a first public model, including: the first processing node processes parameters of at least one first model.
  • the parameters are averaged to generate a first public model, where the value of the parameter of the first public model is the average value of the parameters of at least one first model.
  • the first processing node can generate the first public model by averaging the parameters of at least one first model, wherein the values of the parameters of the first public model are the values of the at least one first model. the average value of the parameters.
  • the averaging process may be a weighted averaging process, that is, the first public model is generated by performing a weighted averaging process on the parameters of at least one first model. At this time, the generated first public model The value of the parameter of the model is the weighted average of the parameters of the at least one first model.
  • At least one first model has the same network structure.
  • the at least one first model has the same network structure, so that the first processing node can process the parameters of the at least one first model more conveniently.
  • the method further includes: the first processing node performs distillation processing on at least one first model, and the distillation processing causes the at least one first model to have the same network structure.
  • the first processing node in order to make the at least one first model have the same network structure, can perform distillation processing on the at least one first model.
  • the distillation processing can make the at least one first model have the same network structure.
  • the network structure is such that the first processing node can more conveniently process the parameters of the at least one first model.
  • the first processing node performs aggregation processing on at least one first model to generate a first public model, including: the first processing node splices the at least one first model , generate the first public model.
  • the first processing node can generate a first public model with better performance by splicing at least one first model.
  • the network structure of at least one first model may be the same or different.
  • the first processing node may splice the input end and the output end of the at least one first model respectively, thereby realizing splicing of the at least one first model.
  • the first processing node may connect the input terminals of the at least one first model through a single-layer perceptron, and merge the output terminals of the at least one first model into a single-layer output, thereby achieving processing of the at least one first model. Model stitching.
  • At least one first model includes a second public model
  • the second public model is the public model obtained from the previous round of model processing.
  • the first processing node can generate a public model for this round based on the second public model. That is to say, the first processing node can further optimize the public model based on the public model obtained in the previous round to further improve the performance of the public model.
  • the method further includes: the first processing node receiving the second public model from the third processing node, and the third processing node is the processing node of the previous round of model processing.
  • the first processing node when the first processing node and the third processing node are different processing nodes, the first processing node can receive the second public model from the third processing node, and further, the first processing node can The public model is optimized based on the two public models to further improve the performance of the public model.
  • the first processing node and the third processing node may be the same processing node, that is, the processing node of the previous round of model processing and the processing node of this round of model processing are determined to be the same node.
  • the second processing node is determined based on one or more of the following information: the topology of the network, the data quality of the second processing node, the Calculate ability.
  • the second processing node may be determined according to the topological structure of the network. For example, if a certain node is in a more favorable position in the network topology (for example, the node at this position facilitates communication with other nodes in the network), then the node can be determined as the second processing node. This will help improve the transmission efficiency of the model in the network.
  • the second processing node may be determined according to the data quality of the second processing node. For example, if the data quality of a certain node is relatively high, the node can be determined as the second processing node. For another example, if the data quality of nodes in a certain area of the network is relatively high, a node from this area can be determined as the second processing node. In this way, it is beneficial to improve the performance of the public model generated by the second processing node.
  • the second processing node may be determined based on the computing capability of the second processing node. For example, the computing power of each node can be compared, thereby determining the node with stronger computing power as the second processing node. This will help improve the efficiency of model training.
  • one of the above three pieces of information can be considered individually based on the actual task requirements, or any two or more pieces of information can be comprehensively considered, thus, It is helpful to determine a suitable second processing node for specific application scenarios, thereby improving the performance of model training.
  • the second aspect provides a model processing method, which can be executed by a processing node, or can also be executed by a chip or chip system or circuit configured in the processing node. This application does not limit this. For convenience of description, the following description takes execution by the first processing node as an example.
  • the method may include: the first processing node acquires at least one first model; the first processing node processes the at least one first model to generate a first public model, wherein the at least one first model includes a second public model, and the second The public model is the public model obtained from the previous round of model processing.
  • the first model when the first processing node is the processing node of the last round of model processing, the first model may include the common model (second common model) obtained by the previous round of model processing, so that the last round of model
  • the processed processing node can perform final model processing based on the public model obtained from the previous round of model processing to obtain a high-performance public model.
  • a model training device in the third aspect, includes: an acquisition unit and a processing unit.
  • the acquisition unit is used to acquire at least one first model; the processing unit is used to process at least one first model to generate a first public model; the processing unit is also used to determine the first public model.
  • Two processing nodes the second processing node is the processing node for the next round of model processing, and the first public model is obtained by the second processing node before the next round of model processing.
  • the device and the second processing node are different processing nodes, and the device further includes a sending unit, a sending unit configured to send the first public data to the second processing node.
  • the obtaining unit and the sending unit are the same unit, or the obtaining unit includes the sending unit.
  • the device and the second processing node are the same processing node.
  • the processing unit is further configured to determine the second processing node based on the indication of the first public model.
  • the acquisition unit is further configured to receive the first model from at least one participating node.
  • the device further includes a sending unit, the sending unit is used to send indication information to at least one participating node, and the indication information is used to instruct at least one participating node to send a message to the device.
  • the obtaining unit and the sending unit are the same unit, or the obtaining unit includes the sending unit.
  • the acquisition unit is also used to generate a first model of the device.
  • the acquisition unit and the processing unit are the same unit, or the acquisition unit includes the processing unit.
  • the processing unit is further configured to perform aggregation processing on at least one first model to generate a first common model.
  • the processing unit is further configured to process parameters of at least one first model to generate a first public model.
  • the processing unit is further configured to average parameters of at least one first model to generate a first public model, wherein the parameters of the first public model are The value is the average of the parameters of at least one first model.
  • At least one first model has the same network structure.
  • the processing unit is further configured to perform distillation processing on at least one first model, and the distillation processing causes the at least one first model to have the same network structure.
  • the processing unit is further configured to splice at least one first model to generate a first common model.
  • At least one first model includes a second public model
  • the second public model is the public model obtained from the previous round of model processing.
  • the acquisition unit is also configured to receive the second public model from the third processing node, and the third processing node is the processing node of the previous round of model processing.
  • the second processing node is determined based on one or more of the following information: the topology of the network, the data quality of the second processing node, the Calculate ability.
  • the obtaining unit includes the sending unit and/or the processing unit; or the obtaining unit and the sending unit or the processing unit are the same unit; or the acquisition unit and the sending unit or the processing unit are integrated in the same unit.
  • the processing unit may be a processor, a processing circuit or a logic circuit, etc.
  • the sending unit may be a transmitter, a transmitting circuit, a transceiver, a transceiver circuit, an input/output interface or circuit, etc.
  • a fourth aspect provides a model training device, which is used to execute the method provided in the second aspect.
  • the device may include a module for performing the method provided in the second aspect.
  • a computer-readable storage medium stores program code for device execution.
  • the program code includes a program code for executing any of the possible implementations of the first aspect or the second aspect. Methods.
  • a computer program product is provided.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute the method in any possible implementation manner of the first aspect or the second aspect.
  • a seventh aspect provides a communication device, which is used to perform the method provided in the first aspect or the second aspect.
  • the device may include units and/or modules for executing the method provided by any implementation of the first aspect or the second aspect, such as a processing unit and/or a communication unit.
  • the device is a processing device.
  • the communication unit may be a transceiver, or an input/output interface; the processing unit may be at least one processor.
  • the transceiver may be a transceiver circuit.
  • the input/output interface may be an input/output circuit.
  • the device is a chip, chip system, or circuit for use in a processing device.
  • the communication unit may be an input/output interface, interface circuit, output circuit, input circuit, pin or related circuit on the chip, chip system or circuit etc.
  • the processing unit may be at least one processor, processing circuit or logic circuit, etc.
  • An eighth aspect provides a communication device, which includes: at least one processor configured to execute computer programs or instructions stored in a memory to execute the method provided by any implementation of the first aspect or the second aspect.
  • the communication device further includes a memory for storing programs.
  • the device is a processing device.
  • the device is a chip, chip system, or circuit for use in a processing device.
  • this application provides a processor for executing the methods provided in the above aspects.
  • processor output, reception, input and other operations can be understood as processor output, reception, input and other operations.
  • transmitting and receiving operations can also be understood as the transmitting and receiving operations performed by radio frequency circuits and antennas, and this application does not limit this.
  • a chip in a tenth aspect, includes a processor and a communication interface.
  • the processor reads instructions stored in the memory through the communication interface and executes any implementation method of any one of the above first or second aspects. Methods.
  • the chip also includes a memory, in which computer programs or instructions are stored.
  • the processor is used to execute the computer programs or instructions stored in the memory.
  • the processor is used to execute The method provided by any implementation of either the first aspect or the second aspect.
  • a chip in an eleventh aspect, includes a logic circuit and a communication interface.
  • the communication interface is used to receive data and/or information to be processed and transmit the data and/or information to be processed to the logic circuit.
  • the logic circuit is used to The method provided by any one of the above implementations of the first aspect or the second aspect is executed.
  • Figure 1 is a schematic diagram of a communication system provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a network topology suitable for this application.
  • Figure 3 is a schematic diagram of a network topology suitable for federated learning.
  • Figure 4 is a schematic diagram of an example of a model training method provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of an example of splicing at least one first model by the first processing node.
  • Figure 6 is a schematic diagram of a possible implementation process of the model training method provided by the embodiment of the present application.
  • Figure 7 is a schematic block diagram of a model training device provided by an embodiment of the present application.
  • Figure 8 is a schematic block diagram of a communication device provided by an embodiment of the present application.
  • the technical solution provided by this application can be applied to various communication systems, such as: fifth generation (5G) Or new radio (NR) system, sixth generation (6G) system, long term evolution (LTE) system, LTE frequency division duplex (FDD) system, LTE time division duplex (time division duplex, TDD) system, etc.
  • the technical solution provided by this application can also be applied to device-to-device (D2D) communication, vehicle-to-everything (V2X) communication, machine-to-machine (M2M) communication, machine type Communication (machine type communication, MTC), and Internet of Things (IoT) communication systems or other communication systems or future communication systems.
  • D2D device-to-device
  • V2X vehicle-to-everything
  • M2M machine-to-machine
  • MTC machine type Communication
  • IoT Internet of Things
  • Model processing refers to the process of taking one or more models as input and performing corresponding processing operations on the one or more models. In the embodiments of the present application, model processing can be performed for multiple rounds.
  • a processing node represents a node that processes at least one first model to generate a common model.
  • the processing performed on the first model may be, for example, aggregation processing, which can enable at least one first model to be merged into a common model.
  • Participating nodes represent nodes other than processing nodes in this round of model processing.
  • the participating node may be used to provide the first model to the processing node.
  • the first model represents the model that the processing nodes of this round of model processing rely on when generating the public model of this round.
  • the first model can be a model generated by the participating nodes of this round based on local data, or it can also be a model generated by the processing nodes of this round based on local data, or it can also be a public model obtained by the previous round of model processing.
  • the first model can be provided by the participating nodes of this round, or can also be generated by the processing node itself of this round.
  • the public model represents a model generated by processing at least one first model.
  • the public model obtained in the last round of model processing can be used as the final output, and then the final output public model can be used in corresponding actual tasks.
  • the public model can also be called the global model.
  • FIG 1 is a schematic diagram of a communication system 100 provided by an embodiment of the present application.
  • the communication system 100 may include two or more devices (nodes) participating in model training, such as device #1 to device #6 shown in Figure 1 .
  • the devices participating in model training may be terminal devices (for example, device #1 to device #4) or network devices (for example, device #5 and device #6).
  • the terminal equipment in the embodiment of this application may refer to user equipment, access terminal, user unit, user station, mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication equipment, user agent or User device.
  • the terminal device may also be a cellular phone, a cordless phone, a session initiation protocol (SIP) phone, a wireless local loop (WLL) station, a personal digital assistant (PDA), a device with wireless communications Functional handheld device, computing device or connected to a wireless modem
  • Other processing equipment virtual reality (VR) equipment, augmented reality (AR) equipment, wireless terminals in industrial control (industrial control), wireless terminals in self-driving (self driving), remote surgery ( Wireless terminals in remote medical surgery, wireless terminals in smart grid, wireless terminals in transportation safety, wireless terminals in smart city, and smart homes.
  • VR virtual reality
  • AR augmented reality
  • Wireless terminals vehicle-mounted equipment, drone equipment, wearable devices, terminal equipment in the future 6G network or terminal equipment in the future evolved public land mobile communication network (public land mobile network, PLMN), etc.
  • PLMN public land mobile network
  • wearable devices are not just hardware devices, but also achieve powerful functions through software support, data interaction, and cloud interaction.
  • Broadly defined wearable smart devices include full-featured, large-sized devices that can achieve complete or partial functions without relying on smartphones, such as smart watches or smart glasses, and those that only focus on a certain type of application function and need to cooperate with other devices such as smartphones.
  • the network device in the embodiment of this application may be a device used to communicate with a terminal device.
  • the network device may be a macro base station, a micro base station (also called a small station), a satellite, or a radio network controller (radio network controller, RNC).
  • RNC radio network controller
  • Node B Node B (Node B, NB), base station controller (BSC), base transceiver station (BTS), home base station (for example, home evolved NodeB, or home Node B, HNB), baseband Unit (baseband unit, BBU), AP, wireless relay node, wireless backhaul node, transmission point (TP) or transmission and reception point (TRP) in the WiFi system, etc.
  • Node B Node B
  • BSC base station controller
  • BTS base transceiver station
  • home base station for example, home evolved NodeB, or home Node B, HNB
  • baseband Unit baseband unit
  • BBU baseband unit
  • AP wireless relay node, wireless backhaul node, transmission point (TP) or transmission and reception point (TRP) in the WiFi system, etc.
  • TP transmission point
  • TRP transmission and reception point
  • DU distributed units
  • the network device may be a relay station, an access point, a vehicle-mounted device, a wearable device, a network device in a future evolved PLMN network, etc., which are not limited by the embodiments of this application.
  • the embodiments of this application do not limit the specific technologies and specific equipment forms used by network equipment.
  • network equipment and terminal equipment can be deployed on land, including indoors or outdoors, handheld or vehicle-mounted; they can also be deployed on water; they can also be deployed on aircraft, balloons and satellites in the air.
  • the scenarios in which network devices and terminal devices are located are not limited.
  • Figure 1 is only a simplified schematic diagram for ease of understanding. This application does not limit the number of devices participating in model training.
  • the communication system may also include other network devices and/or terminal devices.
  • Figure 1 Not drawn.
  • FIG. 2 shows a schematic diagram of a possible network topology of the above communication system 100.
  • the topology of a network can also be understood as the connection method between nodes in the network, or it can also be understood as the connectivity conditions between nodes in the network.
  • the connection method may be a wired connection method or a wireless connection method, which is not limited in this application.
  • the network may include nodes N1 to N6.
  • the nodes N1 to N6 may correspond to the above-described devices #1 to #6.
  • each node is able to communicate with at least one other node.
  • node N1 can communicate with N2, N3, N4, N5, N6, node N2 can communicate with N1, N3, N5, node N3 can communicate with N1, N2, N4, N5, N6, node N4 to
  • the communication situation of N6 with other nodes is similar to that of nodes N1 to N3, and will not be described again here.
  • the schematic diagram of the network topology shown in Figure 2 can be a schematic diagram of the network topology of the network at a certain moment. In practical applications, the network topology can also change dynamically.
  • any node can be determined as a processing node, and the processing node can be dynamically changed during the model training process.
  • processing node is used to implement at least one of the following functions:
  • Function 1 Receive a model from at least one node in the network, process the received model and send it to other nodes in the network.
  • the processing node can aggregate the received model and send it to other nodes in the network.
  • Function 2 Generate (generate) a model and receive the model from at least one node in the network, and then process the generated model and the received model before sending them to other nodes in the network.
  • the processing node can combine the generated model and the received model.
  • the received models are aggregated and sent to other nodes in the network.
  • Unipolarity A node has unipolarity, which means that the node can take at least one model as input and process (for example, aggregation processing) the at least one model before outputting it. Among them, the number of output models is one.
  • Nodes with single-level properties can also be called single-level nodes (the plurality node).
  • a node has diversity, which means that the node can take at least one model as input and perform calculation processing (for example, distillation processing) on each model in the at least one model before outputting it.
  • calculation processing for example, distillation processing
  • the number of input models is multiple
  • the number of output models can also be multiple.
  • Nodes with diversity can also be called diversity nodes (the plurality node).
  • Edge devices are widely distributed in various regions and corners of the world. These devices will continuously generate and accumulate huge amounts of raw data at a rapid speed. If the central server collects raw data from all edge devices, it will inevitably bring huge communication losses and computing power requirements.
  • federated learning is a distributed machine learning method.
  • the local data of multiple edge devices can be used to train the model on the multiple edge devices, and then the trained model is uploaded to the central service
  • the central server can serve as a processing node to aggregate models from multiple edge devices to generate a public model, and deliver the public model to the multiple edge devices, so that the multiple edge devices can generate the public model based on local data. Make an update.
  • FIG 3 is a schematic diagram of a network topology suitable for federated learning.
  • the network includes a processing node N m and other nodes N1 to N6.
  • the nodes N1 to N6 can be called participating nodes.
  • the processing node may be, for example, a central server, and the participating nodes may be, for example, edge devices.
  • the network topology envisioned by federated learning is a fixed star structure, and the central processing node is indispensable.
  • the FedAvg algorithm is a basic algorithm in the field of federated learning.
  • the algorithm can include the following steps:
  • Step 1 Process node initialization public model and convert the public model Sent to all participating nodes.
  • Step 2 In round t ⁇ [1,T], participating node k ⁇ [1,K] is based on the local data set to the received public model Perform training for E epochs, or in other words, perform E iterative updates to obtain a local training model. and train the model locally Report to the processing node.
  • the initial value of t is 1.
  • Step 3 The processing node aggregates all or part of the received models to obtain a public model
  • the processing node can obtain the common model by calculating the weighted average of the parameters of all or part of the model. Specifically, it is assumed that the set of participating nodes uploading the local training model in the tth round is Then the processing node can obtain the public model through the following rules
  • D k represents the set The number of samples of participating nodes with index number k. Subsequently, the processing node can convert the obtained public model Sent to all participating nodes for a new round of training.
  • Step 4 Add 1 to the value of t and go to step 2. Repeat steps 2 and 3 above until the model converges or the number of training rounds reaches the preset upper limit.
  • processing nodes used to generate the public model are fixed. However, in different application scenarios, using fixed nodes as processing nodes may not be the optimal solution. For example, as the network topology changes and the data generated by each node changes, more preferred processing nodes may appear. .
  • this application provides a method and device for model training.
  • This method can determine the processing nodes for the next round of model processing according to actual needs before the next round of model processing, so as to adapt to changes in application scenarios. That is to say, in the model training method provided by this application, the processing nodes can be dynamically changed during the model training process. In this way, through the model training method provided by this application, the processing nodes for the next round of model processing can be determined according to actual needs before the next round of model processing.
  • the processing node of this round of model processing can send the generated public model to the processing node of the next round of model training.
  • this architecture can also be considered a "model-follow-data" architecture.
  • model training method provided by the embodiment of the present application will be described in detail below with reference to the accompanying drawings. Provided by the embodiments of this application The model training method can be applied to the communication system shown in Figure 1 and the network topology shown in Figure 2.
  • Figure 4 is a schematic diagram of an example of a model training method provided by an embodiment of the present application.
  • the method 400 may include S410 to S430.
  • the first processing node obtains at least one first model.
  • the first processing node obtains the first model, which can also be understood as the first processing node obtains relevant information of the first model.
  • the first processing node can obtain one or more of the following information about the first model: the parameter set of the first model, the structure of the neural network corresponding to the first model, and the operation rules of the first model parameters.
  • the parameter set of the first model may include training weights of the neural network corresponding to the first model.
  • the structure of the neural network corresponding to the model may be simply referred to as the network structure of the model.
  • model diagrams model parameters, model tables, model algorithms, databases, etc. This application is not limited to this.
  • the first processing node may obtain at least one first model through at least one of the following methods.
  • the first processing node may receive the first model from at least one participating node, that is, the at least one first model may include the first model from at least one participating node.
  • the node N1 can receive the first model generated by using a local data set from at least one of the nodes N2 to N6.
  • the method 400 may also include: the first processing node sending indication information to at least one participating node, the The instruction information may be used to instruct at least one participating node to send (upload) the first model of the at least one participating node to the first processing node.
  • the participating node has generated the first model before receiving the instruction information, or in other words, the participating node has already saved the first model locally.
  • the participating node receives the instruction information from the first processing node , then the first model can be uploaded to the first processing node according to the instruction of the instruction information.
  • the node may generate the first model after receiving the instruction information from the first processing node, and further, may upload the generated first model to the first processing node.
  • the indication information is also used to instruct the first processing node in what manner to generate the first public model.
  • the indication information may Carrying the label corresponding to Mode 1 or Mode 2
  • the participating node that receives the indication information can carry the label in the information of the first model, and then send the first model (or, in other words, the information of the first model)
  • the first processing node can determine to use method 1 and/or method 2 to process the first model according to the tag carried in the information of the first model to generate the first public model.
  • the first processing node may generate a first model, that is, the at least one first model may include the first model generated by the first processing node.
  • the node N1 can generate the first model using the local data set.
  • the first processing node can receive the first model from at least one participating node, The first model can also be generated locally. Therefore, the first processing node can make full use of the first model of each node in the network for model training to generate a public model with better performance.
  • At least one first model includes a second public model, where the second public model is a public model obtained from the previous round of model processing.
  • the first model may include the first model from at least one participating node and/or the first model generated by the first processing node.
  • it also includes the public model obtained from the previous round (t-1 round) of model processing. Therefore, the first processing node can optimize the public model based on the public model obtained in the previous round to further improve the public model. Public model performance.
  • processing node of the previous round of model processing is the third processing node.
  • the third processing node and the first processing node are different processing nodes.
  • the first processing node can receive the second public model from the third processing node, or in other words, the second public model.
  • the model may be received from a third processing node.
  • the third processing node and the first processing node are the same processing node, or in other words, the processing node of the previous round of model processing and the processing node of this round of model processing are the same processing node.
  • the second public model may be the public model generated by the first processing node in the previous round of model processing.
  • the first processing node processes at least one first model to generate a first public model.
  • the first processing node may process the acquired at least one first model, thereby generating a first public model.
  • the first processing node may perform aggregation processing on at least one first model to generate a first common model.
  • the aggregation process can enable at least one first model to be fused into a common model with better performance.
  • the way in which the first processing node processes at least one first model to generate the first public model may include at least one of the following ways:
  • the first processing node can process parameters of at least one first model to generate a first public model.
  • Method 2 The first processing node can splice at least one first model to generate a first common model.
  • the first processing node may process parameters of at least one first model, thereby generating a first public model.
  • the first processing node may generate the first public model by averaging parameters of at least one first model, where the values of the parameters of the first public model are the values of the at least one first model. The average of the parameters of a model.
  • At least one first model includes a first model #1 and a first model #2, where the parameters of the first model #1 are, for example, [a1b1c1], and the parameters of the first model #2, for example, are [a2b2c2], Then the value of the parameters of the generated public model is [(a1+a2)/2(b1+b2)/2(c1+c2)/2].
  • the averaging process may be a weighted averaging process, that is, the first public model is generated by performing a weighted averaging process on the parameters of at least one first model. At this time, the parameters of the generated first public model are The value is the weighted average of the parameters of the at least one first model.
  • the first processing node can also calculate parameters of the at least one first model.
  • the first public model is generated in the form of several other statistical values.
  • the first processing node may generate the first public model by calculating the median of the at least one first model parameter.
  • the value of the parameter of the generated first public model is the value of the at least one first model. The median of the parameters.
  • the at least one first model has the same network structure.
  • the method 400 also includes: the first processing node performs distillation processing on the at least one first model, and the distillation processing can make the one less first model have the same network structure. In this way, the first processing node can more conveniently process the parameters of the at least one first model.
  • the distillation process also causes the at least one first model to have the same model parameter amount and/or the same operation rule.
  • the distillation process can reduce the number of parameters of the model or expand the number of parameters of the model.
  • the first processing node can determine the desired parameter amount of the first model based on its own computing power, and then use distillation processing to make the parameter amount of the first model equal to the expected parameter amount of the first model. consistent. For example, if the first processing node has strong computing power, the first model can be made to have a larger number of parameters through distillation processing to improve the performance of the first model and the generated public model; for another example, the calculation of the first processing node If the ability is weak, distillation processing can be used to make the first model have a smaller number of parameters to improve the efficiency of model training. In this way, the parameter amount of the first model can be adapted to the computing capability of the first processing node.
  • distillation process can be implemented by any model distillation algorithm, and this application is not limited to this.
  • the distillation process in this embodiment can also be replaced by other algorithms that can make the model have the same network structure.
  • it can be replaced by other model compression (model compression) algorithms that can make the model have the same network structure.
  • model dilation algorithm e.g., model dilation algorithm.
  • the first processing node may splice at least one first model to generate a first common model.
  • the network structure of at least one first model may be the same or different.
  • the network structure of the model is different, which can also be understood as the number of network layers in the neural network corresponding to the model is different, and/or for a certain layer, the number of nodes included is different.
  • the first processing node may splice the input end and the output end of the at least one first model respectively, thereby realizing splicing of the at least one first model.
  • a possible implementation method for the first processing node to splice at least one first model will be introduced below with reference to FIG. 5 .
  • FIG. 5 shows an example of a first processing node splicing at least one first model.
  • the first processing node can connect the input end of the at least one first model through a single-layer perceptron, and merge the output end of the at least one first model into a single-layer output, thereby realizing the Splicing of at least one first model.
  • this application does not place a limit on the number of nodes included in a single-layer perceptron and the number of nodes after the output ends are combined.
  • the single-layer perceptron and the combined output end can each be composed of 3 nodes.
  • connection of the input terminal of the at least one first model through a single-layer perceptron may refer to connecting all nodes of the single-layer perceptron with the input terminals of the at least one first model. all nodes connected. As shown in Figure 5, the three nodes of the single-layer perceptron can be connected to all nodes of the input end of the at least one first model.
  • the above-mentioned merging of the output terminals of the at least one first model into a single layer output may refer to using a single layer output to replace the original output of the at least one first model, and converting the single layer output All nodes in are connected to all nodes in the previous layer. As shown in Figure 5, three nodes in a single layer output can be connected to all nodes in the previous layer.
  • the input terminals of the at least one first model can be combined into a single-layer output, and the output terminals of the at least one first model can be spliced through a single-layer perceptron; for example , the input end and the output end of the at least one first model can also be merged into a single layer output respectively; for another example, the input end and the output end of the at least one first model can also be spliced through a single layer perceptron respectively;
  • the above-mentioned single-layer perceptron can also be replaced by a multi-layer perceptron, and the single-layer output can also be replaced by a multi-layer output, which is not limited in this application.
  • the first processing section also adjusts the structure of the spliced model.
  • the first processing section may add or delete layers to the structure of the spliced model, or the first processing section may add or delete nodes to the structure of the spliced model, etc., which is not limited by this application.
  • the spliced model can be trained based on local data using a backward transfer algorithm to generate a first public model.
  • method 400 further includes: performing a model pruning operation on the generated first public model. For example, some redundant layers or nodes in the first public model can be deleted through a model pruning operation, so that the first public model is more suitable for transmission in the communication network, thereby reducing the communication load.
  • model pruning operation can be implemented by any model pruning algorithm, and this application is not limited to this.
  • the first processing node also determines to process all or part of the acquired at least one first model based on its own computing capability. For example, when the computing power of the first processing node is insufficient, the first processing node may selectively process part of the acquired at least one first model without processing all the models, so that the first processing node The number of models processed by the node can match the computing power of the first processing node.
  • method 400 includes S430.
  • the first processing node determines the second processing node.
  • the second processing node is the processing node for the next round of model processing.
  • the first public model is obtained by the second processing node before the next round of model processing.
  • the first processing node can determine an appropriate processing node for the next round of model processing before the next round of model processing, thereby being able to adapt to different application scenarios and thereby improve the performance of model training.
  • the first processing node and the second processing node may be the same processing node, or they may be different processing nodes.
  • the method 400 may further include: the first processing node sending the first public model to the second processing node before the next round of model processing, so that the first public model can be processed by the second processing node before the next round of model processing. get.
  • the second processing node can be arbitrarily designated by the first processing node, the public model can be continuously transmitted between different nodes in the network through multiple rounds of model processing.
  • the processing node of this round of model processing since the processing node of this round of model processing generates the public model, it can send the public model to the processing node of the next round of model processing without having to send the public model to all participating nodes, which is beneficial to reducing communication overhead.
  • first processing node and the second processing node are the same processing node, that is to say, the processing node of this round of model processing and the processing node of the next round of model processing are the same processing node, then in this In this case, the first processing node may not need to send the first public model to the second processing node.
  • the second processing node may be determined in at least one of the following ways:
  • Method 1 Determine the second processing node based on the indication of the first public model.
  • Method 2 Determine the second processing node based on one or more of the following information: the topological structure of the network, the data quality of the second processing node, and the computing capability of the second processing node.
  • the first processing node may determine the second processing node based on the indication of the first public model.
  • the first public model may indicate the second processing node to the first processing node according to the characteristics of the first public model.
  • the characteristic of the first public model may be the size of the parameters of the first public model.
  • the first public model can indicate the first processing.
  • the node determines a node with strong computing power as the second processing node; for another example, the characteristics of the first public model may be the current functional characteristics of the first public model.
  • the current function of the first public model is a classification function. Therefore, if there is local data for a classification learning task at a certain node in the network, in this case, the first public model can instruct the first processing
  • the node determines the node as the second processing node.
  • a first public model may indicate a second processing node to a first processing node based on parameters of the first public model.
  • the parameters of the first public model include corresponding routing information, and the routing information can be used to indicate the processing node of the next round of model processing. Therefore, the first processing node can determine the first processing node based on the routing information in the first public model. Two processing nodes.
  • the routing information may be pre-configured information, or may be dynamically configured information during the model training process, which is not limited in this application.
  • Determining the second processing node through the above method 1 is beneficial to determining a second processing node for the first public model that matches the characteristics or requirements of the first public model, which is beneficial to improving the performance of model training.
  • the second processing node may be determined based on one or more of the following information: the topological structure of the network, the data quality of the second processing node, and the computing capability of the second processing node.
  • the second processing node may be determined according to the topological structure of the network. For example, if a certain node is in a more favorable position in the topology of the network, the node can be determined as the second processing node.
  • nodes N1, N3, and N5 can communicate with 5 nodes in the network respectively, and nodes N2, N4, and N6 can communicate with 3 nodes in the network respectively.
  • nodes N1, N3, and N5 are convenient for communicating with more nodes. Therefore, node N1, N3, or N5 can be determined as the second processing node.
  • the second processing node may be determined based on the data quality of the second processing node. For example, if the data quality of a certain node in the network is relatively high, this node can be determined as the second processing node. For another example, if the data quality of nodes in a certain area of the network is relatively high, a node from this area can be determined as the second processing node.
  • a node from the area can be determined as the second processing node, and Other nodes in the area serve as participating nodes, thus completing a round of model processing.
  • the data quality of the second processing node is quantified in any of the following ways:
  • model training can be performed based on the local data of the second processing node, so that the convergence time of the model training and the accuracy of completing the task during model deduction can be quantified. Data quality.
  • the data quality of the second processing node can be quantified by calculating whether the data of the second processing node conforms to a certain agreed data distribution.
  • the second processing node may be determined based on the computing capability of the second processing node. For example, the computing power of each node can be compared, thereby determining the node with stronger computing power as the second processing node.
  • one of the above three pieces of information can be considered individually according to the needs of the actual task, or any two or more pieces of information can be comprehensively considered, thereby facilitating the specific decision-making process.
  • any of the above-mentioned method 1 or method 2 can be used, or the above-mentioned method 1 and method 2 can be combined to determine the second processing node, which is not limited by this application.
  • the first processing node may also determine the second processing node based on preconfigured information.
  • the preconfigured information can be sent to the first processing node through other devices, or the first processing node can also save the preconfigured information in advance, which is not limited by this application.
  • the preferred processing node may change. Therefore, in this embodiment, the first processing node passes as the next round Model processing determines appropriate processing nodes, which can better adapt to changes in application scenarios, thereby improving the performance of model training.
  • the first processing node is the processing node of the last round of model processing, in this case, S430 does not need to be executed.
  • at least one first model obtained by the first processing node includes the public model obtained by the previous round of model processing. Therefore, the first processing node can perform final model processing based on the public model obtained by the previous round of model processing. to obtain a high-performance public model.
  • method 400 also includes: the processing node of the last round of model processing sends the public model obtained by the last round of model processing to other nodes (for example, at least one participating node), thereby, Other nodes can use the public model obtained from the last round of model processing for corresponding actual tasks.
  • the implementation process may include the following steps:
  • S610 Initialize the node set and determine the end conditions of model training.
  • the end conditions of model training may include at least one of the following: the number of rounds of model processing reaches the upper limit of the number of rounds T, and the generated public model meets the model convergence conditions (for example, when the generated public model meets the performance requirements, the model converges).
  • T is an integer greater than or equal to 1.
  • model training ends when either of the above two conditions is met.
  • model training ends when the above two conditions are met at the same time.
  • S610 also includes: determining the upper limit of the number of rounds T and/or the model convergence condition.
  • the nodes participating in the first round of model training can be determined among the nodes with network connectivity conditions, and these nodes form an initial node set.
  • the node set may include a processing node set and a participating node set.
  • the processing node set may include processing nodes for the first round of model processing, and the participating node set may include nodes in the node set other than the processing node.
  • S610 may also include: determining the number of times m of additional optimization processing on the public model after each round of model processing.
  • a certain optimization algorithm can also be used to optimize the public model generated in this round m times.
  • the federated learning method or other methods can also be used to optimize the public model generated in this round m times to further improve the performance of the public model.
  • S620 The processing node of the t-th round of model processing acquires at least one first model.
  • the processing node of the t-th round of model processing may obtain at least one first model by receiving a first model from at least one participating node, that is, the at least one first model may include the t-th model from at least one participating node.
  • the at least one first model may include the t-th model from at least one participating node.
  • the processing node of the t-th round of model processing can send instruction information to at least one participating node, so that after receiving the instruction information, at least one participating node can send instructions to
  • the processing node of the t-th round of model processing uploads the first model of the at least one participating node.
  • the first model may be generated by the participating node before receiving the indication information, or may be generated after receiving the indication information, which is not limited.
  • the processing node of the t-th round of model processing can also obtain at least one first model by generating the first model itself.
  • At least one first model includes the common model obtained from the t-1th round of model processing.
  • S630 The processing node of the t-th round of model processing processes at least one first model to generate a public model of the t-th round.
  • the processing node of the t-th round model processing generates the t-th round public model through at least one of Method 1 or Method 2 in the aforementioned method embodiment S420.
  • Method 1 and Method 2 please refer to S420 in the foregoing method embodiment, and will not be described again here to avoid duplication.
  • the implementation process includes S640.
  • the processing node of the t-th round of model processing determines the processing node of the t+1-th round of model processing, and updates the node set.
  • the updated node set may include nodes participating in model training in the t+1 round. Accordingly, the updated The processing node set may include processing nodes for the t+1 round of model processing, and the updated participating node set may include nodes in the t+1 round node set other than the processing nodes for the t+1 round of model processing.
  • the implementation process also includes S650.
  • the processing node of the t-th round of model processing sends the public model of the t-th round to the processing node of the t+1-th round of model processing.
  • the processing node of the t+1 round of model processing can further optimize the public model based on the public model of the t round to improve the performance of the public model.
  • the t-th round of model processing is the last round of model processing, there is no need to execute S640 and S650.
  • t is greater than 1 (that is, the last round of model processing is not the first round of model processing)
  • the t-th round of model processing is not the first round of model processing.
  • At least one first model obtained by the processing node of the round of model processing includes the public model obtained by the t-1th round of model processing. Therefore, the processing node of the tth round of model processing can be based on the public model obtained by the t-1th round of model processing.
  • the model undergoes final model processing to obtain a high-performance public model.
  • the methods and operations implemented by the processing node can also be implemented by components (such as chips or circuits) of the processing node, without limitation.
  • embodiments of the present application also provide corresponding devices, and the devices include modules for executing corresponding modules in each of the above method embodiments.
  • the module can be software, hardware, or a combination of software and hardware. It can be understood that the technical features described in the above method embodiments are also applicable to the following device embodiments.
  • FIG. 7 is a schematic block diagram of a model processing device 700 provided by an embodiment of the present application.
  • the device 700 includes obtaining unit 710 and processing unit 720.
  • the acquisition unit 710 may be used to implement corresponding acquisition functions, such as acquiring at least one first model.
  • the processing unit 720 may be used to implement corresponding processing functions, such as processing at least one first model to generate a first public model.
  • the device 700 also includes a sending unit 730, which may be used to implement corresponding communication functions.
  • the sending unit 730 may also be called a communication interface or communication unit.
  • the device 700 also includes a storage unit, which can be used to store instructions and/or data, and the processing unit 720 can read the instructions and/or data in the storage unit, so that the device implements each of the foregoing method embodiments. Actions of the processing device (e.g., the first processing node).
  • a storage unit which can be used to store instructions and/or data
  • the processing unit 720 can read the instructions and/or data in the storage unit, so that the device implements each of the foregoing method embodiments. Actions of the processing device (e.g., the first processing node).
  • the device 700 may be the processing device in the aforementioned embodiment, or may be a component of the processing device (such as a chip).
  • the device 700 can implement steps or processes corresponding to the execution of the first processing node in the above method embodiment, wherein the acquisition unit 710 can be used to perform operations related to the acquisition of the first processing node in the above method embodiment.
  • the processing unit 720 may be used to perform operations related to processing of the first processing node in the above method embodiment
  • the sending unit 730 may be used to perform operations related to sending of the first processing node in the above method embodiment.
  • the sending unit 730 may be a transceiver, or an input/output interface.
  • the transmitter may be a transceiver circuit
  • the input/output interface may be an input/output circuit
  • the processing unit 720 may be at least one processor.
  • the sending unit 730 may be an input/output interface, interface circuit, input/output circuit, pin or related information on the chip, chip system or circuit. Circuits, etc.; the processing unit 720 may be at least one processor, processing circuit or logic circuit, etc.;
  • the acquisition unit 710 is used to acquire at least one first model; the processing unit 720 is used to process at least one first model to generate a first public model; the processing unit 720 is also used to determine the first public model.
  • Two processing nodes the second processing node is the processing node for the next round of model processing, and the first public model is obtained by the second processing node before the next round of model processing.
  • the device 700 and the second processing node are different processing nodes.
  • the device 700 further includes a sending unit 730.
  • the sending unit 730 is configured to send the first public model to the second processing node.
  • the obtaining unit 710 and the sending unit 730 are the same unit, or the obtaining unit 710 includes the sending unit 730.
  • the device 700 and the second processing node are the same processing node.
  • the processing unit 720 is also configured to determine the second processing node based on the indication of the first public model.
  • the obtaining unit 710 is also configured to receive the first model from at least one participating node.
  • the device 700 further includes a sending unit 730, which is configured to send indication information to at least one participating node.
  • the indication information is used to instruct at least one participating node to send the first model of at least one participating node to the device 700.
  • the obtaining unit 710 and the sending unit 730 are the same unit, or the obtaining unit 710 includes the sending unit 730.
  • the acquisition unit 710 is also used to generate a first model of the device 700 .
  • the acquisition unit 710 and the processing unit 720 are the same unit, or the acquisition unit 710 includes the processing unit 720.
  • the processing unit 720 is also configured to perform aggregation processing on at least one first model to generate a first common model.
  • the processing unit 720 is also configured to process parameters of at least one first model to generate a first public model.
  • the processing unit 720 is also configured to average the parameters of at least one first model to generate the first A public model, wherein the value of the parameter of the first public model is the average value of the parameters of at least one first model.
  • At least one first model has the same network structure.
  • the processing unit 720 is also used to perform distillation processing on at least one first model, and the distillation processing makes at least one first model have the same network structure.
  • the processing unit 720 is also used to splice at least one first model to generate a first common model.
  • At least one first model includes a second public model
  • the second public model is the public model obtained from the previous round of model processing.
  • the acquisition unit 710 is also configured to receive the second public model from the third processing node, which is the processing node of the previous round of model processing.
  • the second processing node is determined based on one or more of the following information: the topological structure of the network, the data quality of the second processing node, and the computing capability of the second processing node.
  • the acquisition unit 710 includes a sending unit 730 and/or a processing unit 720; or the acquiring unit 710 is the same unit as the sending unit 730 or the processing unit 720; or the acquiring unit 710 is integrated with the sending unit 730 or the processing unit 720 in the same unit. middle.
  • the processing unit 720 may be a processor, a processing circuit or a logic circuit, etc.
  • the sending unit 730 may be a transmitter, a transmitting circuit, a transceiver, a transceiver circuit, an input/output interface or circuit, etc.
  • the device 700 here is embodied in the form of a functional unit.
  • the term "unit” as used herein may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a proprietary processor, or a group of processors) used to execute one or more software or firmware programs. processor, etc.) and memory, merged logic circuitry, and/or other suitable components to support the described functionality.
  • ASIC application specific integrated circuit
  • the device 700 can be specifically the first processing node in the above embodiments, and can be used to execute various processes corresponding to the first processing node in the above method embodiments and/or Or steps, to avoid repetition, will not be repeated here.
  • the device 700 of each of the above solutions has the function of implementing the corresponding steps performed by the processing device (such as the first processing node) in the above method.
  • the functions described can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions; for example, the sending unit can be replaced by a transmitter, and other units, such as a processing unit, can be replaced by a processor to respectively perform the sending operations in each method embodiment. and related processing operations.
  • the above-mentioned sending unit 730 may also be a transceiver circuit, and the processing unit 710 may be a processing circuit.
  • the device in Figure 7 can be the device in the aforementioned embodiment, or it can be a chip or a chip system, such as a system on chip (SoC).
  • the sending unit may be an input/output circuit or a communication interface; the processing unit may be a processor or microprocessor or integrated circuit integrated on the chip. No limitation is made here.
  • this embodiment of the present application provides another communication device 800.
  • the device 800 includes a processor 810, which is used to execute computer programs or instructions stored in the memory 820, or read data/signaling stored in the memory 820, to perform the methods in each of the above method embodiments.
  • processors 810 there are one or more processors 810 .
  • the device 800 further includes a memory 820, which is used to store computer programs or instructions and/or data.
  • the memory 820 may be integrated with the processor 810, or may be provided separately.
  • the device 800 also includes a transceiver 830, which is used for signal reception and /or send.
  • the processor 810 is used to control the transceiver 830 to receive and/or transmit signals.
  • the device 800 is used to implement the operations performed by the processing device (such as the first processing node) in each of the above method embodiments.
  • the processor 810 is used to execute computer programs or instructions stored in the memory 820 to implement related operations of the processing device (such as the first processing node) in each of the above method embodiments.
  • processors mentioned in the embodiments of this application can be a central processing unit (CPU), or other general-purpose processor, digital signal processor (DSP), application specific integrated circuit (Application Specific Integrated Circuit) specific integrated circuit (ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM). For example, RAM can be used as an external cache.
  • RAM includes the following forms: static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and direct Memory bus random access memory (direct rambus RAM, DR RAM).
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component
  • the memory storage module
  • memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.
  • Embodiments of the present application also provide a computer-readable storage medium on which computer instructions for implementing the methods executed by the device in each of the above method embodiments are stored.
  • the computer when the computer program is executed by a computer, the computer can implement the method executed by the processing device (such as the first processing node) in each embodiment of the above method.
  • the processing device such as the first processing node
  • Embodiments of the present application also provide a computer program product, which includes instructions that, when executed by a computer, implement the methods executed by the processing device (such as the first processing node) in each of the above method embodiments.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be an indirect coupling or communication connection through some interfaces, devices or units.
  • Information connection can be electrical, mechanical or other forms.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer may be a personal computer, a server, or a network device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes), optical media (such as DVDs), or semiconductor media (such as solid state disks (SSD)), etc.
  • the aforementioned available media include but Not limited to: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请提供了一种模型训练的方法和装置,该方法包括:第一处理节点获取至少一个第一模型;第一处理节点对至少一个第一模型进行处理,生成第一公共模型;第一处理节点确定第二处理节点,第二处理节点为下一轮模型处理的处理节点,第一公共模型在下一轮模型处理之前被第二处理节点获得。本申请提供的技术方案可以在下一轮模型处理之前,根据实际需求确定下一轮模型处理的处理节点,以适应于应用场景的变化。

Description

一种模型训练的方法和装置
本申请要求于2022年05月27日提交中国专利局、申请号为202210586086.5、申请名称为“一种模型训练的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及涉及人工智能领域,并且更具体地,涉及一种模型训练的方法和装置。
背景技术
随着大数据时代的到来,每台设备每天都会以各种形式产生大量的原始数据,为了充分利用这些数据进行模型训练,目前最典型的两种训练架构分别为集中式学习(centralized learning,CL)和联邦学习(federated learning,FL)。
其中,联邦学习是一种分布式的机器学习方法,在联邦学习过程中,可利用多个边缘设备上的本地数据在该多个边缘设备上进行模型训练,然后将训练好的模型上传至中心服务器,中心服务器可作为处理节点对来自多个边缘设备的模型进行聚合以生成公共模型,并将公共模型下发给该多个边缘设备,从而,该多个边缘设备可基于本地数据对该公共模型进行更新。通过反复执行上述步骤直至模型收敛或训练轮数达到预设的上限,最终可获得高性能的机器学习模型。
在目前的联邦学习架构中,用于生成公共模型的处理节点是固定不变的,例如,公共模型只能通过中心服务器生成。但是,在不同的应用场景中,将中心服务器作为处理节点可能并非最优的方案,例如,随着网络拓扑结构的变化,以及边缘设备产生的数据的变化。
发明内容
本申请提供一种模型训练的方法和装置,该方法可以在下一轮模型处理之前,根据实际需求确定下一轮模型处理的处理节点,以适应于应用场景的变化。
第一方面,提供了一种模型处理的方法,该方法可以由处理设备执行,或者,也可以由配置于处理设备中的芯片或芯片系统或电路执行,可以统称为处理节点,本申请对此不做限制。下面以由第一处理节点执行为例进行说明。
该方法可以包括:第一处理节点获取至少一个第一模型;第一处理节点对至少一个第一模型进行处理,生成第一公共模型;第一处理节点确定第二处理节点,第二处理节点为下一轮模型处理的处理节点,第一公共模型在下一轮模型处理之前被第二处理节点获得。
根据本实施例的方法,由于在模型的训练过程中,随着网络拓扑结构的变化,以及各节点产生的数据的变化,优选的处理节点可能发生变化,因此,第一处理节点通过为下一轮模型处理确定合适的处理节点,能够更好地适应于应用场景的变化,进而提升模型训练 的性能。
结合第一方面,在第一方面的某些实现方式中,第一处理节点和第二处理节点为不同的处理节点,方法还包括:第一处理节点向第二处理节点发送第一公共模型。
根据本实施例的方法,当第一处理节点和第二处理节点为不同的处理节点时,第一处理节点可以将生成的公共模型传输至第二处理节点,进而,第二处理节点可在上一轮模型处理得到的公共模型的基础上对该公共模型进行更新和优化,从而提升模型训练的效率和性能。并且,由于第二处理节点可以由第一处理节点根据实际需求任意指定,因此,通过多轮模型处理,能够使得公共模型在网络中的各个不同节点之间连续传输。
此外,由于第一处理节点在生成公共模型之后,可将公共模型发送至第二处理节点,而无需将公共模型下发至所有参与节点,因此有利于降低通信开销。
结合第一方面,在第一方面的某些实现方式中,第一处理节点和第二处理节点为相同的处理节点。
根据本实施例的方法,本轮模型处理的处理节点(第一处理节点)和下一轮模型处理的处理节点(第二处理节点)可以是相同的处理节点,此时,第一处理节点可以不需要向第二处理节点发送第一公共模型。
结合第一方面,在第一方面的某些实现方式中,第一处理节点确定第二处理节点,包括:第一处理节点基于第一公共模型的指示确定第二处理节点。
根据本实施例的方法,第一处理节点可基于第一公共模型的指示确定第二处理节点。
一示例,第一公共模型可根据该第一公共模型的特征向第一处理节点指示第二处理节点。例如,第一公共模型的特征可以是该第一公共模型的参数量的大小。一种可能的情况,该第一公共模型的参数量较大,因此期望通过计算能力较强的节点对该第一公共模型进行处理,则在该情况下,第一公共模型可指示第一处理节点将某个计算能力较强的节点确定为第二处理节点;又例如,第一公共模型的特征可以是该第一公共模型当前的功能特征。举例来说,该第一公共模型当前的功能为分类功能,因此,若网络中某个节点处具有用于分类学习任务的本地数据,则在该情况下,第一公共模型可指示第一处理节点将该节点确定为第二处理节点。
另一示例,第一公共模型可根据该第一公共模型的参数向第一处理节点指示第二处理节点。例如,该第一公共模型的参数中包括相应的路由信息,该路由信息可用于指示下一轮模型处理的处理节点,从而,第一处理节点可根据该第一公共模型中的路由信息确定第二处理节点。
根据本实施例的方法,有利于为第一公共模型确定一个与该第一公共模型的特征或需求相匹配的第二处理节点,进而有利于提升模型训练的性能。
结合第一方面,在第一方面的某些实现方式中,第一处理节点获取至少一个第一模型,包括:第一处理节点接收来自至少一个参与节点的第一模型。
根据本实施例的方法,第一处理节点可以通过接收来自至少一个参与节点的第一模型方式获取至少一个第一模型,或者说,该至少一个第一模型中可以包括来自至少一个参与节点的第一模型,从而,第一处理节点能够充分利用来自参与节点的第一模型进行模型处理,以生成性能更优的公共模型。
结合第一方面,在第一方面的某些实现方式中,第一处理节点接收来自至少一个参与 节点的第一模型之前,方法还包括:第一处理节点向至少一个参与节点发送指示信息,指示信息用于指示至少一个参与节点向第一处理节点发送至少一个参与节点的第一模型。
根据本实施例的方法,可以通过指示信息触发至少一个参与节点向第一处理节点发送(上传)该至少一个参与节点的第一模型。
在一种可能的实现方式中,参与节点在收到指示信息之前已经生成了第一模型,或者说,参与节点本地已经保存有第一模型,此时,若参与节点接收到来自第一处理节点的指示信息,则可根据该指示信息的指示向第一处理节点上传第一模型。
在另一种可能的实现方式中,参与节点在收到指示信息时尚未生成第一模型,或者说,参与本地未保存有第一模型,此时,若该参与节点需要参与模型训练任务,则该参与节点可以在接收到来自第一处理节点的指示信息之后,生成第一模型,进而,可将生成的第一模型上传至第一处理节点。
结合第一方面,在第一方面的某些实现方式中,第一处理节点获取至少一个第一模型,包括:第一处理节点生成第一处理节点的第一模型。
根据本实施例的方法,第一处理节点还可以通过自身生成第一模型的方式获取至少一个第一模型,或者说,该至少一个第一模型中还可以包括第一处理节点生成的第一模型,从而,第一处理节点能够充分利用网络中各个节点的第一模型进行模型膜处理,以生成性能更优的公共模型。
结合第一方面,在第一方面的某些实现方式中,第一处理节点对至少一个第一模型进行处理,生成第一公共模型,包括:第一处理节点对至少一个第一模型进行聚合处理,生成第一公共模型。
根据本实施例的方法,第一处理节点可以对至少一个第一模型进行聚合处理,以生成第一公共模型。其中,该聚合处理可以使得至少一个第一模型被融合为一个性能更优的公共模型,从而提升第一处理节点生成的公共模型的性能。
结合第一方面,在第一方面的某些实现方式中,第一处理节点对至少一个第一模型进行聚合处理,生成第一公共模型,包括:第一处理节点对至少一个第一模型的参数进行处理,生成第一公共模型。
根据本实施例的方法,第一处理节点可以对至少一个第一模型的参数进行处理,从而生成一个性能更优的第一公共模型。
在一种可能的实现方式中,第一处理节点可以通过对至少一个第一模型的参数进行平均处理的方式生成第一公共模型,其中,第一公共模型的参数的取值为该至少一个第一模型的参数的平均值。
在另一种可能的实现方式中,第一处理节点还可以通过计算该至少一个第一模型的参数的其他统计值的方式生成第一公共模型。例如,第一处理节点可以通过计算该至少一个第一模型参数的中位数的方式生成第一公共模型,此时,生成的第一公共模型的参数的取值为该至少一个第一模型的参数的中位数。
结合第一方面,在第一方面的某些实现方式中,第一处理节点对至少一个第一模型的参数进行处理,生成第一公共模型,包括:第一处理节点对至少一个第一模型的参数进行平均处理,生成第一公共模型,其中,第一公共模型的参数的取值为至少一个第一模型的参数的平均值。
根据本实施例的方法,第一处理节点可以通过对至少一个第一模型的参数进行平均处理的方式生成第一公共模型,其中,第一公共模型的参数的取值为该至少一个第一模型的参数的平均值。
需要说明的是,在某些场景中,该平均处理可以是加权平均处理,即,通过对至少一个第一模型的参数进行加权平均处理,生成第一公共模型,此时,生成的第一公共模型的参数的取值为该至少一个第一模型的参数的加权平均值。
结合第一方面,在第一方面的某些实现方式中,至少一个第一模型具有相同的网络结构。
根据本实施例的方法,该至少一个第一模型具有相同的网络结构,这样,第一处理节点可以更方便地对该至少一个第一模型的参数进行处理。
结合第一方面,在第一方面的某些实现方式中,方法还包括:第一处理节点对至少一个第一模型进行蒸馏处理,蒸馏处理使得至少一个第一模型具有相同的网络结构。
根据本实施例的方法,为了使得该至少一个第一模型具有相同的网络结构,第一处理节点可对至少一个第一模型进行蒸馏处理,该蒸馏处理能够使得该少一个第一模型具有相同的网络结构,这样,第一处理节点可以更方便地对该至少一个第一模型的参数进行处理。
结合第一方面,在第一方面的某些实现方式中,第一处理节点对至少一个第一模型进行聚合处理,生成第一公共模型,包括:第一处理节点对至少一个第一模型进行拼接,生成第一公共模型。
根据本实施例的方法,第一处理节点通过对至少一个第一模型进行拼接,能够生成一个性能更优的第一成公共模型。其中,至少一个第一模型的网络结构可以相同,也可以不同。
示例性地,第一处理节点可通过对至少一个第一模型的输入端和输出端分别进行拼接,从而实现对至少一个第一模型的拼接。例如,第一处理节点可将该至少一个第一模型的输入端通过单层感知机进行连接,并将该至少一个第一模型的输出端合并为单层输出,从而实现对该至少一个第一模型的拼接。
结合第一方面,在第一方面的某些实现方式中,至少一个第一模型包括第二公共模型,第二公共模型为上一轮模型处理得到的公共模型。
根据本实施例的方法,第一处理节点可以基于第二公共模型生成本轮的公共模型。也就是说,第一处理节点可以在上一轮得到的公共模型的基础上进一步对该公共模型进行优化,以进一步提升公共模型的性能。
结合第一方面,在第一方面的某些实现方式中,方法还包括:第一处理节点接收来自第三处理节点的第二公共模型,第三处理节点为上一轮模型处理的处理节点。
根据本实施例的方法,当第一处理节点和第三处理节点为不同的处理节点时,第一处理节点可以接收来自第三处理节点的第二公共模型,进而,第一处理节点可以在第二公共模型的基础上对该公共模型进行优化,以进一步提升公共模型的性能。
根据本实施例的方法,第一处理节点和第三处理节点可以是相同的处理节点,也即上一轮模型处理的处理节点和本轮模型处理的处理节点确定为同一节点。
结合第一方面,在第一方面的某些实现方式中,第二处理节点是根据以下一项或多项信息确定的:网络的拓扑结构、第二处理节点的数据质量、第二处理节点的计算能力。
一示例,可根据网络的拓扑结构确定第二处理节点。例如,某个节点在网络的拓扑结构中处于更有利的位置(例如,该位置的节点便于和网络中其他节点进行通信),则可将该节点确定为第二处理节点。这样,有利于提高模型在网络中的传输效率。
一示例,可根据第二处理节点的数据质量确定第二处理节点。例如,若某个节点的数据质量相对较高,则可将该节点确定为第二处理节点。又例如,若网络中某个区域内的节点的数据质量相对较高,则可从该区域中确定一个节点作为第二处理节点。这样,有利于提高第二处理节点生成的公共模型的性能。
又一示例,可根据第二处理节点的计算能力确定第二处理节点。例如,可以比较各个节点的计算能力,从而将计算力较强的节点确定为第二处理节点。这样,有利于提高模型训练的效率。
根据本实施例的方法,在确定第二处理节点时,可以针对实际任务的需求单独考虑上述三项信息中的某一项,还可以综合考虑其中的任意两项或两项以上信息,从而,有利于针对具体的应用场景确定一个合适的第二处理节点,进而提高模型训练的性能。
第二方面,提供了一种模型处理的方法,该方法可以由处理节点执行,或者,也可以由配置于处理节点中的芯片或芯片系统或电路执行,本申请对此不做限制。为了便于描述,下面以由第一处理节点执行为例进行说明。
该方法可以包括:第一处理节点获取至少一个第一模型;第一处理节点对至少一个第一模型进行处理,生成第一公共模型,其中,至少一个第一模型包括第二公共模型,第二公共模型为上一轮模型处理得到的公共模型。
根据本实施例的方法,当第一处理节点为最后一轮模型处理的处理节点时,第一模型可包括上一轮模型处理得到的公共模型(第二共模型),从而,最后一轮模型处理的处理节点可基于上一轮模型处理得到的公共模型进行最终的模型处理,以获得高性能的公共模型。
第二方面的其他实现方式可以参考前述对第一方面的描述,这里不再赘述。
第三方面,提供了一种模型训练的装置。该装置包括:获取单元和处理单元,获取单元,用于获取至少一个第一模型;处理单元,用于对至少一个第一模型进行处理,生成第一公共模型;处理单元,还用于确定第二处理节点,第二处理节点为下一轮模型处理的处理节点,第一公共模型在下一轮模型处理之前被第二处理节点获得。
结合第三方面,在第三方面的某些实现方式中,该装置和第二处理节点为不同的处理节点,该装置还包括发送单元,发送单元,用于向第二处理节点发送第一公共模型。可选地,所述获取单元和所述发送单元为同一单元,或所述获取单元包括所述发送单元。
结合第三方面,在第三方面的某些实现方式中,该装置和第二处理节点为相同的处理节点。
结合第三方面,在第三方面的某些实现方式中,处理单元,还用于基于第一公共模型的指示确定第二处理节点。
结合第三方面,在第三方面的某些实现方式中,获取单元,还用于接收来自至少一个参与节点的第一模型。
结合第三方面,在第三方面的某些实现方式中,该装置还包括发送单元,发送单元,用于向至少一个参与节点发送指示信息,指示信息用于指示至少一个参与节点向该装置发 送至少一个参与节点的第一模型。可选地,所述获取单元和所述发送单元为同一单元,或所述获取单元包括所述发送单元。
结合第三方面,在第三方面的某些实现方式中,获取单元,还用于生成该装置的第一模型。可选地,所述获取单元和所述处理单元为同一单元,或所述获取单元包括所述处理单元。
结合第三方面,在第三方面的某些实现方式中,处理单元,还用于对至少一个第一模型进行聚合处理,生成第一公共模型。
结合第三方面,在第三方面的某些实现方式中,处理单元,还用于对至少一个第一模型的参数进行处理,生成第一公共模型。
结合第三方面,在第三方面的某些实现方式中,处理单元,还用于对至少一个第一模型的参数进行平均处理,生成第一公共模型,其中,第一公共模型的参数的取值为至少一个第一模型的参数的平均值。
结合第三方面,在第三方面的某些实现方式中,至少一个第一模型具有相同的网络结构。
结合第三方面,在第三方面的某些实现方式中,处理单元,还用于对至少一个第一模型进行蒸馏处理,该蒸馏处理使得至少一个第一模型具有相同的网络结构。
结合第三方面,在第三方面的某些实现方式中,处理单元,还用于对至少一个第一模型进行拼接,生成第一公共模型。
结合第三方面,在第三方面的某些实现方式中,至少一个第一模型包括第二公共模型,第二公共模型为上一轮模型处理得到的公共模型。
结合第三方面,在第三方面的某些实现方式中,获取单元,还用于接收来自第三处理节点的第二公共模型,第三处理节点为上一轮模型处理的处理节点。
结合第三方面,在第三方面的某些实现方式中,第二处理节点是根据以下一项或多项信息确定的:网络的拓扑结构、第二处理节点的数据质量、第二处理节点的计算能力。
结合第三方面,在第三方面的某些实现方式中,所述获取单元包括所述发送单元和/或所述处理单元;或者所述获取单元与所述发送单元或所述处理单元为同一单元;或者所述获取单元与所述发送单元或所述处理单元集成在同一单元中。可选的,所述处理单元可以是处理器、处理电路或逻辑电路等,所述发送单元可以是发射器、发射电路、收发器、收发电路、输入/输出接口或电路等。
第四方面,提供了一种模型训练的装置,该装置用于执行第二方面提供的方法。
可选地,该装置可以包括用于执行第二方面提供的方法的模块。
第五方面,提供了一种计算机可读存储介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行上述第一方面或第二方面任一种可能实现方式中的方法。
第六方面,提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第二方面任一种可能实现方式中的方法。
第七方面,提供一种通信装置,该装置用于执行上述第一方面或第二方面提供的方法。具体地,该装置可以包括用于执行第一方面或第二方面的任意一种实现方式提供的方法的单元和/或模块,如处理单元和/或通信单元。
在一种实现方式中,该装置为处理设备。当该装置为处理设备时,通信单元可以是收发器,或,输入/输出接口;处理单元可以是至少一个处理器。可选地,收发器可以为收发电路。可选地,输入/输出接口可以为输入/输出电路。
在另一种实现方式中,该装置为用于处理设备中的芯片、芯片系统或电路。当该装置为用于处理设备中的芯片、芯片系统或电路时,通信单元可以是该芯片、芯片系统或电路上的输入/输出接口、接口电路、输出电路、输入电路、管脚或相关电路等;处理单元可以是至少一个处理器、处理电路或逻辑电路等。
第八方面,提供一种通信装置,该装置包括:至少一个处理器,用于执行存储器存储的计算机程序或指令,以执行上述第一方面或第二方面的任意一种实现方式提供的方法。可选地,该通信装置还包括存储器,用于存储程序。
在一种实现方式中,该装置为处理设备。
在另一种实现方式中,该装置为用于处理设备中的芯片、芯片系统或电路。
第九方面,本申请提供一种处理器,用于执行上述各方面提供的方法。
对于处理器所涉及的发送和获取/接收等操作,如果没有特殊说明,或者,如果未与其在相关描述中的实际作用或者内在逻辑相抵触,则可以理解为处理器输出和接收、输入等操作,也可以理解为由射频电路和天线所进行的发送和接收操作,本申请对此不做限制。
第十方面,提供一种芯片,芯片包括处理器与通信接口,处理器通过通信接口读取存储器上存储的指令,执行上述第一方面或第二方面中任一方面的任意一种实现方式提供的方法。
可选地,作为一种实现方式,芯片还包括存储器,存储器中存储有计算机程序或指令,处理器用于执行存储器上存储的计算机程序或指令,当计算机程序或指令被执行时,处理器用于执行上述第一方面或第二方面中任一方面的任意一种实现方式提供的方法。
第十一方面,提供一种芯片,芯片包括逻辑电路和通信接口,通信接口用于接收待处理的数据和/或信息,并将待处理的数据和/或信息传输至逻辑电路,逻辑电路用于执行上述第一方面或第二方面中任意一种实现方式提供的方法。
附图说明
图1是本申请实施例提供的一种通信系统的示意图。
图2是适用于本申请的一种网络拓扑结构的示意图。
图3是适用于联邦学习的一种网络拓扑结构的示意图。
图4是本申请实施例提供的模型训练的方法的一例示意图。
图5是第一处理节点对至少一个第一模型进行拼接的一例示意图。
图6是本申请实施例提供的模型训练方法的一种可能的实现流程的示意图。
图7是本申请实施例提供的一种模型训练的装置的示意性框图。
图8是本申请实施例提供的一种通信装置的示意性框图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
本申请提供的技术方案可以应用于各种通信系统,例如:第五代(5th generation,5G) 或新无线(new radio,NR)系统、第六代(6th generation,6G)系统、长期演进(long term evolution,LTE)系统、LTE频分双工(frequency division duplex,FDD)系统、LTE时分双工(time division duplex,TDD)系统等。本申请提供的技术方案还可以应用于设备到设备(device to device,D2D)通信,车到万物(vehicle-to-everything,V2X)通信,机器到机器(machine to machine,M2M)通信,机器类型通信(machine type communication,MTC),以及物联网(internet of things,IoT)通信系统或者其他通信系统或未来通信系统。
为了便于理解,下面对本申请涉及的名词或术语进行说明。
1.模型处理
模型处理,指的是将一个或多个模型作为输入,并对该一个或多个模型执行相应的处理操作的过程。在本申请的实施例中,模型处理可以进行多轮。
2.处理节点
在本申请的实施例中,处理节点,表示对至少一个第一模型进行处理以生成公共模型的节点。其中,对该第一模型进行的处理例如可以是聚合处理,该聚合处理能够使得至少一个第一模型被融合为一个公共模型。
3.参与节点
参与节点,表示本轮模型处理中,除处理节点之外的节点。在本申请的实施例中,参与节点可用于向处理节点提供第一模型。
4.第一模型
第一模型,表示本轮模型处理的处理节点在生成本轮的公共模型时所依据的模型。其中,第一模型可以是本轮的参与节点基于本地数据生成的模型,或者,还可以是本轮的处理节点基于本地数据生成的模型,或者,还可以是上一轮模型处理得到的公共模型。其中,第一模型可以由本轮的参与节点提供,还可以由本轮的处理节点自身生成。
5.公共模型
公共模型,表示对至少一个第一模型进行处理,从而生成的一个模型。对于多轮模型处理,最后一轮模型处理得到的公共模型可作为最终的输出,进而,该最终输出的公共模型可用于相应的实际任务中。其中,公共模型还可以称为全局模型。
应理解,本申请中对各类模型和节点的命名仅为便于理解本申请的实施例进行的示例性说明,不对本申请的保护范围构成任何限定。
为便于理解本申请实施例,首先结合图1详细说明本申请实施例提供的一种通信系统。
图1是本申请实施例提供的一种通信系统100的示意图。该通信系统100可以包括两个或两个以上参与模型训练的设备(节点),例如图1所示的设备#1至设备#6。
其中,参与模型训练的设备可以是终端设备(例如,设备#1至设备#4),也可以是网络设备(例如,设备#5和设备#6)。
本申请实施例中的终端设备可以指用户设备、接入终端、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、终端、无线通信设备、用户代理或用户装置。终端设备还可以是蜂窝电话、无绳电话、会话启动协议(session initiation protocol,SIP)电话、无线本地环路(wireless local loop,WLL)站、个人数字处理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的 其它处理设备、虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端、车载设备、无人机设备、可穿戴设备,未来6G网络中的终端设备或者未来演进的公用陆地移动通信网络(public land mobile network,PLMN)中的终端设备等,本申请实施例对此不做限制。其中,可穿戴设备不仅仅是一种硬件设备,更是通过软件支持以及数据交互、云端交互来实现强大的功能。广义穿戴式智能设备包括功能全、尺寸大、可不依赖智能手机实现完整或者部分的功能,例如:智能手表或智能眼镜等,以及只专注于某一类应用功能,需要和其它设备如智能手机配合使用,如各类进行体征监测的智能手环、智能首饰等。
本申请实施例中的网络设备可以是用于与终端设备通信的设备,该网络设备可以是宏基站、微基站(也称为小站)、卫星、无线网络控制器(radio network controller,RNC)、节点B(Node B,NB)、基站控制器(base station controller,BSC)、基站收发台(base transceiver station,BTS)、家庭基站(例如,home evolved NodeB,或home Node B,HNB)、基带单元(baseband unit,BBU),WiFi系统中的AP、无线中继节点、无线回传节点、传输点(transmission point,TP)或者发送接收点(transmission and reception point,TRP)等,还可以为5G(如,NR)系统中的gNB或传输点(TRP或TP),5G系统中的基站的一个或一组(包括多个天线面板)天线面板,或者,还可以为构成gNB或传输点的网络节点,如分布式单元(distributed unit,DU)。或者该网络设备可以为中继站、接入点、车载设备、可穿戴设备以及未来演进的PLMN网络中的网络设备等,本申请实施例并不限定。本申请实施例对网络设备所采用的具体技术和具体设备形态不做限制。
在本申请实施例中,网络设备和终端设备可以部署在陆地上,包括室内或室外、手持或车载;也可以部署在水面上;还可以部署在空中的飞机、气球和卫星上。本申请实施例中对网络设备和终端设备所处的场景不做限定。
应理解,图1仅为便于理解而示例的简化示意图,本申请对于参与模型训练的设备的数量不做限定,例如,该通信系统中还可以包括其他网络设备和/或终端设备,图1中未予以画出。
图2示出了上述通信系统100的一种可能的网络拓扑结构的示意图。网络的拓扑结构,还可以理解为,该网络中各节点之间的连接方式,或者还可以理解为,网络中各节点之间的连通条件。该连接方式可以是有线的连接方式,也可以是无线的连接方式,本申请不予限定。
如图2所示,该网络中可以包括节点N1至N6。作为示例,节点N1至N6可对应于上述设备#1至设备#6。在图2所示的网络拓扑结构中,每个节点能够与其他至少一个节点进行通信。作为示例,节点N1能够与N2、N3、N4、N5、N6进行通信,节点N2能够与N1、N3、N5进行通信,节点N3能够与N1、N2、N4、N5、N6进行通信,节点N4至N6的与其他节点进行通信的情况与节点N1至N3类似,在此不再赘述。
应理解,图2所示的网络拓扑结构仅为便于理解本申请的实施例进行的示例性说明,不对本申请的保护范围构成任何限定。例如,在另一种可能的网络拓扑结构中,任意两个 节点之间均能够进行通信。
还应理解,图2所示的网络拓扑结构的示意图,可以是该网络在某一时刻的网络拓扑结构的示意图,在实际应用中,网络拓扑结构还可以动态地发生变化。
在图2所示的网络拓扑结构中,任何一个节点都可以被确定为处理节点,该处理节点可在模型训练的过程中动态变更。
可选地,该处理节点用于实现以下功能中的至少一项:
功能1:从网络中至少一个节点接收模型,并将接收的模型进行处理后发送至网络中的其他节点。例如,处理节点可将接收的模型进行聚合处理后发送至网络中的其他节点。
功能2:生成(产生)模型,并从网络中的至少一个节点接收模型,再将生成的模型和接收的模型进行处理后发送至网络中的其他节点,例如,处理节点可将生成的模型和接收的模型进行聚合处理后发送至网络中的其他节点。
相应地,可以定义处理节点的以下功能特性:
单极性(polarity):节点具备单极性,指的是该节点能够将至少一个模型作为输入,并将该至少一个模型进行处理(例如,聚合处理)后输出。其中,输出的模型个数为一个。
具备单级性的节点还可以称为单级性节点(the plurality node)。
可选地,还可以定义处理节点的另一种功能特性:
多样性(plurality):节点具备多样性,指的是该节点能够将至少一个模型作为输入,并对该至少一个模型中的每一个模型进行计算处理(例如,蒸馏处理)后输出。其中,当输入的模型个数为多个时,输出的模型个数也可以是多个。
具备多样性的节点还可以称为多样性节点(the plurality node)。
随着大数据时代的到来,每台设备每天都会以各种形式产生大量的原始数据,这些数据将以“孤岛”的形式诞生并存在于世界的各个角落。
为了充分利用这些数据进行模型训练,目前最典型的两种训练架构分别为集中式学习(centralized learning,CL)和联邦学习(federated learning,FL)。
其中,集中式学习要求各个边缘设备将本地数据统一传输到中心服务器上,之后,中心服务器再利用收集到的数据进行模型的训练与学习。然而,这一架构随着时代的发展逐渐受到如下因素的限制:
(1)边缘设备广泛地分布于世界上各个地区和角落,这些设备将以飞快的速度源源不断地产生和积累巨大量级的原始数据。若中心服务器收集来自全部边缘设备的原始数据,势必会带来巨大的通信损耗和算力需求。
(2)随着实际场景的复杂化,越来越多的学习任务要求边缘设备能够做出及时而有效的决策与反馈。集中式学习由于涉及到大量数据的上传势必会导致较大程度的时延,致使模型训练过程无法满足实际任务场景下的实时需求。
(3)考虑到行业竞争、用户隐私安全、行政手续复杂等问题,对数据进行集中整合将面临越来越大的制约。因此,系统的部署方式将越来越倾向于在本地存储数据,同时由边缘设备自身完成模型的本地训练。
为了打破上述限制,联邦学习架构被提出。
其中,联邦学习是一种分布式的机器学习方法。在联邦学习过程中,可利用多个边缘设备的本地数据在该多个边缘设备上进行模型训练,然后将训练好的模型上传至中心服务 器,中心服务器可作为处理节点对来自多个边缘设备的模型进行聚合生成公共模型,并将公共模型下发给该多个边缘设备,从而,该多个边缘设备可基于本地数据对该公共模型进行更新。通过反复执行上述步骤直至模型收敛或训练论数达到预设的上限,最终可获得高性能的机器学习模型。
图3是适用于联邦学习的一种网络拓扑结构的示意图。该网络中包括处理节点Nm和其他节点N1至N6,为便于描述,节点N1至N6可以称为参与节点。其中,处理节点例如可以是中心服务器,参与节点例如可以是边缘设备。
如图3所示,联邦学习设想的网络拓扑结构是固定的星型结构,且中央的处理节点是不可或缺的。
作为示例,下面以FedAvg算法为例介绍联邦学习的大致流程。FedAvg算法是联邦学习领域中的一种基础算法,该算法可包括如下步骤:
步骤1:处理节点初始化公共模型并将公共模型发送至所有参与节点。
步骤2:在第t∈[1,T]轮中,参与节点k∈[1,K]基于本地数据集对接收到的公共模型进行E个epoch的训练,或者说,进行E次迭代更新,以得到本地训练模型并将本地训练模型上报给处理节点。t的初始值取1。
步骤3:处理节点对接收的全部或部分模型进行聚合处理,得到公共模型
例如,处理节点可通过计算该全部或部分模型的参数的加权平均值得到公共模型具体地,假设第t轮上传本地训练模型的参与节点的集合为则处理节点可通过如下法则获得公共模型
其中,Dk表示集合中索引号为k的参与节点的样本数。随后,处理节点可将获得的公共模型发送至所有参与节点,以进行新一轮的训练。
步骤4:将t的取值加1,转到步骤2。重复执行上述步骤2和步骤3,直至模型收敛或训练轮数达到预设的上限。
在目前的联邦学习架构中,用于生成公共模型的处理节点是固定不变的。但是,在不同的应用场景中,将固定的节点作为处理节点可能并非最优的方案,例如,随着网络拓扑结构的变化,以及各节点产生的数据的变化,可能会出现更优选的处理节点。
为此,本申请提供一种模型训练的方法和装置,该方法可以在下一轮模型处理之前,根据实际需求确定下一轮模型处理的处理节点,以适应于应用场景的变化。也就是说,在本申请提供的模型训练的方法中,处理节点可以在模型训练过程中动态变更。这样,通过本申请提供的模型训练方法,可以在下一轮模型处理之前,根据实际需求确定下一轮模型处理的处理节点。当下一轮模型处理的处理节点与本轮模型处理的处理节点为不同的处理节点时,本轮模型处理的处理节点可将生成的公共模型发送至下一轮模型训练的处理节点。由于下一轮模型处理的处理节点可以根据实际需求任意指定,因此,通过多轮模型处理,能够使得公共模型在网络中的各个不同节点之间连续传输。作为示例,这一架构也可以认为是一种“模型跟随数据(model-follow-data)”的架构。
下文将结合附图详细说明本申请实施例提供的模型训练的方法。本申请实施例提供的 模型训练的方法可以应用于上述图1所示的通信系统和图2所示的网络拓扑结构中。
图4是本申请实施例提供的模型训练的方法的一例示意图。该方法400可以包括S410至S430。
S410,第一处理节点获取至少一个第一模型。
其中,第一处理节点获取第一模型,还可以理解为,第一处理节点获取第一模型的相关信息。例如,第一处理节点可以获取有关第一模型的以下一项或多项信息:第一模型的参数集、第一模型所对应的神经网络的结构、第一模型参数的运算规则。作为示例,第一模型的参数集中可包括该第一模型所对应的神经网络的训练权重。
为简洁,在本申请的实施例中,模型所对应的神经网络的结构,可以简称为模型的网络结构。
可选地,模型的相关信息通过以下一种或多种形式进行描述:模型图、模型参数、模型表、模型算法、数据库等,本申请对此不予限定。
在S410中,第一处理节点可通过以下至少一种方式获取至少一个第一模型。
第一种可能的方式,第一处理节点可以接收来自至少一个参与节点的第一模型,也即该至少一个第一模型中可以包括来自至少一个参与节点的第一模型。以图2所述的架构为例,例如,当第一处理节点为图2中的节点N1时,该节点N1可以接收来自节点N2至N6中至少一个节点利用本地数据集生成的第一模型。
可选地,基于该第一种可能的方式,第一处理节点在接收来自至少一个参与节点的第一模型之前,方法400还可以包括:第一处理节点向至少一个参与节点发送指示信息,该指示信息可用于指示至少一个参与节点向第一处理节点发送(上传)该至少一个参与节点的第一模型。
一种可能的情况,参与节点在收到指示信息之前已经生成了第一模型,或者说,参与节点本地已经保存有第一模型,此时,若参与节点收到来自第一处理节点的指示信息,则可根据该指示信息的指示向第一处理节点上传第一模型。
另一种可能的情况,参与节点在收到指示信息时尚未生成第一模型,或者说,参与节点本地未保存有第一模型,此时,若该参与节点需要参与模型训练任务,则该参与节点可以在收到来自第一处理节点的指示信息之后,生成第一模型,进而,可将生成的第一模型上传至第一处理节点。
可选地,该指示信息还用于指示第一处理节点通过何种方式生成第一公共模型。
示例性地,第一处理节点通过方式1和/或方式2生成第一公共模型(生成第一公共模型的方式1和方式2下文将描述,这里暂不详述),则该指示信息中可携带方式1或方式2对应的标签,从而,收到该指示信息的参与节点可将该标签携带于第一模型的信息中,再将该第一模型(或者说,第一模型的信息)发送至第一处理节点,如此,第一处理节点可根据该第一模型的信息中携带的标签确定使用方式1和/或方式2对该第一模型进行处理,以生成第一公共模型。
第二种可能的方式,第一处理节点可以生成第一模型,也即该至少一个第一模型中可以包括第一处理节点生成的第一模型。以图2所述的架构为例,例如,当第一处理节点为图2中的节点N1时,该节点N1可以利用本地数据集生成第一模型。
根据上述两种可能的方式,第一处理节点可以接收来自至少一个参与节点的第一模型, 还可以本地生成第一模型,因此,第一处理节点能够充分利用网络中各个节点的第一模型进行模型训练,以生成性能更优的公共模型。
可选地,至少一个第一模型包括第二公共模型,其中,第二公共模型为上一轮模型处理得到的公共模型。例如,第一处理节点为第t(t大于1)轮模型处理的处理节点,则第一模型除了可以包括来自至少一个参与节点的第一模型和/或第一处理节点生成的第一模型之外,还包括上一轮(第t-1轮)模型处理得到的公共模型,从而,第一处理节点可以在上一轮得到的公共模型的基础上对该公共模型进行优化,以进一步提升该公共模型的性能。
假设上一轮模型处理的处理节点为第三处理节点。
一种可能的情况,第三处理节点和第一处理节点为不同的处理节点,则在该情况下,第一处理节点可以接收来自第三处理节点的第二公共模型,或者说,第二公共模型可以是从第三处理节点接收的。
另一种可能的情况,第三处理节点和第一处理节点为相同的处理节点,或者说,上一轮模型处理的处理节点和本轮模型处理的处理节点为同一处理节点,则在该情况下,第二公共模型可以是第一处理节点在上一轮模型处理中生成的公共模型。
S420,第一处理节点对至少一个第一模型进行处理,生成第一公共模型。
在本实施例中,第一处理节点可对获取的至少一个第一模型进行处理,从而生成第一公共模型。
作为示例,第一处理节点可对至少一个第一模型进行聚合处理,从而生成第一公共模型。其中,该聚合处理能够使得至少一个第一模型被融合为一个性能更优的公共模型。
作为示例,第一处理节点对至少一个第一模型进行处理生成第一公共模型的方式可以包括以下方式中的至少一种:
方式1:第一处理节点可以对至少一个第一模型的参数进行处理,从而生成第一公共模型。
方式2:第一处理节点可以对至少一个第一模型进行拼接,从而生成第一成公共模型。
下面分别介绍上述方式1和方式2。
方式1
在方式1中,第一处理节点可以对至少一个第一模型的参数进行处理,从而生成第一公共模型。
在一种可能的实现方式中,第一处理节点可以通过对至少一个第一模型的参数进行平均处理的方式生成第一公共模型,其中,第一公共模型的参数的取值为该至少一个第一模型的参数的平均值。
作为一个示例,至少一个第一模型包括第一模型#1和第一模型#2,其中,第一模型#1的参数例如为[a1b1c1],第一模型#2的参数例如为[a2b2c2],则生成的公共模型的参数的取值为[(a1+a2)/2(b1+b2)/2(c1+c2)/2]。
在某些场景中,该平均处理可以是加权平均处理,即,通过对至少一个第一模型的参数进行加权平均处理的方式生成第一公共模型,此时,生成的第一公共模型的参数的取值为该至少一个第一模型的参数的加权平均值。
在另一种可能的实现方式中,第一处理节点还可以通过计算该至少一个第一模型的参 数的其他统计值的方式生成第一公共模型。例如,第一处理节点可以通过计算该至少一个第一模型参数的中位数的方式生成第一公共模型,此时,生成的第一公共模型的参数的取值为该至少一个第一模型的参数的中位数。
可选地,在方式1中,该至少一个第一模型具有相同的网络结构。
可选地,为了使得该至少一个第一模型具有相同的网络结构,方法400还包括:第一处理节点对至少一个第一模型进行蒸馏处理,该蒸馏处理可以使得该少一个第一模型具有相同的网络结构,这样,第一处理节点可以更方便地对该至少一个第一模型的参数进行处理。
可选地,该蒸馏处理还使得该至少一个第一模型具有相同的模型参数量和/或相同的运算规则。
作为示例,该蒸馏处理可以使得模型的参数量减小,也可以使得模型的参数量扩大。
在一种可能的实现方式中,第一处理节点可以根据自身的计算能力确定期望的第一模型的参数量,进而可通过蒸馏处理使得第一模型的参数量与期望的第一模型的参数量一致。例如,第一处理节点的计算能力较强,则可通过蒸馏处理使得第一模型具有较大的参数量,以提升第一模型和生成的公共模型的性能;又例如,第一处理节点的计算能力较弱,则可通过蒸馏处理使得第一模型具有较小的参数量,以提高模型训练的效率。如此,能够使得第一模型的参数量适配第一处理节点的计算能力。
需要说明的是,上述蒸馏处理可以通过任何一种模型蒸馏(model distilling)算法实现,本申请对此不予限定。
可选地,本实施例中的蒸馏处理,还可以替换为其他能够使得模型具有相同的网络结构的算法,例如,可以替换为其他能够使得模型具有相同的网络结构的模型压缩(model compression)算法或模型扩张(model dilatation)算法。
方式2
在方式2中,第一处理节点可以对至少一个第一模型进行拼接,从而生成第一公共模型。
其中,至少一个第一模型的网络结构可以相同,也可以不同。
其中,模型的网络结构不同,还可以理解为,模型所对应的神经网络中,网络的层数不同,和/或,对于某一层而言,所包括的节点数不同。
示例性地,第一处理节点可通过对至少一个第一模型的输入端和输出端分别进行拼接,从而实现对至少一个第一模型的拼接。
下面结合图5介绍第一处理节点对至少一个第一模型进行拼接的一种可能的实现方式。
图5示出了第一处理节点对至少一个第一模型进行拼接的一例示意图。如图5所示,第一处理节点可将该至少一个第一模型的输入端通过单层感知机进行连接,并将该至少一个第一模型的输出端合并为单层输出,从而实现对该至少一个第一模型的拼接。
需要说明的是,本申请对于单层感知机所包括的节点数以及输出端合并后的节点数不做限制。例如,如图5所示,单层感知机以及合并后的输出端可以分别由3个节点构成。
作为一种实现方式,上述将该至少一个第一模型的输入端通过单层感知机进行连接,可以指的是,将该单层感知机的所有节点分别与该至少一个第一模型的输入端的所有节点 相连。如图5所示,单层感知机的3个节点可以分别与该至少一个第一模型的输入端的所有节点相连。
作为一种实现方式,上述将该至少一个第一模型的输出端合并为单层输出,可以指的是,利用单层输出代替该至少一个第一模型原有的输出,并将该单层输出中的所有节点分别与上一层的所有节点相连。如图5所示,单层输出中的3个节点可以分别与上一层的所有节点相连。
应理解,上述对至少一个第一模型的输入端和输出端分别进行拼接的方式是示例性地,显然,还可以通过其他方式对至少一个第一模型的输入端和输出端分别进行拼接。例如,在另一种可能的实现方式中,可以将该至少一个第一模型的输入端合并为单层输出,并将该至少一个第一模型的输出端通过单层感知机进行拼接;又例如,还可以将该至少一个第一模型的输入端和输出端分别合并为单层输出;又例如,还可以将该至少一个第一模型的输入端和输出端分别通过单层感知机进行拼接;再例如,上述单层感知机还可以替换为多层感知机,单层输出还可以替换为多层输出,本申请对此不予限定。
可选地,第一处理节还对拼接后的模型的结构进行调整。例如,第一处理节可对拼接后的模型的结构增加层或删除层,或者,第一处理节可对拼接后的模型的结构增加节点或删除节点等,本申请不予限定。
应理解,第一处理节点完成对至少一个第一模型的拼接之后,可基于本地数据使用反向传递算法对拼接之后的模型进行训练,从而生成第一公共模型。
可选地,若使用方式2生成第一公共模型,则方法400还包括:对生成的第一公共模型进行模型剪枝操作。例如,可以通过模型剪枝操作删去该第一公共模型中的一些冗余的层或节点,以使得该第一公共模型更适合于在通信网路中进行传输,从而降低通信负载。
需要说明的是,上述模型剪枝操作可以通过任何一种模型剪枝算法实现,本申请对此不予限定。
可选地,在本申请的实施例中,第一处理节点还根据自身的计算能力,确定对获取的至少一个第一模型中的全部模型或部分模型进行处理。例如,当第一处理节点的计算能力不足时,第一处理节点可选择性地对获取的至少一个第一模型中的部分模型进行处理,而不需要对全部模型进行处理,从而使得第一处理节点所处理的模型数量能够与该第一处理节点的计算能力相匹配。
可选地,若第一处理节点为最后一轮模型处理前,任一轮模型处理的处理节点,则方法400包括S430。
S430,第一处理节点确定第二处理节点,第二处理节点为下一轮模型处理的处理节点,第一公共模型在下一轮模型处理之前被第二处理节点获得。
在本实施例中,第一处理节点可在下一轮模型处理之前,为下一轮模型处理确定合适的处理节点,从而,能够适应于不同的应用场景,进而提升模型训练的性能。
其中,第一处理节点和第二处理节点可以是相同的处理节点,也可以是不同的处理节点。
一种可能的情况,第一处理节点和第二处理节点为不同的处理节点。在该情况下,方法400还可以包括:第一处理节点在下一轮模型处理之前向第二处理节点发送第一公共模型,从而,第一公共模型可在下一轮模型处理之前被第二处理节点获得。
根据本实施例的方法,由于第二处理节点可以由第一处理节点任意指定,因此,通过多轮模型处理,能够使得公共模型在网络中的各个不同节点之间连续传输。此外,由于本轮模型处理的处理节点在生成公共模型之后,可将公共模型发送至下一轮模型处理的处理节点,而无需将公共模型下发至所有参与节点,因此有利于降低通信开销。
另一种可能的情况,第一处理节点和第二处理节点为相同的处理节点,也就是说,本轮模型处理的处理节点与下一轮模型处理的处理节点为同一处理节点,则在该情况下,第一处理节点可以不需要向第二处理节点发送第一公共模型。
作为示例,第二处理节点可通过以下方式中的至少一种方式确定:
方式1:基于第一公共模型的指示确定第二处理节点。
方式2:根据以下一项或多项信息确定第二处理节点:网络的拓扑结构、第二处理节点的数据质量、第二处理节点的计算能力。
下面分别介绍上述方式1和方式2。
方式1
在方式1中,第一处理节点可基于第一公共模型的指示确定第二处理节点。
一示例,第一公共模型可根据该第一公共模型的特征向第一处理节点指示第二处理节点。例如,第一公共模型的特征可以是该第一公共模型的参数量的大小。一种可能的情况,该第一公共模型的参数量较大,因此期望通过计算能力较强的节点对该第一公共模型进行处理,则在该情况下,第一公共模型可指示第一处理节点将某个计算能力较强的节点确定为第二处理节点;又例如,第一公共模型的特征可以是该第一公共模型当前的功能特征。举例来说,该第一公共模型当前的功能为分类功能,因此,若网络中某个节点处具有用于分类学习任务的本地数据,则在该情况下,第一公共模型可指示第一处理节点将该节点确定为第二处理节点。
另一示例,第一公共模型可根据该第一公共模型的参数向第一处理节点指示第二处理节点。例如,该第一公共模型的参数中包括相应的路由信息,该路由信息可用于指示下一轮模型处理的处理节点,从而,第一处理节点可根据该第一公共模型中的路由信息确定第二处理节点。
作为示例,该路由信息可以是预先配置的信息,还可以是模型训练过程中动态配置的信息,本申请不予限定。
通过上述方式1确定第二处理节点,有利于为第一公共模型确定一个与该第一公共模型的特征或需求相匹配的第二处理节点,进而有利于提升模型训练的性能。
方式2
在方式2中,可根据以下一项或多项信息确定第二处理节点:网络的拓扑结构、第二处理节点的数据质量、第二处理节点的计算能力。
一示例,可根据网络的拓扑结构确定第二处理节点。例如,某个节点在网络的拓扑结构中处于更有利的位置,则可将该节点确定为第二处理节点。
举例来说,当网络的拓扑结构如图2所示时,节点N1、N3、N5分别可与网络中的5个节点进行通信,节点N2、N4、N6分别可与网络中3个节点进行通信,也就是说,相比于节点N2、N4、N6,节点N1、N3、N5便于和较多节点进行通信,因此,可将节点N1、N3或N5确定为第二处理节点。
通过根据网络的拓扑结构确定第二处理节点,有利于提高模型在网络中的传输效率。
另一示例,可根据第二处理节点的数据质量确定第二处理节点。例如,网络中某个节点的数据质量相对较高,则可将该节点确定为第二处理节点。又例如,网络中某个区域内的节点的数据质量相对较高,则可从该区域中确定一个节点作为第二处理节点。
作为一个可选的实施例,在开始某一轮模型处理之前,若网络中某个区域内的节点的数据质量相对较高,则可从该区域中确定一个节点作为第二处理节点,并将该区域中的其他节点作为参与节点,从而完成一轮模型处理。
可选地,第二处理节点的数据质量通过以下任一种方式进行量化:
在一种可选的方式中,可基于该第二处理节点的本地数据进行模型训练,从而,可通过检测模型训练的收敛时间和模型推演时完成任务的准确度来量化该第二处理节点的数据质量。
在另一种可选的方式中,可通过计算该第二处理节点的数据是否符合某种约定的数据分布来量化该第二处理节点的数据质量。
通过根据第二处理节点的数据质量确定第二处理节点,有利于提高第二处理节点生成的公共模型的性能。
又一示例,可根据第二处理节点的计算能力确定第二处理节点。例如,可以比较各个节点的计算能力,从而将计算力较强的节点确定为第二处理节点。
通过根据第二处理节点的计算能力确定第二处理节点,有利于提高模型训练的效率。
应理解,在确定第二处理节点时,可以针对实际任务的需求单独考虑上述三项信息中的某一项,还可以综合考虑其中的任意两项或两项以上信息,从而,有利于针对具体的应用场景确定一个合适的第二处理节点,进而提高模型训练的性能。
还应理解,在确定第二处理节点时,可采用上述方式1或方式2中的任一种方式,还可以结合上述方式1和方式2共同确定第二处理节点,本申请不予限定。
还应理解,在某些场景中,第一处理节点还可以基于预配置的信息确定第二处理节点。作为示例,该预配置的信息可以通过其他设备发送至第一处理节点,或者,第一处理节点还可以预先保存该预配置的信息,本申请不予限定。
由于在模型的训练过程中,随着网络拓扑结构的变化,以及各节点产生的数据的变化,优选的处理节点可能发生变化,因此,在本实施例中,第一处理节点通过为下一轮模型处理确定合适的处理节点,能够更好地适应于应用场景的变化,进而提升模型训练的性能。
可选地,若第一处理节点为最后一轮模型处理的处理节点,则在该情况下,不需要执行S430。此时,第一处理节点获取的至少一个第一模型中,包括上一轮模型处理得到的公共模型,从而,第一处理节点可基于上一轮模型处理得到的公共模型进行最终的模型处理,以获得高性能的公共模型。
可选地,在最后一轮模型处理完成之后,方法400还包括:最后一轮模型处理的处理节点向其他节点(例如,至少一个参与节点)发送最后一轮模型处理得到的公共模型,从而,其他节点可将该最后一轮模型处理得到的公共模型用于相应的实际任务中。
上文结合图4和图5介绍了本申请实施例提供的一种模型训练的方法。为便于理解本申请的实施例,下面结合图6介绍本申请实施例提供的模型训练方法的一种可能的实现流程。
如图6所示,该实现流程可包括如下步骤:
S610,初始化节点集合,确定模型训练的结束条件。
其中,模型训练的结束条件可包括以下至少一种:模型处理的轮数达到轮数上限T、生成的公共模型满足模型收敛条件(例如,当生成的公共模型满足性能需求时,模型收敛)。其中,T为大于或等于1的整数。
在一种可能的实现方式中,满足上述两种条件中的任何一种时,模型训练结束。
在另一种可能的实现方式中,同时满足上述两种条件时,模型训练结束。
可选地,S610还包括:确定轮数上限T和/或模型收敛条件。
在第一轮模型处理开始之前,可在具备网络连通条件的节点中确定第一轮参与模型训练的节点,并由这些节点组成初始的节点集合。
其中,节点集合可包括处理节点集合和参与节点集合。处理节点集合中可包括第一轮模型处理的处理节点,参与节点集合中可包括该节点集合中除该处理节点之外的节点。
作为一个可选的实施例,S610还可以包括:确定每一轮模型处理之后,对公共模型进行额外优化处理的次数m。
根据本实施例的方法,在每一轮模型处理之后,还可以使用某种优化算法对该轮生成的公共模型进行m次优化处理。例如,在每一轮模型处理之后,还可以采用联邦学习方法或其他方法对该轮生成的公共模型进行m次优化处理,以进一步提升公共模型的性能。
S620,第t轮模型处理的处理节点获取至少一个第一模型。
一示例,第t轮模型处理的处理节点可以通过接收来自至少一个参与节点的第一模型的方式获取至少一个第一模型,也即该至少一个第一模型中可以包括来自至少一个参与节点的第一模型。其中,至少一个参与节点属于参与节点集合。
在一种可能的实现方式中,第t轮模型处理的处理节点可以向至少一个参与节点发送指示信息,从而,至少一个参与节点在接收到该指示信息后,可根据该指示信息的指示,向第t轮模型处理的处理节点上传该至少一个参与节点的第一模型。其中,该第一模型可以是参与节点在收到指示信息之前生成的,也可是在收到指示信息后生成的,不予限定。
有关指示信息的介绍可参考前述方法实施例中的S410,未避免重复,这里不再赘述。
另一示例,第t轮模型处理的处理节点还可以通过自身生成第一模型的方式获取至少一个第一模型。
可选地,当t大于1时,至少一个第一模型包括第t-1轮模型处理得到的公共模型。
S630,第t轮模型处理的处理节点对至少一个第一模型进行处理,生成第t轮的公共模型。
可选地,第t轮模型处理的处理节点通过前述方法实施例S420中的方式1或方式2中的至少一种方式生成第t轮的公共模型。有关方式1和方式2的介绍可参考前述方法实施例中的S420,未避免重复,这里不再赘述。
可选地,若第t轮模型处理的处理节为最后一轮模型处理前,任一轮模型处理的处理节点,则该实现流程包括S640。
S640,第t轮模型处理的处理节点确定第t+1轮模型处理的处理节点,并更新节点集合。
其中,更新后的节点集合中可包括第t+1轮参与模型训练的节点。相应地,更新后的 处理节点集合中可包括第t+1轮模型处理的处理节点,更新后的参与节点集合中可包括第t+1轮节点集合中除第t+1轮模型处理的处理节点之外的节点。
可选地,若第t轮模型处理的处理节点与第t+1轮模型处理的处理节点为不同的处理节点,则该实现流程还包括S650。
S650,第t轮模型处理的处理节点向第t+1轮模型处理的处理节点发送第t轮的公共模型。
从而,第t+1轮模型处理的处理节点可在第t轮的公共模型的基础上进一步对该公共模型进行优化,以提升公共模型的性能。
可选地,若第t轮模型处理为最后一轮模型处理,则不需要执行S640和S650,此时,若t大于1(即最后一轮模型处理并非第一轮模型处理),则第t轮模型处理的处理节点获取的至少一个第一模型中,包括第t-1轮模型处理得到的公共模型,从而,第t轮模型处理的处理节点可基于第t-1轮模型处理得到的公共模型进行最终的模型处理,以获得高性能的公共模型。
通过多轮模型处理(优化学习)直至满足模型训练的结束条件,最终可输出高性能的公共模型。
可以理解,本申请实施例中的图4至图6中的例子仅仅是为了便于本领域技术人员理解本申请实施例,并非要将本申请实施例限于例示的具体场景。本领域技术人员根据图4至图6的例子,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。
还可以理解,本申请的各实施例中的一些可选的特征,在某些场景下,可以不依赖于其他特征,也可以在某些场景下,与其他特征进行结合,不作限定。
还可以理解,本申请的各实施例中的方案可以进行合理的组合使用,并且实施例中出现的各个术语的解释或说明可以在各个实施例中互相参考或解释,对此不作限定。
还可以理解,在本申请的各实施例中的各种数字序号的大小并不意味着执行顺序的先后,仅为描述方便进行的区分,不应对本申请实施例的实施过程构成任何限定。
还可以理解,在本申请的各实施例中,第一、第二、#1、#2等数字编号仅为描述方便进行的区分,并不用来限制本申请实施例的范围。
还可以理解,在本申请的各实施例中的通信装置之间所传输的信息的名称,其命名不对本申请实施例的保护范围造成限定。
还可以理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
还可以理解,上述各个方法实施例中,由处理节点实现的方法和操作,也可以由可由处理节点的组成部件(例如芯片或者电路)来实现,不作限定。
相应于上述各方法实施例给出的方法,本申请实施例还提供了相应的装置,所述装置包括用于执行上述各个方法实施例相应的模块。该模块可以是软件,也可以是硬件,或者是软件和硬件结合。可以理解的是,上述各方法实施例所描述的技术特征同样适用于以下装置实施例。
图7是本申请实施例提供的模型处理的装置700的示意性框图。该装置700包括获取 单元710和处理单元720。获取单元710可以用于实现相应的获取功能,如获取至少一个第一模型。处理单元720可以用于实现相应的处理功能,如对至少一个第一模型进行处理,生成第一公共模型。
可选地,该装置700还包括发送单元730,发送单元730可以用于实现相应的通信功能。发送单元730还可以称为通信接口或通信单元。
可选地,该装置700还包括存储单元,该存储单元可以用于存储指令和/或数据,处理单元720可以读取存储单元中的指令和/或数据,以使得装置实现前述各个方法实施例中处理设备(例如,第一处理节点)的动作。
在一种设计中,该装置700可以是前述实施例中的处理设备,也可以是处理设备的组成部件(如芯片)。作为示例,该装置700可实现对应于上文方法实施例中的第一处理节点执行的步骤或者流程,其中,获取单元710可用于执行上文方法实施例中第一处理节点的获取相关的操作;处理单元720可用于执行上文方法实施例中第一处理节点的处理相关的操作;发送单元730可用于执行上文方法实施例中第一处理节点的发送相关的操作。当该装置700为第一处理节点时,发送单元730可以是收发器,或,输入/输出接口。可选地,发送器可以为收发电路,可选地,输入/输出接口可以为输入/输出电路;处理单元720可以是至少一个处理器。当该装置700为第一处理节点中的芯片、芯片系统或电路时,发送单元730可以是该芯片、芯片系统或电路上的输入/输出接口、接口电路、输入/输出电路、管脚或相关电路等;处理单元720可以是至少一个处理器、处理电路或逻辑电路等;
一种可能的实现方式,获取单元710,用于获取至少一个第一模型;处理单元720,用于对至少一个第一模型进行处理,生成第一公共模型;处理单元720,还用于确定第二处理节点,第二处理节点为下一轮模型处理的处理节点,第一公共模型在下一轮模型处理之前被第二处理节点获得。
可选地,该装置700和第二处理节点为不同的处理节点,该装置700还包括发送单元730,发送单元730,用于向第二处理节点发送第一公共模型。可选地,获取单元710和发送单元730为同一单元,或获取单元710包括发送单元730。
可选地,该装置700和第二处理节点为相同的处理节点。
可选地,处理单元720,还用于基于第一公共模型的指示确定第二处理节点。
可选地,获取单元710,还用于接收来自至少一个参与节点的第一模型。
可选地,该装置700还包括发送单元730,发送单元730,用于向至少一个参与节点发送指示信息,指示信息用于指示至少一个参与节点向该装置700发送至少一个参与节点的第一模型。可选地,获取单元710和发送单元730为同一单元,或获取单元710包括发送单元730。
可选地,获取单元710,还用于生成该装置700的第一模型。可选地,获取单元710和处理单元720为同一单元,或获取单元710包括处理单元720。
可选地,处理单元720,还用于对至少一个第一模型进行聚合处理,生成第一公共模型。
可选地,处理单元720,还用于对至少一个第一模型的参数进行处理,生成第一公共模型。
可选地,处理单元720,还用于对至少一个第一模型的参数进行平均处理,生成第一 公共模型,其中,第一公共模型的参数的取值为至少一个第一模型的参数的平均值。
可选地,至少一个第一模型具有相同的网络结构。
可选地,处理单元720,还用于对至少一个第一模型进行蒸馏处理,该蒸馏处理使得至少一个第一模型具有相同的网络结构。
可选地,处理单元720,还用于对至少一个第一模型进行拼接,生成第一公共模型。
可选地,至少一个第一模型包括第二公共模型,第二公共模型为上一轮模型处理得到的公共模型。
可选地,获取单元710,还用于接收来自第三处理节点的第二公共模型,第三处理节点为上一轮模型处理的处理节点。
可选地,第二处理节点是根据以下一项或多项信息确定的:网络的拓扑结构、第二处理节点的数据质量、第二处理节点的计算能力。
可选地,获取单元710包括发送单元730和/或处理单元720;或者获取单元710与发送单元730或处理单元720为同一单元;或者获取单元710与发送单元730或处理单元720集成在同一单元中。可选地,处理单元720可以是处理器、处理电路或逻辑电路等,发送单元730可以是发射器、发射电路、收发器、收发电路、输入/输出接口或电路等。
应理解,各单元执行上述相应步骤的具体过程在上述各方法实施例中已经详细说明,为了简洁,在此不再赘述。
还应理解,这里的装置700以功能单元的形式体现。这里的术语“单元”可以指应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。在一个可选例子中,本领域技术人员可以理解,装置700可以具体为上述实施例中的第一处理节点,可以用于执行上述各方法实施例中与第一处理节点对应的各个流程和/或步骤,为避免重复,在此不再赘述。
上述各个方案的装置700具有实现上述方法中处理设备(如第一处理节点)所执行的相应步骤的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块;例如发送单元可以由发送机替代,其它单元,如处理单元等可以由处理器替代,分别执行各个方法实施例中的发送操作以及相关的处理操作。
此外,上述发送单元730还可以是收发电路,处理单元710可以是处理电路。
需要指出的是,图7中的装置可以是前述实施例中的设备,也可以是芯片或者芯片系统,例如:片上系统(system on chip,SoC)。其中,发送单元可以是输入/输出电路、通信接口;处理单元为该芯片上集成的处理器或者微处理器或者集成电路。在此不作限定。
如图8所示,本申请实施例提供另一种通信的装置800。该装置800包括处理器810,处理器810用于执行存储器820存储的计算机程序或指令,或读取存储器820存储的数据/信令,以执行上文各方法实施例中的方法。可选地,处理器810为一个或多个。
可选地,如图8所示,该装置800还包括存储器820,存储器820用于存储计算机程序或指令和/或数据。该存储器820可以与处理器810集成在一起,或者也可以分离设置。可选地,存储器820为一个或多个。
可选地,如图8所示,该装置800还包括收发器830,收发器830用于信号的接收和 /或发送。例如,处理器810用于控制收发器830进行信号的接收和/或发送。
作为一种方案,该装置800用于实现上文各个方法实施例中由处理设备(如第一处理节点)执行的操作。
例如,处理器810用于执行存储器820存储的计算机程序或指令,以实现上文各个方法实施例中处理设备(如第一处理节点)的相关操作。
应理解,本申请实施例中提及的处理器可以是中央处理单元(central processingunit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中提及的存储器可以是易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM)。例如,RAM可以用作外部高速缓存。作为示例而非限定,RAM包括如下多种形式:静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)可以集成在处理器中。
还需要说明的是,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本申请实施例还提供一种计算机可读存储介质,其上存储有用于实现上述各方法实施例中由设备执行的方法的计算机指令。
例如,该计算机程序被计算机执行时,使得该计算机可以实现上述方法各实施例中由处理设备(如第一处理节点)执行的方法。
本申请实施例还提供一种计算机程序产品,包含指令,该指令被计算机执行时以实现上述各方法实施例中由处理设备(如第一处理节点)执行的方法。
上述提供的任一种装置中相关内容的解释及有益效果均可参考上文提供的对应的方法实施例,此处不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。此外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通 信连接,可以是电性,机械或其它的形式。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。例如,所述计算机可以是个人计算机,服务器,或者网络设备等。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD)等。例如,前述的可用介质包括但不限于:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (36)

  1. 一种模型训练的方法,其特征在于,包括:
    第一处理节点获取至少一个第一模型;
    所述第一处理节点对所述至少一个第一模型进行处理,生成第一公共模型;
    所述第一处理节点确定第二处理节点,所述第二处理节点为下一轮模型处理的处理节点,所述第一公共模型在下一轮模型处理之前被所述第二处理节点获得。
  2. 根据权利要求1所述的方法,其特征在于,所述第一处理节点和所述第二处理节点为不同的处理节点,所述方法还包括:
    所述第一处理节点向所述第二处理节点发送所述第一公共模型。
  3. 根据权利要求1所述的方法,其特征在于,
    所述第一处理节点和所述第二处理节点为相同的处理节点。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一处理节点确定所述第二处理节点,包括:
    所述第一处理节点基于所述第一公共模型的指示确定所述第二处理节点。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述第一处理节点获取所述至少一个第一模型,包括:
    所述第一处理节点接收来自至少一个参与节点的第一模型。
  6. 根据权利要求5所述的方法,其特征在于,所述第一处理节点接收来自所述至少一个参与节点的第一模型之前,所述方法还包括:
    所述第一处理节点向所述至少一个参与节点发送指示信息,所述指示信息用于指示所述至少一个参与节点向所述第一处理节点发送所述至少一个参与节点的第一模型。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述第一处理节点获取所述至少一个第一模型,包括:
    所述第一处理节点生成所述第一处理节点的第一模型。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述第一处理节点对所述至少一个第一模型进行处理,生成所述第一公共模型,包括:
    所述第一处理节点对所述至少一个第一模型进行聚合处理,生成所述第一公共模型。
  9. 根据权利要求8所述的方法,其特征在于,所述第一处理节点对所述至少一个第一模型进行聚合处理,生成所述第一公共模型,包括:
    所述第一处理节点对所述至少一个第一模型的参数进行处理,生成所述第一公共模型。
  10. 根据权利要求9所述的方法,其特征在于,所述第一处理节点对所述至少一个第一模型的参数进行处理,生成所述第一公共模型,包括:
    所述第一处理节点对所述至少一个第一模型的参数进行平均处理,生成所述第一公共模型,其中,所述第一公共模型的参数的取值为所述至少一个第一模型的参数的平均值。
  11. 根据权利要求9或10所述的方法,其特征在于,
    所述至少一个第一模型具有相同的网络结构。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    所述第一处理节点对所述至少一个第一模型进行蒸馏处理,所述蒸馏处理使得所述至少一个第一模型具有相同的网络结构。
  13. 根据权利要求8至12中任一项所述的方法,其特征在于,所述第一处理节点对所述至少一个第一模型进行聚合处理,生成所述第一公共模型,包括:
    所述第一处理节点对所述至少一个第一模型进行拼接,生成所述第一公共模型。
  14. 根据权利要求1至13中任一项所述的方法,其特征在于,
    所述至少一个第一模型包括第二公共模型,所述第二公共模型为上一轮模型处理得到的公共模型。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    所述第一处理节点接收来自第三处理节点的所述第二公共模型,所述第三处理节点为上一轮模型处理的处理节点。
  16. 根据权利要求1至15中任一项所述的方法,其特征在于,所述第二处理节点是根据以下一项或多项信息确定的:
    网络的拓扑结构、所述第二处理节点的数据质量、所述第二处理节点的计算能力。
  17. 一种模型训练的装置,其特征在于,包括:获取单元和处理单元,
    所述获取单元,用于获取至少一个第一模型;
    所述处理单元,用于对所述至少一个第一模型进行处理,生成第一公共模型;
    所述处理单元,还用于确定第二处理节点,所述第二处理节点为下一轮模型处理的处理节点,所述第一公共模型在下一轮模型处理之前被所述第二处理节点获得。
  18. 根据权利要求17所述的装置,其特征在于,所述装置和所述第二处理节点为不同的处理节点,所述装置还包括发送单元,
    所述发送单元,用于向所述第二处理节点发送所述第一公共模型。
  19. 根据权利要求17所述的装置,其特征在于,
    所述装置和所述第二处理节点为相同的处理节点。
  20. 根据权利要求17至19中任一项所述的装置,其特征在于,
    所述处理单元,还用于基于所述第一公共模型的指示确定所述第二处理节点。
  21. 根据权利要求17至20中任一项所述的装置,其特征在于,
    所述获取单元,还用于接收来自至少一个参与节点的第一模型。
  22. 根据权利要求21所述的装置,其特征在于,所述装置还包括发送单元,
    所述发送单元,用于向所述至少一个参与节点发送指示信息,所述指示信息用于指示所述至少一个参与节点向所述装置发送所述至少一个参与节点的第一模型。
  23. 根据权利要求17至22中任一项所述的装置,其特征在于,
    所述获取单元,还用于生成所述装置的第一模型。
  24. 根据权利要求17至23中任一项所述的装置,其特征在于,
    所述处理单元,还用于对所述至少一个第一模型进行聚合处理,生成所述第一公共模型。
  25. 根据权利要求24所述的装置,其特征在于,
    所述处理单元,还用于对所述至少一个第一模型的参数进行处理,生成所述第一公共模型。
  26. 根据权利要求25所述的装置,其特征在于,
    所述处理单元,还用于对所述至少一个第一模型的参数进行平均处理,生成所述第一公共模型,其中,所述第一公共模型的参数的取值为所述至少一个第一模型的参数的平均值。
  27. 根据权利要求25或26所述的装置,其特征在于,
    所述至少一个第一模型具有相同的网络结构。
  28. 根据权利要求27所述的装置,其特征在于,
    所述处理单元,还用于对所述至少一个第一模型进行蒸馏处理,所述蒸馏处理使得所述至少一个第一模型具有相同的网络结构。
  29. 根据权利要求24至28中任一项所述的装置,其特征在于,
    所述处理单元,还用于对所述至少一个第一模型进行拼接,生成所述第一公共模型。
  30. 根据权利要求17至29中任一项所述的装置,其特征在于,
    所述至少一个第一模型包括第二公共模型,所述第二公共模型为上一轮模型处理得到的公共模型。
  31. 根据权利要求30所述的装置,其特征在于,
    所述获取单元,还用于接收来自第三处理节点的所述第二公共模型,所述第三处理节点为上一轮模型处理的处理节点。
  32. 根据权利要求17至31中任一项所述的装置,其特征在于,所述第二处理节点是根据以下一项或多项信息确定的:
    网络的拓扑结构、所述第二处理节点的数据质量、所述第二处理节点的计算能力。
  33. 一种通信装置,其特征在于,包括处理器和存储器,所述存储器用于存储计算机程序或指令,所述处理器用于执行所述存储器中的所述计算机程序或指令,使得权利要求1至16中任一项所述的方法被执行。
  34. 一种芯片,其特征在于,包括逻辑电路和通信接口,所述通信接口用于接收待处理的数据和/或信息,并将所述待处理的数据和/或信息传输至所述逻辑电路,所述逻辑电路用于执行如权利要求1至16中任一项所述的方法。
  35. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如权利要求1至16中任意一项所述的方法。
  36. 一种计算机程序产品,其特征在于,所述计算机程序产品包括用于执行如权利要求1至16中任一项所述的方法的指令。
PCT/CN2023/089751 2022-05-27 2023-04-21 一种模型训练的方法和装置 WO2023226650A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210586086.5A CN117196071A (zh) 2022-05-27 2022-05-27 一种模型训练的方法和装置
CN202210586086.5 2022-05-27

Publications (1)

Publication Number Publication Date
WO2023226650A1 true WO2023226650A1 (zh) 2023-11-30

Family

ID=88918413

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089751 WO2023226650A1 (zh) 2022-05-27 2023-04-21 一种模型训练的方法和装置

Country Status (2)

Country Link
CN (1) CN117196071A (zh)
WO (1) WO2023226650A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612153A (zh) * 2019-02-22 2020-09-01 华为技术有限公司 训练模型的方法和装置
CN112329073A (zh) * 2021-01-05 2021-02-05 腾讯科技(深圳)有限公司 分布式数据处理方法、装置、计算机设备及存储介质
WO2022013879A1 (en) * 2020-07-17 2022-01-20 Telefonaktiebolaget Lm Ericsson (Publ) Federated learning using heterogeneous labels
CN114118447A (zh) * 2021-12-15 2022-03-01 湖南红普创新科技发展有限公司 新型联邦学习系统、方法、装置、计算机设备及存储介质
CN114202062A (zh) * 2021-12-13 2022-03-18 中国科学院计算机网络信息中心 一种网络模型训练方法、客户端及服务器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612153A (zh) * 2019-02-22 2020-09-01 华为技术有限公司 训练模型的方法和装置
WO2022013879A1 (en) * 2020-07-17 2022-01-20 Telefonaktiebolaget Lm Ericsson (Publ) Federated learning using heterogeneous labels
CN112329073A (zh) * 2021-01-05 2021-02-05 腾讯科技(深圳)有限公司 分布式数据处理方法、装置、计算机设备及存储介质
CN114202062A (zh) * 2021-12-13 2022-03-18 中国科学院计算机网络信息中心 一种网络模型训练方法、客户端及服务器
CN114118447A (zh) * 2021-12-15 2022-03-01 湖南红普创新科技发展有限公司 新型联邦学习系统、方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN117196071A (zh) 2023-12-08

Similar Documents

Publication Publication Date Title
US20230232213A1 (en) Information transmission methods and apparatuses, and communication devices and storage medium
WO2022121804A1 (zh) 半异步联邦学习的方法和通信装置
WO2023020502A1 (zh) 数据处理方法及装置
WO2023036268A1 (zh) 一种通信方法及装置
WO2023226650A1 (zh) 一种模型训练的方法和装置
WO2023088465A1 (zh) 一种模型训练方法及相关装置
EP4075918A1 (en) Communication method and device
WO2023040971A1 (zh) 一种人工智能模型下载方法、装置及系统
WO2022206513A1 (zh) 模型处理的方法、通信装置和系统
CN104244358A (zh) 一种基于dcs的无线传感网节能路由策略
WO2023236986A1 (zh) 一种通信方法及通信装置
WO2023207860A1 (zh) 通信方法和通信装置
WO2022237865A1 (zh) 一种数据处理方法及装置
WO2024026846A1 (zh) 一种人工智能模型处理方法及相关设备
WO2023103959A1 (zh) 一种无线通信的方法和装置
WO2024044881A1 (zh) 一种数据处理方法、训练方法及相关装置
WO2022262687A1 (zh) 一种数据处理方法及装置
WO2023185890A1 (zh) 一种数据处理方法及相关装置
WO2023125598A1 (zh) 一种通信方法及通信装置
WO2023197950A1 (zh) 一种通信方法及相关装置
WO2024036453A1 (zh) 一种联邦学习方法及相关装置
US20230259742A1 (en) Communication method, apparatus, and system
WO2023115254A1 (zh) 处理数据的方法及装置
WO2023125599A1 (zh) 一种通信方法及通信装置
WO2023179458A1 (zh) 通信方法和通信装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23810724

Country of ref document: EP

Kind code of ref document: A1