WO2020168761A1 - Model training method and apparatus - Google Patents

Model training method and apparatus Download PDF

Info

Publication number
WO2020168761A1
WO2020168761A1 PCT/CN2019/118762 CN2019118762W WO2020168761A1 WO 2020168761 A1 WO2020168761 A1 WO 2020168761A1 CN 2019118762 W CN2019118762 W CN 2019118762W WO 2020168761 A1 WO2020168761 A1 WO 2020168761A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
node
local
request
shared
Prior art date
Application number
PCT/CN2019/118762
Other languages
French (fr)
Chinese (zh)
Inventor
王园园
池清华
徐以旭
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020168761A1 publication Critical patent/WO2020168761A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of artificial intelligence (AI), and more specifically, to methods and devices for training models in the AI field.
  • AI artificial intelligence
  • AI can be applied to wireless networks.
  • wireless networks have more spectrums, more types of services, and the number of access terminals, the network system will become more complex, which will also make the architecture of the wireless network and access network equipment more intelligent and automated.
  • a wireless intelligent network architecture has been defined.
  • machine learning can be used to train the model.
  • the machine learning training model mainly adopts two forms: centralized training and local training.
  • centralized training requires aggregating training data to a central node, which will cause high communication overhead, and there are also problems such as extended modeling time, data privacy issues in user data upload, and high pressure on central node storage and calculation.
  • Local training does not need to report the model, and each local node only uses local data to model separately.
  • Local training has the problem of insufficient data, which will cause the model to be inaccurate, and the model also has the problem of weak local generalization ability.
  • the present application provides a method and device for training a model in a wireless network, which can implement joint learning in a wireless network, and helps to obtain a training model with high accuracy and generalization ability.
  • a method for training a model applied to a wireless network is provided, the method is executed by a first node in the wireless network, and the method includes:
  • the first node sends a first request to at least one second node in the wireless network, and the first request is used to request the at least one second node to retrain the first shared model locally based on the local data of the second node.
  • the first node obtains the model report message from the at least one second node.
  • the model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model.
  • the first local model is obtained after each second node performs local model retraining on the first shared model based on the first request and local data;
  • the first node determines the second sharing model according to the model report message of the at least one second node and the first sharing model.
  • a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data.
  • the shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model.
  • the embodiment of the present application can implement joint learning in a wireless network.
  • joint learning refers to that the first node is used as a centralized node, and the second node is used as a local node.
  • the first node and The second node learns together to train the model.
  • Joint learning can overcome some or all of the shortcomings of the centralized training model, as well as some or all of the shortcomings of the local training model.
  • the embodiment of the present application does not require the local node to report training data to the centralized node, thereby greatly reducing the communication overhead caused by training data reporting, and can Reduce the pressure of the storage of ultra-large-scale data of centralized nodes and model training.
  • local nodes in the embodiment of the present application perform distributed model training, which can shorten the time of model training and protect data privacy.
  • the local node sends the local model after the local retraining to the centralized node, so that the centralized node can be based on the local model of at least one local node. Updating the shared model can help overcome the problem of insufficient data in local training, thereby improving the accuracy of the training model and the generalization ability of the model.
  • the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;
  • the first node determining the second sharing model according to the model report message of the at least one second node and the first sharing model includes:
  • the first node determines the second local model in at least one first local model corresponding to the at least one second node, wherein the size of the training data set corresponding to the second local model is greater than or equal to the first threshold, and/or, the second The prediction error of the local model is less than or equal to the second threshold;
  • the first node determines the second sharing model according to the second local model and the first sharing model.
  • the size of the training data set is selected from at least one first local model to be greater than or equal to a certain threshold, and/or the prediction error
  • the second local model is less than or equal to a certain threshold, and then the second local model is used to determine the second shared model. Since the accuracy of the second local model in the embodiment of this application is higher than that of the first local model, this application is implemented Examples can help improve the accuracy of the second shared model.
  • the first node determining the second sharing model according to the second local model and the first sharing model includes:
  • the first node performs weighted average aggregation on the parameters of the second local model, or the increment between the parameters of the second local model and the parameters of the first shared model, where the weight parameter used in the weighted average aggregation is based on the second
  • the size of the training data set corresponding to the local model, and/or the prediction error of the second local model is determined;
  • the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.
  • the weight parameter may be the reciprocal of the total number of second local models, and at this time, the weight parameters of each second local model are the same.
  • the weight parameter of the second local model may be the ratio of the size of the training data set corresponding to the second local model to the size of all training data sets, where the size of all training data sets is corresponding to each second local model The sum of the size of the training data set.
  • the weight parameter of each second local model may be the inverse of the corresponding MAE.
  • the first node may not filter the at least one second local model from the at least one first local model, but directly determine the second shared model based on the at least one first local model.
  • the parameters of the first local model or the increments between the parameters of the first local model and the parameters of the shared model may be aggregated by weighted average.
  • the weight parameter of each first local model may be the reciprocal of the total number of first local models. Then, the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.
  • the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function, and the method further includes:
  • the centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, where the joint learning strategy includes at least one of the following information:
  • Joint learning start conditions information of the first shared model, joint learning group member identification, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model, processing strategy of the first local model Or share the model update strategy.
  • the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through a joint learning strategy, including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc. Based on this, the embodiment of the present application can realize that under the condition that the local data of the second node does not need to be uploaded to the first node, the first node and the second node can learn together to obtain a training model with high accuracy and generalization ability.
  • a joint learning strategy including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc.
  • the first node sends a first request for joint learning training to the second node when the joint learning start condition is satisfied.
  • the foregoing second node includes local analysis and modeling functions
  • sending the first request by the first node to at least one second node in the radio access network includes:
  • the centralized analysis and modeling function in the first node sends a first request to the local analysis and modeling function of each second node in the at least one second node.
  • the first The request includes the information of the first shared model.
  • the information of the first sharing model includes at least one of the following:
  • the first request further includes the upload strategy of the first local model.
  • the first node sends the upload strategy of the first local model to the second node, so that the first node can instruct the second node how to upload the first local model.
  • the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.
  • the upload strategy of the first local model includes at least one of the following:
  • some implementations of the first aspect further include:
  • the first node determines that the prediction error of the second sharing model is less than or equal to the third threshold
  • the first node respectively sends a model update notification message to at least one second node, where the model update notification message is used to request each second node to install the second shared model.
  • the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
  • the prediction error of the first sharing model may be set as the third threshold, which is not limited in the embodiment of the present application.
  • a method for training a model applied to a radio access network characterized in that the method is executed by a second node in the radio access network, and the method includes:
  • the second node receives the first request from the first node in the radio access network
  • the second node performs local model retraining on the first shared model based on the local data of the second node to obtain the first local model
  • the second node sends a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the increment between the parameters of the first local model and the parameters of the first shared model, and the model The report message is used to update the first sharing model.
  • a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data.
  • the shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model.
  • the embodiment of the present application can implement joint learning in a wireless network.
  • joint learning refers to that the first node is used as a centralized node, and the second node is used as a local node.
  • the first node and The second node learns together to train the model.
  • Joint learning can overcome some or all of the shortcomings of the centralized training model, as well as some or all of the shortcomings of the local training model.
  • the embodiment of the present application does not require the local node to report training data to the centralized node, thereby greatly reducing the communication overhead caused by training data reporting, and can Reduce the pressure of the storage of ultra-large-scale data of centralized nodes and model training.
  • local nodes in the embodiment of the present application perform distributed model training, which can shorten the time of model training and protect data privacy.
  • the local node sends the local model after the local retraining to the centralized node, so that the centralized node can be based on the local model of at least one local node. Updating the shared model can help overcome the problem of insufficient data in local training, thereby improving the accuracy of the training model and the generalization ability of the model.
  • the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.
  • the first node can filter the size of the training data set corresponding to the first local model in the model report message and/or the prediction error to filter out that the size of the training data set is greater than or equal to a certain threshold, and/or the prediction error is less than Or a second local model equal to a certain threshold, and then determine the second shared model through the second local model. Since the accuracy of the second local model in the embodiment of the present application is higher than that of the first local model, the embodiment of the present application can help improve the accuracy of the second shared model.
  • the second node includes local analysis and modeling functions, and the first node includes centralized analysis and modeling functions;
  • the second node receiving the first request from the first node in the radio access network includes:
  • the local analysis and modeling function in the second node receives a first request from the local analysis and modeling function in the first node, where the first request includes the information of the first shared model.
  • the first request further includes the upload strategy of the first local model.
  • the first node sends the upload strategy of the first local model to the second node, so that the first node can instruct the second node how to upload the first local model.
  • the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.
  • the upload strategy of the first local model includes at least one of the following:
  • the second node further includes a local adaptive strategy function
  • the method further includes:
  • the local analysis and modeling function sends a third request to the local adaptive strategy.
  • the third request is used to request the local joint learning strategy corresponding to the first shared model.
  • the local joint learning strategy is used to indicate whether the second node is The shared model performs local model retraining, and the third request includes the information of the first shared model;
  • the local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy
  • the local analysis and modeling function performs local model retraining on the first shared model based on the local data.
  • the embodiment of the present application sends the local joint learning strategy to the local analysis and modeling function through the local adaptive strategy function, so that the second node can determine whether to participate in the joint learning according to its own computing ability, so as to avoid the second node from calculating by itself. Insufficient ability leads to a longer iteration time of joint learning and improves the efficiency of joint learning.
  • the local adaptive strategy function may not send the local joint learning strategy to the local analysis and modeling function. Instead, when the second node receives a request for joint learning training, it will always Participate in joint learning.
  • the information of the first sharing model includes at least one of the following:
  • some implementations of the second aspect further include:
  • the second node receives the model update notification message sent by the first node, where the model update notification message is used to notify the second sharing model, where the second sharing model is determined by the first node according to the model report message and the first sharing model of;
  • the second node installs the second sharing model when it determines that the prediction error of the second sharing model is less than the fourth threshold.
  • the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
  • the prediction error of the first sharing model may be set as the fourth threshold, which is not limited in the embodiment of the present application.
  • the interactive information during joint learning between the first node and the second node may directly pass through the communication between the first node and the second node. Interface transmission.
  • a device for training a model may be a first node in a wireless network or a chip in the first node.
  • the first node may be a centralized node, or a central node.
  • the device has the function of realizing the above-mentioned first aspect and various possible implementation modes. This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the device includes a transceiver module.
  • the device further includes a processing module.
  • the transceiver module may be, for example, at least one of a transceiver, a receiver, and a transmitter.
  • the transceiver module Can include radio frequency circuits or antennas.
  • the processing module may be a processor.
  • the device further includes a storage module, and the storage module may be a memory, for example. When a storage module is included, the storage module is used to store instructions.
  • the processing module is connected to the storage module, and the processing module can execute the instructions stored in the storage module or from other instructions, so that the device executes the above-mentioned first aspect and various possible implementation methods.
  • the chip when the device is a chip, the chip includes a transceiver module.
  • the chip also includes a processing module.
  • the transceiver module may be, for example, an input/output interface or pin on the chip. Or circuits, etc.
  • the processing module may be a processor, for example.
  • the processing module can execute instructions so that the chip in the terminal executes the first aspect and any possible implementation communication methods.
  • the processing module may execute instructions in the storage module, and the storage module may be a storage module in the chip, such as a register, a cache, and the like.
  • the storage module may also be located in the communication device but outside the chip, such as read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory) memory, RAM) etc.
  • ROM read-only memory
  • RAM random access memory
  • the processor mentioned in any of the above can be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the above
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • a device for training a model may be a second node in a wireless network or a chip in the second node.
  • the second node may be a local node or a distributed edge node.
  • the device has the function of realizing the above-mentioned second aspect and various possible implementation manners. This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the device includes a transceiver module.
  • the device further includes a processing module.
  • the transceiver module may be, for example, at least one of a transceiver, a receiver, and a transmitter.
  • the transceiver module Can include radio frequency circuits or antennas.
  • the processing module may be a processor.
  • the device further includes a storage module, and the storage module may be a memory, for example. When a storage module is included, the storage module is used to store instructions.
  • the processing module is connected to the storage module, and the processing module can execute instructions stored in the storage module or from other instructions, so that the device executes the communication methods of the second aspect and various possible implementation manners.
  • the chip when the device is a chip, the chip includes a transceiver module.
  • the device also includes a processing module.
  • the transceiver module may be, for example, an input/output interface or pin on the chip. Or circuits, etc.
  • the processing module may be a processor, for example.
  • the processing module can execute instructions so that the chip in the terminal executes the first aspect and any possible implementation methods.
  • the processing module may execute instructions in the storage module, and the storage module may be a storage module in the chip, such as a register, a cache, and the like.
  • the storage module may also be located in the communication device but outside the chip, such as read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory) memory, RAM) etc.
  • ROM read-only memory
  • RAM random access memory
  • the processor mentioned in any of the above can be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the above
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the second aspect and various possible implementations are integrated circuits for program execution.
  • a computer storage medium stores program code, and the program code is used to instruct instructions to execute the method in the first aspect or the second aspect or any possible implementation manners thereof.
  • a computer program product containing instructions which when running on a computer, causes the computer to execute the method in the first aspect or the second aspect or any possible implementation manner thereof.
  • a communication system in a seventh aspect, includes a device capable of implementing the methods and various possible designs of the foregoing first aspect, and the foregoing device capable of implementing the various methods and various possible designs of the foregoing second aspect. Functional device.
  • a processor configured to be coupled with a memory, and configured to execute the method in the first aspect or the second aspect or any possible implementation manners thereof.
  • a chip in a ninth aspect, includes a processor and a communication interface.
  • the communication interface is used to communicate with an external device or an internal device.
  • the processor is used to implement the first aspect or the second aspect or any possible The method in the implementation mode.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored in the memory or instructions derived from other sources.
  • the processor is used to implement the method in the foregoing first aspect or second aspect or any possible implementation manner thereof.
  • the chip can be integrated on the first node or the second node.
  • Fig. 1 shows a schematic diagram of a system architecture applying an embodiment of the present application.
  • Fig. 2 shows a schematic diagram of an intelligent network architecture to which an embodiment of the present application is applied.
  • Figure 3 shows a schematic flowchart of a method for training a model provided by an embodiment of the present application
  • Fig. 4 shows a schematic flowchart of a method for training a model provided by an embodiment of the present application.
  • Fig. 5 shows a schematic flowchart of a method for training a model provided by an embodiment of the present application.
  • Fig. 6 is a schematic block diagram of an apparatus for training a model provided by an embodiment of the present application.
  • Fig. 7 is a schematic block diagram of another device for training a model provided by an embodiment of the present application.
  • Fig. 8 is a schematic block diagram of another device for training a model provided by an embodiment of the present application.
  • Fig. 9 is a schematic block diagram of another device for training a model provided by an embodiment of the present application.
  • Fig. 1 shows a schematic diagram of a system architecture 100 to which an embodiment of the present application is applied.
  • the system architecture 100 includes a first node 110 and at least one second node 120.
  • the system 100 is, for example, a wireless network
  • the first node 110 may be a centralized node or a central node
  • the second node may be a local node or a distributed edge node, which is not limited in the embodiment of the present application.
  • the first node 110 or the second node 120 may be respectively deployed in a radio access network (RAN), or may be deployed in a core network, or an operation support system (OSS), Alternatively, the second node 120 may be a terminal device in a wireless network, which is not specifically limited in the embodiment of the present application.
  • the first node 110 and the second node 120 may both be deployed in the RAN.
  • the first node 110 may be deployed in the OSS or the core network
  • the second node 120 may be deployed in the RAN.
  • the first node 110 may be deployed in the RAN, and the second node 120 is a terminal device.
  • the first node 110 or the second node 120 in the aforementioned system architecture 100 may be implemented by one device, or may be implemented by multiple devices, or may be a functional module in one device.
  • a platform for example, a cloud platform
  • Fig. 2 shows a schematic diagram of an intelligent network architecture 200 to which an embodiment of the present application is applied.
  • the intelligent network architecture 200 is a layered architecture that can meet the differentiated requirements of different scenarios for computing resources and execution cycles as needed.
  • the intelligent network architecture 200 may specifically be an intelligent wireless network architecture.
  • the intelligent network architecture 200 includes an operation-support system (OSS), at least one cloud radio access network (C-RAN), and an evolved Node B (evolved Node B). B, eNB) or next generation node B (next genetation NodeB, gNB).
  • each C-RAN may include a separate centralized unit (centralized unit, CU) and at least one distributed unit (distributed unit, DU).
  • OSS is a more centralized node than C-RAN, eNB or gNB; CU in C-RAN is more centralized than DU Node: Compared with eNB or gNB, C-RAN is a more concentrated node.
  • OSS may be referred to as a centralized node
  • C-RAN, eNB, or gNB may be referred to as a local node
  • the CU in C-RAN may be referred to as a centralized node
  • the DU in the C-RAN is called a local node
  • the C-RAN is called a centralized node
  • the eNB or gNB is called a local node.
  • the centralized node may correspond to the first node 110 in FIG. 1, and the local node may correspond to the second node 120 in FIG.
  • At least one of OSS, CU, DU, eNB, and gNB may include a data analysis (DA) function (or network element).
  • DA data analysis
  • the DA function can be deployed in a higher location, such as in the OSS, at this time, it can be referred to as operation-support system data analysis (OSSDA).
  • OSSDA operation-support system data analysis
  • the DA function (or network element) can also be deployed in 5G CU, 5G DU, 5G all-in-one gNB or eNB. In this case, it can be referred to as radio access network data analysis (RANDA).
  • RANDA radio access network data analysis
  • the DA function can also be deployed independently, which is not limited in the embodiment of the present application.
  • OSSDA or RANDA can provide data integration and programmable feature engineering, algorithm framework integration with rich machine learning algorithm libraries, and a general architecture that supports separation of training and execution.
  • AI-based wireless intelligent services mainly include a closed loop consisting of data collection, feature engineering, algorithm design and training modeling, model evaluation, and prediction execution.
  • the DA function or network element
  • DSF data service function
  • analysis&modeling analysis&modeling
  • A&MF analysis and modeling function
  • model execution function model execution function
  • APF adaptive policy function
  • DSF mainly completes data collection, data preprocessing, feature engineering and other steps, and provides training data and feature vector subscription services to A&MF and MEF.
  • DSF has the programmability of customized feature engineering of data, and the ability to perform data collection, preprocessing and feature engineering according to the requirements of A&MF training algorithm or MEF prediction model.
  • A&MF The role of A&MF is to execute machine learning training algorithms and generate machine learning models.
  • A&MF includes a library of commonly used machine learning algorithms, which sends the machine learning model generated by training to the MEF.
  • MEF receives and installs the model issued by A&MF, subscribes the feature vector to DSF according to A&MF's instructions, completes the prediction, and sends the prediction result and the operation instructions corresponding to the result to APF.
  • APF is the last execution effective link of process operation.
  • the strategy set is stored in the APF, which completes the conversion from the result of model prediction to the execution strategy.
  • the strategy set includes the prediction result, the operation instruction corresponding to the prediction result, and the corresponding relationship of the execution strategy.
  • the logical function in the centralized node can be called a central logical function
  • the logical function in the local node is a local logical function.
  • DSF, A&MF, MEF, and APF deployed at centralized nodes can be called centralized DSF (central DSF, C-DSF), centralized A&MF (central A&MF, C-A&MF), and centralized MEF (central MEF, C-MEF).
  • centralized APF central APF, C-APF
  • DSF local DSF
  • A&MF local A&MF
  • MEF local MEF
  • L-APF local APF
  • network elements may be deployed according to the characteristics of services and computing resources.
  • the functions deployed by different network elements may be different.
  • the above four functions can be deployed on the local node side: DSF, A&MF, MEF, and APF; on the centralized node side, only DSF, APF, and A&MF can be deployed.
  • different functions exist in network element or cross-network element coordination.
  • the names of the above-mentioned functions in the embodiments of the present application are only taken as an example.
  • the names of the functions in the wireless network 200 may also be other names, which are not specifically limited in the embodiments of the present application.
  • FIG. 3 shows a schematic flowchart of a method 300 for training a model provided by an embodiment of the present application.
  • the method 300 may be applied to the system architecture 100 shown in FIG. 1 and may also be applied to the intelligent system architecture 200 shown in FIG. 2, but the embodiment of the present application is not limited thereto.
  • this application uses the first node and the second node as examples to describe the method 300 for training a model.
  • the chip in the first node and the implementation method of the chip in the second node please refer to the specific description of the first node and the second node, and the description will not be repeated.
  • the first node sends a first request to at least one second node, where the first request is used to request the at least one second node to perform local model reconfiguration on the first shared model based on the local data of the second node. training.
  • the first shared model may be obtained by the first node performing parameter training on the initial model according to the training data.
  • Each of the at least one second node performs local model retraining on the first shared model based on the local data of the second node according to the first request to obtain the first local model .
  • the local model retraining refers to that the second node retrains the parameters of the first shared model based on local data.
  • the at least one second node respectively sends a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model
  • the increment between the parameters of the first local model and the parameters of the first shared model is the amount of change of the parameters of the first local model relative to the parameters of the first shared model.
  • the first node determines a second sharing model according to the model report message of at least one second node and the first sharing model.
  • a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data.
  • the shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model.
  • the embodiment of the present application can implement joint learning in a wireless network.
  • the second shared model is a new shared model obtained through training in steps 310 to 340
  • the first shared model is an old shared model before the above-mentioned training.
  • joint learning refers to that the first node is used as a centralized node, and the second node is used as a local node.
  • the first node and The second node learns together to train the model.
  • Joint learning can overcome some or all of the shortcomings of the centralized training model, as well as some or all of the shortcomings of the local training model.
  • the embodiment of the present application does not require the local node to report training data to the centralized node, thereby greatly reducing the communication overhead caused by training data reporting, and can Reduce the pressure of the storage of ultra-large-scale data of centralized nodes and model training.
  • local nodes in the embodiment of the present application perform distributed model training, which can shorten the time of model training and protect data privacy.
  • the local node sends the local model after the local retraining to the centralized node, so that the centralized node can be based on the local model of at least one local node. Updating the shared model can help overcome the problem of insufficient data in local training, thereby improving the accuracy of the training model and the generalization ability of the model.
  • the scheme of joint learning to train a model in a wireless network can help to obtain a training model with high accuracy and generalization ability.
  • the "parameter of the model” included in the model report message is used to indicate the model that the model report message needs to report.
  • “parameters of the model” can be replaced with “models”, and the two have equivalent meanings.
  • the model report message may be described as including the first local model or the increment between the first local model and the first shared model.
  • the first node sends the first request for joint learning training to the second node when the joint learning start condition is satisfied.
  • the joint learning starting condition may be that the first node cannot obtain training data, or the calculation pressure of the first node exceeds a certain index.
  • the first request may include the information of the first sharing model, so that the second node determines the first sharing model according to the first request.
  • the information of the first shared model includes at least one of the following: model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
  • the first request may also include the upload strategy of the first local model.
  • the first node sends the upload strategy of the first local model to the second node, so that the first node can instruct the second node how to upload the first local model.
  • the first local model is the local model uploaded by the second node.
  • the first local model is a model obtained by performing local model retraining on the first shared model according to the local data of the at least one second node.
  • the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.
  • the upload strategy of the first local model includes at least one of the following: the identification of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the upload time of the first local model Carry information.
  • the carrying information includes the size and/or prediction error of the training data set of the first local model.
  • the processing algorithm before uploading the first local model includes, for example, performing an incremental operation algorithm between the model obtained by local retraining and the first shared model issued by the first node, or through parameter pruning, quantization, low-rank decomposition, and sparseness.
  • Algorithms such as optimization are used to perform model compression compression algorithms, or encryption algorithms used to perform model encryption through layer obfuscation or conversion of parameters into codes, which are not limited in the embodiments of the present application.
  • the training data set of the first local model that is, the training data set used when the second node retrains the first shared model locally based on the local data.
  • the size of the training data set is, for example, the amount of training data in the training data set, which is not limited in the embodiment of the present application.
  • the prediction data set can be used to predict the local model to obtain the prediction error of the local model.
  • the prediction error of the local model is, for example, the mean absolute error (MAE) of the local model, or the mean squared error (MSE), which is not limited in the embodiment of the present application.
  • the model report message may further include the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.
  • step 340 may specifically be:
  • the first node determines a second local model in at least one first local model corresponding to at least one second node, wherein the size of the training data set corresponding to the second local model is greater than or equal to the first threshold, and/or, The prediction error of the second local model is less than or equal to the second threshold. Then, the first node determines a second sharing model according to the second local model and the first sharing model.
  • the number of the second local model may be one or more, which is not limited in the embodiment of the present application.
  • the accuracy of the local model obtained by training is high and the generalization ability is strong.
  • the accuracy of the trained local model will be low, and the generalization ability will be weak.
  • the first node selects at least one second local model that satisfies the screening conditions from at least one first local model, and the accuracy or generalization ability of the at least one second local model is higher than that of the at least one first local model. model.
  • the first node may delete the local model whose size of the training data set in the first local model is less than the first threshold, or the prediction error is greater than the second threshold.
  • the size of the training data set is selected from at least one first local model to be greater than or equal to a certain threshold, and/or the prediction error
  • the second local model is less than or equal to a certain threshold, and then the second local model is used to determine the second shared model. Since the accuracy of the second local model in the embodiment of this application is higher than that of the first local model, this application is implemented Examples can help improve the accuracy of the second shared model.
  • the first node may calculate the parameter of the second local model, or the increment between the parameter of the second local model and the parameter of the first shared model, Perform weighted average aggregation, wherein the weight parameter used in the weighted average aggregation is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model. Then, the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.
  • the first node when the model report message includes the parameters of the first local model, the first node performs weighted average aggregation on the parameters of the second local model after determining the second local model.
  • the first node may determine the result obtained by the weighted average aggregation of the at least one second local model as the second shared model.
  • the first node compares the parameters of the second local model with The increments between the parameters of the shared model are aggregated on a weighted average.
  • the first node determines the result obtained by weighted average aggregation of at least one second local model as the increment of the first shared model, and then determines the sum of the increment of the first shared model and the first shared model It is the second sharing model.
  • the weight parameter may be the reciprocal of the total number of second local models, and at this time, the weight parameters of each second local model are the same.
  • the weight parameter of the second local model may be the ratio of the size of the training data set corresponding to the second local model to the size of all training data sets, where the size of all training data sets is corresponding to each second local model The sum of the size of the training data set.
  • the weight parameter of each second local model may be the inverse of the corresponding MAE. It should be understood that the foregoing examples of weight parameters are merely examples, and the embodiments of the present application are not limited thereto.
  • the first node may not filter at least one second local model from the at least one first local model, but directly determine the second shared model based on the at least one first local model.
  • the parameters of the first local model or the increments between the parameters of the first local model and the parameters of the shared model may be aggregated by weighted average.
  • the weight parameter of each first local model may be the reciprocal of the total number of first local models. Then, the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.
  • the first node or the second node may determine whether the prediction error of the second sharing model is less than a preset threshold.
  • the prediction error of the second sharing model is less than the preset threshold, it means that the accuracy of the second sharing model can meet the requirements.
  • the accuracy of the second sharing model can be determined based on the prediction data set.
  • the first node can determine whether the prediction error of the second sharing model is less than the third threshold.
  • the first node updates the first shared model to the second shared model, and the first node sends a model update notification message to the at least one second node respectively
  • the model update notification message is used to request each second node to install a second shared model.
  • the first node does not send the model update notification message to the second node.
  • the first node may delete the second shared model without updating the saved first shared model.
  • the second node may determine whether the prediction error of the second sharing model is less than the fourth threshold.
  • the first node may send a model update notification message to at least one second node, where the model update notification message is used to notify the second sharing model.
  • the second node can determine whether the prediction error of the second shared model indicated by the model update notification message is less than the fourth threshold. When the second node determines that the prediction error of the second sharing model is less than or equal to the fourth threshold, the second sharing model is installed. When the second node determines that the prediction error of the second sharing model is greater than the fourth threshold, the second sharing model is not installed.
  • the second node does not need to send its locally stored prediction data set to the first node, which can reduce communication signaling overhead between network elements.
  • the third threshold and the fourth threshold may be the same or different, which is not limited in the embodiment of the present application.
  • the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
  • the prediction error of the first sharing model may be set to the third threshold or the fourth threshold, which is not limited in the embodiment of the present application.
  • the prediction error of the second shared model may be compared with the prediction error of the first shared model. If the prediction error of the second shared model is smaller than the first shared model, the first node saves the second shared model, and the second node installs the second shared model. If the prediction error of the second sharing model is greater than or equal to the first sharing model, the first node does not update the first sharing model, and the second node does not install the second sharing model.
  • FIG. 4 shows a schematic flowchart of a method 400 for training a model provided by an embodiment of the present application. It should be understood that FIG. 4 shows the steps or operations of the method for training a model, but these steps or operations are only examples, and the embodiment of the present application may also perform other operations or variations of each operation in FIG. 4. In addition, the various steps in FIG. 4 may be performed in a different order from that presented in FIG. 4, and it is possible that not all operations in FIG. 4 are to be performed.
  • the first node in Figure 4 includes centralized APF (C-APF), centralized DSF (C-DSF), centralized A&MF (C-A&MF), and the second node includes local APF (L-APF), local MEF (L- MEF), local DSF (L-DSF), local A&MF (L-A&MF).
  • C-APF centralized APF
  • C-DSF centralized DSF
  • C-A&MF centralized A&MF
  • L-APF local MEF
  • L-DSF local DSF
  • L-A&MF local A&MF
  • the C-APF in the first node sends a joint learning strategy to C-A&MF.
  • a joint learning strategy can be stored in the C-APF, and the joint learning strategy is used to indicate how the first node and the second node perform joint learning.
  • the joint learning strategy may include at least one of the following: joint learning start conditions, shared model information, joint learning group member identification, local model upload strategy, local model screening strategy, local model aggregation strategy, Local model processing strategy or shared model update strategy.
  • the starting condition of the joint learning is, for example, that the C-DSF cannot obtain the subscription data, or the computing resource of the C-A&MF exceeds a certain threshold, which is not limited in the embodiment of the present application.
  • a certain threshold which is not limited in the embodiment of the present application.
  • the centralized training model can be used for model training when D-SDF can obtain subscription data, the model training calculation is small, or the computing resources of C-A&MF are sufficient, or the computing resources of L-A&MF are insufficient.
  • the identification of the members of the joint learning group may include the identification of at least one second node participating in the joint learning, and for example, may include the identification of L-A&MF in each second node of the at least one second node participating in the joint learning.
  • the application embodiment does not limit this.
  • the local model refers to the local model obtained by the second node after retraining the model based on local data, for example, including the first local model described in FIG. 3.
  • the screening strategy of the local model refers to a strategy for the first node (or C-A&MF in the first node) to screen out the second local model that meets the screening condition from at least one first local model. For example, it may include determining in the first local model that the size of the training data set is greater than or equal to the first threshold, and/or the local model whose prediction error is less than or equal to the second threshold.
  • the local model screening strategy may be a model screening rule identifier, which is not limited in the embodiment of the present application.
  • the aggregation strategy of the local model is used to indicate the aggregation algorithm used by the first node (or the C-A&MF in the first node) to aggregate the local model and the calculation method of the weight parameter.
  • the aggregation strategy of the local model may be a model aggregation algorithm identification.
  • the aggregation of local models can also be replaced with the integration of local models, and the two have the same meaning.
  • the processing strategy of the local model is used to instruct the first node (or C-A&MF in the first node) to process the acquired local model.
  • the processing algorithm includes, for example, an incremental operation algorithm that performs an incremental operation between a model obtained by local retraining and a shared model issued by the first node, or a compression algorithm that performs model compression through algorithms such as parameter pruning, quantization, low-rank decomposition, and sparse optimization. Or encryption algorithms that perform model encryption through layer obfuscation or conversion of parameters into codes, which are not limited in the embodiment of the present application.
  • the processing strategy of the local model may be the model processing algorithm identification.
  • the shared model update strategy is used to instruct the first node (or C-A&MF in the first node) to update the shared model. For example, when the prediction error of the new shared model is less than or equal to a certain threshold, the old shared model is updated to the new shared model. Or, in the case where the prediction error of the new shared model is less than or equal to the prediction error of the old shared model, the old shared model is updated to the new shared model.
  • the C-A&MF in the first node sends a joint learning strategy issuance response to the C-APF, which is used to indicate that the C-A&MF receives the aforementioned joint learning strategy.
  • the first node and the second node perform data collection, model training, and model application.
  • the L-DSF in the second node reports the collected data to the C-A&MF in the first node, and the C-A&MF performs model training to obtain a shared model. Then, C-A&MF delivers the shared model to L-MEF for model application.
  • C-A&MF in the first node can send a data subscription request to L-DSF in the second node.
  • L-DSF After receiving the data subscription request, L-DSF sends a data subscription response to C-A&MF, which carries the local data.
  • the local data may include a training data set or a prediction data set, which is not limited in the embodiment of the present application.
  • step of performing 403 is to train to obtain a shared model, and the shared model herein may also be referred to as an initial shared model.
  • Subsequent steps 404 to 422 are to update the initial sharing model generated in step 403 to compensate for the poor accuracy of the initial sharing model due to insufficient or incomplete data, so that the sharing model will change when the network status in the wireless network changes. , Still has high accuracy and generalization ability.
  • the C-A&MF in the first node sends a joint learning training request to the L-A&MF in the second node.
  • the C-A&MF in the first node determines whether the joint learning start condition is satisfied according to the joint learning start condition indicated in the joint learning strategy. For example, when C-A&MF determines that the subscription data cannot be obtained from the L-DSF of the second node, or the computing resource occupation of C-A&MF exceeds a preset threshold, it determines that the joint learning start condition is met. When the joint learning start conditions are met, C-A&MF sends a joint learning training request to L-A&MF. Otherwise, if the joint learning start condition is not satisfied, the model in the centralized training 403 is performed.
  • the joint learning training request may correspond to a specific example of the first request in FIG. 3.
  • the joint learning training request may refer to the description in the first request in FIG. 3, and to avoid repetition, the details are not repeated here.
  • the L-A&MF in the second node sends a local joint learning strategy request to the L-APF.
  • the local joint learning strategy is used to indicate whether the second node performs local model retraining on the shared model.
  • the local joint learning strategy request includes the information of the shared model.
  • the information of the shared model can be obtained from the joint learning training request in step 404.
  • the information of the shared model can be referred to the above description, for the sake of brevity, it will not be described here.
  • the L-APF in the second node sends a local joint learning strategy response to the L-A&MF.
  • L-APF determines whether to perform local model retraining on the shared model according to the utilization of local computing resources, that is, whether the second node participates in local joint learning.
  • the local joint learning strategy response may be an indicator of whether to participate in the local joint learning.
  • the local joint learning strategy may also include a model update strategy.
  • the model update strategy may indicate that when the prediction error of the new shared model is less than the prediction error of the old shared model or is less than or equal to a certain preset threshold, the old shared model is updated, otherwise the old shared model is continued to be used.
  • the old sharing model is, for example, the initial sharing model obtained in 404
  • the new sharing model is, for example, the sharing model obtained in step 413.
  • the local joint learning strategy is sent to L-A&MF through L-APF, so that the second node can determine whether to participate in joint learning according to its own computing ability, so as to avoid the second node's lack of its own computing ability leading to joint learning.
  • the learning iteration time is extended to improve the efficiency of joint learning.
  • steps 405 and 406 may not be performed. Instead, the second node always participates in the joint learning when receiving the joint learning training request, which is not limited in the embodiment of the present application.
  • the L-A&MF in the second node sends a joint learning training request response to the C-A&MF in the first node.
  • the L-A&MF in the second node sends a data subscription request to the L-DSF.
  • L-A&MF can send a data subscription request to L-DSF according to the model input and output in the joint learning training request and the model training data collection time.
  • the data subscription request can carry the data representation and the data collection time.
  • the L-DSF in the second node sends a data subscription response to L-A&MF.
  • L-DSF collects data according to the data subscription request in 408, and sends the collected data to L-A&MF.
  • the L-A&MF in the second node performs model retraining and model processing.
  • L-A&MF retrains the local shared model according to the information of the shared model issued in step 404 and the training data obtained in 409.
  • L-A&MF can also perform local model processing according to the identification of the processing algorithm issued in step 404 before uploading the local model, for example, the local model obtained during the retraining of the local model and the one issued in step 404
  • the shared model performs incremental operations (as an example, the local model parameters and the shared model parameters can be incrementally calculated).
  • the model can be compressed by algorithms such as parameter pruning, quantization, low-rank decomposition, and sparse optimization, and model encryption can be performed by methods such as layer confusion or conversion of parameters to code.
  • the L-A&MF in the second node sends a local model upload notification to the C-A&MF in the first node.
  • the local model upload notification includes the model identification of the shared model, the processed local model, and the size of the training data set corresponding to the local model, the prediction error of the local model, etc., which are not limited in the embodiment of the application .
  • the local model upload notification may correspond to an example of the model report message in FIG. 3.
  • the local model upload notification can refer to the description of the model report message in FIG. 3, and to avoid repetition, it will not be repeated here.
  • the C-A&MF in the first node sends a local model upload notification response to the L-A&MF in the second node.
  • the C-A&MF in the first node performs model screening, aggregation, and processing.
  • C-A&MF selects at least one local model that satisfies the condition from the received at least one local model (such as the at least one first local model described above) according to the local model screening strategy indicated in step 401 ( Such as the at least one second local model described above).
  • the method of screening by C-A&MF can be referred to the description above. For brevity, it will not be described in detail here.
  • C-A&MF can aggregate at least one local model selected according to the local model aggregation strategy, such as weighted average aggregation.
  • the manner in which C-A&MF performs aggregation can be referred to the above description. For brevity, it will not be described in detail here.
  • C-A&MF can process the aggregated model according to the local model processing strategy, such as compression or encryption.
  • the processing method of C-A&MF can be referred to the above description. For brevity, it will not be described in detail here.
  • each step shown in 4A in FIG. 4 is executed.
  • 4A includes 414 to 416.
  • the C-A&MF in the first node sends a model update request #1 to the L-MEF in the second node.
  • C-A&MF may test the shared model obtained in step 413 according to the test data in the test data set, and determine the prediction error of the shared model obtained in step 413.
  • the sharing model obtained in 413 may also be referred to as a new sharing model.
  • the new sharing model may correspond to an example of the second sharing model in FIG. 3.
  • C-A&MF determines that the prediction error of the new shared model is less than a certain threshold, it sends the above model update request #1, where the model update request #1 may include the model identification of the new shared model and the fused model Parameters.
  • the parameters of the model may include at least one of weight parameters, bias parameters, or activation function information of each layer.
  • model update request #1 may be an example of the model update notification message sent when the first node in FIG. 3 determines that the prediction error of the second shared model is less than the third threshold.
  • steps 414 to 416 may not be performed.
  • L-MEF performs model update installation. Specifically, L-MEF replaces the current parameters of the model with the parameters of the model issued in 414.
  • the L-MEF can replace the weight parameters, paranoid parameters, or activation functions of each layer of the neural network with the parameters of the model delivered in step 414.
  • model update response #1 may indicate that the model update has been completed.
  • the steps shown in 4A in FIG. 4 can be replaced with the steps shown in 4B in FIG. 4.
  • 4B includes 417 to 421.
  • the C-A&MF in the first node requests the L-A&MF mode model update request #2 in the second node.
  • C-A&MF after C-A&MF obtains the new shared model through step 413, it can send a model update request #2 to L-A&MF, where the model update request #2 can include the model of the new shared model Identification, and the parameters of the fused model.
  • the parameters of the model may include at least one of weight parameters, bias parameters, or activation function information of each layer.
  • model update request #2 may be an example of the model update notification message sent by the first node to the second node before the second node in FIG. 3 determines whether the prediction error of the updated shared model is less than the fourth threshold.
  • the L-A&MF in the second node sends a model update response #2 to the C-A&MF in the first node.
  • the model update response #2 can be used to notify the first node that the model update request #2 is received.
  • the L-A&MF in the second node sends a model installation request to the L-MEF.
  • L-A&MF can determine whether the prediction error of the new shared model is greater than a preset threshold or the old shared model according to the model update strategy. As an example, when the prediction error of the new shared model is greater than or equal to the old shared model, the second node does not perform model installation on the new shared model, and steps 419 to 421 are not executed at this time. When the prediction error of the new shared model is smaller than the old shared model, the second node updates and installs the new shared model, and steps 419 to 421 are executed at this time.
  • the model installation request can carry the model identifier of the new shared model and the parameters of the fused model.
  • the model identification and the parameters of the fused model can be referred to the above description, for the sake of brevity, it will not be described here.
  • the L-MEF in the second node performs model update installation.
  • 420 may refer to the description in 415, and for brevity, it is not described here.
  • the L-MEF in the second node sends a model installation response to L-A&MF, and the model installation response may indicate that the model update has been completed.
  • the second node performs model application.
  • the L-MEF in the second node subscribes to the L-DSF the data required for model prediction, and performs model prediction. Then, the prediction result is sent to the local APF for policy execution.
  • first node and the second node start the joint learning, they can be executed cyclically according to the steps of the joint learning.
  • a joint learning stop condition can be set.
  • the first node and the second node can stop the joint learning.
  • the joint learning stop condition may include the execution time of the joint learning or the limited resources of the second node. That is to say, the embodiment of the present application may stop the joint learning after the execution time long after the joint learning is started, or stop the joint learning when some or all of the resources of the second node are limited.
  • the joint learning stop condition may be included in the joint learning strategy, or pre-configured in the first node or the second node, which is not limited in the embodiment of the present application.
  • the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through a joint learning strategy, including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc. Based on this, the embodiment of the present application can realize that under the condition that the local data of the second node does not need to be uploaded to the first node, the first node and the second node can learn together to obtain a training model with high accuracy and generalization ability.
  • a joint learning strategy including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc.
  • FIG. 5 shows a schematic flowchart of a method 500 for training a model provided by an embodiment of the present application. It should be understood that FIG. 5 shows the steps or operations of the method for training a model, but these steps or operations are only examples, and the embodiment of the present application may also perform other operations or variations of each operation in FIG. 5. In addition, the various steps in FIG. 5 may be performed in a different order from that presented in FIG. 5, and it is possible that not all operations in FIG. 5 are to be performed.
  • CUDA and at least one DUDA are taken as examples for description.
  • CUDA may correspond to an example of the first node above
  • DUDA may correspond to an example of the second node above.
  • the interactive information during joint learning between the first node and the second node can be directly transmitted through the interface between the first node and the second node.
  • CUDA and DUDA take CUDA and DUDA as examples, but the embodiments of this application are not limited thereto.
  • CUDA can also be replaced with gNB or eNB or cell
  • DUDA can be replaced with the gNB Or terminal equipment served by an eNB or a cell.
  • CUDA can also be replaced with CU
  • DUDA can be replaced with DU managed by the CU.
  • CUDA can also be replaced with C-RAN
  • DUDA can be replaced with an eNB or gNB managed by the C-RAN.
  • CUDA may also be replaced by an eNB or gNB
  • DUDA may be replaced by a cell managed by the eNB or gNB
  • CUDA can also be replaced with OSS
  • DUDA can be replaced with network elements managed by OSS.
  • both CUDA and CUDA can be replaced with gNB or eNB. The embodiment of the application does not specifically limit this.
  • CUDA sends a joint learning training request to at least one DUDA.
  • CUDA may send a joint learning training request to each DUDA of at least one DUDA when the joint learning start condition is satisfied.
  • the joint training request may refer to the above description, and may refer to the above description. For brevity, details are not repeated here.
  • Each DUDA of at least one DUDA sends a joint learning training request response to CUDA.
  • Each DUDA performs local model training and processing.
  • DUDA performs data subscription, local model training, and processing according to the instructions in step 501.
  • DUDA performs data subscription, local model training, and processing according to the instructions in step 501.
  • DUDA performs data subscription, local model training, and processing according to the instructions in step 501.
  • 410 in FIG. 4 For details, please refer to the description of 410 in FIG. 4 above. For brevity, details are not described herein again.
  • Each DUDA sends a local model upload notification to CUDA.
  • CUDA sends a local model upload notification response to each DUDA.
  • 504 and 505 can be referred to the descriptions of 411 and 412 in FIG. 4 above. For brevity, details are not repeated here.
  • CUDA performs model screening, fusion, and processing.
  • 506 can refer to the description of 41/3 in FIG. 4 above, and for the sake of brevity, it will not be repeated here.
  • Each DUDA performs model update installation and model application.
  • CUDA sends a joint learning training request to at least one DUDA, and each DUDA can locally retrain the shared model indicated by CUDA according to the joint learning training request, and then each DUDA will train the result
  • the local model is reported to CUDA, and then CUDA can merge and process the local model reported by the at least one DUDA to determine a new shared model.
  • the embodiment of the present application can use the CU and the DU to separate the architecture.
  • the inter-interface transmits interactive information during joint learning, and based on this, a shared model with high accuracy and generalization ability is obtained.
  • FIG. 6 shows a schematic structural diagram of an apparatus 600 for training a model in a wireless network provided by the present application.
  • the apparatus 600 for training a model may be the first node in the wireless network.
  • the device 600 for training a model includes: a sending unit 610, a receiving unit 620, and a determining unit 630.
  • the sending unit 610 is configured to send a first request to at least one second node in the wireless network, where the first request is used to request the at least one second node to perform a request based on the local data of the second node.
  • the first shared model performs local model retraining.
  • the receiving unit 620 is configured to obtain a model report message from the at least one second node, where the model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and The increment between the parameters of the first shared model, wherein the first local model is that each second node performs a local model on the first based on the first request and the local data Obtained after retraining.
  • the determining unit 630 is configured to determine the second sharing model according to the model report message of the at least one second node and the first sharing model.
  • a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data.
  • the shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model.
  • the embodiment of the present application can implement joint learning in a wireless network.
  • the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;
  • the determining unit 630 is specifically configured to:
  • a second local model is determined in at least one first local model corresponding to the at least one second node, and the size of the training data set corresponding to the second local model is greater than or equal to a first threshold, and/or, the first 2.
  • the prediction error of the local model is less than or equal to the second threshold;
  • the size of the training data set is selected from at least one first local model to be greater than or equal to a certain threshold, and/or the prediction error
  • the second local model is less than or equal to a certain threshold, and then the second shared model is determined through the second local model. Since the accuracy or generalization ability of the second local model in the embodiment of this application is higher than that of the first local model, Therefore, the embodiments of the present application can help improve the accuracy and generalization ability of the second sharing model.
  • the first node may not filter the at least one second local model from the at least one first local model, but directly determine the second shared model based on the at least one first local model.
  • the determining unit 630 is specifically configured to:
  • the parameter is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model;
  • the first node determines the second sharing model according to the weighted average aggregation result and the first sharing model.
  • the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function:
  • the centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, where the joint learning strategy includes at least one of the following information:
  • Joint learning start conditions information of the first shared model, identification of members of the joint learning group, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model , The processing strategy of the first local model or the shared model update strategy.
  • the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through a joint learning strategy, including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc. Based on this, the embodiment of the present application can realize that under the condition that the local data of the second node does not need to be uploaded to the first node, the first node and the second node can learn together to obtain a training model with high accuracy and generalization ability.
  • a joint learning strategy including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc.
  • the second node includes local analysis and modeling functions
  • the sending unit 610 is specifically configured to:
  • the centralized analysis and modeling function sends the first request to the local analysis and modeling function of each second node in the at least one second node, wherein the The first request includes the information of the first sharing model.
  • the information of the first sharing model includes at least one of the following:
  • the first request further includes the upload strategy of the first local model.
  • the upload strategy of the first local model includes at least one of the following:
  • the carrying information includes the size and/or prediction error of the training data set of the first local model.
  • the determining unit 630 is further configured to determine that the prediction error of the second sharing model is less than or equal to a third threshold.
  • the sending unit 610 is further configured to send a model update notification message to the at least one second node respectively, and the model update notification message is used to request The second shared model is installed in each of the second nodes.
  • the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
  • the prediction error of the first sharing model may be set as the third threshold, which is not limited in the embodiment of the present application.
  • the sending unit 610 and/or the receiving unit 620 may also be collectively referred to as a transceiver unit (module), or a communication unit, which may be respectively used to perform the steps of receiving and sending by the first node in the method embodiment.
  • the processing unit 630 is configured to generate instructions sent by the sending unit 610, or process instructions received by the receiving unit 620.
  • the device 600 for training a model may further include a storage unit for storing instructions executed by the sending unit, the receiving unit, and the processing unit.
  • the device 600 for training the model is the first node in the method embodiment, and may also be a chip in the first node.
  • the processing unit may be a processor, and the sending unit and the receiving unit may be transceivers.
  • the apparatus for training a model may further include a storage unit, and the storage unit may be a memory.
  • the storage unit is used to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the communication device executes the foregoing method.
  • the processing unit may be a processor, and the sending unit and receiving unit may be input/output interfaces, pins or circuits, etc.; the processing unit executes the storage unit
  • the storage unit may be a storage unit in the chip (for example, a register, a cache, etc.), or may be in the terminal device A storage unit (for example, read-only memory, random access memory, etc.) located outside the chip.
  • the sending unit 610 and the receiving unit 620 may be implemented by a transceiver.
  • the processing unit can be implemented by a processor.
  • the storage unit can be realized by a memory.
  • the apparatus 700 for training a model may include a processor 710, a memory 720, and a transceiver 730.
  • the device 700 for training the model may be the first node in the wireless network.
  • the device 600 for training a model shown in FIG. 6 or the device 700 for training a model shown in FIG. 7 can implement the steps performed by the first node in the foregoing embodiment.
  • FIG. 8 shows a schematic structural diagram of an apparatus 800 for training a model provided in this application.
  • the device 800 for training the model may be the second node in the wireless network.
  • the device 800 for training a model includes: a receiving unit 810, a processing unit 820, and a sending unit 830.
  • the receiving unit 810 is configured to receive a first request from a first node in the radio access network.
  • the processing unit 820 is configured to perform local model retraining on the first shared model based on the local data of the second node according to the first request to obtain the first local model.
  • the sending unit 830 is configured to send a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the first shared model. The increment between the parameters, the model report message is used to update the first shared model.
  • a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data.
  • the shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model.
  • the embodiment of the present application can implement joint learning in a wireless network.
  • the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.
  • the first node can filter out that the size of the training data set is greater than or equal to a certain threshold, and/or the prediction based on the size of the training data set corresponding to the first local model in the model report message, and/or the prediction error
  • the second local model whose error is less than or equal to a certain threshold is then used to determine the second shared model through the second local model. Since the accuracy or generalization ability of the second local model in the embodiment of the application is higher than that of the first local model, the embodiment of the application can help improve the accuracy and generalization ability of the second shared model.
  • the second node includes local analysis and modeling functions, and the first node includes centralized analysis and modeling functions;
  • the second node receiving the first request from the first node in the radio access network includes:
  • the local analysis and modeling function in the second node obtains the first request from the local analysis and modeling function in the first node, wherein the first request includes the information of the first shared model.
  • the first request further includes the upload strategy of the first local model.
  • the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.
  • the upload strategy of the first local model includes at least one of the following:
  • the carrying information includes the size and/or prediction error of the training data set of the first local model.
  • the second node also includes a local adaptive policy function
  • the local analysis and modeling function sends a third request to the local adaptive strategy, the third request is used to request a local joint learning strategy corresponding to the first shared model, and the local joint learning strategy is used to Indicating whether the second node performs local model retraining on the first shared model, and the third request includes the information of the first shared model;
  • the local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy
  • the local analysis and modeling function performs local retraining on the first shared model based on the local data. Model retraining.
  • the embodiment of the present application sends the local joint learning strategy to the local analysis and modeling function through the local adaptive strategy function, so that the second node can determine whether to participate in the joint learning according to its own computing ability, so as to avoid the second node from calculating by itself. Insufficient ability leads to a longer iteration time of joint learning and improves the efficiency of joint learning.
  • the local adaptive strategy function may not send the local joint learning strategy to the local analysis and modeling function. Instead, when the second node receives a request for joint learning training, it will always Participate in joint learning.
  • the information of the first sharing model includes at least one of the following:
  • the second node receives a model update notification message sent by the first node, where the model update notification message is used to notify a second sharing model, where the second sharing model is the first node Determined according to the model report message and the first sharing model;
  • the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
  • the prediction error of the first sharing model may be set as the fourth threshold, which is not limited in the embodiment of the present application.
  • the receiving unit 810 and/or the sending unit 830 may also be collectively referred to as a transceiver unit (module), or a communication unit, which may be respectively used to perform the steps of receiving and sending by the second node in the method embodiment.
  • the processing unit 820 is also configured to generate instructions sent by the sending unit 830, or process instructions received by the receiving unit 810.
  • the communication device 800 may further include a storage unit for storing instructions executed by the communication unit and the processing unit.
  • the device 800 for training the model is the second node in the method embodiment, and may also be a chip in the second node.
  • the processing unit may be a processor, and the sending unit and the receiving unit may be transceivers.
  • the device may further include a storage unit, and the storage unit may be a memory. The storage unit is used to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the communication device executes the foregoing method.
  • the processing unit may be a processor, the sending unit and the receiving unit may be input/output interfaces, pins or circuits, etc.; the processing unit executes the storage unit Stored instructions to enable the communication device to perform the operations performed by the network device in the foregoing method embodiments.
  • the storage unit may be a storage unit in the chip (for example, a register, a cache, etc.), or may be in the communication device.
  • the storage unit for example, read only memory, random access memory, etc. located outside the chip.
  • the sending unit 830 and the receiving unit 810 may be implemented by a transceiver, and the processing unit 820 may be implemented by a processor.
  • the storage unit can be realized by a memory.
  • the apparatus 900 for training a model may include a processor 910, a memory 920, and a transceiver 930.
  • the device 900 for training the model may be the second node in the wireless network.
  • the device 800 for training a model shown in FIG. 8 or the device 900 for training a model shown in FIG. 9 can implement the steps performed by the second node in the foregoing method embodiment.
  • the first node or the second node in the device and method embodiment for training the model in each of the above device embodiments corresponds to the first node or the second node, and the corresponding modules or units execute the corresponding steps.
  • the sending and receiving unit (or communication unit, transceiver) method executes the sending and/or receiving steps in the method embodiment (or is executed by the sending unit and the receiving unit respectively), and other steps except sending and receiving can be performed by the processing unit (processor )carried out.
  • the sending unit and the receiving unit may form a transceiver unit, and the transmitter and receiver may form a transceiver to jointly implement the transceiver function in the method embodiment; the processor may be one or more.
  • the above-mentioned first node or second node may be a chip, and the processing unit may be implemented by hardware or software.
  • the processing unit When implemented by hardware, the processing unit may be a logic circuit, an integrated circuit, or the like.
  • the processing unit can be a general-purpose processor, which can be implemented by reading the software code stored in the storage unit.
  • the storage unit can be integrated in the processor or can exist independently of the processor. .
  • the processing device may be a chip.
  • the processing device may be a Field-Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a System on Chip (SoC), and a Central Processor (Central Processor). Unit, CPU), network processor (Network Processor, NP), digital signal processing circuit (Digital Signal Processor, DSP), microcontroller (Micro Controller Unit, MCU), programmable controller (Programmable Logic Device, PLD) or Other integrated chips, etc.
  • FPGA Field-Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • SoC System on Chip
  • CPU Central Processor
  • NP Network Processor
  • DSP digital signal processing circuit
  • MCU Micro Controller Unit
  • PLD Programmable Logic Device
  • each step in the method provided in this embodiment can be completed by an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capability.
  • the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the above-mentioned processor can be a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated crcuit, ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the processors in the embodiments of the present application may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory or storage unit in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • serial link DRAM SLDRAM
  • direct rambus RAM direct rambus RAM
  • An embodiment of the present application also provides a wireless network, which includes the first node and the second node described above.
  • the embodiments of the present application also provide a computer-readable medium on which a computer program is stored, and when the computer program is executed by a computer, the method in any of the foregoing embodiments is implemented.
  • the embodiments of the present application also provide a computer program product, which implements the method in any of the foregoing embodiments when the computer program product is executed by a computer.
  • the embodiment of the present application also provides a system chip, which includes a communication unit and a processing unit.
  • the processing unit may be a processor, for example.
  • the communication unit may be, for example, an input/output interface, a pin, or a circuit.
  • the processing unit can execute computer instructions so that the chip in the communication device executes any of the methods provided in the foregoing embodiments of the present application.
  • the computer instructions are stored in a storage unit.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not be implemented in this application.
  • the implementation process of the example constitutes any limitation.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • the computer instruction can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instruction can be transmitted from a website, computer, server, or data center through a cable.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium can be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state disk (SSD)). ))Wait.
  • At least one refers to one or more, and “multiple” refers to two or more.
  • “And/or” describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the associated objects are in an "or” relationship.
  • "The following at least one item (a)” or similar expressions refers to any combination of these items, including any combination of a single item (a) or plural items (a).
  • at least one item (a) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

A model training method and apparatus in a wireless network, wherein same can realize joint learning in the wireless network, and can help to obtain a training model with a relatively high accuracy and generalization capability. In the method, a first node in a wireless network sends a first request to at least one second node in the wireless network, and each second node can locally retrain a first shared model based on local data thereof according to the first request, and can then report a local model by means of the first node, such that the first node can determine a second shared model according to the local model reported by the second node.

Description

训练模型的方法和装置Method and device for training model
本申请要求于2019年02月22日提交中国专利局、申请号为2019101354646、申请名称为“训练模型的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 22, 2019, with application number 2019101354646 and application titled "Method and Apparatus for Training Model", the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及人工智能(artificial intelligence,AI)领域,并且更具体的,涉及AI领域中的训练模型的方法和装置。This application relates to the field of artificial intelligence (AI), and more specifically, to methods and devices for training models in the AI field.
背景技术Background technique
AI能够应用于无线网络中。随着无线网络中具有更多的频谱、更多的业务种类和接入终端数量,网络系统也会更加复杂,这也使得无线网络的架构和接入网设备将会更加智能化和自动化。结合无线网络中的各类型业务特征和网络架构和设备形态,目前已经定义出无线智能网络架构。AI can be applied to wireless networks. As wireless networks have more spectrums, more types of services, and the number of access terminals, the network system will become more complex, which will also make the architecture of the wireless network and access network equipment more intelligent and automated. Combining the characteristics of various types of services and network architectures and device forms in wireless networks, a wireless intelligent network architecture has been defined.
无线智能网络架构中,可以利用机器学习训练模型。机器学习训练模型主要采用集中训练和本地训练两种形式。其中,集中训练需要将训练数据聚合到中心节点,这会导致通信开销大,并且还存在建模时延长、用户数据上传存在数据隐私问题、中心节点存储和计算压力大等问题。本地训练不需要将模型上报,各本地节点只利用本地数据分别建模。本地训练存在数据量不足的问题,会导致模型不准确,并且模型还存在局部泛化能力弱的问题。In the wireless intelligent network architecture, machine learning can be used to train the model. The machine learning training model mainly adopts two forms: centralized training and local training. Among them, centralized training requires aggregating training data to a central node, which will cause high communication overhead, and there are also problems such as extended modeling time, data privacy issues in user data upload, and high pressure on central node storage and calculation. Local training does not need to report the model, and each local node only uses local data to model separately. Local training has the problem of insufficient data, which will cause the model to be inaccurate, and the model also has the problem of weak local generalization ability.
因此,无线智能网络架构中亟需一种中心节点与本地节点协作进行模型训练的方案。Therefore, there is an urgent need for a model training scheme in which a central node and a local node cooperate in the wireless intelligent network architecture.
发明内容Summary of the invention
本申请提供一种无线网络中的训练模型的方法和装置,能够在无线网络中实现联合学习,有助于得到准确性以及泛化能力较高的训练模型。The present application provides a method and device for training a model in a wireless network, which can implement joint learning in a wireless network, and helps to obtain a training model with high accuracy and generalization ability.
第一方面,提供了一种应用于无线网络中的训练模型的方法,该方法由无线网络中的第一节点执行,该方法包括:In the first aspect, a method for training a model applied to a wireless network is provided, the method is executed by a first node in the wireless network, and the method includes:
第一节点向无线网络中的至少一个第二节点发送第一请求,该第一请求用于请求该至少一个第二节点分别基于第二节点的本地数据,对第一共享模型进行本地模型再训练;The first node sends a first request to at least one second node in the wireless network, and the first request is used to request the at least one second node to retrain the first shared model locally based on the local data of the second node. ;
第一节点从上述至少一个第二节点分别获取模型上报消息,每个第二节点的模型上报消息包括第一本地模型的参数,或者包括该第一本地模型的参数与上述第一共享模型的参数之间的增量,其中,第一本地模型是每个第二节点基于第一请求和本地数据,对第一共享模型进行本地模型再训练后得到的;The first node obtains the model report message from the at least one second node. The model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model. The first local model is obtained after each second node performs local model retraining on the first shared model based on the first request and local data;
第一节点根据上述至少一个第二节点的模型上报消息和第一共享模型,确定第二共享模型。The first node determines the second sharing model according to the model report message of the at least one second node and the first sharing model.
因此,本申请实施例中,无线网络中的第一节点向该无线网络中的至少一个第二节点发送第一请求,每个第二节点能够根据该第一请求,基于其本地数据对第一共享模型进行本地再训练,然后通过模型上报消息将训练得到的本地模型的参数,或本地模型的参数与共享模型的参数之间的增量上报给第一节点,进而第一节点能够根据该至少一个第二节点上报内容和第一共享模型,确定第二共享模型,基于此本申请实施例能够在无线网络中实现联合学习。Therefore, in this embodiment of the present application, a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data. The shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model. Based on this, the embodiment of the present application can implement joint learning in a wireless network.
本申请实施例中,联合学习指的是第一节点以作为集中节点,以第二节点作为本地节点,在第二节点的本地数据不需要上传至第一节点的情况下,协同第一节点和第二节点共同学习以训练模型。联合学习能够克服集中训练模型的部分或全部缺点,也能够克服本地训练模型的部分或全部缺点。In the embodiment of the present application, joint learning refers to that the first node is used as a centralized node, and the second node is used as a local node. When the local data of the second node does not need to be uploaded to the first node, the first node and The second node learns together to train the model. Joint learning can overcome some or all of the shortcomings of the centralized training model, as well as some or all of the shortcomings of the local training model.
具体而言,本申请实施例相比于现有技术中的集中训练模型的方式而言,不需要本地节点向集中节点上报训练数据,因而能够大大减少训练数据上报带来的通信开销,并且能够减小集中节点超大规模数据的存储和模型训练的压力,另外本申请实施例本地节点进行分布式模型训练,能够缩短模型训练的时间,保护数据隐私。Specifically, compared with the centralized training model in the prior art, the embodiment of the present application does not require the local node to report training data to the centralized node, thereby greatly reducing the communication overhead caused by training data reporting, and can Reduce the pressure of the storage of ultra-large-scale data of centralized nodes and model training. In addition, local nodes in the embodiment of the present application perform distributed model training, which can shorten the time of model training and protect data privacy.
另外,本申请实施例相比于现有技术中的本地训练模型的方式而言,本地节点会将本地再训练后的本地模型发送给集中节点,使得集中节点能够根据至少一个本地节点的本地模型对共享模型进行更新,因而能够有助于克服本地训练存在数据量不足的问题,进而提高训练模型的准确性以及模型泛化能力。In addition, compared with the local training model in the prior art, the local node sends the local model after the local retraining to the centralized node, so that the centralized node can be based on the local model of at least one local node. Updating the shared model can help overcome the problem of insufficient data in local training, thereby improving the accuracy of the training model and the generalization ability of the model.
结合第一方面,在第一方面的某些实现方式中,模型上报消息还包括第一本地模型对应的训练数据集的大小,和/或,第一本地模型的预测误差;With reference to the first aspect, in some implementations of the first aspect, the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;
其中,上述第一节点根据所述至少一个第二节点的模型上报消息和所述第一共享模型,确定第二共享模型,包括:Wherein, the first node determining the second sharing model according to the model report message of the at least one second node and the first sharing model includes:
第一节点在至少一个第二节点对应的至少一个第一本地模型中确定第二本地模型,其中,第二本地模型对应的训练数据集的大小大于或等于第一阈值,和/或,第二本地模型的预测误差小于或等于第二阈值;The first node determines the second local model in at least one first local model corresponding to the at least one second node, wherein the size of the training data set corresponding to the second local model is greater than or equal to the first threshold, and/or, the second The prediction error of the local model is less than or equal to the second threshold;
第一节点根据所述第二本地模型和所述第一共享模型,确定所述第二共享模型。The first node determines the second sharing model according to the second local model and the first sharing model.
因此,本申请实施例在第一节点根据第一本地模型确定第二共享模型时,通过至少一个第一本地模型中筛选出训练数据集的大小大于或等于某一特定阈值,和/或预测误差小于或等于某一特定阈值的第二本地模型,然后通过第二本地模型来确定第二共享模型,由于本申请实施例中第二本地模型的准确性高于第一本地模型,因此本申请实施例能够有助于提高第二共享模型的准确性。Therefore, in the embodiment of the present application, when the first node determines the second shared model according to the first local model, the size of the training data set is selected from at least one first local model to be greater than or equal to a certain threshold, and/or the prediction error The second local model is less than or equal to a certain threshold, and then the second local model is used to determine the second shared model. Since the accuracy of the second local model in the embodiment of this application is higher than that of the first local model, this application is implemented Examples can help improve the accuracy of the second shared model.
结合第一方面,在第一方面的某些实现方式中,第一节点根据所述第二本地模型和所述第一共享模型,确定所述第二共享模型,包括:With reference to the first aspect, in some implementations of the first aspect, the first node determining the second sharing model according to the second local model and the first sharing model includes:
第一节点对第二本地模型的参数,或者第二本地模型的参数与第一共享模型的参数之间的增量,进行加权平均聚合,其中,加权平均聚合所采用的权重参数是根据第二本地模型对应的训练数据集的大小,和/或,第二本地模型的预测误差确定的;The first node performs weighted average aggregation on the parameters of the second local model, or the increment between the parameters of the second local model and the parameters of the first shared model, where the weight parameter used in the weighted average aggregation is based on the second The size of the training data set corresponding to the local model, and/or the prediction error of the second local model is determined;
第一节点根据加权平均聚合的结果和第一共享模型,确定第二共享模型。The first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.
作为示例,该权重参数可以为第二本地模型的总个数的倒数,此时各个第二本地模型的权重参数相同。或者,第二本地模型的权重参数可以分别为该第二本地模型对应的训练 数据集的大小与所有训练数据集的大小的比值,其中,所有训练数据集的大小为各个第二本地模型对应的训练数据集的大小之和。或者,各个第二本地模型的权重参数可以分别为各自对应的MAE的倒数。As an example, the weight parameter may be the reciprocal of the total number of second local models, and at this time, the weight parameters of each second local model are the same. Or, the weight parameter of the second local model may be the ratio of the size of the training data set corresponding to the second local model to the size of all training data sets, where the size of all training data sets is corresponding to each second local model The sum of the size of the training data set. Alternatively, the weight parameter of each second local model may be the inverse of the corresponding MAE.
可选的,第一节点可以不在至少一个第一本地模型中筛选至少一个第二本地模型,而是直接根据该至少一个第一本地模型确定第二共享模型。此时,可以对第一本地模型的参数,或者第一本地模型的参数与共享模型的参数之间的增量,进行加权平均聚合。此时,作为示例,每个第一本地模型的权重参数可以为第一本地模型的总个数的倒数。然后,第一节点根据加权平均聚合的结果和第一共享模型,确定第二共享模型。Optionally, the first node may not filter the at least one second local model from the at least one first local model, but directly determine the second shared model based on the at least one first local model. At this time, the parameters of the first local model or the increments between the parameters of the first local model and the parameters of the shared model may be aggregated by weighted average. At this time, as an example, the weight parameter of each first local model may be the reciprocal of the total number of first local models. Then, the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.
结合第一方面,在第一方面的某些实现方式中,第一节点中包括集中自适应策略功能和集中分析和建模功能,该方法还包括:With reference to the first aspect, in some implementations of the first aspect, the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function, and the method further includes:
集中自适应策略功能向集中分析和建模功能发送联合学习策略,该联合学习策略包括以下信息中的至少一种:The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, where the joint learning strategy includes at least one of the following information:
联合学习启动条件、第一共享模型的信息、联合学习群组成员标识、第一本地模型的上传策略、第一本地模型的筛选策略、第一本地模型的聚合策略、第一本地模型的处理策略或共享模型更新策略。Joint learning start conditions, information of the first shared model, joint learning group member identification, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model, processing strategy of the first local model Or share the model update strategy.
因此,本申请实施例通过联合学习策略来管理无线网络中的第一节点和第二节点的联合学习过程,包括编排管理包括联合学习启动条件、模型的上传策略、模型的筛选策略、模型的聚合策略、模型的处理策略等中的至少一种。基于此,本申请实施例能够实现在第二节点的本地数据不需要上传至第一节点的情况下,协同第一节点与第二节点共同学习,得到准确性以及泛化能力高的训练模型。Therefore, the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through a joint learning strategy, including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc. Based on this, the embodiment of the present application can realize that under the condition that the local data of the second node does not need to be uploaded to the first node, the first node and the second node can learn together to obtain a training model with high accuracy and generalization ability.
可选的,本申请实施例中,第一节点在满足联合学习启动条件时,向第二节点发送用于进行联合学习训练的第一请求。Optionally, in this embodiment of the application, the first node sends a first request for joint learning training to the second node when the joint learning start condition is satisfied.
结合第一方面,在第一方面的某些实现方式中,上述第二节点中包括本地分析和建模功能;With reference to the first aspect, in some implementations of the first aspect, the foregoing second node includes local analysis and modeling functions;
其中,第一节点向所述无线接入网中的至少一个第二节点发送第一请求,包括:Wherein, sending the first request by the first node to at least one second node in the radio access network includes:
在满足上述联合学习启动条件时,第一节点中的集中分析和建模功能向上述至少一个第二节点中每个第二节点的本地分析和建模功能发送第一请求,这里,该第一请求中包括第一共享模型的信息。When the above-mentioned joint learning start conditions are met, the centralized analysis and modeling function in the first node sends a first request to the local analysis and modeling function of each second node in the at least one second node. Here, the first The request includes the information of the first shared model.
因此,本申请实施例通过设置联合学习的启动条件,能够实现在不满足联合学习启动条件的情况下采用集中训练模型的方式进行模型训练,在满足联合训练启动条件的情况下采用联合学习的方式进行模型训练。Therefore, by setting the starting conditions of the joint learning in the embodiment of the present application, it is possible to use the centralized training model for model training when the joint learning starting conditions are not met, and the joint learning method when the joint training starting conditions are met. Perform model training.
结合第一方面,在第一方面的某些实现方式中,第一共享模型的信息包括以下至少一种:With reference to the first aspect, in some implementation manners of the first aspect, the information of the first sharing model includes at least one of the following:
第一共享模型的模型标识、模型类型、模型结构、输入输出和初始模型参数或训练数据采集时长。The model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
结合第一方面,在第一方面的某些实现方式中,第一请求中还包括第一本地模型的上传策略。With reference to the first aspect, in some implementation manners of the first aspect, the first request further includes the upload strategy of the first local model.
本申请实施例通过第一节点向第二节点发送第一本地模型的上传策略,使得第一节点能够对第二节点如何上传第一本地模型进行指示。对应的,第二节点获取到第一本地模型 的上传策略之后,可以根据该上传策略对本地再训练得到本地模型并进行相应的处理操作。In this embodiment of the application, the first node sends the upload strategy of the first local model to the second node, so that the first node can instruct the second node how to upload the first local model. Correspondingly, after the second node obtains the upload strategy of the first local model, it can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.
结合第一方面,在第一方面的某些实现方式中,第一本地模型的上传策略包括以下至少一种:With reference to the first aspect, in some implementations of the first aspect, the upload strategy of the first local model includes at least one of the following:
第一本地模型上传前的处理算法的标识、所述第一本地模型的上传时间或第一本地模型上传时的携带信息,其中,携带信息包括第一本地模型的训练数据集的大小和/或预测误差。The identification of the processing algorithm before uploading the first local model, the upload time of the first local model, or the carrying information when the first local model was uploaded, where the carrying information includes the size and/or the training data set of the first local model Forecast error.
结合第一方面,在第一方面的某些实现方式中,还包括:In combination with the first aspect, some implementations of the first aspect further include:
第一节点确定第二共享模型的预测误差小于或等于第三阈值;The first node determines that the prediction error of the second sharing model is less than or equal to the third threshold;
第一节点向至少一个第二节点分别发送模型更新通知消息,该模型更新通知消息用于请求每个第二节点安装第二共享模型。The first node respectively sends a model update notification message to at least one second node, where the model update notification message is used to request each second node to install the second shared model.
因此,本申请实施例通过判断第二共享模型的预测误差,当确定该预测误差小于或等于预设阈值时,第一节点保存该第二共享模型,第二节点安装该第二共享模型。当确定该预测误差大于预设阈值时,第一节点不对第一共享模型进行更新,并且第二节点不安装第二共享模型。基于此,本申请实施例能够避免安装准确性低的共享模型,进一步的保证更新后的共享模型的准确性以及泛化能力。Therefore, the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
一种可能的实现方式中,可以将第一共享模型的预测误差设置为第三阈值,本申请实施例对此不作限定。In a possible implementation manner, the prediction error of the first sharing model may be set as the third threshold, which is not limited in the embodiment of the present application.
第二方面,提供了一种应用于无线接入网中的训练模型的方法,其特征在于,该方法由无线接入网中的第二节点执行,该方法包括:In a second aspect, a method for training a model applied to a radio access network is provided, characterized in that the method is executed by a second node in the radio access network, and the method includes:
第二节点从所述无线接入网中的第一节点接收第一请求;The second node receives the first request from the first node in the radio access network;
第二节点根据所述第一请求,基于第二节点的本地数据对第一共享模型进行本地模型再训练,以得到第一本地模型;According to the first request, the second node performs local model retraining on the first shared model based on the local data of the second node to obtain the first local model;
第二节点向第一节点发送模型上报消息,其中,该模型上报消息包括第一本地模型的参数,或者包括第一本地模型的参数与第一共享模型的参数之间的增量,所述模型上报消息用于所述第一共享模型的更新。The second node sends a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the increment between the parameters of the first local model and the parameters of the first shared model, and the model The report message is used to update the first sharing model.
因此,本申请实施例中,无线网络中的第一节点向该无线网络中的至少一个第二节点发送第一请求,每个第二节点能够根据该第一请求,基于其本地数据对第一共享模型进行本地再训练,然后通过模型上报消息将训练得到的本地模型的参数,或本地模型的参数与共享模型的参数之间的增量上报给第一节点,进而第一节点能够根据该至少一个第二节点上报内容和第一共享模型,确定第二共享模型,基于此本申请实施例能够在无线网络中实现联合学习。Therefore, in this embodiment of the present application, a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data. The shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model. Based on this, the embodiment of the present application can implement joint learning in a wireless network.
本申请实施例中,联合学习指的是第一节点以作为集中节点,以第二节点作为本地节点,在第二节点的本地数据不需要上传至第一节点的情况下,协同第一节点和第二节点共同学习以训练模型。联合学习能够克服集中训练模型的部分或全部缺点,也能够克服本地训练模型的部分或全部缺点。In the embodiment of the present application, joint learning refers to that the first node is used as a centralized node, and the second node is used as a local node. When the local data of the second node does not need to be uploaded to the first node, the first node and The second node learns together to train the model. Joint learning can overcome some or all of the shortcomings of the centralized training model, as well as some or all of the shortcomings of the local training model.
具体而言,本申请实施例相比于现有技术中的集中训练模型的方式而言,不需要本地节点向集中节点上报训练数据,因而能够大大减少训练数据上报带来的通信开销,并且能够减小集中节点超大规模数据的存储和模型训练的压力,另外本申请实施例本地节点进行 分布式模型训练,能够缩短模型训练的时间,保护数据隐私。Specifically, compared with the centralized training model in the prior art, the embodiment of the present application does not require the local node to report training data to the centralized node, thereby greatly reducing the communication overhead caused by training data reporting, and can Reduce the pressure of the storage of ultra-large-scale data of centralized nodes and model training. In addition, local nodes in the embodiment of the present application perform distributed model training, which can shorten the time of model training and protect data privacy.
另外,本申请实施例相比于现有技术中的本地训练模型的方式而言,本地节点会将本地再训练后的本地模型发送给集中节点,使得集中节点能够根据至少一个本地节点的本地模型对共享模型进行更新,因而能够有助于克服本地训练存在数据量不足的问题,进而提高训练模型的准确性以及模型泛化能力。In addition, compared with the local training model in the prior art, the local node sends the local model after the local retraining to the centralized node, so that the centralized node can be based on the local model of at least one local node. Updating the shared model can help overcome the problem of insufficient data in local training, thereby improving the accuracy of the training model and the generalization ability of the model.
结合第二方面,在第二方面的某些实现方式中,模型上报消息还包括第一本地模型对应的训练数据集的大小,和/或,第一本地模型的预测误差。With reference to the second aspect, in some implementations of the second aspect, the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.
第一节点可以根据模型上报消息中的第一本地模型对应的训练数据集的大小,和/或,预测误差,筛选出训练数据集的大小大于或等于某一特定阈值,和/或预测误差小于或等于某一特定阈值的第二本地模型,然后通过第二本地模型来确定第二共享模型。由于本申请实施例中第二本地模型的准确性高于第一本地模型,因此本申请实施例能够有助于提高第二共享模型的准确性。The first node can filter the size of the training data set corresponding to the first local model in the model report message and/or the prediction error to filter out that the size of the training data set is greater than or equal to a certain threshold, and/or the prediction error is less than Or a second local model equal to a certain threshold, and then determine the second shared model through the second local model. Since the accuracy of the second local model in the embodiment of the present application is higher than that of the first local model, the embodiment of the present application can help improve the accuracy of the second shared model.
结合第二方面,在第二方面的某些实现方式中,第二节点中包括本地分析和建模功能,第一节点中包括集中分析和建模功能;With reference to the second aspect, in some implementations of the second aspect, the second node includes local analysis and modeling functions, and the first node includes centralized analysis and modeling functions;
其中,第二节点从无线接入网中的第一节点接收第一请求,包括:Wherein, the second node receiving the first request from the first node in the radio access network includes:
该第二节点中的本地分析和建模功能从第一节点中本地分析和建模功能接收第一请求,其中,该第一请求中包括第一共享模型的信息。The local analysis and modeling function in the second node receives a first request from the local analysis and modeling function in the first node, where the first request includes the information of the first shared model.
结合第二方面,在第二方面的某些实现方式中,第一请求中还包括第一本地模型的上传策略。With reference to the second aspect, in some implementations of the second aspect, the first request further includes the upload strategy of the first local model.
本申请实施例通过第一节点向第二节点发送第一本地模型的上传策略,使得第一节点能够对第二节点如何上传第一本地模型进行指示。对应的,第二节点获取到第一本地模型的上传策略之后,可以根据该上传策略对本地再训练得到本地模型并进行相应的处理操作。In this embodiment of the application, the first node sends the upload strategy of the first local model to the second node, so that the first node can instruct the second node how to upload the first local model. Correspondingly, after obtaining the upload strategy of the first local model, the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.
结合第二方面,在第二方面的某些实现方式中,第一本地模型的上传策略包括以下至少一种:With reference to the second aspect, in some implementations of the second aspect, the upload strategy of the first local model includes at least one of the following:
第一本地模型上传前的处理算法的标识、第一本地模型的上传时间或第一本地模型上传时的携带信息;其中,第一本地模型上传时的携带信息包括该第一本地模型的训练数据集的大小和/或预测误差。The identification of the processing algorithm before the upload of the first local model, the upload time of the first local model or the information carried when the first local model was uploaded; wherein the information carried when the first local model was uploaded includes the training data of the first local model Set size and/or prediction error.
结合第二方面,在第二方面的某些实现方式中,第二节点中还包括本地自适应策略功能,该方法还包括:With reference to the second aspect, in some implementations of the second aspect, the second node further includes a local adaptive strategy function, and the method further includes:
本地分析和建模功能向本地自适应策略发送第三请求,该第三请求用于请求对应于第一共享模型的本地联合学习策略,该本地联合学习策略用于指示第二节点是否对第一共享模型进行本地模型再训练,第三请求中包括第一共享模型的信息;The local analysis and modeling function sends a third request to the local adaptive strategy. The third request is used to request the local joint learning strategy corresponding to the first shared model. The local joint learning strategy is used to indicate whether the second node is The shared model performs local model retraining, and the third request includes the information of the first shared model;
本地分析和建模功能接收所述本地自适应策略发送的本地联合学习策略;The local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy;
当本地联合学习策略指示第二节点对第一共享模型进行本地模型再训练时,本地分析和建模功能基于所述本地数据,对第一共享模型进行本地模型再训练。When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local model retraining on the first shared model based on the local data.
因此,本申请实施例通过本地自适应策略功能向本地分析和建模功能发送本地联合学习策略,可以使得第二节点根据自身的计算能力确定是否参与联合学习,从而能够避免第二节点因自身计算能力不足而导致联合学习迭代时间延长,提高联合学习效率。Therefore, the embodiment of the present application sends the local joint learning strategy to the local analysis and modeling function through the local adaptive strategy function, so that the second node can determine whether to participate in the joint learning according to its own computing ability, so as to avoid the second node from calculating by itself. Insufficient ability leads to a longer iteration time of joint learning and improves the efficiency of joint learning.
可选的,本申请实施例中,本地自适应策略功能也可以不向本地分析建模功能发送本地联合学习策略,而是第二节点在收到用于请求进行联合学习训练的请求时,始终参与到联合学习中。Optionally, in this embodiment of the application, the local adaptive strategy function may not send the local joint learning strategy to the local analysis and modeling function. Instead, when the second node receives a request for joint learning training, it will always Participate in joint learning.
结合第二方面,在第二方面的某些实现方式中,第一共享模型的信息包括以下至少一种:With reference to the second aspect, in some implementation manners of the second aspect, the information of the first sharing model includes at least one of the following:
第一共享模型的模型标识、模型类型、模型结构、输入输出和初始模型参数或训练数据采集时长。The model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
结合第二方面,在第二方面的某些实现方式中,还包括:In combination with the second aspect, some implementations of the second aspect further include:
第二节点接收第一节点发送的模型更新通知消息,其中,该模型更新通知消息用于通知第二共享模型,其中,该第二共享模型是第一节点根据模型上报消息和第一共享模型确定的;The second node receives the model update notification message sent by the first node, where the model update notification message is used to notify the second sharing model, where the second sharing model is determined by the first node according to the model report message and the first sharing model of;
第二节点在确定第二共享模型的预测误差小于第四阈值时,安装该第二共享模型。The second node installs the second sharing model when it determines that the prediction error of the second sharing model is less than the fourth threshold.
因此,本申请实施例通过判断第二共享模型的预测误差,当确定该预测误差小于或等于预设阈值时,第一节点保存该第二共享模型,第二节点安装该第二共享模型。当确定该预测误差大于预设阈值时,第一节点不对第一共享模型进行更新,并且第二节点不安装第二共享模型。基于此,本申请实施例能够避免安装准确性低的共享模型,进一步的保证更新后的共享模型的准确性以及泛化能力。Therefore, the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
一种可能的实现方式中,可以将第一共享模型的预测误差设置为第四阈值,本申请实施例对此不作限定。In a possible implementation manner, the prediction error of the first sharing model may be set as the fourth threshold, which is not limited in the embodiment of the present application.
可选的,本申请实施例中,当智能网元的功能合一部署时,第一节点与第二节点之间进行联合学习时的交互信息可以直接通过第一节点与第二节点之间的接口传输。Optionally, in this embodiment of the present application, when the functions of the intelligent network element are deployed in one, the interactive information during joint learning between the first node and the second node may directly pass through the communication between the first node and the second node. Interface transmission.
第三方面,提供了一种训练模型的装置,该装置可以是无线网络中的第一节点,也可以是该第一节点内的芯片。作为示例,第一节点可以为集中节点,或中心节点。该装置具有实现上述第一方面及各种可能的实现方式的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a third aspect, a device for training a model is provided. The device may be a first node in a wireless network or a chip in the first node. As an example, the first node may be a centralized node, or a central node. The device has the function of realizing the above-mentioned first aspect and various possible implementation modes. This function can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-mentioned functions.
在一种可能的设计中,该装置包括:收发模块,可选地,该装置还包括处理模块,所述收发模块例如可以是收发器、接收器、发射器中的至少一种,该收发模块可以包括射频电路或天线。该处理模块可以是处理器。可选地,所述装置还包括存储模块,该存储模块例如可以是存储器。当包括存储模块时,该存储模块用于存储指令。该处理模块与该存储模块连接,该处理模块可以执行该存储模块存储的指令或源自其他的指令,以使该装置执行上述第一方面及各种可能的实现方式的方法。In a possible design, the device includes a transceiver module. Optionally, the device further includes a processing module. The transceiver module may be, for example, at least one of a transceiver, a receiver, and a transmitter. The transceiver module Can include radio frequency circuits or antennas. The processing module may be a processor. Optionally, the device further includes a storage module, and the storage module may be a memory, for example. When a storage module is included, the storage module is used to store instructions. The processing module is connected to the storage module, and the processing module can execute the instructions stored in the storage module or from other instructions, so that the device executes the above-mentioned first aspect and various possible implementation methods.
在另一种可能的设计中,当该装置为芯片时,该芯片包括:收发模块,可选地,该芯片还包括处理模块,收发模块例如可以是该芯片上的输入/输出接口、管脚或电路等。处理模块例如可以是处理器。该处理模块可执行指令,以使该终端内的芯片执行上述第一方面以及任意可能的实现的通信方法。可选地,该处理模块可以执行存储模块中的指令,该存储模块可以为芯片内的存储模块,如寄存器、缓存等。该存储模块还可以是位于通信设备内,但位于芯片外部,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。In another possible design, when the device is a chip, the chip includes a transceiver module. Optionally, the chip also includes a processing module. The transceiver module may be, for example, an input/output interface or pin on the chip. Or circuits, etc. The processing module may be a processor, for example. The processing module can execute instructions so that the chip in the terminal executes the first aspect and any possible implementation communication methods. Optionally, the processing module may execute instructions in the storage module, and the storage module may be a storage module in the chip, such as a register, a cache, and the like. The storage module may also be located in the communication device but outside the chip, such as read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory) memory, RAM) etc.
其中,上述任一处提到的处理器,可以是一个通用中央处理器(CPU),微处理器, 特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制上述第一方面及各种可能的实现方式的方法的程序执行的集成电路。Among them, the processor mentioned in any of the above can be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the above The integrated circuit of the program execution of the method of the first aspect and various possible implementation modes.
第四方面,提供了一种训练模型的装置,该装置可以是无线网络中的第二节点,也可以是该第二节点内的芯片。作为示例,该第二节点可以为本地节点,或者为分布式边缘节点。该装置具有实现上述第二方面及各种可能的实现方式的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a fourth aspect, a device for training a model is provided. The device may be a second node in a wireless network or a chip in the second node. As an example, the second node may be a local node or a distributed edge node. The device has the function of realizing the above-mentioned second aspect and various possible implementation manners. This function can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-mentioned functions.
在一种可能的设计中,该装置包括:收发模块,可选地,该装置还包括处理模块,所述收发模块例如可以是收发器、接收器、发射器中的至少一种,该收发模块可以包括射频电路或天线。该处理模块可以是处理器。可选地,所述装置还包括存储模块,该存储模块例如可以是存储器。当包括存储模块时,该存储模块用于存储指令。该处理模块与该存储模块连接,该处理模块可以执行该存储模块存储的指令或源自其他的指令,以使该装置执行上述第二方面及各种可能的实现方式的通信方法。In a possible design, the device includes a transceiver module. Optionally, the device further includes a processing module. The transceiver module may be, for example, at least one of a transceiver, a receiver, and a transmitter. The transceiver module Can include radio frequency circuits or antennas. The processing module may be a processor. Optionally, the device further includes a storage module, and the storage module may be a memory, for example. When a storage module is included, the storage module is used to store instructions. The processing module is connected to the storage module, and the processing module can execute instructions stored in the storage module or from other instructions, so that the device executes the communication methods of the second aspect and various possible implementation manners.
在另一种可能的设计中,当该装置为芯片时,该芯片包括:收发模块,可选地,该装置还包括处理模块,收发模块例如可以是该芯片上的输入/输出接口、管脚或电路等。处理模块例如可以是处理器。该处理模块可执行指令,以使该终端内的芯片执行上述第一方面以及任意可能的实现的方法。可选地,该处理模块可以执行存储模块中的指令,该存储模块可以为芯片内的存储模块,如寄存器、缓存等。该存储模块还可以是位于通信设备内,但位于芯片外部,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。In another possible design, when the device is a chip, the chip includes a transceiver module. Optionally, the device also includes a processing module. The transceiver module may be, for example, an input/output interface or pin on the chip. Or circuits, etc. The processing module may be a processor, for example. The processing module can execute instructions so that the chip in the terminal executes the first aspect and any possible implementation methods. Optionally, the processing module may execute instructions in the storage module, and the storage module may be a storage module in the chip, such as a register, a cache, and the like. The storage module may also be located in the communication device but outside the chip, such as read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory) memory, RAM) etc.
其中,上述任一处提到的处理器,可以是一个通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制上述第二方面及各种可能的实现方式的程序执行的集成电路。Among them, the processor mentioned in any of the above can be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the above The second aspect and various possible implementations are integrated circuits for program execution.
第五方面,提供了一种计算机存储介质,该计算机存储介质中存储有程序代码,该程序代码用于指示执行上述第一方面或第二方面或其任意可能的实现方式中的方法的指令。In a fifth aspect, a computer storage medium is provided, the computer storage medium stores program code, and the program code is used to instruct instructions to execute the method in the first aspect or the second aspect or any possible implementation manners thereof.
第六方面,提供了一种包含指令的计算机程序产品,其在计算机上运行时,使得计算机执行上述第一方面或第二方面或其任意可能的实现方式中的方法。In a sixth aspect, a computer program product containing instructions is provided, which when running on a computer, causes the computer to execute the method in the first aspect or the second aspect or any possible implementation manner thereof.
第七方面,提供了一种通信系统,该通信系统包括具有实现上述第一方面的各方法及各种可能设计的功能的装置和上述具有实现上述第二方面的各方法及各种可能设计的功能的装置。In a seventh aspect, a communication system is provided. The communication system includes a device capable of implementing the methods and various possible designs of the foregoing first aspect, and the foregoing device capable of implementing the various methods and various possible designs of the foregoing second aspect. Functional device.
第八方面,提供了一种处理器,用于与存储器耦合,用于执行上述第一方面或第二方面或其任意可能的实现方式中的方法。In an eighth aspect, a processor is provided, configured to be coupled with a memory, and configured to execute the method in the first aspect or the second aspect or any possible implementation manners thereof.
第九方面,提供了一种芯片,芯片包括处理器和通信接口,该通信接口用于与外部器件或内部器件进行通信,该处理器用于实现上述第一方面或第二方面或其任意可能的实现方式中的方法。In a ninth aspect, a chip is provided. The chip includes a processor and a communication interface. The communication interface is used to communicate with an external device or an internal device. The processor is used to implement the first aspect or the second aspect or any possible The method in the implementation mode.
可选地,该芯片还可以包括存储器,该存储器中存储有指令,处理器用于执行存储器中存储的指令或源于其他的指令。当该指令被执行时,处理器用于实现上述第一方面或第二方面或其任意可能的实现方式中的方法。Optionally, the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored in the memory or instructions derived from other sources. When the instruction is executed, the processor is used to implement the method in the foregoing first aspect or second aspect or any possible implementation manner thereof.
可选地,该芯片可以集成在第一节点或第二节点上。Optionally, the chip can be integrated on the first node or the second node.
附图说明Description of the drawings
图1示出了应用本申请实施例的系统架构的示意图。Fig. 1 shows a schematic diagram of a system architecture applying an embodiment of the present application.
图2示出了应用本申请实施例的一种智能网络架构的示意图。Fig. 2 shows a schematic diagram of an intelligent network architecture to which an embodiment of the present application is applied.
图3示出了本申请实施例提供的训练模型的方法的示意性流程图Figure 3 shows a schematic flowchart of a method for training a model provided by an embodiment of the present application
图4示出了本申请实施例提供的一种训练模型的方法的示意性流程图。Fig. 4 shows a schematic flowchart of a method for training a model provided by an embodiment of the present application.
图5示出了本申请实施例提供的一种训练模型的方法的示意性流程图。Fig. 5 shows a schematic flowchart of a method for training a model provided by an embodiment of the present application.
图6是本申请实施例提供的训练模型的装置的示意性框图。Fig. 6 is a schematic block diagram of an apparatus for training a model provided by an embodiment of the present application.
图7是本申请实施例提供的另一训练模型的装置的示意性框图。Fig. 7 is a schematic block diagram of another device for training a model provided by an embodiment of the present application.
图8是本申请实施例提供的又一训练模型的装置的示意性框图。Fig. 8 is a schematic block diagram of another device for training a model provided by an embodiment of the present application.
图9是本申请实施例提供的又一训练模型的装置的示意性框图。Fig. 9 is a schematic block diagram of another device for training a model provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合附图,对本申请中的技术方案进行描述。The technical solution in this application will be described below in conjunction with the drawings.
图1示出了应用本申请实施例的系统架构100的示意图。如图1所示,该系统架构100包括第一节点110和至少一个第二节点120。具体的,该系统100例如为无线网络,第一节点110可以为集中节点或中心节点,第二节点可以为本地节点或分布式边缘节点,本申请实施例对此不作限定。Fig. 1 shows a schematic diagram of a system architecture 100 to which an embodiment of the present application is applied. As shown in FIG. 1, the system architecture 100 includes a first node 110 and at least one second node 120. Specifically, the system 100 is, for example, a wireless network, the first node 110 may be a centralized node or a central node, and the second node may be a local node or a distributed edge node, which is not limited in the embodiment of the present application.
作为示例,第一节点110或第二节点120可以分别部署在无线接入网(radio access network,RAN)中,也可以部署在核心网,或者运营商支撑系统(operations support system,OSS)中,或者,第二节点120可以为无线网络中的终端设备,本申请实施例对此不作具体限定。一种可能的情况,第一节点110和第二节点120可以均部署在RAN中。或者,另一种可能的情况,第一节点110可以部署于OSS或核心网中,第二节点120部署于RAN中。或者,另一种可能的情况,第一节点110可以部署于RAN中,第二节点120为终端设备。As an example, the first node 110 or the second node 120 may be respectively deployed in a radio access network (RAN), or may be deployed in a core network, or an operation support system (OSS), Alternatively, the second node 120 may be a terminal device in a wireless network, which is not specifically limited in the embodiment of the present application. In a possible situation, the first node 110 and the second node 120 may both be deployed in the RAN. Or, in another possible situation, the first node 110 may be deployed in the OSS or the core network, and the second node 120 may be deployed in the RAN. Or, in another possible situation, the first node 110 may be deployed in the RAN, and the second node 120 is a terminal device.
可选的,上述系统架构100中的第一节点110或第二节点120可以由一个设备实现,也可以由多个设备共同实现,还可以是一个设备内的一个功能模块,本申请实施例对此不作具体限定。可以理解的是,上述功能既可以是硬件设备中的网络元件,也可以是在专用硬件上运行的软件功能,或者是平台(例如,云平台)上实例化的虚拟化功能,本申请实施例对此不作限定。Optionally, the first node 110 or the second node 120 in the aforementioned system architecture 100 may be implemented by one device, or may be implemented by multiple devices, or may be a functional module in one device. This is not specifically limited. It is understandable that the above-mentioned functions may be network elements in hardware devices, software functions running on dedicated hardware, or virtualization functions instantiated on a platform (for example, a cloud platform). The embodiments of the present application There is no restriction on this.
图2示出了应用本申请实施例的一种智能网络架构200的示意图。智能网络架构200为一个分层架构,能够按需满足不同场景类型对计算资源和执行周期差异化的需求。该智能网络架构200具体可以为智能无线网络架构。如图2所示,智能网络架构200中包括业务支持管理系统(operation-support system,OSS),至少一个云无线接入网络(cloud radio access network,C-RAN),演进型节点B(evolved Node B,eNB)或下一代节点B(next genetation NodeB,gNB)。其中,每个C-RAN中可以包括分离的一个集中式单元(centralized unit,CU)和至少一个分布式单元(distributed unit,DU)。Fig. 2 shows a schematic diagram of an intelligent network architecture 200 to which an embodiment of the present application is applied. The intelligent network architecture 200 is a layered architecture that can meet the differentiated requirements of different scenarios for computing resources and execution cycles as needed. The intelligent network architecture 200 may specifically be an intelligent wireless network architecture. As shown in Figure 2, the intelligent network architecture 200 includes an operation-support system (OSS), at least one cloud radio access network (C-RAN), and an evolved Node B (evolved Node B). B, eNB) or next generation node B (next genetation NodeB, gNB). Among them, each C-RAN may include a separate centralized unit (centralized unit, CU) and at least one distributed unit (distributed unit, DU).
本申请实施例中,从逻辑功能或部署层面上来讲,OSS相对于C-RAN、eNB或gNB而言,为更集中的节点;C-RAN中的CU相对于DU而言,为更集中的节点;C-RAN相 对eNB或gNB而言,为更集中的节点。对应的,在本申请实施例的一些可能的实现方式中,可以将OSS称为集中节点,将C-RAN、eNB或gNB称为本地节点;将C-RAN中的CU称为集中节点,将该C-RAN中的DU称为本地节点;将C-RAN称为集中节点,将eNB或gNB称为本地节点。In the embodiments of this application, in terms of logical function or deployment level, OSS is a more centralized node than C-RAN, eNB or gNB; CU in C-RAN is more centralized than DU Node: Compared with eNB or gNB, C-RAN is a more concentrated node. Correspondingly, in some possible implementations of the embodiments of the present application, OSS may be referred to as a centralized node, and C-RAN, eNB, or gNB may be referred to as a local node; the CU in C-RAN may be referred to as a centralized node, and The DU in the C-RAN is called a local node; the C-RAN is called a centralized node, and the eNB or gNB is called a local node.
在图2所示的智能网络架构200中,集中节点可以对应于图1中的第一节点110,本地节点可以对应于图1中的第二节点120。In the intelligent network architecture 200 shown in FIG. 2, the centralized node may correspond to the first node 110 in FIG. 1, and the local node may correspond to the second node 120 in FIG.
可选的,在该智能网络架构中,OSS,CU,DU,eNB以及gNB中的至少一个中可以包括数据分析(data analysis,DA)功能(或网元)。作为示例,DA功能(或网元)可以部署在较高的位置,比如OSS内,此时可以称之为业务支持管理系统数据分析(operation-support system data analysis,OSSDA)。DA功能(或网元)也可以部署在5G CU、5G DU、5G合一gNB或eNB内,此时可以称之为无线接入网(radio access network data analysis,RANDA)。或者,DA功能(或网元)也可以独立部署,本申请实施例对此不作限定。OSSDA或RANDA能够提供数据集成和可编程的特征工程、算法框架集成丰富的机器学习算法库、支持训练和执行分离的通用架构。Optionally, in the intelligent network architecture, at least one of OSS, CU, DU, eNB, and gNB may include a data analysis (DA) function (or network element). As an example, the DA function (or network element) can be deployed in a higher location, such as in the OSS, at this time, it can be referred to as operation-support system data analysis (OSSDA). The DA function (or network element) can also be deployed in 5G CU, 5G DU, 5G all-in-one gNB or eNB. In this case, it can be referred to as radio access network data analysis (RANDA). Alternatively, the DA function (or network element) can also be deployed independently, which is not limited in the embodiment of the present application. OSSDA or RANDA can provide data integration and programmable feature engineering, algorithm framework integration with rich machine learning algorithm libraries, and a general architecture that supports separation of training and execution.
具体而言,基于AI的无线智能业务主要包括由数据采集、特征工程、算法设计和训练建模、模型评估、预测执行等步骤组成的闭环。作为示例,将上述功能映射到网络架构中,可以将DA功能(或网元)抽象合并为四个功能模块,分别为:数据服务功能(data service function,DSF)、分析和建模功能(analysis&modeling function,A&MF)、模型执行功能(model execution function,MEF)和自适应策略功能(adaptive policy function,APF)。Specifically, AI-based wireless intelligent services mainly include a closed loop consisting of data collection, feature engineering, algorithm design and training modeling, model evaluation, and prediction execution. As an example, to map the above functions to the network architecture, the DA function (or network element) can be abstracted and merged into four functional modules, namely: data service function (DSF), analysis and modeling function (analysis&modeling) function, A&MF), model execution function (model execution function, MEF) and adaptive policy function (adaptive policy function, APF).
DSF主要完成数据收集,数据预处理,特征工程等步骤,并向A&MF、MEF提供训练数据和特征向量订阅服务。DSF具备对数据进行定制化特征工程的可编程能力,以及按照A&MF训练算法或MEF预测模型需求执行数据采集、预处理和特征工程的能力。DSF mainly completes data collection, data preprocessing, feature engineering and other steps, and provides training data and feature vector subscription services to A&MF and MEF. DSF has the programmability of customized feature engineering of data, and the ability to perform data collection, preprocessing and feature engineering according to the requirements of A&MF training algorithm or MEF prediction model.
A&MF的作用是执行机器学习训练算法,生成机器学习模型。A&MF内包括常用的机器学习算法库,其将训练生成的机器学习模型发送给MEF。The role of A&MF is to execute machine learning training algorithms and generate machine learning models. A&MF includes a library of commonly used machine learning algorithms, which sends the machine learning model generated by training to the MEF.
MEF接收并安装A&MF下发的模型,并按照A&MF的指示向DSF订阅特征向量并完成预测,同时将预测结果和该结果所对应的操作指示发送给APF。MEF receives and installs the model issued by A&MF, subscribes the feature vector to DSF according to A&MF's instructions, completes the prediction, and sends the prediction result and the operation instructions corresponding to the result to APF.
APF是流程运行的最后一个执行生效环节。APF内存储了策略集合,完成模型预测的结果到执行策略的转换。作为示例,策略集合包括预测结果、预测结果对应的操作指示,以及执行策略的对应关系,当APF获取到预测结果和该预测结果对应的操作指示时,根据该对应关系,确定并执行对应的执行策略。APF is the last execution effective link of process operation. The strategy set is stored in the APF, which completes the conversion from the result of model prediction to the execution strategy. As an example, the strategy set includes the prediction result, the operation instruction corresponding to the prediction result, and the corresponding relationship of the execution strategy. When the APF obtains the prediction result and the operation instruction corresponding to the prediction result, the corresponding execution is determined and executed according to the corresponding relationship. Strategy.
进一步的,当一个逻辑功能同时部署在集中节点和本地节点时,可以称集中节点中的该逻辑功能为集中(central)逻辑功能,本地节点中的该逻辑功能为本地(local)逻辑功能。例如,可以将部署在集中节点的DSF、A&MF、MEF和APF分别称为集中DSF(central DSF,C-DSF)、集中A&MF(central A&MF,C-A&MF)、集中MEF(central MEF,C-MEF)和集中APF(central APF,C-APF),称部署在本地节点中的DSF、A&MF、MEF和APF为本地DSF(local DSF,L-DSF)、本地A&MF(local A&MF,L-A&MF)、本地MEF(local MEF,L-MEF)和本地APF(local APF,L-APF)。Further, when a logical function is deployed at the centralized node and the local node at the same time, the logical function in the centralized node can be called a central logical function, and the logical function in the local node is a local logical function. For example, DSF, A&MF, MEF, and APF deployed at centralized nodes can be called centralized DSF (central DSF, C-DSF), centralized A&MF (central A&MF, C-A&MF), and centralized MEF (central MEF, C-MEF). ) And centralized APF (central APF, C-APF), the DSF, A&MF, MEF and APF deployed in the local node are called local DSF (local DSF, L-DSF), local A&MF (local A&MF, L-A&MF), Local MEF (local MEF, L-MEF) and local APF (local APF, L-APF).
需要说明的是,本申请实施例中,可以根据业务的特征以及计算资源部署网元。这时,不同的网元所部署的功能可能不相同。例如,在本地节点侧可以部署上述四个功能:DSF、 A&MF、MEF和APF;在集中节点侧可以只部署DSF、APF和A&MF。另外,为了完成一个完整的训练和预测任务,不同功能之间存在网元内,或者跨网元的协同配合。It should be noted that, in the embodiments of the present application, network elements may be deployed according to the characteristics of services and computing resources. At this time, the functions deployed by different network elements may be different. For example, the above four functions can be deployed on the local node side: DSF, A&MF, MEF, and APF; on the centralized node side, only DSF, APF, and A&MF can be deployed. In addition, in order to complete a complete training and prediction task, different functions exist in network element or cross-network element coordination.
应注意,本申请实施例中上述各个功能的名称仅仅作为一个示例,在具体实现中,该无线网络200的中各个功能的名称还可能为其他名称,本申请实施例对此不作具体限定。It should be noted that the names of the above-mentioned functions in the embodiments of the present application are only taken as an example. In specific implementation, the names of the functions in the wireless network 200 may also be other names, which are not specifically limited in the embodiments of the present application.
图3示出了本申请实施例提供的训练模型的方法300的示意性流程图。该方法300可以应用于图1中所示的系统架构100,也可以应用于图2中所示的智能系统架构200,但是本申请实施例并不限于此。FIG. 3 shows a schematic flowchart of a method 300 for training a model provided by an embodiment of the present application. The method 300 may be applied to the system architecture 100 shown in FIG. 1 and may also be applied to the intelligent system architecture 200 shown in FIG. 2, but the embodiment of the present application is not limited thereto.
为了方便描述,本申请以第一节点、第二节点为例,对训练模型的方法300进行描述。对于第一节点中的芯片,第二节点中的芯片的实现方法,可以参考第一节点、第二节点的具体说明,不再重复介绍。For the convenience of description, this application uses the first node and the second node as examples to describe the method 300 for training a model. For the chip in the first node and the implementation method of the chip in the second node, please refer to the specific description of the first node and the second node, and the description will not be repeated.
310,第一节点向至少一个第二节点发送第一请求,该第一请求用于请求所述至少一个第二节点分别基于所述第二节点的本地数据,对第一共享模型进行本地模型再训练。其中,第一共享模型可以是第一节点根据训练数据,对初始模型进行参数训练得到的。310. The first node sends a first request to at least one second node, where the first request is used to request the at least one second node to perform local model reconfiguration on the first shared model based on the local data of the second node. training. The first shared model may be obtained by the first node performing parameter training on the initial model according to the training data.
320,该至少一个第二节点中的每个第二节点根据所述第一请求,基于所述第二节点的本地数据对所述第一共享模型进行本地模型再训练,以得到第一本地模型。这里,本地模型再训练,指的是第二节点基于本地数据,对第一共享模型的参数进行再一次训练。320. Each of the at least one second node performs local model retraining on the first shared model based on the local data of the second node according to the first request to obtain the first local model . Here, the local model retraining refers to that the second node retrains the parameters of the first shared model based on local data.
330,该至少一个第二节点分别向第一节点发送模型上报消息,模型上报消息包括所述第一本地模型的参数,或者包括所述第一本地模型的参数与所述第一共享模型的参数之间的增量。这里,第一本地模型的参数与第一共享模型的参数之间的增量,即第一本地模型的参数相对第一共享模型的参数的变化量。330. The at least one second node respectively sends a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model The increment between. Here, the increment between the parameters of the first local model and the parameters of the first shared model is the amount of change of the parameters of the first local model relative to the parameters of the first shared model.
340,第一节点根据至少一个第二节点的模型上报消息和所述第一共享模型,确定第二共享模型。340. The first node determines a second sharing model according to the model report message of at least one second node and the first sharing model.
因此,本申请实施例中,无线网络中的第一节点向该无线网络中的至少一个第二节点发送第一请求,每个第二节点能够根据该第一请求,基于其本地数据对第一共享模型进行本地再训练,然后通过模型上报消息将训练得到的本地模型的参数,或本地模型的参数与共享模型的参数之间的增量上报给第一节点,进而第一节点能够根据该至少一个第二节点上报内容和第一共享模型,确定第二共享模型,基于此本申请实施例能够在无线网络中实现联合学习。Therefore, in this embodiment of the present application, a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data. The shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model. Based on this, the embodiment of the present application can implement joint learning in a wireless network.
其中,第二共享模型为经过步骤310至340训练得到的新的共享模型,第一共享模型为上述训练之前的旧的共享模型。Wherein, the second shared model is a new shared model obtained through training in steps 310 to 340, and the first shared model is an old shared model before the above-mentioned training.
本申请实施例中,联合学习指的是第一节点以作为集中节点,以第二节点作为本地节点,在第二节点的本地数据不需要上传至第一节点的情况下,协同第一节点和第二节点共同学习以训练模型。联合学习能够克服集中训练模型的部分或全部缺点,也能够克服本地训练模型的部分或全部缺点。In the embodiment of the present application, joint learning refers to that the first node is used as a centralized node, and the second node is used as a local node. When the local data of the second node does not need to be uploaded to the first node, the first node and The second node learns together to train the model. Joint learning can overcome some or all of the shortcomings of the centralized training model, as well as some or all of the shortcomings of the local training model.
具体而言,本申请实施例相比于现有技术中的集中训练模型的方式而言,不需要本地节点向集中节点上报训练数据,因而能够大大减少训练数据上报带来的通信开销,并且能够减小集中节点超大规模数据的存储和模型训练的压力,另外本申请实施例本地节点进行分布式模型训练,能够缩短模型训练的时间,保护数据隐私。Specifically, compared with the centralized training model in the prior art, the embodiment of the present application does not require the local node to report training data to the centralized node, thereby greatly reducing the communication overhead caused by training data reporting, and can Reduce the pressure of the storage of ultra-large-scale data of centralized nodes and model training. In addition, local nodes in the embodiment of the present application perform distributed model training, which can shorten the time of model training and protect data privacy.
另外,本申请实施例相比于现有技术中的本地训练模型的方式而言,本地节点会将本 地再训练后的本地模型发送给集中节点,使得集中节点能够根据至少一个本地节点的本地模型对共享模型进行更新,因而能够有助于克服本地训练存在数据量不足的问题,进而提高训练模型的准确性以及模型泛化能力。In addition, compared with the local training model in the prior art, the local node sends the local model after the local retraining to the centralized node, so that the centralized node can be based on the local model of at least one local node. Updating the shared model can help overcome the problem of insufficient data in local training, thereby improving the accuracy of the training model and the generalization ability of the model.
因此,根据本申请实施例的无线网络中联合学习以训练模型的方案,能够有助于得到准确性以及泛化能力较高的训练模型。Therefore, the scheme of joint learning to train a model in a wireless network according to an embodiment of the present application can help to obtain a training model with high accuracy and generalization ability.
需要说明的是,本申请实施例中,模型上报消息中包括的“模型的参数”用于指示该模型上报消息需要上报的模型。在本申请实施例的一些描述中,可以将“模型的参数”替换为“模型”,二者具有等价的含义。作为示例,模型上报消息可以描述为:包括所述第一本地模型,或者包括所述第一本地模型与所述第一共享模型之间的增量。It should be noted that in this embodiment of the present application, the "parameter of the model" included in the model report message is used to indicate the model that the model report message needs to report. In some descriptions of the embodiments of the present application, "parameters of the model" can be replaced with "models", and the two have equivalent meanings. As an example, the model report message may be described as including the first local model or the increment between the first local model and the first shared model.
本申请一个可选的实施例中,第一节点在满足联合学习启动条件时,向第二节点发送用于进行联合学习训练的第一请求。作为示例,联合学习启动条件可以为第一节点不能获取训练数据,或者第一节点的计算压力超过一定指标。In an optional embodiment of the present application, the first node sends the first request for joint learning training to the second node when the joint learning start condition is satisfied. As an example, the joint learning starting condition may be that the first node cannot obtain training data, or the calculation pressure of the first node exceeds a certain index.
本申请实施例通过设置联合学习的启动条件,能够实现在不满足联合学习启动条件的情况下采用集中训练模型的方式进行模型训练,在满足联合训练启动条件的情况下采用联合学习的方式进行模型训练。By setting the starting conditions of the joint learning in the embodiment of the application, it is possible to use a centralized training model for model training when the joint learning starting conditions are not met, and to use a joint learning method for model training when the joint training starting conditions are met. training.
本申请一个可选的实施例,第一请求中可以包括第一共享模型的信息,以使得第二节点根据该第一请求,确定第一共享模型。作为示例,第一共享模型的信息包括以下至少一种:所述第一共享模型的模型标识、模型类型、模型结构、输入输出和初始模型参数或训练数据采集时长。In an optional embodiment of the present application, the first request may include the information of the first sharing model, so that the second node determines the first sharing model according to the first request. As an example, the information of the first shared model includes at least one of the following: model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
可选的,第一请求中还可以包括第一本地模型的上传策略。这样,通过第一节点向第二节点发送第一本地模型的上传策略,使得第一节点能够对第二节点如何上传第一本地模型进行指示。其中,第一本地模型即第二节点上传的本地模型。作为示例,第一本地模型为上述至少一个第二节点中根据其本地数据,对第一共享模型进行本地模型再训练所得的模型。Optionally, the first request may also include the upload strategy of the first local model. In this way, the first node sends the upload strategy of the first local model to the second node, so that the first node can instruct the second node how to upload the first local model. The first local model is the local model uploaded by the second node. As an example, the first local model is a model obtained by performing local model retraining on the first shared model according to the local data of the at least one second node.
对应的,第二节点获取到第一本地模型的上传策略之后,可以根据该上传策略对本地再训练得到本地模型并进行相应的处理操作。Correspondingly, after obtaining the upload strategy of the first local model, the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.
作为示例,所述第一本地模型的上传策略包括以下至少一种:所述第一本地模型上传前的处理算法的标识、所述第一本地模型的上传时间或所述第一本地模型上传时的携带信息。其中,所述携带信息包括所述第一本地模型的训练数据集的大小和/或预测误差。As an example, the upload strategy of the first local model includes at least one of the following: the identification of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the upload time of the first local model Carry information. Wherein, the carrying information includes the size and/or prediction error of the training data set of the first local model.
其中,第一本地模型上传前的处理算法,例如包括将本地再训练得到的模型与第一节点下发的第一共享模型进行增量运算算法,或通过参数修剪、量化、低秩分解、稀疏优化等算法进行模型压缩的压缩算法,或通过层(layer)混淆或将参数转换成代码(code)等方式进行模型加密的加密算法,本申请实施例对此不作限定。Among them, the processing algorithm before uploading the first local model includes, for example, performing an incremental operation algorithm between the model obtained by local retraining and the first shared model issued by the first node, or through parameter pruning, quantization, low-rank decomposition, and sparseness. Algorithms such as optimization are used to perform model compression compression algorithms, or encryption algorithms used to perform model encryption through layer obfuscation or conversion of parameters into codes, which are not limited in the embodiments of the present application.
第一本地模型的训练数据集,即第二节点基于本地数据,对第一共享模型进行本地再训练时所采用的训练数据的集合。训练数据集的大小例如为训练数据集中的训练数据的数据量,本申请实施例对此不作限定。The training data set of the first local model, that is, the training data set used when the second node retrains the first shared model locally based on the local data. The size of the training data set is, for example, the amount of training data in the training data set, which is not limited in the embodiment of the present application.
第二节点在训练得到本地模型之后,可以采用预测数据集对该本地模型进行预测,来获取本地模型的预测误差。作为示例,本地模型的预测误差例如为本地模型的平均绝对误差(mean absolute error,MAE),或者均方误差(mean squared error,MSE)等,本申请 实施例对此不作限定。After the second node is trained to obtain the local model, the prediction data set can be used to predict the local model to obtain the prediction error of the local model. As an example, the prediction error of the local model is, for example, the mean absolute error (MAE) of the local model, or the mean squared error (MSE), which is not limited in the embodiment of the present application.
可选的,本申请实施例中,模型上报消息还可以包括所述第一本地模型对应的训练数据集的大小,和/或,所述第一本地模型的预测误差。此时,步骤340具体可以为:Optionally, in this embodiment of the present application, the model report message may further include the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model. At this time, step 340 may specifically be:
该第一节点在至少一个第二节点对应的至少一个第一本地模型中确定第二本地模型,其中,该第二本地模型对应的训练数据集的大小大于或等于第一阈值,和/或,所述第二本地模型的预测误差小于或等于第二阈值。然后,该第一节点根据第二本地模型和所述第一共享模型,确定第二共享模型。这里,第二本地模型的数量可以为一个,或者多个,本申请实施例对此不作限定。The first node determines a second local model in at least one first local model corresponding to at least one second node, wherein the size of the training data set corresponding to the second local model is greater than or equal to the first threshold, and/or, The prediction error of the second local model is less than or equal to the second threshold. Then, the first node determines a second sharing model according to the second local model and the first sharing model. Here, the number of the second local model may be one or more, which is not limited in the embodiment of the present application.
具体而言,当进行本地模型再训练时采用的训练数据集的数据量充足时,训练得到的本地模型的准确性较高,泛化能力较强。而当进行本地模型再训练时采用的训练数据集的数据量不足时,会导致训练得到的本地模型的准确性较低,泛化能力较弱。Specifically, when the amount of data in the training data set used in the retraining of the local model is sufficient, the accuracy of the local model obtained by training is high and the generalization ability is strong. However, when the amount of data in the training data set used in the retraining of the local model is insufficient, the accuracy of the trained local model will be low, and the generalization ability will be weak.
也就是说,第一节点在至少一个第一本地模型中筛选出满足筛选条件的至少一个第二本地模型,该至少一个第二本地模型的准确性或泛化能力高于上述至少一个第一本地模型。此时,第一节点可以删除第一本地模型中的训练数据集的大小小于第一阈值,或者预测误差大于第二阈值的本地模型。That is, the first node selects at least one second local model that satisfies the screening conditions from at least one first local model, and the accuracy or generalization ability of the at least one second local model is higher than that of the at least one first local model. model. At this time, the first node may delete the local model whose size of the training data set in the first local model is less than the first threshold, or the prediction error is greater than the second threshold.
因此,本申请实施例在第一节点根据第一本地模型确定第二共享模型时,通过至少一个第一本地模型中筛选出训练数据集的大小大于或等于某一特定阈值,和/或预测误差小于或等于某一特定阈值的第二本地模型,然后通过第二本地模型来确定第二共享模型,由于本申请实施例中第二本地模型的准确性高于第一本地模型,因此本申请实施例能够有助于提高第二共享模型的准确性。Therefore, in the embodiment of the present application, when the first node determines the second shared model according to the first local model, the size of the training data set is selected from at least one first local model to be greater than or equal to a certain threshold, and/or the prediction error The second local model is less than or equal to a certain threshold, and then the second local model is used to determine the second shared model. Since the accuracy of the second local model in the embodiment of this application is higher than that of the first local model, this application is implemented Examples can help improve the accuracy of the second shared model.
可选的,本申请实施例中,所述第一节点可以对所述第二本地模型的参数,或者所述第二本地模型的参数与所述第一共享模型的参数之间的增量,进行加权平均聚合,其中,所述加权平均聚合所采用的权重参数是根据所述第二本地模型对应的训练数据集的大小,和/或,所述第二本地模型的预测误差确定的。然后,第一节点根据加权平均聚合的结果和第一共享模型,确定第二共享模型。Optionally, in this embodiment of the present application, the first node may calculate the parameter of the second local model, or the increment between the parameter of the second local model and the parameter of the first shared model, Perform weighted average aggregation, wherein the weight parameter used in the weighted average aggregation is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model. Then, the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.
一种可能的情况,当模型上报消息中包括第一本地模型的参数时,第一节点在确定第二本地模型之后,对第二本地模型的参数进行加权平均聚合。对应的,第一节点可以将对至少一个第二本地模型加权平均聚合所获得的结果确定为第二共享模型。In a possible situation, when the model report message includes the parameters of the first local model, the first node performs weighted average aggregation on the parameters of the second local model after determining the second local model. Correspondingly, the first node may determine the result obtained by the weighted average aggregation of the at least one second local model as the second shared model.
另一种可能的情况,当模型上报消息中包括第一本地模型的参数与共享模型的参数之间的增量时,第一节点在确定第二本地模型之后,对第二本地模型的参数与共享模型的参数之间的增量进行加权平均聚合。对应的,第一节点将对至少一个第二本地模型加权平均聚合所获得的结果确定为第一共享模型的增量,然后将该第一共享模型的增量与第一共享模型之和,确定为第二共享模型。In another possible situation, when the model report message includes the increment between the parameters of the first local model and the parameters of the shared model, after determining the second local model, the first node compares the parameters of the second local model with The increments between the parameters of the shared model are aggregated on a weighted average. Correspondingly, the first node determines the result obtained by weighted average aggregation of at least one second local model as the increment of the first shared model, and then determines the sum of the increment of the first shared model and the first shared model It is the second sharing model.
作为示例,权重参数可以为第二本地模型的总个数的倒数,此时各个第二本地模型的权重参数相同。或者,第二本地模型的权重参数可以分别为该第二本地模型对应的训练数据集的大小与所有训练数据集的大小的比值,其中,所有训练数据集的大小为各个第二本地模型对应的训练数据集的大小之和。或者,各个第二本地模型的权重参数可以分别为各自对应的MAE的倒数。应理解,上述权重参数的举例仅仅作为示例,本申请实施例并不限于此。As an example, the weight parameter may be the reciprocal of the total number of second local models, and at this time, the weight parameters of each second local model are the same. Or, the weight parameter of the second local model may be the ratio of the size of the training data set corresponding to the second local model to the size of all training data sets, where the size of all training data sets is corresponding to each second local model The sum of the size of the training data set. Alternatively, the weight parameter of each second local model may be the inverse of the corresponding MAE. It should be understood that the foregoing examples of weight parameters are merely examples, and the embodiments of the present application are not limited thereto.
本申请一些可选的实施例,第一节点可以不在至少一个第一本地模型中筛选至少一个第二本地模型,而是直接根据该至少一个第一本地模型确定第二共享模型。此时,可以对第一本地模型的参数,或者第一本地模型的参数与共享模型的参数之间的增量,进行加权平均聚合。此时,作为示例,每个第一本地模型的权重参数可以为第一本地模型的总个数的倒数。然后,第一节点根据加权平均聚合的结果和第一共享模型,确定第二共享模型。In some optional embodiments of the present application, the first node may not filter at least one second local model from the at least one first local model, but directly determine the second shared model based on the at least one first local model. At this time, the parameters of the first local model or the increments between the parameters of the first local model and the parameters of the shared model may be aggregated by weighted average. At this time, as an example, the weight parameter of each first local model may be the reciprocal of the total number of first local models. Then, the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.
可选的,本申请实施例中,在步骤340之后,第一节点或第二节点可以判断第二共享模型的预测误差是否小于预设阈值。当确定第二共享模型的预测误差小于预设阈值时,则表示该第二共享模型的准确性可以满足要求。这里,可以根据预测数据集来确定第二共享模型的准确性。Optionally, in this embodiment of the present application, after step 340, the first node or the second node may determine whether the prediction error of the second sharing model is less than a preset threshold. When it is determined that the prediction error of the second sharing model is less than the preset threshold, it means that the accuracy of the second sharing model can meet the requirements. Here, the accuracy of the second sharing model can be determined based on the prediction data set.
一种可能的情况,第一节点可以判断第二共享模型的预测误差是否小于第三阈值。In a possible situation, the first node can determine whether the prediction error of the second sharing model is less than the third threshold.
当确定第二共享模型的预测误差小于或等于第三阈值时,第一节点将第一共享模型更新为第二共享模型,且第一节点向所述至少一个第二节点分别发送模型更新通知消息,所述模型更新通知消息用于请求所述每个第二节点安装第二共享模型。当确定第二共享模型的预测误差大于第三阈值时,第一节点不向第二节点发送该模型更新通知消息。另外,第一节点可以删除该第二共享模型,不对其保存的第一共享模型进行更新。When it is determined that the prediction error of the second shared model is less than or equal to the third threshold, the first node updates the first shared model to the second shared model, and the first node sends a model update notification message to the at least one second node respectively The model update notification message is used to request each second node to install a second shared model. When it is determined that the prediction error of the second shared model is greater than the third threshold, the first node does not send the model update notification message to the second node. In addition, the first node may delete the second shared model without updating the saved first shared model.
另一种可能的情况,第二节点可以判断第二共享模型的预测误差是否小于第四阈值。In another possible situation, the second node may determine whether the prediction error of the second sharing model is less than the fourth threshold.
具体的,第一节点可以向至少一个第二节点发送模型更新通知消息,该模型更新通知消息用于通知所述第二共享模型。第二节点在收到该模型更新通知消息之后,可以判断模型更新通知消息所指示的第二共享模型的预测误差是否小于第四阈值。当第二节点确定第二共享模型的预测误差小于或等于第四阈值时,安装该第二共享模型。当第二节点确定第二共享模型的预测误差大于第四阈值时,不安装该第二共享模型。Specifically, the first node may send a model update notification message to at least one second node, where the model update notification message is used to notify the second sharing model. After receiving the model update notification message, the second node can determine whether the prediction error of the second shared model indicated by the model update notification message is less than the fourth threshold. When the second node determines that the prediction error of the second sharing model is less than or equal to the fourth threshold, the second sharing model is installed. When the second node determines that the prediction error of the second sharing model is greater than the fourth threshold, the second sharing model is not installed.
此时,第二节点不需要将其本地保存的预测数据集发送给第一节点,能够减轻网元之间通信信令开销。At this time, the second node does not need to send its locally stored prediction data set to the first node, which can reduce communication signaling overhead between network elements.
本申请实施例中,第三阈值与第四阈值可以相同,或者不同,本申请实施例对此不作限定。In the embodiment of the present application, the third threshold and the fourth threshold may be the same or different, which is not limited in the embodiment of the present application.
因此,本申请实施例通过判断第二共享模型的预测误差,当确定该预测误差小于或等于预设阈值时,第一节点保存该第二共享模型,第二节点安装该第二共享模型。当确定该预测误差大于预设阈值时,第一节点不对第一共享模型进行更新,并且第二节点不安装第二共享模型。基于此,本申请实施例能够避免安装准确性低的共享模型,进一步的保证更新后的共享模型的准确性以及泛化能力。Therefore, the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
一种可能的实现方式中,可以将第一共享模型的预测误差设置为第三阈值或第四阈值,本申请实施例对此不作限定。In a possible implementation manner, the prediction error of the first sharing model may be set to the third threshold or the fourth threshold, which is not limited in the embodiment of the present application.
也就是说,在一些可选的实施例中,可以将第二共享模型的预测误差与第一共享模型的预测误差进行比较。如果第二共享模型的预测误差小于第一共享模型,则第一节点保存该第二共享模型,第二节点安装该第二共享模型。如果第二共享模型的预测误差大于或等于第一共享模型,则第一节点不对第一共享模型进行更新,并且第二节点不安装第二共享模型。That is, in some optional embodiments, the prediction error of the second shared model may be compared with the prediction error of the first shared model. If the prediction error of the second shared model is smaller than the first shared model, the first node saves the second shared model, and the second node installs the second shared model. If the prediction error of the second sharing model is greater than or equal to the first sharing model, the first node does not update the first sharing model, and the second node does not install the second sharing model.
图4示出了本申请实施例提供的一种训练模型的方法400的示意性流程图。应理解,图4示出了训练模型的方法的步骤或操作,但这些步骤或操作仅是示例,本申请实施例还 可以执行其他操作或者图4中的各个操作的变形。此外,图4中的各个步骤可以按照与图4呈现的不同的顺序来执行,并且有可能并非要执行图4中的全部操作。FIG. 4 shows a schematic flowchart of a method 400 for training a model provided by an embodiment of the present application. It should be understood that FIG. 4 shows the steps or operations of the method for training a model, but these steps or operations are only examples, and the embodiment of the present application may also perform other operations or variations of each operation in FIG. 4. In addition, the various steps in FIG. 4 may be performed in a different order from that presented in FIG. 4, and it is possible that not all operations in FIG. 4 are to be performed.
图4中的第一节点包括集中APF(C-APF)、集中DSF(C-DSF)、集中A&MF(C-A&MF),第二节点中包括本地APF(L-APF)、本地MEF(L-MEF)、本地DSF(L-DSF)、本地A&MF(L-A&MF)。具体的,各个功能模块可以参见图2中的描述,为避免重复,这里不再赘述。The first node in Figure 4 includes centralized APF (C-APF), centralized DSF (C-DSF), centralized A&MF (C-A&MF), and the second node includes local APF (L-APF), local MEF (L- MEF), local DSF (L-DSF), local A&MF (L-A&MF). Specifically, each functional module can be referred to the description in FIG. 2. To avoid repetition, details are not repeated here.
401,第一节点中的C-APF向C-A&MF发送联合学习策略。401. The C-APF in the first node sends a joint learning strategy to C-A&MF.
具体的,C-APF中可以存储联合学习策略,该联合学习策略用于指示第一节点和第二节点如何进行联合学习。作为示例,联合学习策略可以包括以下中的至少一种:联合学习启动条件、共享模型的信息、联合学习群组成员标识、本地模型的上传策略、本地模型的筛选策略、本地模型的聚合策略、本地模型的处理策略或共享模型更新策略。Specifically, a joint learning strategy can be stored in the C-APF, and the joint learning strategy is used to indicate how the first node and the second node perform joint learning. As an example, the joint learning strategy may include at least one of the following: joint learning start conditions, shared model information, joint learning group member identification, local model upload strategy, local model screening strategy, local model aggregation strategy, Local model processing strategy or shared model update strategy.
联合学习启动条件,例如为C-DSF无法获取订阅数据,或者C-A&MF的计算资源超过某个阈值等,本申请实施例对此不作限定。本申请实施例中,通过设置联合学习的启动条件,能够实现在不满足联合学期启动条件时下采用集中训练模型的方式进行模型训练,在满足联合学习启动条件下采用联合学习的方式进行模型训练。例如,可以在D-SDF能够获取订阅数据,模型训练计算量小,或C-A&MF的计算资源充裕,或L-A&MF计算资源不足的情况下采用集中训练模型的方式进行模型训练。The starting condition of the joint learning is, for example, that the C-DSF cannot obtain the subscription data, or the computing resource of the C-A&MF exceeds a certain threshold, which is not limited in the embodiment of the present application. In the embodiments of the present application, by setting the starting conditions of the joint learning, it is possible to implement model training by using a centralized training model when the joint semester starting conditions are not met, and using a joint learning method for model training when the joint learning starting conditions are met. For example, the centralized training model can be used for model training when D-SDF can obtain subscription data, the model training calculation is small, or the computing resources of C-A&MF are sufficient, or the computing resources of L-A&MF are insufficient.
联合学习群组成员标识,例如可以包括参与联合学习的至少一个第二节点的标识,又例如可以包括参与联合学的至少一个第二节点中每个第二节点中的L-A&MF的标识,本申请实施例对此不作限定。The identification of the members of the joint learning group, for example, may include the identification of at least one second node participating in the joint learning, and for example, may include the identification of L-A&MF in each second node of the at least one second node participating in the joint learning. The application embodiment does not limit this.
在上述联合学习策略中,本地模型指的是第二节点在基于本地数据进行模型再训练之后所得到的本地模型,例如包括图3中所述第一本地模型。In the above-mentioned joint learning strategy, the local model refers to the local model obtained by the second node after retraining the model based on local data, for example, including the first local model described in FIG. 3.
本地模型的筛选策略,指的是第一节点(或第一节点中的C-A&MF)在至少一个第一本地模型中筛选出满足筛选条件的第二本地模型的策略。例如,可以包括在第一本地模型中确定训练数据集的大小大于或等于第一阈值,和/或预测误差小于或等于第二阈值的本地模型。作为示例,本地模型筛选策略可以为模型筛选规则标识,本申请实施例对此不作限定。The screening strategy of the local model refers to a strategy for the first node (or C-A&MF in the first node) to screen out the second local model that meets the screening condition from at least one first local model. For example, it may include determining in the first local model that the size of the training data set is greater than or equal to the first threshold, and/or the local model whose prediction error is less than or equal to the second threshold. As an example, the local model screening strategy may be a model screening rule identifier, which is not limited in the embodiment of the present application.
本地模型的聚合策略,用于指示第一节点(或第一节点中的C-A&MF)进行本地模型聚合时采用的聚合算法,以及权重参数的计算方法。作为示例,本地模型的聚合策略可以为模型聚合算法标识。本申请实施例中,本地模型的聚合,也可以替换为,本地模型的融合,二者具有同样的含义。The aggregation strategy of the local model is used to indicate the aggregation algorithm used by the first node (or the C-A&MF in the first node) to aggregate the local model and the calculation method of the weight parameter. As an example, the aggregation strategy of the local model may be a model aggregation algorithm identification. In the embodiments of the present application, the aggregation of local models can also be replaced with the integration of local models, and the two have the same meaning.
本地模型的处理策略,用于指示第一节点(或第一节点中的C-A&MF)对获取到的本地模型进行处理。该处理算法,例如包括将本地再训练得到的模型与第一节点下发的共享模型进行增量运算算法,或通过参数修剪、量化、低秩分解、稀疏优化等算法进行模型压缩的压缩算法,或通过层(layer)混淆或将参数转换成代码(code)等方式进行模型加密的加密算法,本申请实施例对此不作限定。作为示例,本地模型的处理策略可以为模型处理算法标识。The processing strategy of the local model is used to instruct the first node (or C-A&MF in the first node) to process the acquired local model. The processing algorithm includes, for example, an incremental operation algorithm that performs an incremental operation between a model obtained by local retraining and a shared model issued by the first node, or a compression algorithm that performs model compression through algorithms such as parameter pruning, quantization, low-rank decomposition, and sparse optimization. Or encryption algorithms that perform model encryption through layer obfuscation or conversion of parameters into codes, which are not limited in the embodiment of the present application. As an example, the processing strategy of the local model may be the model processing algorithm identification.
共享模型更新策略,用于指示第一节点(或第一节点中的C-A&MF)进行共享模型更新。例如,在新的共享模型的预测误差小于或等于某一阈值的情况下,将旧的共享模型更 新为新的共享模型。或者,在新的共享模型的预测误差小于或等于旧的共享模型的预测误差的情况下,将旧的共享模型更新为新的共享模型。The shared model update strategy is used to instruct the first node (or C-A&MF in the first node) to update the shared model. For example, when the prediction error of the new shared model is less than or equal to a certain threshold, the old shared model is updated to the new shared model. Or, in the case where the prediction error of the new shared model is less than or equal to the prediction error of the old shared model, the old shared model is updated to the new shared model.
另外,共享模型的信息、本地模型上传策略可以参见上文中的描述,为避免重复,这里不再赘述。In addition, the information of the shared model and the local model upload strategy can be referred to the above description, in order to avoid repetition, it will not be repeated here.
402,第一节点中的C-A&MF向C-APF发送联合学习策略下发响应,用于指示C-A&MF接收到上述联合学习策略。402. The C-A&MF in the first node sends a joint learning strategy issuance response to the C-APF, which is used to indicate that the C-A&MF receives the aforementioned joint learning strategy.
403,第一节点、第二节点进行数据采集、模型训练以及模型应用。403. The first node and the second node perform data collection, model training, and model application.
具体的,第二节点中的L-DSF将采集的数据上报给第一节点中的C-A&MF,由C-A&MF进行模型训练,获取共享模型。然后,C-A&MF将该共享模型下发到L-MEF进行模型应用。Specifically, the L-DSF in the second node reports the collected data to the C-A&MF in the first node, and the C-A&MF performs model training to obtain a shared model. Then, C-A&MF delivers the shared model to L-MEF for model application.
作为示例,第一节点中的C-A&MF可以向第二节点中的L-DSF发送数据订阅请求,L-DSF在接收到该数据订阅请求之后,向C-A&MF发送数据订阅响应,其中携带本地数据。其中,本地数据可以包括训练数据集或预测数据集,本申请实施例对此不作限定。As an example, C-A&MF in the first node can send a data subscription request to L-DSF in the second node. After receiving the data subscription request, L-DSF sends a data subscription response to C-A&MF, which carries the local data. The local data may include a training data set or a prediction data set, which is not limited in the embodiment of the present application.
应理解,执行403的步骤是为了训练得到共享模型,这里的该共享模型也可以称为初始共享模型。后续的步骤404至422是为了更新步骤403中生成的该初始共享模型,以弥补初始共享模型因数据不足或不全面导致的准确性差的问题,使得共享模型在无线网络中网络状态变化的情况下,仍然具有较高的准确性以及泛化能力。It should be understood that the step of performing 403 is to train to obtain a shared model, and the shared model herein may also be referred to as an initial shared model. Subsequent steps 404 to 422 are to update the initial sharing model generated in step 403 to compensate for the poor accuracy of the initial sharing model due to insufficient or incomplete data, so that the sharing model will change when the network status in the wireless network changes. , Still has high accuracy and generalization ability.
404,第一节点中的C-A&MF向第二节点中的L-A&MF发送联合学习训练请求。404: The C-A&MF in the first node sends a joint learning training request to the L-A&MF in the second node.
具体的,第一节点中的C-A&MF根据联合学习策略中指示的联合学习启动条件,判断是否满足联合学习启动条件。例如,当C-A&MF确定无法从第二节点的L-DSF获取订阅数据,或C-A&MF的计算资源占用超过预设阈值时,确定满足联合学习启动条件。在满足联合学习启动条件时,C-A&MF向L-A&MF发送联合学习训练请求。否则,在联合学习启动条件不满足的情况下,进行集中训练403中的模型。Specifically, the C-A&MF in the first node determines whether the joint learning start condition is satisfied according to the joint learning start condition indicated in the joint learning strategy. For example, when C-A&MF determines that the subscription data cannot be obtained from the L-DSF of the second node, or the computing resource occupation of C-A&MF exceeds a preset threshold, it determines that the joint learning start condition is met. When the joint learning start conditions are met, C-A&MF sends a joint learning training request to L-A&MF. Otherwise, if the joint learning start condition is not satisfied, the model in the centralized training 403 is performed.
这里,该联合学习训练请求可以对应于图3中的第一请求的一个具体示例。具体的,联合学习训练请求可以参见图3中的第一请求中的描述,为避免重复,这里不再赘述。Here, the joint learning training request may correspond to a specific example of the first request in FIG. 3. Specifically, the joint learning training request may refer to the description in the first request in FIG. 3, and to avoid repetition, the details are not repeated here.
405,第二节点中的L-A&MF向L-APF发送本地联合学习策略请求。405. The L-A&MF in the second node sends a local joint learning strategy request to the L-APF.
这里,本地联合学习策略用于指示第二节点是否对所述共享模型进行本地模型再训练。其中,该本地联合学习策略请求中包括共享模型的信息。作为示例,共享模型的信息可以从步骤404中的联合学习训练请求中获取。具体的,共享模型的信息可以参见上文中的描述,为了简洁,这里不再描述。Here, the local joint learning strategy is used to indicate whether the second node performs local model retraining on the shared model. Wherein, the local joint learning strategy request includes the information of the shared model. As an example, the information of the shared model can be obtained from the joint learning training request in step 404. Specifically, the information of the shared model can be referred to the above description, for the sake of brevity, it will not be described here.
406,第二节点中的L-APF向L-A&MF发送本地联合学习策略响应。406. The L-APF in the second node sends a local joint learning strategy response to the L-A&MF.
具体的,L-APF根据本地计算资源的利用情况,确定是否对所述共享模型进行本地模型再训练,即第二节点是否参与本地联合学习。作为示例,本地联合学习策略响应可以为是否参与本地联合学习的标识。Specifically, L-APF determines whether to perform local model retraining on the shared model according to the utilization of local computing resources, that is, whether the second node participates in local joint learning. As an example, the local joint learning strategy response may be an indicator of whether to participate in the local joint learning.
可选的,本地联合学习策略中还可以包括模型更新策略。比如,该模型更新策略可以指示在新的共享模型的预测误差小于旧的共享模型的预测误差或小于等于某一预设阈值时,对旧的共享模型进行更新,否则继续使用旧的共享模型。作为示例,旧的共享模型例如为404中获取的初始共享模型,新的共享模型例如为步骤413中获取的共享模型。Optionally, the local joint learning strategy may also include a model update strategy. For example, the model update strategy may indicate that when the prediction error of the new shared model is less than the prediction error of the old shared model or is less than or equal to a certain preset threshold, the old shared model is updated, otherwise the old shared model is continued to be used. As an example, the old sharing model is, for example, the initial sharing model obtained in 404, and the new sharing model is, for example, the sharing model obtained in step 413.
本申请实施例中,当本地联合学习策略指示第二节点对共享模型进行本地模型再训练 时,执行下面的步骤407至421。否则,不执行下面的步骤407至421。In the embodiment of this application, when the local joint learning strategy instructs the second node to retrain the shared model locally, the following steps 407 to 421 are executed. Otherwise, the following steps 407 to 421 are not executed.
因此,本申请实施例通过L-APF向L-A&MF发送本地联合学习策略,可以使得第二节点根据自身的计算能力确定是否参与联合学习,从而能够避免第二节点因自身计算能力不足而导致联合学习迭代时间延长,提高联合学习效率。Therefore, in the embodiment of the present application, the local joint learning strategy is sent to L-A&MF through L-APF, so that the second node can determine whether to participate in joint learning according to its own computing ability, so as to avoid the second node's lack of its own computing ability leading to joint learning. The learning iteration time is extended to improve the efficiency of joint learning.
需要说明的是,本申请实施例中,也可以不执行步骤405和406,而是第二节点在收到联合学习训练请求时,始终参与到联合学习中,本申请实施例对此不作限定。It should be noted that, in the embodiment of the present application, steps 405 and 406 may not be performed. Instead, the second node always participates in the joint learning when receiving the joint learning training request, which is not limited in the embodiment of the present application.
407,第二节点中的L-A&MF向第一节点中的C-A&MF发送联合学习训练请求响应。407. The L-A&MF in the second node sends a joint learning training request response to the C-A&MF in the first node.
408,第二节点中的L-A&MF向L-DSF发送数据订阅请求。408: The L-A&MF in the second node sends a data subscription request to the L-DSF.
作为示例,L-A&MF可以根据联合学习训练请求中的模型输入输出,以及模型训练数据采集时长向L-DSF发送数据订阅请求,该数据订阅请求中可以携带数据表示以及数据采集时间。As an example, L-A&MF can send a data subscription request to L-DSF according to the model input and output in the joint learning training request and the model training data collection time. The data subscription request can carry the data representation and the data collection time.
409,第二节点中的L-DSF向L-A&MF发送数据订阅响应。409: The L-DSF in the second node sends a data subscription response to L-A&MF.
具体的,L-DSF根据408中的数据订阅请求采集数据,并向L-A&MF发送采集到的数据。Specifically, L-DSF collects data according to the data subscription request in 408, and sends the collected data to L-A&MF.
410,第二节点中的L-A&MF进行模型再训练、模型处理。410: The L-A&MF in the second node performs model retraining and model processing.
具体的,L-A&MF根据步骤404中下发的共享模型的信息,以及409中获取到的训练数据,进行本地共享模型再训练。Specifically, L-A&MF retrains the local shared model according to the information of the shared model issued in step 404 and the training data obtained in 409.
可选的,L-A&MF还可以根据步骤404中下发的本地模型上传前的处理算法的标识,进行本地模型处理,比如将进行本地模型再训练时获取的本地模型与步骤404中下发的共享模型进行增量运算(作为示例可以对本地模型的参数和共享模型的参数进行增量运算)。然后可以通过参数修剪、量化、低秩分解、稀疏优化等算法进行模型压缩,通过layer混淆或将参数转换为code等方式进行模型加密。Optionally, L-A&MF can also perform local model processing according to the identification of the processing algorithm issued in step 404 before uploading the local model, for example, the local model obtained during the retraining of the local model and the one issued in step 404 The shared model performs incremental operations (as an example, the local model parameters and the shared model parameters can be incrementally calculated). Then the model can be compressed by algorithms such as parameter pruning, quantization, low-rank decomposition, and sparse optimization, and model encryption can be performed by methods such as layer confusion or conversion of parameters to code.
411,第二节点中的L-A&MF向第一节点中的C-A&MF发送本地模型上传通知。作为示例,该本地模型上传通知中包括共享模型的模型标识,处理后的本地模型,以及该本地模型对应的训练数据集的大小,该本地模型的预测误差等,本申请实施例对此不作限定。411. The L-A&MF in the second node sends a local model upload notification to the C-A&MF in the first node. As an example, the local model upload notification includes the model identification of the shared model, the processed local model, and the size of the training data set corresponding to the local model, the prediction error of the local model, etc., which are not limited in the embodiment of the application .
具体的,该本地模型上传通知可以对应于图3中的模型上报消息的一个示例。具体的,本地模型上传通知可以参见图3中的模型上报消息的描述,为避免重复,这里不再赘述。Specifically, the local model upload notification may correspond to an example of the model report message in FIG. 3. Specifically, the local model upload notification can refer to the description of the model report message in FIG. 3, and to avoid repetition, it will not be repeated here.
412,第一节点中的C-A&MF向第二节点中的L-A&MF发送本地模型上传通知响应。412. The C-A&MF in the first node sends a local model upload notification response to the L-A&MF in the second node.
413,第一节点中的C-A&MF进行模型筛选、聚合、处理。413. The C-A&MF in the first node performs model screening, aggregation, and processing.
具体的,C-A&MF根据步骤401中指示的本地模型筛选策略,从接收到的至少一个本地模型(比如上文中所述的至少一个第一本地模型)中筛选出满足条件的至少一个本地模型(比如上文中所述的至少一个第二本地模型)。具体的,C-A&MF进行筛选的方式可以参见上文中的描述,为了简洁,这里不再详细描述。Specifically, C-A&MF selects at least one local model that satisfies the condition from the received at least one local model (such as the at least one first local model described above) according to the local model screening strategy indicated in step 401 ( Such as the at least one second local model described above). Specifically, the method of screening by C-A&MF can be referred to the description above. For brevity, it will not be described in detail here.
然后,C-A&MF可以根据本地模型聚合策略,对筛选出的至少一个本地模型进行聚合,比如加权平均聚合。具体的,C-A&MF进行聚合的方式可以参见上文中的描述,为了简洁,这里不再详细描述。Then, C-A&MF can aggregate at least one local model selected according to the local model aggregation strategy, such as weighted average aggregation. Specifically, the manner in which C-A&MF performs aggregation can be referred to the above description. For brevity, it will not be described in detail here.
然后,C-A&MF可以根据本地模型处理策略,对聚合后的模型进行处理,比如压缩,或加密。具体的,C-A&MF进行处理的方式可以参见上文中的描述,为了简洁,这里不再详细描述。Then, C-A&MF can process the aggregated model according to the local model processing strategy, such as compression or encryption. Specifically, the processing method of C-A&MF can be referred to the above description. For brevity, it will not be described in detail here.
本申请一种可能的情况,执行如图4中4A所示的各个步骤。其中,4A中包括414至416。In a possible situation of this application, each step shown in 4A in FIG. 4 is executed. Among them, 4A includes 414 to 416.
414,第一节点中的C-A&MF向第二节点中的L-MEF发送模型更新请求#1。414. The C-A&MF in the first node sends a model update request #1 to the L-MEF in the second node.
本申请实施例中,C-A&MF可以根据测试数据集中的测试数据对步骤413中获取到的共享模型进行测试,确定步骤413中获取的共享模型的预测误差。这里,413中获取到的共享模型也可以称为新的共享模型。这里,该新的共享模型可以对应于图3中第二共享模型的一个示例。In the embodiment of the present application, C-A&MF may test the shared model obtained in step 413 according to the test data in the test data set, and determine the prediction error of the shared model obtained in step 413. Here, the sharing model obtained in 413 may also be referred to as a new sharing model. Here, the new sharing model may correspond to an example of the second sharing model in FIG. 3.
当C-A&MF确定新的共享模型的预测误差小于某一阈值时,发送上述模型更新请求#1,其中该模型更新请求#1中可以包括该新的共享模型的模型标识,以及融合后的模型的参数。作为示例,当该训练模型为神经网络模型时,模型的参数可以包括各层的权重参数、偏置参数,或激活函数的信息中的至少一种。When C-A&MF determines that the prediction error of the new shared model is less than a certain threshold, it sends the above model update request #1, where the model update request #1 may include the model identification of the new shared model and the fused model Parameters. As an example, when the training model is a neural network model, the parameters of the model may include at least one of weight parameters, bias parameters, or activation function information of each layer.
这里,该模型更新请求#1可以为图3中第一节点确定第二共享模型的预测误差小于第三阈值的情况下,发送的模型更新通知消息的一个示例。Here, the model update request #1 may be an example of the model update notification message sent when the first node in FIG. 3 determines that the prediction error of the second shared model is less than the third threshold.
当C-A&MF确定新的共享模型的预测误差大于或等于该阈值时,步骤414至416可以不执行。When C-A&MF determines that the prediction error of the new shared model is greater than or equal to the threshold, steps 414 to 416 may not be performed.
415,L-MEF执行模型更新安装。具体的,L-MEF将模型的当前的参数替换为414中下发的模型的参数。415, L-MEF performs model update installation. Specifically, L-MEF replaces the current parameters of the model with the parameters of the model issued in 414.
以模型为神经网络模型为例,L-MEF可以将神经网络各层的权重参数、偏执参数,或激活函数替换为步骤414中下发的模型的参数。Taking the model as a neural network model as an example, the L-MEF can replace the weight parameters, paranoid parameters, or activation functions of each layer of the neural network with the parameters of the model delivered in step 414.
416,第二节点中的L-MEF向第一节点中的C-A&MF发送模型更新响应#1。这里,模型更新响应#1可以指示已完成模型更新。416. The L-MEF in the second node sends a model update response #1 to the C-A&MF in the first node. Here, model update response #1 may indicate that the model update has been completed.
本申请另一种可能的情况,可以将图4中4A所示的各个步骤替换为如图4中4B所示的各个步骤。其中,4B中包括417至421。In another possible situation of this application, the steps shown in 4A in FIG. 4 can be replaced with the steps shown in 4B in FIG. 4. Among them, 4B includes 417 to 421.
417,第一节点中的C-A&MF向第二节点中的L-A&MF方式模型更新请求#2。417. The C-A&MF in the first node requests the L-A&MF mode model update request #2 in the second node.
本申请实施例中,当C-A&MF通过步骤413获取到新的共享模型之后,可以向L-A&MF发送模型更新请求#2,其中该模型更新请求#2中可以包括该新的共享模型的模型标识,以及融合后的模型的参数。作为示例,当该训练模型为神经网络模型时,模型的参数可以包括各层的权重参数、偏置参数,或激活函数的信息中的至少一种。In the embodiment of this application, after C-A&MF obtains the new shared model through step 413, it can send a model update request #2 to L-A&MF, where the model update request #2 can include the model of the new shared model Identification, and the parameters of the fused model. As an example, when the training model is a neural network model, the parameters of the model may include at least one of weight parameters, bias parameters, or activation function information of each layer.
这里,该模型更新请求#2可以为图3中第二节点判断更新后的共享模型的预测误差是否小于第四阈值的之前,第一节点向第二节点发送的模型更新通知消息的一个示例。Here, the model update request #2 may be an example of the model update notification message sent by the first node to the second node before the second node in FIG. 3 determines whether the prediction error of the updated shared model is less than the fourth threshold.
418,第二节点中的L-A&MF向第一节点中的C-A&MF发送模型更新响应#2。这里,模型更新响应#2可以用于通知第一节点收到模型更新请求#2。418. The L-A&MF in the second node sends a model update response #2 to the C-A&MF in the first node. Here, the model update response #2 can be used to notify the first node that the model update request #2 is received.
419,第二节点中的L-A&MF向L-MEF发送模型安装请求。419: The L-A&MF in the second node sends a model installation request to the L-MEF.
例如,L-A&MF可以根据模型更新策略,判断新的共享模型的预测误差是否大于预设阈值,或旧的共享模型。作为示例,当新的共享模型的预测误差大于或等于旧的共享模型时,第二节点不对新的共享模型进行模型安装,此时不执行步骤419至421。当新的共享模型的预测误差小于旧的共享模型时,第二节点对新的共享模型进行更新安装,此时执行步骤419至421。For example, L-A&MF can determine whether the prediction error of the new shared model is greater than a preset threshold or the old shared model according to the model update strategy. As an example, when the prediction error of the new shared model is greater than or equal to the old shared model, the second node does not perform model installation on the new shared model, and steps 419 to 421 are not executed at this time. When the prediction error of the new shared model is smaller than the old shared model, the second node updates and installs the new shared model, and steps 419 to 421 are executed at this time.
其中,该模型安装请求中可以携带新的共享模型的模型标识,以及融合后的模型的参 数。具体的,模型标识,以及融合后的模型的参数可以参见上文中的描述,为了简洁,这里不再描述。Among them, the model installation request can carry the model identifier of the new shared model and the parameters of the fused model. Specifically, the model identification and the parameters of the fused model can be referred to the above description, for the sake of brevity, it will not be described here.
420,第二节点中的L-MEF进行模型更新安装。420. The L-MEF in the second node performs model update installation.
具体的,420可以参见415中的描述,为了简洁,这里不再描述。Specifically, 420 may refer to the description in 415, and for brevity, it is not described here.
421,第二节点中的L-MEF向L-A&MF发送模型安装响应,模型安装响应可以指示已完成模型更新。421. The L-MEF in the second node sends a model installation response to L-A&MF, and the model installation response may indicate that the model update has been completed.
422,第二节点进行模型应用。422, the second node performs model application.
具体的,第二节点中的L-MEF向L-DSF订阅模型预测所需的数据,并进行模型预测。然后,将预测结果发送给本地APF进行策略执行。Specifically, the L-MEF in the second node subscribes to the L-DSF the data required for model prediction, and performs model prediction. Then, the prediction result is sent to the local APF for policy execution.
需要说明的是,本申请一个可选的实施例,一旦第一节点和第二节点启动了联合学习,就可以按照联合学习的步骤循环执行下去。It should be noted that, in an optional embodiment of the present application, once the first node and the second node start the joint learning, they can be executed cyclically according to the steps of the joint learning.
本申请另一个可选的实施例,可以设置联合学习停止条件。在满足该联合学习停止条件的情况下,第一节点和第二节点可以停止联合学习。作为一个示例,联合学习停止条件可以包括联合学习执行时长,或第二节点资源受限。也就是说,本申请实施例可以在启动联合学习之后的该执行时长时间之后,停止联合学习,或者在部分或全部第二节点资源受限的情况下,停止联合学习。In another optional embodiment of this application, a joint learning stop condition can be set. When the joint learning stop condition is satisfied, the first node and the second node can stop the joint learning. As an example, the joint learning stop condition may include the execution time of the joint learning or the limited resources of the second node. That is to say, the embodiment of the present application may stop the joint learning after the execution time long after the joint learning is started, or stop the joint learning when some or all of the resources of the second node are limited.
可选的,本申请实施例中,联合学习停止条件可以包含于联合学习策略中,或者预先配置于第一节点或第二节点中,本申请实施例对此不作限定。Optionally, in the embodiment of the present application, the joint learning stop condition may be included in the joint learning strategy, or pre-configured in the first node or the second node, which is not limited in the embodiment of the present application.
因此,本申请实施例通过联合学习策略来管理无线网络中的第一节点和第二节点的联合学习过程,包括编排管理包括联合学习启动条件、模型的上传策略、模型的筛选策略、模型的聚合策略、模型的处理策略等中的至少一种。基于此,本申请实施例能够实现在第二节点的本地数据不需要上传至第一节点的情况下,协同第一节点与第二节点共同学习,得到准确性以及泛化能力高的训练模型。Therefore, the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through a joint learning strategy, including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc. Based on this, the embodiment of the present application can realize that under the condition that the local data of the second node does not need to be uploaded to the first node, the first node and the second node can learn together to obtain a training model with high accuracy and generalization ability.
图5示出了本申请实施例提供的一种训练模型的方法500的示意性流程图。应理解,图5示出了训练模型的方法的步骤或操作,但这些步骤或操作仅是示例,本申请实施例还可以执行其他操作或者图5中的各个操作的变形。此外,图5中的各个步骤可以按照与图5呈现的不同的顺序来执行,并且有可能并非要执行图5中的全部操作。FIG. 5 shows a schematic flowchart of a method 500 for training a model provided by an embodiment of the present application. It should be understood that FIG. 5 shows the steps or operations of the method for training a model, but these steps or operations are only examples, and the embodiment of the present application may also perform other operations or variations of each operation in FIG. 5. In addition, the various steps in FIG. 5 may be performed in a different order from that presented in FIG. 5, and it is possible that not all operations in FIG. 5 are to be performed.
图5中以CUDA和至少一个DUDA为例进行描述,CUDA可以对应于上文中的第一节点的一个示例,DUDA可以对应于上文中第二节点的一个示例。本申请实施例中,当智能网元的功能合一部署时,第一节点与第二节点之间进行联合学习时的交互信息可以直接通过第一节点与第二节点之间的接口传输。In FIG. 5, CUDA and at least one DUDA are taken as examples for description. CUDA may correspond to an example of the first node above, and DUDA may correspond to an example of the second node above. In the embodiment of the present application, when the functions of the intelligent network element are deployed in one, the interactive information during joint learning between the first node and the second node can be directly transmitted through the interface between the first node and the second node.
需要说明的是,本申请实施例以CUDA和DUDA为例,但本申请实施例并不限制于此,例如CUDA还可以被替换为gNB或eNB或小区(cell),且DUDA被替换为该gNB或eNB或小区(cell)服务的终端设备。又例如CUDA还可以被替换为CU,且DUDA被替换为该CU管理的DU。又例如,CUDA还可以被替换为C-RAN,DUDA被替换为该C-RAN管理的eNB或gNB。又例如,CUDA还可以被替换为eNB或gNB,DUDA被替换为该eNB或gNB管理的小区(cell)。又例如,CUDA还可以被替换为OSS,DUDA被替换为OSS管理的网元。又例如,CUDA和CUDA均可以被替换为gNB或eNB。本申请实施例对此不作具体限定。It should be noted that the embodiments of this application take CUDA and DUDA as examples, but the embodiments of this application are not limited thereto. For example, CUDA can also be replaced with gNB or eNB or cell, and DUDA can be replaced with the gNB Or terminal equipment served by an eNB or a cell. For another example, CUDA can also be replaced with CU, and DUDA can be replaced with DU managed by the CU. For another example, CUDA can also be replaced with C-RAN, and DUDA can be replaced with an eNB or gNB managed by the C-RAN. For another example, CUDA may also be replaced by an eNB or gNB, and DUDA may be replaced by a cell managed by the eNB or gNB. For another example, CUDA can also be replaced with OSS, and DUDA can be replaced with network elements managed by OSS. For another example, both CUDA and CUDA can be replaced with gNB or eNB. The embodiment of the application does not specifically limit this.
501,CUDA向至少一个DUDA发送联合学习训练请求。501. CUDA sends a joint learning training request to at least one DUDA.
作为示例,CUDA可以在满足联合学习启动条件时,向至少一个DUDA中的每个DUDA发送联合学习训练请求。具体的,联合训练请求可以参见上文中的描述可以参见上文中的描述,为了简洁,这里不再赘述。As an example, CUDA may send a joint learning training request to each DUDA of at least one DUDA when the joint learning start condition is satisfied. Specifically, the joint training request may refer to the above description, and may refer to the above description. For brevity, details are not repeated here.
502,至少一个DUDA中的每个DUDA向CUDA发联合学习训练请求响应。502. Each DUDA of at least one DUDA sends a joint learning training request response to CUDA.
503,每个DUDA进行本地模型训练和处理。503. Each DUDA performs local model training and processing.
具体的,DUDA根据步骤501的指示,进行数据订阅、本地模型训练、处理。具体的,可以参见上文图4中410的描述,为了简洁,这里不再赘述。Specifically, DUDA performs data subscription, local model training, and processing according to the instructions in step 501. For details, please refer to the description of 410 in FIG. 4 above. For brevity, details are not described herein again.
504,每个DUDA向CUDA发送本地模型上传通知。504. Each DUDA sends a local model upload notification to CUDA.
505,CUDA向每个DUDA发送本地模型上传通知响应。505. CUDA sends a local model upload notification response to each DUDA.
具体的,504和505可以参见上文图4中411和412的描述,为了简洁,这里不再赘述。Specifically, 504 and 505 can be referred to the descriptions of 411 and 412 in FIG. 4 above. For brevity, details are not repeated here.
506,CUDA进行模型筛选、融合、处理。506, CUDA performs model screening, fusion, and processing.
具体的,506可以参见上文图4中41/3的描述,为了简洁,这里不再赘述。Specifically, 506 can refer to the description of 41/3 in FIG. 4 above, and for the sake of brevity, it will not be repeated here.
507,每个DUDA进行模型更新安装以及模型应用。507. Each DUDA performs model update installation and model application.
具体的,507可以参见图4中414至421的描述,为了简洁,这里不再赘述。Specifically, for 507, refer to the descriptions 414 to 421 in FIG. 4, and for the sake of brevity, details are not repeated here.
508,重复步骤501至507。508, repeat steps 501 to 507.
因此,本申请实施例中,CUDA向至少一个DUDA发送联合学习训练请求,每个DUDA能够根据该联合学习训练请求,对CUDA所指示的共享模型进行本地再训练,然后每个DUDA将训练得到的本地模型上报给CUDA,进而CUDA能够对该至少一个DUDA上报的本地模型进行融合和处理,确定新的共享模型,基于此本申请实施例能够在CU和DU分离的架构下,通过CU与DU之间的接口传输联合学习时的交互信息,基于此得到准确性以及泛化能力较高的共享模型。Therefore, in this embodiment of the application, CUDA sends a joint learning training request to at least one DUDA, and each DUDA can locally retrain the shared model indicated by CUDA according to the joint learning training request, and then each DUDA will train the result The local model is reported to CUDA, and then CUDA can merge and process the local model reported by the at least one DUDA to determine a new shared model. Based on this, the embodiment of the present application can use the CU and the DU to separate the architecture. The inter-interface transmits interactive information during joint learning, and based on this, a shared model with high accuracy and generalization ability is obtained.
基于上述实施例的方法,下面将介绍本申请提供的通信装置。Based on the method of the foregoing embodiment, the communication device provided in the present application will be introduced below.
图6示出了本申请提供的无线网络中训练模型的装置600的结构示意图,该训练模型的装置600可以为无线网络中的第一节点。该训练模型的装置600包括:发送单元610、接收单元620和确定单元630。FIG. 6 shows a schematic structural diagram of an apparatus 600 for training a model in a wireless network provided by the present application. The apparatus 600 for training a model may be the first node in the wireless network. The device 600 for training a model includes: a sending unit 610, a receiving unit 620, and a determining unit 630.
发送单元610,用于向所述无线网络中的至少一个第二节点发送第一请求,所述第一请求用于请求所述至少一个第二节点分别基于所述第二节点的本地数据,对第一共享模型进行本地模型再训练。The sending unit 610 is configured to send a first request to at least one second node in the wireless network, where the first request is used to request the at least one second node to perform a request based on the local data of the second node. The first shared model performs local model retraining.
接收单元620,用于从所述至少一个第二节点分别获取模型上报消息,所述每个第二节点的模型上报消息包括第一本地模型的参数,或者包括所述第一本地模型的参数与所述第一共享模型的参数之间的增量,其中,所述第一本地模型是所述每个第二节点基于所述第一请求和所述本地数据,对所述第一进行本地模型再训练后得到的。The receiving unit 620 is configured to obtain a model report message from the at least one second node, where the model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and The increment between the parameters of the first shared model, wherein the first local model is that each second node performs a local model on the first based on the first request and the local data Obtained after retraining.
确定单元630用于所述根据所述至少一个第二节点的模型上报消息和所述第一共享模型,确定第二共享模型。The determining unit 630 is configured to determine the second sharing model according to the model report message of the at least one second node and the first sharing model.
因此,本申请实施例中,无线网络中的第一节点向该无线网络中的至少一个第二节点发送第一请求,每个第二节点能够根据该第一请求,基于其本地数据对第一共享模型进行本地再训练,然后通过模型上报消息将训练得到的本地模型的参数,或本地模型的参数与 共享模型的参数之间的增量上报给第一节点,进而第一节点能够根据该至少一个第二节点上报内容和第一共享模型,确定第二共享模型,基于此本申请实施例能够在无线网络中实现联合学习。Therefore, in this embodiment of the present application, a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data. The shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model. Based on this, the embodiment of the present application can implement joint learning in a wireless network.
可选的,所述模型上报消息还包括所述第一本地模型对应的训练数据集的大小,和/或,所述第一本地模型的预测误差;Optionally, the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;
所述确定单元630具体用于:The determining unit 630 is specifically configured to:
在所述至少一个第二节点对应的至少一个第一本地模型中确定第二本地模型,所述第二本地模型对应的训练数据集的大小大于或等于第一阈值,和/或,所述第二本地模型的预测误差小于或等于第二阈值;A second local model is determined in at least one first local model corresponding to the at least one second node, and the size of the training data set corresponding to the second local model is greater than or equal to a first threshold, and/or, the first 2. The prediction error of the local model is less than or equal to the second threshold;
根据所述第二本地模型和所述第一共享模型,确定所述第二共享模型。Determine the second sharing model according to the second local model and the first sharing model.
因此,本申请实施例在第一节点根据第一本地模型确定第二共享模型时,通过至少一个第一本地模型中筛选出训练数据集的大小大于或等于某一特定阈值,和/或预测误差小于或等于某一特定阈值的第二本地模型,然后通过第二本地模型来确定第二共享模型,由于本申请实施例中第二本地模型的准确性或泛化能力高于第一本地模型,因此本申请实施例能够有助于提高第二共享模型的准确性以及泛化能力。Therefore, in the embodiment of the present application, when the first node determines the second shared model according to the first local model, the size of the training data set is selected from at least one first local model to be greater than or equal to a certain threshold, and/or the prediction error The second local model is less than or equal to a certain threshold, and then the second shared model is determined through the second local model. Since the accuracy or generalization ability of the second local model in the embodiment of this application is higher than that of the first local model, Therefore, the embodiments of the present application can help improve the accuracy and generalization ability of the second sharing model.
可选的,第一节点可以不在至少一个第一本地模型中筛选至少一个第二本地模型,而是直接根据该至少一个第一本地模型确定第二共享模型。Optionally, the first node may not filter the at least one second local model from the at least one first local model, but directly determine the second shared model based on the at least one first local model.
可选的,所述确定单元630具体用于:Optionally, the determining unit 630 is specifically configured to:
对所述第二本地模型的参数,或者所述第二本地模型的参数与所述第一共享模型的参数之间的增量,进行加权平均聚合,其中,所述加权平均聚合所采用的权重参数是根据所述第二本地模型对应的训练数据集的大小,和/或,所述第二本地模型的预测误差确定的;Perform weighted average aggregation on the parameters of the second local model, or the increments between the parameters of the second local model and the parameters of the first shared model, wherein the weight used by the weighted average aggregation The parameter is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model;
所述第一节点根据加权平均聚合的结果和所述第一共享模型,确定所述第二共享模型。The first node determines the second sharing model according to the weighted average aggregation result and the first sharing model.
可选的,所述第一节点中包括集中自适应策略功能和集中分析和建模功能:Optionally, the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function:
所述集中自适应策略功能向所述集中分析和建模功能发送联合学习策略,所述联合学习策略包括以下信息中的至少一种:The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, where the joint learning strategy includes at least one of the following information:
联合学习启动条件、所述第一共享模型的信息、联合学习群组成员标识、所述第一本地模型的上传策略、所述第一本地模型的筛选策略、所述第一本地模型的聚合策略、所述第一本地模型的处理策略或共享模型更新策略。Joint learning start conditions, information of the first shared model, identification of members of the joint learning group, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model , The processing strategy of the first local model or the shared model update strategy.
因此,本申请实施例通过联合学习策略来管理无线网络中的第一节点和第二节点的联合学习过程,包括编排管理包括联合学习启动条件、模型的上传策略、模型的筛选策略、模型的聚合策略、模型的处理策略等中的至少一种。基于此,本申请实施例能够实现在第二节点的本地数据不需要上传至第一节点的情况下,协同第一节点与第二节点共同学习,得到准确性以及泛化能力高的训练模型。Therefore, the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through a joint learning strategy, including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc. Based on this, the embodiment of the present application can realize that under the condition that the local data of the second node does not need to be uploaded to the first node, the first node and the second node can learn together to obtain a training model with high accuracy and generalization ability.
可选的,所述第二节点中包括本地分析和建模功能;Optionally, the second node includes local analysis and modeling functions;
其中,所述发送单元610具体用于:Wherein, the sending unit 610 is specifically configured to:
在满足所述联合学习启动条件时,所述集中分析和建模功能向所述至少一个第二节点中每个第二节点的本地分析和建模功能发送所述第一请求,其中,所述第一请求中包括所述第一共享模型的信息。When the joint learning start condition is met, the centralized analysis and modeling function sends the first request to the local analysis and modeling function of each second node in the at least one second node, wherein the The first request includes the information of the first sharing model.
可选的,所述第一共享模型的信息包括以下至少一种:Optionally, the information of the first sharing model includes at least one of the following:
所述第一共享模型的模型标识、模型类型、模型结构、输入输出和初始模型参数或训练数据采集时长。The model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
本申请实施例通过设置联合学习的启动条件,能够实现在不满足联合学习启动条件的情况下采用集中训练模型的方式进行模型训练,在满足联合训练启动条件的情况下采用联合学习的方式进行模型训练。By setting the starting conditions of the joint learning in the embodiment of the application, it is possible to use a centralized training model for model training when the joint learning starting conditions are not met, and to use a joint learning method for model training when the joint training starting conditions are met. training.
可选的,所述第一请求中还包括所述第一本地模型的上传策略。Optionally, the first request further includes the upload strategy of the first local model.
可选的,所述第一本地模型的上传策略包括以下至少一种:Optionally, the upload strategy of the first local model includes at least one of the following:
所述第一本地模型上传前的处理算法的标识、所述第一本地模型的上传时间或所述第一本地模型上传时的携带信息;The identifier of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the information carried when the first local model was uploaded;
其中,所述携带信息包括所述第一本地模型的训练数据集的大小和/或预测误差。Wherein, the carrying information includes the size and/or prediction error of the training data set of the first local model.
可选的,所述确定单元630还用于确定所述第二共享模型的预测误差小于或等于第三阈值。Optionally, the determining unit 630 is further configured to determine that the prediction error of the second sharing model is less than or equal to a third threshold.
在所述第二共享模型的预测误差小于或等于第三阈值时,所述发送单元610还用于向所述至少一个第二节点分别发送模型更新通知消息,所述模型更新通知消息用于请求所述每个第二节点安装所述第二共享模型。When the prediction error of the second shared model is less than or equal to the third threshold, the sending unit 610 is further configured to send a model update notification message to the at least one second node respectively, and the model update notification message is used to request The second shared model is installed in each of the second nodes.
因此,本申请实施例通过判断第二共享模型的预测误差,当确定该预测误差小于或等于预设阈值时,第一节点保存该第二共享模型,第二节点安装该第二共享模型。当确定该预测误差大于预设阈值时,第一节点不对第一共享模型进行更新,并且第二节点不安装第二共享模型。基于此,本申请实施例能够避免安装准确性低的共享模型,进一步的保证更新后的共享模型的准确性以及泛化能力。Therefore, the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
一种可能的实现方式中,可以将第一共享模型的预测误差设置为第三阈值,本申请实施例对此不作限定。In a possible implementation manner, the prediction error of the first sharing model may be set as the third threshold, which is not limited in the embodiment of the present application.
可选的,发送单元610和/或接收单元620还可以一起被称为收发单元(模块),或者通信单元,可以分别用于执行方法实施例中第一节点接收和发送的步骤。处理单元630用于对生成发送单元610发送的指令,或处理接收单元620接收的指令。可选的,训练模型的装置600还可以包括存储单元,存储单元用于存储发送单元、接收单元和处理单元执行的指令。Optionally, the sending unit 610 and/or the receiving unit 620 may also be collectively referred to as a transceiver unit (module), or a communication unit, which may be respectively used to perform the steps of receiving and sending by the first node in the method embodiment. The processing unit 630 is configured to generate instructions sent by the sending unit 610, or process instructions received by the receiving unit 620. Optionally, the device 600 for training a model may further include a storage unit for storing instructions executed by the sending unit, the receiving unit, and the processing unit.
训练模型的装置600是方法实施例中的第一节点,也可以是第一节点内的芯片。当该训练模型的装置600是第一节点时,该处理单元可以是处理器,发送单元和接收单元可以是收发器。该训练模型的装置还可以包括存储单元,该存储单元可以是存储器。该存储单元用于存储指令,该处理单元执行该存储单元所存储的指令,以使该通信设备执行上述方法。当该训练模型的装置600是第一节点内的芯片时,该处理单元可以是处理器,发送单元和接收单元可以是输入/输出接口、管脚或电路等;该处理单元执行存储单元所存储的指令,以使该通信装置执行上述方法实施例中由终端设备所执行的操作,该存储单元可以是该芯片内的存储单元(例如,寄存器、缓存等),也可以是该终端设备内的位于该芯片外部的存储单元(例如,只读存储器、随机存取存储器等)。The device 600 for training the model is the first node in the method embodiment, and may also be a chip in the first node. When the device 600 for training the model is the first node, the processing unit may be a processor, and the sending unit and the receiving unit may be transceivers. The apparatus for training a model may further include a storage unit, and the storage unit may be a memory. The storage unit is used to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the communication device executes the foregoing method. When the device 600 for training the model is a chip in the first node, the processing unit may be a processor, and the sending unit and receiving unit may be input/output interfaces, pins or circuits, etc.; the processing unit executes the storage unit In order to cause the communication device to perform the operations performed by the terminal device in the above method embodiment, the storage unit may be a storage unit in the chip (for example, a register, a cache, etc.), or may be in the terminal device A storage unit (for example, read-only memory, random access memory, etc.) located outside the chip.
本领域技术人员可以清楚地了解到,当训练模型的装置600所执行的步骤以及相应的有益效果可以参考上述方法实施例中第一节点的相关描述,为了简洁,在此不再赘述。Those skilled in the art can clearly understand that the steps performed by the apparatus 600 for training the model and the corresponding beneficial effects can be referred to the related description of the first node in the foregoing method embodiment, and for brevity, details are not described herein again.
应理解,发送单元610和接收单元620可以由收发器实现。处理单元可由处理器实现。存储单元可以由存储器实现。如图7所示,训练模型的装置700可以包括处理器710、存储器720和收发器730。该训练模型的装置700可以为无线网络中的第一节点。It should be understood that the sending unit 610 and the receiving unit 620 may be implemented by a transceiver. The processing unit can be implemented by a processor. The storage unit can be realized by a memory. As shown in FIG. 7, the apparatus 700 for training a model may include a processor 710, a memory 720, and a transceiver 730. The device 700 for training the model may be the first node in the wireless network.
图6所示的训练模型的装置600或图7所示的训练模型的装置700能够实现前述实施例中第一节点执行的步骤,类似的描述可以参考前述对应的方法中的描述。为避免重复,这里不再赘述。The device 600 for training a model shown in FIG. 6 or the device 700 for training a model shown in FIG. 7 can implement the steps performed by the first node in the foregoing embodiment. For similar descriptions, reference may be made to the description in the foregoing corresponding method. To avoid repetition, I won’t repeat them here.
图8示出了本申请提供的训练模型的装置800的结构示意图。该训练模型的装置800可以为无线网络中的第二节点。该训练模型的装置800包括:接收单元810、处理单元820和发送单元830。FIG. 8 shows a schematic structural diagram of an apparatus 800 for training a model provided in this application. The device 800 for training the model may be the second node in the wireless network. The device 800 for training a model includes: a receiving unit 810, a processing unit 820, and a sending unit 830.
接收单元810,用于从所述无线接入网中的第一节点接收第一请求。The receiving unit 810 is configured to receive a first request from a first node in the radio access network.
处理单元820,用于根据所述第一请求,基于所述第二节点的本地数据对第一共享模型进行本地模型再训练,以得到第一本地模型。The processing unit 820 is configured to perform local model retraining on the first shared model based on the local data of the second node according to the first request to obtain the first local model.
发送单元830,用于向所述第一节点发送模型上报消息,所述模型上报消息包括所述第一本地模型的参数,或者包括所述第一本地模型的参数与所述第一共享模型的参数之间的增量,所述模型上报消息用于所述第一共享模型的更新。The sending unit 830 is configured to send a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the first shared model. The increment between the parameters, the model report message is used to update the first shared model.
因此,本申请实施例中,无线网络中的第一节点向该无线网络中的至少一个第二节点发送第一请求,每个第二节点能够根据该第一请求,基于其本地数据对第一共享模型进行本地再训练,然后通过模型上报消息将训练得到的本地模型的参数,或本地模型的参数与共享模型的参数之间的增量上报给第一节点,进而第一节点能够根据该至少一个第二节点上报内容和第一共享模型,确定第二共享模型,基于此本申请实施例能够在无线网络中实现联合学习。Therefore, in this embodiment of the present application, a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data. The shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model. Based on this, the embodiment of the present application can implement joint learning in a wireless network.
可选的,所述模型上报消息还包括所述第一本地模型对应的训练数据集的大小,和/或,所述第一本地模型的预测误差。Optionally, the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.
这样,第一节点可以根据模型上报消息中的第一本地模型对应的训练数据集的大小,和/或,预测误差,筛选出训练数据集的大小大于或等于某一特定阈值,和/或预测误差小于或等于某一特定阈值的第二本地模型,然后通过第二本地模型来确定第二共享模型。由于本申请实施例中第二本地模型的准确性或泛化能力高于第一本地模型,因此本申请实施例能够有助于提高第二共享模型的准确性以及泛化能力。In this way, the first node can filter out that the size of the training data set is greater than or equal to a certain threshold, and/or the prediction based on the size of the training data set corresponding to the first local model in the model report message, and/or the prediction error The second local model whose error is less than or equal to a certain threshold is then used to determine the second shared model through the second local model. Since the accuracy or generalization ability of the second local model in the embodiment of the application is higher than that of the first local model, the embodiment of the application can help improve the accuracy and generalization ability of the second shared model.
可选的,所述第二节点中包括本地分析和建模功能,所述第一节点中包括集中分析和建模功能;Optionally, the second node includes local analysis and modeling functions, and the first node includes centralized analysis and modeling functions;
其中,所述第二节点从所述无线接入网中的第一节点接收第一请求,包括:Wherein, the second node receiving the first request from the first node in the radio access network includes:
所述第二节点中的本地分析和建模功能从所述第一节点中本地分析和建模功能所述第一请求,其中,所述第一请求中包括所述第一共享模型的信息。The local analysis and modeling function in the second node obtains the first request from the local analysis and modeling function in the first node, wherein the first request includes the information of the first shared model.
可选的,所述第一请求中还包括所述第一本地模型的上传策略。对应的,第二节点获取到第一本地模型的上传策略之后,可以根据该上传策略对本地再训练得到本地模型并进行相应的处理操作。Optionally, the first request further includes the upload strategy of the first local model. Correspondingly, after obtaining the upload strategy of the first local model, the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.
可选的,所述第一本地模型的上传策略包括以下至少一种:Optionally, the upload strategy of the first local model includes at least one of the following:
所述第一本地模型上传前的处理算法的标识、所述第一本地模型的上传时间或所述第一本地模型上传时的携带信息;The identifier of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the information carried when the first local model was uploaded;
其中,所述携带信息包括所述第一本地模型的训练数据集的大小和/或预测误差。Wherein, the carrying information includes the size and/or prediction error of the training data set of the first local model.
可选的,所述第二节点中还包括本地自适应策略功能,Optionally, the second node also includes a local adaptive policy function,
所述本地分析和建模功能向所述本地自适应策略发送第三请求,所述第三请求用于请求对应于所述第一共享模型的本地联合学习策略,所述本地联合学习策略用于指示所述第二节点是否对所述第一共享模型进行本地模型再训练,所述第三请求中包括所述第一共享模型的信息;The local analysis and modeling function sends a third request to the local adaptive strategy, the third request is used to request a local joint learning strategy corresponding to the first shared model, and the local joint learning strategy is used to Indicating whether the second node performs local model retraining on the first shared model, and the third request includes the information of the first shared model;
所述本地分析和建模功能接收所述本地自适应策略发送的所述本地联合学习策略;The local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy;
当所述本地联合学习策略指示所述第二节点对所述第一共享模型进行本地模型再训练时,所述本地分析和建模功能基于所述本地数据,对所述第一共享模型进行本地模型再训练。When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local retraining on the first shared model based on the local data. Model retraining.
因此,本申请实施例通过本地自适应策略功能向本地分析和建模功能发送本地联合学习策略,可以使得第二节点根据自身的计算能力确定是否参与联合学习,从而能够避免第二节点因自身计算能力不足而导致联合学习迭代时间延长,提高联合学习效率。Therefore, the embodiment of the present application sends the local joint learning strategy to the local analysis and modeling function through the local adaptive strategy function, so that the second node can determine whether to participate in the joint learning according to its own computing ability, so as to avoid the second node from calculating by itself. Insufficient ability leads to a longer iteration time of joint learning and improves the efficiency of joint learning.
可选的,本申请实施例中,本地自适应策略功能也可以不向本地分析建模功能发送本地联合学习策略,而是第二节点在收到用于请求进行联合学习训练的请求时,始终参与到联合学习中。Optionally, in this embodiment of the application, the local adaptive strategy function may not send the local joint learning strategy to the local analysis and modeling function. Instead, when the second node receives a request for joint learning training, it will always Participate in joint learning.
可选的,所述第一共享模型的信息包括以下至少一种:Optionally, the information of the first sharing model includes at least one of the following:
所述第一共享模型的模型标识、模型类型、模型结构、输入输出和初始模型参数或训练数据采集时长。The model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
可选的,所述第二节点接收所述第一节点发送的模型更新通知消息,所述模型更新通知消息用于通知第二共享模型,其中,所述第二共享模型是所述第一节点根据所述模型上报消息和所述第一共享模型确定的;Optionally, the second node receives a model update notification message sent by the first node, where the model update notification message is used to notify a second sharing model, where the second sharing model is the first node Determined according to the model report message and the first sharing model;
所述第二节点确定所述第二共享模型的预测误差小于第四阈值时,安装所述第二共享模型。When the second node determines that the prediction error of the second sharing model is less than a fourth threshold, install the second sharing model.
因此,本申请实施例通过判断第二共享模型的预测误差,当确定该预测误差小于或等于预设阈值时,第一节点保存该第二共享模型,第二节点安装该第二共享模型。当确定该预测误差大于预设阈值时,第一节点不对第一共享模型进行更新,并且第二节点不安装第二共享模型。基于此,本申请实施例能够避免安装准确性低的共享模型,进一步的保证更新后的共享模型的准确性以及泛化能力。Therefore, the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.
一种可能的实现方式中,可以将第一共享模型的预测误差设置为第四阈值,本申请实施例对此不作限定。In a possible implementation manner, the prediction error of the first sharing model may be set as the fourth threshold, which is not limited in the embodiment of the present application.
可选的,接收单元810和/或发送单元830还可以一起被称为收发单元(模块),或者通信单元,可以分别用于执行方法实施例中第二节点接收和发送的步骤。处理单元820还用于对生成发送单元830发送的指令,或处理接收单元810接收的指令。可选的,通信装置800还可以包括处存储单元,存储单元用于存储通信单元和处理单元执行的指令。Optionally, the receiving unit 810 and/or the sending unit 830 may also be collectively referred to as a transceiver unit (module), or a communication unit, which may be respectively used to perform the steps of receiving and sending by the second node in the method embodiment. The processing unit 820 is also configured to generate instructions sent by the sending unit 830, or process instructions received by the receiving unit 810. Optionally, the communication device 800 may further include a storage unit for storing instructions executed by the communication unit and the processing unit.
训练模型的装置800是方法实施例中的第二节点,也可以是第二节点内的芯片。当该训练模型的装置800是第二节点时,该处理单元可以是处理器,发送单元和接收单元可以是收发器。该装置还可以包括存储单元,该存储单元可以是存储器。该存储单元用于存储指令,该处理单元执行该存储单元所存储的指令,以使该通信设备执行上述方法。当该训 练模型的装置800是第二节点内的芯片时,该处理单元可以是处理器,该发送单元和接收单元可以是输入/输出接口、管脚或电路等;该处理单元执行存储单元所存储的指令,以使该通信设备执行上述方法实施例中由网络设备所执行的操作,该存储单元可以是该芯片内的存储单元(例如,寄存器、缓存等),也可以是该通信设备内的位于该芯片外部的存储单元(例如,只读存储器、随机存取存储器等)。The device 800 for training the model is the second node in the method embodiment, and may also be a chip in the second node. When the device 800 for training the model is the second node, the processing unit may be a processor, and the sending unit and the receiving unit may be transceivers. The device may further include a storage unit, and the storage unit may be a memory. The storage unit is used to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the communication device executes the foregoing method. When the device 800 for the training model is a chip in the second node, the processing unit may be a processor, the sending unit and the receiving unit may be input/output interfaces, pins or circuits, etc.; the processing unit executes the storage unit Stored instructions to enable the communication device to perform the operations performed by the network device in the foregoing method embodiments. The storage unit may be a storage unit in the chip (for example, a register, a cache, etc.), or may be in the communication device. The storage unit (for example, read only memory, random access memory, etc.) located outside the chip.
本领域技术人员可以清楚地了解到,当训练模型的装置800所执行的步骤以及相应的有益效果可以参考上述方法实施例中第二节点的相关描述,为了简洁,在此不再赘述。Those skilled in the art can clearly understand that the steps performed by the device 800 for training the model and the corresponding beneficial effects can be referred to the related description of the second node in the foregoing method embodiment. For brevity, details are not described herein again.
应理解,发送单元830和接收单元810可以由收发器实现,处理单元820可由处理器实现。存储单元可以由存储器实现。如图9所示,训练模型的装置900可以包括处理器910、存储器920和收发器930。该训练模型的装置900可以为无线网络中的第二节点。It should be understood that the sending unit 830 and the receiving unit 810 may be implemented by a transceiver, and the processing unit 820 may be implemented by a processor. The storage unit can be realized by a memory. As shown in FIG. 9, the apparatus 900 for training a model may include a processor 910, a memory 920, and a transceiver 930. The device 900 for training the model may be the second node in the wireless network.
图8所示的训练模型的装置800或图9所示的训练模型的装置900能够实现前述方法实施例中第二节点执行的步骤,类似的描述可以参考前述对应的方法中的描述。为避免重复,这里不再赘述。The device 800 for training a model shown in FIG. 8 or the device 900 for training a model shown in FIG. 9 can implement the steps performed by the second node in the foregoing method embodiment. For a similar description, reference may be made to the description in the foregoing corresponding method. To avoid repetition, I won’t repeat them here.
上述各个装置实施例中训练模型的装置和方法实施例中的第一节点或第二节点对应,由相应的模块或单元执行相应的步骤。例如收发单元(或通信单元,收发器)方法执行方法实施例中发送和/或接收的步骤(或由发送单元,接收单元分别执行),除发送接收外的其它步骤可以由处理单元(处理器)执行。具体单元的功能可以参考相应的方法实施例。发送单元和接收单元可以组成收发单元,发射器和接收器可以组成收发器,共同实现方法实施例中的收发功能;处理器可以为一个或多个。The first node or the second node in the device and method embodiment for training the model in each of the above device embodiments corresponds to the first node or the second node, and the corresponding modules or units execute the corresponding steps. For example, the sending and receiving unit (or communication unit, transceiver) method executes the sending and/or receiving steps in the method embodiment (or is executed by the sending unit and the receiving unit respectively), and other steps except sending and receiving can be performed by the processing unit (processor )carried out. For the functions of specific units, refer to the corresponding method embodiments. The sending unit and the receiving unit may form a transceiver unit, and the transmitter and receiver may form a transceiver to jointly implement the transceiver function in the method embodiment; the processor may be one or more.
应理解,上述各个单元的划分仅仅是功能上的划分,实际实现时可能会有其它的划分方法。It should be understood that the division of each unit described above is only a functional division, and there may be other division methods in actual implementation.
上述第一节点或者第二节点可以是一个芯片,处理单元可以通过硬件来实现也可以通过软件来实现。当通过硬件实现时,该处理单元可以是逻辑电路、集成电路等。当通过软件来实现时,该处理单元可以是一个通用处理器,通过读取存储单元中存储的软件代码来实现,该存储单元可以集成在处理器中,也可以位于该处理器之外独立存在。The above-mentioned first node or second node may be a chip, and the processing unit may be implemented by hardware or software. When implemented by hardware, the processing unit may be a logic circuit, an integrated circuit, or the like. When implemented by software, the processing unit can be a general-purpose processor, which can be implemented by reading the software code stored in the storage unit. The storage unit can be integrated in the processor or can exist independently of the processor. .
应理解,上述处理装置可以是一个芯片。例如,该处理装置可以是现场可编程门阵列(Field-Programmable Gate Array,FPGA)、专用集成芯片(Application Specific Integrated Circuit,ASIC)、系统芯片(System on Chip,SoC)、中央处理器(Central Processor Unit,CPU)、网络处理器(Network Processor,NP)、数字信号处理电路(Digital Signal Processor,DSP)、微控制器(Micro Controller Unit,MCU),可编程控制器(Programmable Logic Device,PLD)或其他集成芯片等。It should be understood that the foregoing processing device may be a chip. For example, the processing device may be a Field-Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a System on Chip (SoC), and a Central Processor (Central Processor). Unit, CPU), network processor (Network Processor, NP), digital signal processing circuit (Digital Signal Processor, DSP), microcontroller (Micro Controller Unit, MCU), programmable controller (Programmable Logic Device, PLD) or Other integrated chips, etc.
在实现过程中,本实施例提供的方法中的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In the implementation process, each step in the method provided in this embodiment can be completed by an integrated logic circuit of hardware in the processor or instructions in the form of software. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
应注意,本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated crcuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或 者晶体管逻辑器件、分立硬件组件。本申请实施例中的处理器可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be noted that the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software. The above-mentioned processor can be a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated crcuit, ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components. The processors in the embodiments of the present application may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
可以理解,本申请实施例中的存储器或存储单元可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory or storage unit in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) ) And direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memories of the systems and methods described herein are intended to include, but are not limited to, these and any other suitable types of memories.
本申请实施例还提供一种无线网络,其包括上述第一节点和第二节点。An embodiment of the present application also provides a wireless network, which includes the first node and the second node described above.
本申请实施例还提供了一种计算机可读介质,其上存储有计算机程序,该计算机程序被计算机执行时实现上述任一实施例中的方法。The embodiments of the present application also provide a computer-readable medium on which a computer program is stored, and when the computer program is executed by a computer, the method in any of the foregoing embodiments is implemented.
本申请实施例还提供了一种计算机程序产品,该计算机程序产品被计算机执行时实现上述任一实施例中的方法。The embodiments of the present application also provide a computer program product, which implements the method in any of the foregoing embodiments when the computer program product is executed by a computer.
本申请实施例还提供了一种系统芯片,该系统芯片包括:通信单元和处理单元。该处理单元,例如可以是处理器。该通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行计算机指令,以使该通信装置内的芯片执行上述本申请实施例提供的任一种的方法。The embodiment of the present application also provides a system chip, which includes a communication unit and a processing unit. The processing unit may be a processor, for example. The communication unit may be, for example, an input/output interface, a pin, or a circuit. The processing unit can execute computer instructions so that the chip in the communication device executes any of the methods provided in the foregoing embodiments of the present application.
可选地,该计算机指令被存储在存储单元中。Optionally, the computer instructions are stored in a storage unit.
本申请中的各个实施例可以独立的使用,也可以进行联合的使用,这里不做限定。The various embodiments in this application can be used independently or in combination, which is not limited here.
应理解,本申请实施例中出现的第一、第二等描述,仅作示意与区分描述对象之用,没有次序之分,也不表示本申请实施例中对设备个数的特别限定,不能构成对本申请实施例的任何限制。It should be understood that the first, second, etc. descriptions appearing in the embodiments of this application are only for illustration and to distinguish the description objects, and there is no order, nor does it mean that the number of devices in the embodiments of this application is particularly limited. It constitutes any limitation to the embodiments of the present application.
还应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should also be understood that in the various embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not be implemented in this application. The implementation process of the example constitutes any limitation.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机指令时,全部或部分地产生按照本申请实施例的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站 站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices. The computer instruction can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instruction can be transmitted from a website, computer, server, or data center through a cable. (Such as coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The usable medium can be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state disk (SSD)). ))Wait.
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。In this application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the associated objects are in an "or" relationship. "The following at least one item (a)" or similar expressions refers to any combination of these items, including any combination of a single item (a) or plural items (a). For example, at least one item (a) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that "one embodiment" or "an embodiment" mentioned throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Therefore, the appearance of "in one embodiment" or "in an embodiment" in various places throughout the specification does not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics can be combined in one or more embodiments in any suitable manner. It should be understood that in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, rather than corresponding to the embodiments of the present application. The implementation process constitutes any limitation.

Claims (28)

  1. 一种应用于无线网络中的训练模型的方法,其特征在于,所述方法由所述无线网络中的第一节点执行,所述方法包括:A method for training a model applied to a wireless network, characterized in that the method is executed by a first node in the wireless network, and the method includes:
    所述第一节点向所述无线网络中的至少一个第二节点发送第一请求,所述第一请求用于请求所述至少一个第二节点分别基于所述第二节点的本地数据,对第一共享模型进行本地模型再训练;The first node sends a first request to at least one second node in the wireless network, and the first request is used to request the at least one second node to make a request to the first node based on the local data of the second node. A shared model for local model retraining;
    所述第一节点从所述至少一个第二节点分别获取模型上报消息,所述每个第二节点的模型上报消息包括第一本地模型的参数,或者包括所述第一本地模型的参数与所述第一共享模型的参数之间的增量,其中,所述第一本地模型是所述每个第二节点基于所述第一请求和所述本地数据,对所述第一共享模型进行本地模型再训练后得到的;The first node obtains a model report message from the at least one second node, and the model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and the The increment between the parameters of the first sharing model, wherein the first local model is that each second node performs a local operation on the first sharing model based on the first request and the local data. Obtained after the model is retrained;
    所述第一节点根据所述至少一个第二节点的模型上报消息和所述第一共享模型,确定第二共享模型。The first node determines a second sharing model according to the model report message of the at least one second node and the first sharing model.
  2. 根据权利要求1所述的方法,其特征在于,所述模型上报消息还包括所述第一本地模型对应的训练数据集的大小,和/或,所述第一本地模型的预测误差;The method according to claim 1, wherein the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;
    所述第一节点根据所述至少一个第二节点的模型上报消息和所述第一共享模型,确定第二共享模型,包括:The first node determining the second sharing model according to the model report message of the at least one second node and the first sharing model includes:
    所述第一节点在所述至少一个第二节点对应的至少一个第一本地模型中确定第二本地模型,所述第二本地模型对应的训练数据集的大小大于或等于第一阈值,和/或,所述第二本地模型的预测误差小于或等于第二阈值;The first node determines a second local model in at least one first local model corresponding to the at least one second node, and the size of the training data set corresponding to the second local model is greater than or equal to a first threshold, and/ Or, the prediction error of the second local model is less than or equal to a second threshold;
    所述第一节点根据所述第二本地模型和所述第一共享模型,确定所述第二共享模型。The first node determines the second sharing model according to the second local model and the first sharing model.
  3. 根据权利要求2所述的方法,其特征在于,所述第一节点根据所述第二本地模型和所述第一共享模型,确定所述第二共享模型,包括:The method according to claim 2, wherein the first node determining the second sharing model according to the second local model and the first sharing model comprises:
    所述第一节点对所述第二本地模型的参数,或者所述第二本地模型的参数与所述第一共享模型的参数之间的增量,进行加权平均聚合,其中,所述加权平均聚合所采用的权重参数是根据所述第二本地模型对应的训练数据集的大小,和/或,所述第二本地模型的预测误差确定的;The first node performs weighted average aggregation on the parameters of the second local model, or the increments between the parameters of the second local model and the parameters of the first shared model, where the weighted average The weight parameter used in the aggregation is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model;
    所述第一节点根据加权平均聚合的结果和所述第一共享模型,确定所述第二共享模型。The first node determines the second sharing model according to the weighted average aggregation result and the first sharing model.
  4. 根据权利要求1-3任一项所述方法,其特征在于,所述第一节点中包括集中自适应策略功能和集中分析和建模功能,所述方法还包括:The method according to any one of claims 1 to 3, wherein the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function, and the method further comprises:
    所述集中自适应策略功能向所述集中分析和建模功能发送联合学习策略,所述联合学习策略包括以下信息中的至少一种:The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, where the joint learning strategy includes at least one of the following information:
    联合学习启动条件、所述第一共享模型的信息、联合学习群组成员标识、所述第一本地模型的上传策略、所述第一本地模型的筛选策略、所述第一本地模型的聚合策略、所述第一本地模型的处理策略或共享模型更新策略。Joint learning start conditions, information of the first shared model, identification of members of the joint learning group, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model , The processing strategy of the first local model or the shared model update strategy.
  5. 根据权利要求4所述的方法,其特征在于,所述第二节点中包括本地分析和建模功能;The method according to claim 4, wherein the second node includes local analysis and modeling functions;
    其中,所述第一节点向所述无线接入网中的至少一个第二节点发送第一请求,包括:Wherein, sending the first request by the first node to at least one second node in the radio access network includes:
    在满足所述联合学习启动条件时,所述第一节点中的所述集中分析和建模功能向所述至少一个第二节点中每个第二节点的本地分析和建模功能发送所述第一请求,其中,所述第一请求中包括所述第一共享模型的信息。When the joint learning start condition is met, the centralized analysis and modeling function in the first node sends the first node to the local analysis and modeling function of each second node in the at least one second node A request, wherein the first request includes information of the first sharing model.
  6. 根据权利要求4或5所述的方法,其特征在于,所述第一共享模型的信息包括以下至少一种:The method according to claim 4 or 5, wherein the information of the first sharing model includes at least one of the following:
    所述第一共享模型的模型标识、模型类型、模型结构、输入输出和初始模型参数或训练数据采集时长。The model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
  7. 根据权利要求5或6所述的方法,其特征在于,所述第一请求中还包括所述第一本地模型的上传策略。The method according to claim 5 or 6, wherein the first request further includes an upload strategy of the first local model.
  8. 根据权利要求7所述的方法,其特征在于,所述第一本地模型的上传策略包括以下至少一种:The method according to claim 7, wherein the upload strategy of the first local model comprises at least one of the following:
    所述第一本地模型上传前的处理算法的标识、所述第一本地模型的上传时间或所述第一本地模型上传时的携带信息;The identifier of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the information carried when the first local model was uploaded;
    其中,所述携带信息包括所述第一本地模型的训练数据集的大小和/或预测误差。Wherein, the carrying information includes the size and/or prediction error of the training data set of the first local model.
  9. 根据权利要求1-8任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1-8, further comprising:
    所述第一节点确定所述第二共享模型的预测误差小于或等于第三阈值;Determining, by the first node, that the prediction error of the second sharing model is less than or equal to a third threshold;
    所述第一节点向所述至少一个第二节点分别发送模型更新通知消息,所述模型更新通知消息用于请求所述每个第二节点安装所述第二共享模型。The first node sends a model update notification message to the at least one second node respectively, and the model update notification message is used to request each second node to install the second shared model.
  10. 一种应用于无线接入网中的训练模型的方法,其特征在于,所述方法由所述无线接入网中的第二节点执行,所述方法包括:A method for training a model applied to a radio access network, wherein the method is executed by a second node in the radio access network, and the method includes:
    所述第二节点从所述无线接入网中的第一节点接收第一请求;The second node receives the first request from the first node in the radio access network;
    所述第二节点根据所述第一请求,基于所述第二节点的本地数据对第一共享模型进行本地模型再训练,以得到第一本地模型;According to the first request, the second node performs local model retraining on the first shared model based on the local data of the second node to obtain the first local model;
    所述第二节点向所述第一节点发送模型上报消息,所述模型上报消息包括所述第一本地模型的参数,或者包括所述第一本地模型的参数与所述第一共享模型的参数之间的增量,所述模型上报消息用于所述第一共享模型的更新。The second node sends a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model The model report message is used to update the first shared model.
  11. 根据权利要求10所述的方法,其特征在于,所述模型上报消息还包括所述第一本地模型对应的训练数据集的大小,和/或,所述第一本地模型的预测误差。The method according to claim 10, wherein the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.
  12. 根据权利要求10或11所述的方法,其特征在于,所述第二节点中包括本地分析和建模功能,所述第一节点中包括集中分析和建模功能;The method according to claim 10 or 11, wherein the second node includes a local analysis and modeling function, and the first node includes a centralized analysis and modeling function;
    其中,所述第二节点从所述无线接入网中的第一节点接收第一请求,包括:Wherein, the second node receiving the first request from the first node in the radio access network includes:
    所述第二节点中的本地分析和建模功能从所述第一节点中本地分析和建模功能所述第一请求,其中,所述第一请求中包括所述第一共享模型的信息。The local analysis and modeling function in the second node obtains the first request from the local analysis and modeling function in the first node, wherein the first request includes the information of the first shared model.
  13. 根据权利要求12所述的方法,其特征在于,所述第一请求中还包括所述第一本地模型的上传策略。The method according to claim 12, wherein the first request further includes an upload strategy of the first local model.
  14. 根据权利要求13所述的方法,其特征在于,所述第一本地模型的上传策略包括以下至少一种:The method according to claim 13, wherein the upload strategy of the first local model comprises at least one of the following:
    所述第一本地模型上传前的处理算法的标识、所述第一本地模型的上传时间或所述第 一本地模型上传时的携带信息;The identifier of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the information carried when the first local model was uploaded;
    其中,所述携带信息包括所述第一本地模型的训练数据集的大小和/或预测误差。Wherein, the carrying information includes the size and/or prediction error of the training data set of the first local model.
  15. 根据权利要求12所述的方法,其特征在于,所述第二节点中还包括本地自适应策略功能,所述方法还包括:The method according to claim 12, wherein the second node further comprises a local adaptive policy function, and the method further comprises:
    所述本地分析和建模功能向所述本地自适应策略发送第三请求,所述第三请求用于请求对应于所述第一共享模型的本地联合学习策略,所述本地联合学习策略用于指示所述第二节点是否对所述第一共享模型进行本地模型再训练,所述第三请求中包括所述第一共享模型的信息;The local analysis and modeling function sends a third request to the local adaptive strategy, the third request is used to request a local joint learning strategy corresponding to the first shared model, and the local joint learning strategy is used to Indicating whether the second node performs local model retraining on the first shared model, and the third request includes the information of the first shared model;
    所述本地分析和建模功能接收所述本地自适应策略发送的所述本地联合学习策略;The local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy;
    当所述本地联合学习策略指示所述第二节点对所述第一共享模型进行本地模型再训练时,所述本地分析和建模功能基于所述本地数据,对所述第一共享模型进行本地模型再训练。When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local retraining on the first shared model based on the local data. Model retraining.
  16. 根据权利要求12-15任一项所述的方法,其特征在于,所述第一共享模型的信息包括以下至少一种:The method according to any one of claims 12-15, wherein the information of the first sharing model includes at least one of the following:
    所述第一共享模型的模型标识、模型类型、模型结构、输入输出和初始模型参数或训练数据采集时长。The model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
  17. 根据权利要求12-16任一项所述的方法,其特征在于,还包括:The method according to any one of claims 12-16, further comprising:
    所述第二节点接收所述第一节点发送的模型更新通知消息,所述模型更新通知消息用于通知第二共享模型,其中,所述第二共享模型是所述第一节点根据所述模型上报消息和所述第一共享模型确定的;The second node receives a model update notification message sent by the first node, where the model update notification message is used to notify a second sharing model, where the second sharing model is that the first node is based on the model The report message and the first sharing model are determined;
    所述第二节点确定所述第二共享模型的预测误差小于第四阈值时,安装所述第二共享模型。When the second node determines that the prediction error of the second sharing model is less than a fourth threshold, install the second sharing model.
  18. 一种应用于无线网络中的训练模型的装置,其特征在于,所述装置为所述无线网络中的第一节点,所述装置包括:A device for training a model applied to a wireless network, wherein the device is a first node in the wireless network, and the device includes:
    发送单元,用于向所述无线网络中的至少一个第二节点发送第一请求,所述第一请求用于请求所述至少一个第二节点分别基于所述第二节点的本地数据,对第一共享模型进行本地模型再训练;The sending unit is configured to send a first request to at least one second node in the wireless network, where the first request is used to request the at least one second node to send a request to the first node based on the local data of the second node. A shared model for local model retraining;
    接收单元,用于从所述至少一个第二节点分别获取模型上报消息,所述每个第二节点的模型上报消息包括第一本地模型的参数,或者包括所述第一本地模型的参数与所述第一共享模型的参数之间的增量,其中,所述第一本地模型是所述每个第二节点基于所述第一请求和所述本地数据,对所述第一进行本地模型再训练后得到的;The receiving unit is configured to obtain a model report message from the at least one second node, where the model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and the The increment between the parameters of the first shared model, wherein the first local model is that each second node performs a local model reconfiguration on the first based on the first request and the local data Obtained after training;
    确定单元,用于根据所述至少一个第二节点的模型上报消息和所述第一共享模型,确定第二共享模型。The determining unit is configured to determine the second sharing model according to the model report message of the at least one second node and the first sharing model.
  19. 根据权利要求18所述的装置,其特征在于,所述模型上报消息还包括所述第一本地模型对应的训练数据集的大小,和/或,所述第一本地模型的预测误差;The apparatus according to claim 18, wherein the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;
    所述确定单元具体用于:The determining unit is specifically used for:
    在所述至少一个第二节点对应的至少一个第一本地模型中确定第二本地模型,所述第二本地模型对应的训练数据集的大小大于或等于第一阈值,和/或,所述第二本地模型的预测误差小于或等于第二阈值;A second local model is determined in at least one first local model corresponding to the at least one second node, and the size of the training data set corresponding to the second local model is greater than or equal to a first threshold, and/or, the first 2. The prediction error of the local model is less than or equal to the second threshold;
    根据所述第二本地模型和所述第一共享模型,确定所述第二共享模型。Determine the second sharing model according to the second local model and the first sharing model.
  20. 根据权利要求19所述的装置,其特征在于,所述确定单元具体用于:The device according to claim 19, wherein the determining unit is specifically configured to:
    对所述第二本地模型的参数,或者所述第二本地模型的参数与所述第一共享模型的参数之间的增量,进行加权平均聚合,其中,所述加权平均聚合所采用的权重参数是根据所述第二本地模型对应的训练数据集的大小,和/或,所述第二本地模型的预测误差确定的;Perform weighted average aggregation on the parameters of the second local model, or the increments between the parameters of the second local model and the parameters of the first shared model, wherein the weight used by the weighted average aggregation The parameter is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model;
    所述第一节点根据加权平均聚合的结果和所述第一共享模型,确定所述第二共享模型。The first node determines the second sharing model according to the weighted average aggregation result and the first sharing model.
  21. 根据权利要求18-20任一项所述装置,其特征在于,所述第一节点中包括集中自适应策略功能和集中分析和建模功能:The device according to any one of claims 18-20, wherein the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function:
    所述集中自适应策略功能向所述集中分析和建模功能发送联合学习策略,所述联合学习策略包括以下信息中的至少一种:The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, where the joint learning strategy includes at least one of the following information:
    联合学习启动条件、所述第一共享模型的信息、联合学习群组成员标识、所述第一本地模型的上传策略、所述第一本地模型的筛选策略、所述第一本地模型的聚合策略、所述第一本地模型的处理策略或共享模型更新策略。Joint learning start conditions, information of the first shared model, identification of members of the joint learning group, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model , The processing strategy of the first local model or the shared model update strategy.
  22. 根据权利要求21所述的装置,其特征在于,所述第二节点中包括本地分析和建模功能;The device according to claim 21, wherein the second node includes local analysis and modeling functions;
    其中,所述发送单元具体用于:Wherein, the sending unit is specifically used for:
    在满足所述联合学习启动条件时,所述集中分析和建模功能向所述至少一个第二节点中每个第二节点的本地分析和建模功能发送所述第一请求,其中,所述第一请求中包括所述第一共享模型的信息。When the joint learning start condition is met, the centralized analysis and modeling function sends the first request to the local analysis and modeling function of each second node in the at least one second node, wherein the The first request includes the information of the first sharing model.
  23. 根据权利要求18-22任一项所述的装置,其特征在于,The device according to any one of claims 18-22, wherein:
    所述确定单元还用于确定所述第二共享模型的预测误差小于或等于第三阈值;The determining unit is further configured to determine that the prediction error of the second sharing model is less than or equal to a third threshold;
    所述发送单元还用于向所述至少一个第二节点分别发送模型更新通知消息,所述模型更新通知消息用于请求所述每个第二节点安装所述第二共享模型。The sending unit is further configured to send a model update notification message to the at least one second node respectively, and the model update notification message is used to request each second node to install the second shared model.
  24. 一种应用于无线接入网中的训练模型的装置,其特征在于,所述装置为所述无线接入网中的第二节点,所述装置包括:A device applied to a training model in a wireless access network, wherein the device is a second node in the wireless access network, and the device includes:
    接收单元,用于从所述无线接入网中的第一节点接收第一请求;A receiving unit, configured to receive a first request from a first node in the wireless access network;
    处理单元,用于根据所述第一请求,基于所述第二节点的本地数据对第一共享模型进行本地模型再训练,以得到第一本地模型;A processing unit, configured to perform local model retraining on the first shared model based on the local data of the second node according to the first request to obtain the first local model;
    发送单元,用于向所述第一节点发送模型上报消息,所述模型上报消息包括所述第一本地模型的参数,或者包括所述第一本地模型的参数与所述第一共享模型的参数之间的增量,所述模型上报消息用于所述第一共享模型的更新。The sending unit is configured to send a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model The model report message is used to update the first shared model.
  25. 根据权利要求24所述的装置,其特征在于,所述模型上报消息还包括所述第一本地模型对应的训练数据集的大小,和/或,所述第一本地模型的预测误差。The apparatus according to claim 24, wherein the model report message further comprises the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.
  26. 根据权利要求24或25所述的装置,其特征在于,所述第二节点中包括本地分析和建模功能,所述第一节点中包括集中分析和建模功能;The device according to claim 24 or 25, wherein the second node includes a local analysis and modeling function, and the first node includes a centralized analysis and modeling function;
    其中,所述第二节点从所述无线接入网中的第一节点接收第一请求,包括:Wherein, the second node receiving the first request from the first node in the radio access network includes:
    所述第二节点中的本地分析和建模功能从所述第一节点中本地分析和建模功能所述第一请求,其中,所述第一请求中包括所述第一共享模型的信息。The local analysis and modeling function in the second node obtains the first request from the local analysis and modeling function in the first node, wherein the first request includes the information of the first shared model.
  27. 根据权利要求26所述的装置,其特征在于,所述第二节点中还包括本地自适应策略功能,The device according to claim 26, wherein the second node further includes a local adaptive policy function,
    所述本地分析和建模功能向所述本地自适应策略发送第三请求,所述第三请求用于请求对应于所述第一共享模型的本地联合学习策略,所述本地联合学习策略用于指示所述第二节点是否对所述第一共享模型进行本地模型再训练,所述第三请求中包括所述第一共享模型的信息;The local analysis and modeling function sends a third request to the local adaptive strategy, the third request is used to request a local joint learning strategy corresponding to the first shared model, and the local joint learning strategy is used to Indicating whether the second node performs local model retraining on the first shared model, and the third request includes the information of the first shared model;
    所述本地分析和建模功能接收所述本地自适应策略发送的所述本地联合学习策略;The local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy;
    当所述本地联合学习策略指示所述第二节点对所述第一共享模型进行本地模型再训练时,所述本地分析和建模功能基于所述本地数据,对所述第一共享模型进行本地模型再训练。When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local retraining on the first shared model based on the local data. Model retraining.
  28. 根据权利要求26或27所述的装置,其特征在于,还包括:The device according to claim 26 or 27, further comprising:
    所述第二节点接收所述第一节点发送的模型更新通知消息,所述模型更新通知消息用于通知第二共享模型,其中,所述第二共享模型是所述第一节点根据所述模型上报消息和所述第一共享模型确定的;The second node receives a model update notification message sent by the first node, where the model update notification message is used to notify a second sharing model, where the second sharing model is that the first node is based on the model The report message and the first sharing model are determined;
    所述第二节点确定所述第二共享模型的预测误差小于第四阈值时,安装所述第二共享模型。When the second node determines that the prediction error of the second sharing model is less than a fourth threshold, install the second sharing model.
PCT/CN2019/118762 2019-02-22 2019-11-15 Model training method and apparatus WO2020168761A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910135464.6 2019-02-22
CN201910135464.6A CN111612153B (en) 2019-02-22 2019-02-22 Method and device for training model

Publications (1)

Publication Number Publication Date
WO2020168761A1 true WO2020168761A1 (en) 2020-08-27

Family

ID=72143917

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118762 WO2020168761A1 (en) 2019-02-22 2019-11-15 Model training method and apparatus

Country Status (2)

Country Link
CN (1) CN111612153B (en)
WO (1) WO2020168761A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100145A (en) * 2020-09-02 2020-12-18 南京三眼精灵信息技术有限公司 Digital model sharing learning system and method
CN112232519A (en) * 2020-10-15 2021-01-15 成都数融科技有限公司 Joint modeling method based on federal learning
CN112329557A (en) * 2020-10-21 2021-02-05 杭州趣链科技有限公司 Model application method and device, computer equipment and storage medium
WO2023151454A1 (en) * 2022-02-14 2023-08-17 大唐移动通信设备有限公司 Model monitoring method, monitoring end, device, and storage medium
CN118052302A (en) * 2024-04-11 2024-05-17 北京钢研新材科技有限公司 Federal learning method and device for material data model
WO2024113822A1 (en) * 2022-11-30 2024-06-06 华为技术有限公司 Distributed machine learning method, device, storage medium and program product

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269794A (en) * 2020-09-16 2021-01-26 连尚(新昌)网络科技有限公司 Method and equipment for violation prediction based on block chain
WO2022252162A1 (en) * 2021-06-02 2022-12-08 北京小米移动软件有限公司 Model training method, model training apparatus and storage medium
CN116233857A (en) * 2021-12-02 2023-06-06 华为技术有限公司 Communication method and communication device
CN116887290A (en) * 2022-03-28 2023-10-13 华为技术有限公司 Communication method and device for training machine learning model
CN117196071A (en) * 2022-05-27 2023-12-08 华为技术有限公司 Model training method and device
CN117221944A (en) * 2022-06-02 2023-12-12 华为技术有限公司 Communication method and device
WO2024026846A1 (en) * 2022-08-05 2024-02-08 华为技术有限公司 Artificial intelligence model processing method and related device
WO2024065682A1 (en) * 2022-09-30 2024-04-04 Shenzhen Tcl New Technology Co., Ltd. Communication devices and methods for machine learning model training
CN117993516A (en) * 2022-11-04 2024-05-07 华为技术有限公司 Communication method and device
CN116566846B (en) * 2023-07-05 2023-09-22 中国电信股份有限公司 Model management method and system, shared node and network node

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150242760A1 (en) * 2014-02-21 2015-08-27 Microsoft Corporation Personalized Machine Learning System
CN105575389A (en) * 2015-12-07 2016-05-11 百度在线网络技术(北京)有限公司 Model training method, system and device
CN106062786A (en) * 2014-09-12 2016-10-26 微软技术许可有限责任公司 Computing system for training neural networks
CN107871160A (en) * 2016-09-26 2018-04-03 谷歌公司 Communicate efficient joint study

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9160760B2 (en) * 2014-01-06 2015-10-13 Cisco Technology, Inc. Anomaly detection in a computer network
US20150324690A1 (en) * 2014-05-08 2015-11-12 Microsoft Corporation Deep Learning Training System
US10346944B2 (en) * 2017-04-09 2019-07-09 Intel Corporation Machine learning sparse computation mechanism
CN108345661B (en) * 2018-01-31 2020-04-28 华南理工大学 Wi-Fi clustering method and system based on large-scale Embedding technology
CN108596345A (en) * 2018-04-23 2018-09-28 薛泽 Machine learning based on block chain and make a mistake prior-warning device and method
CN109145984B (en) * 2018-08-20 2022-03-25 联想(北京)有限公司 Method and apparatus for machine training
CN109325541A (en) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for training pattern

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150242760A1 (en) * 2014-02-21 2015-08-27 Microsoft Corporation Personalized Machine Learning System
CN106062786A (en) * 2014-09-12 2016-10-26 微软技术许可有限责任公司 Computing system for training neural networks
CN105575389A (en) * 2015-12-07 2016-05-11 百度在线网络技术(北京)有限公司 Model training method, system and device
CN107871160A (en) * 2016-09-26 2018-04-03 谷歌公司 Communicate efficient joint study

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100145A (en) * 2020-09-02 2020-12-18 南京三眼精灵信息技术有限公司 Digital model sharing learning system and method
CN112100145B (en) * 2020-09-02 2023-07-04 南京三眼精灵信息技术有限公司 Digital model sharing learning system and method
CN112232519A (en) * 2020-10-15 2021-01-15 成都数融科技有限公司 Joint modeling method based on federal learning
CN112232519B (en) * 2020-10-15 2024-01-09 成都数融科技有限公司 Joint modeling method based on federal learning
CN112329557A (en) * 2020-10-21 2021-02-05 杭州趣链科技有限公司 Model application method and device, computer equipment and storage medium
WO2023151454A1 (en) * 2022-02-14 2023-08-17 大唐移动通信设备有限公司 Model monitoring method, monitoring end, device, and storage medium
WO2024113822A1 (en) * 2022-11-30 2024-06-06 华为技术有限公司 Distributed machine learning method, device, storage medium and program product
CN118052302A (en) * 2024-04-11 2024-05-17 北京钢研新材科技有限公司 Federal learning method and device for material data model

Also Published As

Publication number Publication date
CN111612153A (en) 2020-09-01
CN111612153B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
WO2020168761A1 (en) Model training method and apparatus
US11824784B2 (en) Automated platform resource management in edge computing environments
WO2016161677A1 (en) Traffic offload method and system
JP7159347B2 (en) MODEL UPDATE METHOD AND APPARATUS, AND SYSTEM
EP4158558A1 (en) Federated learning optimizations
CN113014415A (en) End-to-end quality of service in an edge computing environment
CN108667657B (en) SDN-oriented virtual network mapping method based on local feature information
CN112153700A (en) Network slice resource management method and equipment
CN110365568A (en) A kind of mapping method of virtual network based on deeply study
CN113194489B (en) Minimum-maximum cost optimization method for effective federal learning in wireless edge network
US20190044806A1 (en) Systems and methods for managing a cloud deployed service
CN113095512A (en) Federal learning modeling optimization method, apparatus, medium, and computer program product
WO2019206100A1 (en) Feature engineering programming method and apparatus
EP4002231A1 (en) Federated machine learning as a service
CN103368910B (en) Virtual radio communications network system and method for building up thereof
US11483177B2 (en) Dynamic intelligent analytics VPN instantiation and/or aggregation employing secured access to the cloud network device
CN115882981A (en) Unlicensed spectrum acquisition with cooperative spectrum sensing in next generation networks
WO2022001941A1 (en) Network element management method, network management system, independent computing node, computer device, and storage medium
US20220321408A1 (en) Change deployment system
CN114548416A (en) Data model training method and device
Koudouridis et al. An architecture and performance evaluation framework for artificial intelligence solutions in beyond 5G radio access networks
WO2021238508A1 (en) Data processing method, apparatus, and device
Donatti et al. Survey on Machine Learning-Enabled Network Slicing: Covering the Entire Life Cycle
CN115460617A (en) Network load prediction method and device based on federal learning, electronic equipment and medium
WO2024011908A1 (en) Network prediction system and method, and electronic device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19916101

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19916101

Country of ref document: EP

Kind code of ref document: A1