CN111612153B - Method and device for training model - Google Patents

Method and device for training model Download PDF

Info

Publication number
CN111612153B
CN111612153B CN201910135464.6A CN201910135464A CN111612153B CN 111612153 B CN111612153 B CN 111612153B CN 201910135464 A CN201910135464 A CN 201910135464A CN 111612153 B CN111612153 B CN 111612153B
Authority
CN
China
Prior art keywords
model
local
node
sharing
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910135464.6A
Other languages
Chinese (zh)
Other versions
CN111612153A (en
Inventor
王园园
池清华
徐以旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910135464.6A priority Critical patent/CN111612153B/en
Priority to PCT/CN2019/118762 priority patent/WO2020168761A1/en
Publication of CN111612153A publication Critical patent/CN111612153A/en
Application granted granted Critical
Publication of CN111612153B publication Critical patent/CN111612153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application provides a method and a device for training a model in a wireless network, which can realize joint learning in the wireless network and are beneficial to obtaining a training model with higher accuracy and generalization capability. In the embodiment of the application, a first node in a wireless network sends a first request to at least one second node in the wireless network, each second node can locally retrain a first sharing model based on local data of the second node according to the first request, and then the first node can determine a second sharing model according to the local model reported by the second node by reporting the local model to the first node.

Description

Method and device for training model
Technical Field
The present application relates to the field of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), and more particularly to a method and apparatus for training models in the AI field.
Background
AI can be applied in wireless networks. With more spectrum, more traffic classes and number of access terminals in a wireless network, the network system will be more complex, which will also make the architecture of the wireless network and the access network devices more intelligent and automated. In combination with various service features, network architecture and device morphology in wireless networks, wireless intelligent network architecture has been defined.
In a wireless intelligent network architecture, a machine learning training model may be utilized. The machine learning training model mainly adopts two forms of centralized training and local training. The centralized training needs to aggregate training data to a central node, which causes large communication overhead, and also has the problems of prolonged modeling, data privacy problem in uploading user data, large storage and calculation pressure of the central node and the like. The local training does not need to report the model, and each local node is modeled by using local data. The problem of insufficient data volume in local training can lead to inaccurate models, and the problem of weak local generalization capability of the models.
Therefore, a solution for model training by cooperation of a central node and a local node is needed in a wireless intelligent network architecture.
Disclosure of Invention
The application provides a method and a device for training a model in a wireless network, which can realize joint learning in the wireless network and are beneficial to obtaining a training model with higher accuracy and generalization capability.
In a first aspect, there is provided a method applied to a training model in a wireless network, the method being performed by a first node in the wireless network, the method comprising:
The first node sends a first request to at least one second node in the wireless network, wherein the first request is used for requesting the at least one second node to retrain the first sharing model based on local data of the second node respectively;
The first node respectively acquires model report messages from the at least one second node, wherein the model report message of each second node comprises parameters of a first local model or an increment between the parameters of the first local model and the parameters of the first shared model, and the first local model is obtained after each second node retrains the first shared model based on a first request and local data;
and the first node determines a second sharing model according to the model report message of the at least one second node and the first sharing model.
Therefore, in the embodiment of the application, a first node in a wireless network sends a first request to at least one second node in the wireless network, each second node can locally retrain a first sharing model based on local data according to the first request, and then report parameters of the local model obtained through training or increment between the parameters of the local model and the parameters of the sharing model to the first node through a model report message, so that the first node can determine a second sharing model according to report content of the at least one second node and the first sharing model.
In the embodiment of the application, joint learning refers to that a first node is used as a centralized node, a second node is used as a local node, and the first node and the second node are cooperated to jointly learn to train a model under the condition that local data of the second node does not need to be uploaded to the first node. The joint learning can overcome part or all of the disadvantages of the centralized training model and can also overcome part or all of the disadvantages of the local training model.
Specifically, compared with the mode of centralized training of the model in the prior art, the embodiment of the application does not need a local node to report training data to the centralized node, so that communication overhead caused by reporting the training data can be greatly reduced, the storage of ultra-large-scale data of the centralized node and the pressure of model training can be reduced.
In addition, compared with the local training model in the prior art, the local node sends the local model after local retraining to the centralized node, so that the centralized node can update the shared model according to the local model of at least one local node, thereby being capable of helping to overcome the problem of insufficient data quantity in local training and further improving the accuracy of the training model and the model generalization capability.
With reference to the first aspect, in some implementations of the first aspect, the model report message further includes a size of a training data set corresponding to the first local model, and/or a prediction error of the first local model;
The first node determines a second sharing model according to the model report message of the at least one second node and the first sharing model, and includes:
The first node determines a second local model in at least one first local model corresponding to at least one second node, wherein the size of a training data set corresponding to the second local model is greater than or equal to a first threshold value, and/or the prediction error of the second local model is less than or equal to a second threshold value;
the first node determines the second sharing model according to the second local model and the first sharing model.
Therefore, when the first node determines the second sharing model according to the first local model, the embodiment of the application screens out the second local model with the size of the training data set being greater than or equal to a specific threshold value and/or with the prediction error being less than or equal to a specific threshold value from at least one first local model, and then determines the second sharing model according to the second local model.
With reference to the first aspect, in certain implementation manners of the first aspect, the determining, by the first node, the second sharing model according to the second local model and the first sharing model includes:
The first node carries out weighted average aggregation on the parameters of the second local model or the increment between the parameters of the second local model and the parameters of the first shared model, wherein the weight parameters adopted by the weighted average aggregation are determined according to the size of a training data set corresponding to the second local model and/or the prediction error of the second local model;
The first node determines a second sharing model according to the result of the weighted average aggregation and the first sharing model.
As an example, the weight parameter may be the inverse of the total number of second local models, where the weight parameters of the respective second local models are the same. Or the weight parameter of the second local model may be a ratio of the size of the training data set corresponding to the second local model to the size of all the training data sets, where the size of all the training data sets is the sum of the sizes of the training data sets corresponding to the second local models. Or the weight parameter of each second local model may be the reciprocal of each corresponding MAE.
Alternatively, the first node may not screen the at least one second local model among the at least one first local model, but determine the second shared model directly from the at least one first local model. At this time, the weighted average aggregation may be performed on the parameters of the first local model or the increment between the parameters of the first local model and the parameters of the shared model. At this time, as an example, the weight parameter of each first local model may be the inverse of the total number of first local models. The first node then determines a second sharing model based on the result of the weighted average aggregation and the first sharing model.
With reference to the first aspect, in certain implementations of the first aspect, a centralized adaptive policy function and a centralized analysis and modeling function are included in the first node, the method further includes:
The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, the joint learning strategy comprising at least one of the following information:
The method comprises the steps of joint learning starting conditions, information of a first shared model, joint learning group member identification, uploading strategies of the first local model, screening strategies of the first local model, aggregation strategies of the first local model, processing strategies of the first local model or sharing model updating strategies.
Therefore, the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through the joint learning strategy, and comprises arranging and managing at least one of joint learning starting conditions, uploading strategies of models, screening strategies of models, aggregation strategies of models, processing strategies of the models and the like. Based on the above, the embodiment of the application can realize that the training model with high accuracy and generalization capability is obtained by cooperating with the first node and the second node to learn together under the condition that the local data of the second node does not need to be uploaded to the first node.
Optionally, in the embodiment of the present application, when the first node meets the joint learning starting condition, the first node sends a first request for performing joint learning training to the second node.
With reference to the first aspect, in certain implementation manners of the first aspect, the second node includes a local analysis and modeling function therein;
wherein the first node sends a first request to at least one second node in the radio access network, comprising:
Upon satisfaction of the joint learning initiation condition, the centralized analysis and modeling function in the first node sends a first request to the local analysis and modeling function of each of the at least one second node, where the first request includes information of the first shared model.
Therefore, the embodiment of the application can realize model training by adopting a mode of centralized training under the condition that the starting condition of the joint learning is not met and can realize model training by adopting a mode of joint learning under the condition that the starting condition of the joint learning is met by setting the starting condition of the joint learning.
With reference to the first aspect, in certain implementations of the first aspect, the information of the first shared model includes at least one of:
Model identification, model type, model structure, input and output, and initial model parameters or training data acquisition duration of the first shared model.
With reference to the first aspect, in certain implementation manners of the first aspect, the first request further includes an upload policy of the first local model.
According to the embodiment of the application, the first node sends the uploading strategy of the first local model to the second node, so that the first node can indicate how the second node uploads the first local model. Correspondingly, after the second node obtains the uploading policy of the first local model, the second node can retrain the local model according to the uploading policy to obtain the local model and perform corresponding processing operation.
With reference to the first aspect, in certain implementations of the first aspect, the uploading policy of the first local model includes at least one of:
The method comprises the steps of identifying a processing algorithm before uploading a first local model, uploading time of the first local model or carrying information when the first local model is uploaded, wherein the carrying information comprises the size and/or prediction error of a training data set of the first local model.
With reference to the first aspect, in certain implementation manners of the first aspect, the method further includes:
the first node determines that the prediction error of the second sharing model is less than or equal to a third threshold;
The first node sends a model update notification message to at least one second node, respectively, the model update notification message being used to request each second node to install a second shared model.
Therefore, the embodiment of the application judges the prediction error of the second sharing model, and when the prediction error is smaller than or equal to the preset threshold value, the first node stores the second sharing model, and the second node installs the second sharing model. When the prediction error is determined to be greater than the preset threshold, the first node does not update the first sharing model, and the second node does not install the second sharing model. Based on the method, the embodiment of the application can avoid the installation of the sharing model with low accuracy, and further ensure the accuracy and generalization capability of the updated sharing model.
In a possible implementation manner, the prediction error of the first sharing model may be set to a third threshold, which is not limited by the embodiment of the present application.
In a second aspect, a method is provided for applying to a training model in a radio access network, characterized in that the method is performed by a second node in the radio access network, the method comprising:
the second node receives a first request from a first node in the radio access network;
The second node retrains the first shared model based on the local data of the second node according to the first request to obtain a first local model;
The second node transmits a model report message to the first node, wherein the model report message comprises parameters of the first local model or an increment between the parameters of the first local model and parameters of the first shared model, and the model report message is used for updating the first shared model.
Therefore, in the embodiment of the application, a first node in a wireless network sends a first request to at least one second node in the wireless network, each second node can locally retrain a first sharing model based on local data according to the first request, and then report parameters of the local model obtained through training or increment between the parameters of the local model and the parameters of the sharing model to the first node through a model report message, so that the first node can determine a second sharing model according to report content of the at least one second node and the first sharing model.
In the embodiment of the application, joint learning refers to that a first node is used as a centralized node, a second node is used as a local node, and the first node and the second node are cooperated to jointly learn to train a model under the condition that local data of the second node does not need to be uploaded to the first node. The joint learning can overcome part or all of the disadvantages of the centralized training model and can also overcome part or all of the disadvantages of the local training model.
Specifically, compared with the mode of centralized training of the model in the prior art, the embodiment of the application does not need a local node to report training data to the centralized node, so that communication overhead caused by reporting the training data can be greatly reduced, the storage of ultra-large-scale data of the centralized node and the pressure of model training can be reduced.
In addition, compared with the local training model in the prior art, the local node sends the local model after local retraining to the centralized node, so that the centralized node can update the shared model according to the local model of at least one local node, thereby being capable of helping to overcome the problem of insufficient data quantity in local training and further improving the accuracy of the training model and the model generalization capability.
With reference to the second aspect, in some implementations of the second aspect, the model report message further includes a size of a training data set corresponding to the first local model, and/or a prediction error of the first local model.
The first node may screen out a second local model with a size of the training data set greater than or equal to a specific threshold and/or with a prediction error less than or equal to a specific threshold according to a size of the training data set corresponding to the first local model in the model report message and/or with a prediction error, and then determine the second shared model through the second local model. Because the accuracy of the second local model is higher than that of the first local model in the embodiment of the application, the embodiment of the application can be beneficial to improving the accuracy of the second sharing model.
With reference to the second aspect, in certain implementations of the second aspect, the second node includes a local analysis and modeling function therein, and the first node includes a centralized analysis and modeling function therein;
wherein the second node receives a first request from a first node in the radio access network, comprising:
the local analysis and modeling function in the second node receives a first request from the local analysis and modeling function in the first node, wherein the first request includes information of the first shared model.
With reference to the second aspect, in some implementations of the second aspect, the first request further includes an upload policy of the first local model.
According to the embodiment of the application, the first node sends the uploading strategy of the first local model to the second node, so that the first node can indicate how the second node uploads the first local model. Correspondingly, after the second node obtains the uploading policy of the first local model, the second node can retrain the local model according to the uploading policy to obtain the local model and perform corresponding processing operation.
With reference to the second aspect, in certain implementations of the second aspect, the uploading policy of the first local model includes at least one of:
Identification of a processing algorithm before uploading the first local model, uploading time of the first local model or carrying information when uploading the first local model; the carrying information when the first local model is uploaded comprises the size and/or the prediction error of the training data set of the first local model.
With reference to the second aspect, in some implementations of the second aspect, a local adaptive policy function is further included in the second node, and the method further includes:
The local analysis and modeling function sends a third request to the local adaptive strategy, the third request being used for requesting a local joint learning strategy corresponding to the first sharing model, the local joint learning strategy being used for indicating whether the second node retrains the first sharing model locally or not, the third request including information of the first sharing model;
The local analysis and modeling function receives a local joint learning strategy sent by the local self-adaptive strategy;
When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local model retraining on the first shared model based on the local data.
Therefore, the embodiment of the application sends the local joint learning strategy to the local analysis and modeling function through the local self-adaptive strategy function, so that the second node can determine whether to participate in joint learning according to the self computing capability, thereby avoiding the extension of the joint learning iteration time caused by the insufficient self computing capability of the second node and improving the joint learning efficiency.
Optionally, in the embodiment of the present application, the local adaptive policy function may not send a local joint learning policy to the local analysis modeling function, and the second node may participate in joint learning all the time when receiving a request for requesting to perform joint learning training.
With reference to the second aspect, in certain implementations of the second aspect, the information of the first shared model includes at least one of:
Model identification, model type, model structure, input and output, and initial model parameters or training data acquisition duration of the first shared model.
With reference to the second aspect, in certain implementations of the second aspect, the method further includes:
The second node receives a model update notification message sent by the first node, wherein the model update notification message is used for notifying a second sharing model, and the second sharing model is determined by the first node according to a model report message and the first sharing model;
the second node installs the second sharing model when it is determined that the prediction error of the second sharing model is less than the fourth threshold.
Therefore, the embodiment of the application judges the prediction error of the second sharing model, and when the prediction error is smaller than or equal to the preset threshold value, the first node stores the second sharing model, and the second node installs the second sharing model. When the prediction error is determined to be greater than the preset threshold, the first node does not update the first sharing model, and the second node does not install the second sharing model. Based on the method, the embodiment of the application can avoid the installation of the sharing model with low accuracy, and further ensure the accuracy and generalization capability of the updated sharing model.
In a possible implementation manner, the prediction error of the first sharing model may be set to a fourth threshold, which is not limited by the embodiment of the present application.
Optionally, in the embodiment of the present application, when the functions of the intelligent network element are deployed in a unified manner, the interaction information during the joint learning between the first node and the second node may be directly transmitted through an interface between the first node and the second node.
In a third aspect, an apparatus for training a model is provided, which may be a first node in a wireless network or a chip within the first node. As an example, the first node may be a centralized node, or a central node. The apparatus has the functionality to implement the first aspect described above and various possible implementations. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In one possible design, the apparatus includes: the transceiver module, which may include a radio frequency circuit or an antenna, may be, for example, at least one of a transceiver, a receiver, and a transmitter. The processing module may be a processor. Optionally, the apparatus further comprises a storage module, which may be a memory, for example. When included, the memory module is used to store instructions. The processing module is connected to the storage module, and the processing module may execute the instructions stored in the storage module or the instructions from other sources, so that the apparatus performs the method of the first aspect and the various possible implementation manners.
In another possible design, when the device is a chip, the chip includes: the transceiver module, which may be, for example, an input/output interface, pins or circuitry on the chip, may optionally further comprise a processing module. The processing module may be, for example, a processor. The processing module may execute instructions to cause a chip within the terminal to perform the communication method of the first aspect and any possible implementation. Alternatively, the processing module may execute instructions in a memory module, which may be an on-chip memory module, such as a register, cache, or the like. The memory module may also be a static memory device, random access memory (random access memory, RAM), or the like, located within the communication device, but external to the chip, such as read-only memory (ROM) or other type of static storage device that may store static information and instructions.
The processor referred to in any of the above may be a general purpose Central Processing Unit (CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits configured to control program execution of the methods of the first aspect and various possible implementations.
In a fourth aspect, an apparatus for training a model is provided, where the apparatus may be a second node in a wireless network or may be a chip within the second node. As an example, the second node may be a local node, or a distributed edge node. The apparatus has the functionality to implement the second aspect described above and various possible implementations. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In one possible design, the apparatus includes: the transceiver module, which may include a radio frequency circuit or an antenna, may be, for example, at least one of a transceiver, a receiver, and a transmitter. The processing module may be a processor. Optionally, the apparatus further comprises a storage module, which may be a memory, for example. When included, the memory module is used to store instructions. The processing module is connected to the storage module, and the processing module may execute the instructions stored in the storage module or the instructions from other sources, so that the apparatus performs the communication method of the second aspect and various possible implementation manners.
In another possible design, when the device is a chip, the chip includes: the transceiver module, which may be, for example, an input/output interface, pins or circuitry on the chip, may optionally further comprise a processing module. The processing module may be, for example, a processor. The processing module may execute instructions to cause a chip within the terminal to perform the method of the first aspect and any possible implementation. Alternatively, the processing module may execute instructions in a memory module, which may be an on-chip memory module, such as a register, cache, or the like. The memory module may also be a static memory device, random access memory (random access memory, RAM), or the like, located within the communication device, but external to the chip, such as read-only memory (ROM) or other type of static storage device that may store static information and instructions.
The processor referred to in any of the above may be a general purpose Central Processing Unit (CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits configured to control program execution in accordance with the second aspect and various possible implementations.
In a fifth aspect, a computer storage medium is provided, in which program code is stored for instructing the execution of the method of the first aspect or the second aspect or any possible implementation thereof.
In a sixth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of the first or second aspect or any possible implementation thereof.
In a seventh aspect, a communication system is provided, which comprises means having functions for implementing the methods and various possible designs of the first aspect and means having functions for implementing the methods and various possible designs of the second aspect.
In an eighth aspect, a processor is provided for coupling with a memory for performing the method of the first aspect or the second aspect or any possible implementation thereof.
In a ninth aspect, a chip is provided, the chip comprising a processor for communicating with an external device or an internal device, and a communication interface for implementing the method of the first or second aspect or any possible implementation thereof.
Optionally, the chip may further include a memory having instructions stored therein, the processor being configured to execute the instructions stored in the memory or derived from other instructions. The processor is configured to implement the method of the first or second aspect described above or any possible implementation thereof when the instructions are executed.
Alternatively, the chip may be integrated on the first node or the second node.
Drawings
Fig. 1 shows a schematic diagram of a system architecture to which an embodiment of the application is applied.
Fig. 2 shows a schematic diagram of an intelligent network architecture to which embodiments of the present application are applied.
FIG. 3 is a schematic flow chart of a method for training a model provided by an embodiment of the application
Fig. 4 shows a schematic flow chart of a method for training a model according to an embodiment of the present application.
Fig. 5 shows a schematic flow chart of a method for training a model according to an embodiment of the present application.
Fig. 6 is a schematic block diagram of an apparatus for training a model provided by an embodiment of the present application.
FIG. 7 is a schematic block diagram of another apparatus for training a model provided by an embodiment of the present application.
Fig. 8 is a schematic block diagram of an apparatus for training a model provided by an embodiment of the present application.
Fig. 9 is a schematic block diagram of an apparatus for training a model provided by an embodiment of the present application.
Detailed Description
The technical scheme of the application will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic diagram of a system architecture 100 to which embodiments of the application are applied. As shown in fig. 1, the system architecture 100 includes a first node 110 and at least one second node 120. Specifically, the system 100 is, for example, a wireless network, the first node 110 may be a centralized node or a central node, and the second node may be a local node or a distributed edge node, which is not limited in the embodiment of the present application.
As an example, the first node 110 or the second node 120 may be deployed in a radio access network (radio access network, RAN), or may be deployed in a core network, or in an operator support system (operations support system, OSS), or the second node 120 may be a terminal device in a wireless network, which is not specifically limited in this embodiment of the present application. In one possible scenario, the first node 110 and the second node 120 may both be deployed in the RAN. Or in another possible scenario, the first node 110 may be deployed in an OSS or core network and the second node 120 in the RAN. Or in another possible scenario, the first node 110 may be deployed in a RAN and the second node 120 is a terminal device.
Alternatively, the first node 110 or the second node 120 in the system architecture 100 may be implemented by one device, or may be implemented by a plurality of devices together, or may be a functional module in one device, which is not specifically limited in the embodiment of the present application. It will be appreciated that the above-described functions may be either network elements in a hardware device, software functions running on dedicated hardware, or virtualized functions instantiated on a platform (e.g., a cloud platform), to which embodiments of the application are not limited.
Fig. 2 shows a schematic diagram of an intelligent network architecture 200 to which embodiments of the present application are applied. The intelligent network architecture 200 is a hierarchical architecture, and can meet the requirements of different scene types for computing resources and execution cycle differentiation as required. The intelligent network architecture 200 may be specifically an intelligent wireless network architecture. As shown in fig. 2, the intelligent network architecture 200 includes a service support management system (OSS), at least one cloud radio access network (cloud radio access network, C-RAN), an evolved Node B (eNB), or a next generation Node B (next genetation NodeB, gNB). Wherein each C-RAN may include a separate one centralized unit (centralized unit, CU) and at least one Distributed Unit (DU) therein.
In the embodiment of the application, the OSS is a more concentrated node relative to the C-RAN, eNB or gNB in terms of logic function or deployment level; CUs in the C-RAN are more concentrated nodes relative to DUs; the C-RAN is a more centralized node relative to the eNB or the gNB. Correspondingly, in some possible implementations of the embodiments of the present application, OSS may be referred to as a centralized node, and C-RAN, eNB or gNB as a local node; a CU in the C-RAN is referred to as a centralized node, and a DU in the C-RAN is referred to as a local node; the C-RAN is referred to as a centralized node and the eNB or gNB as a local node.
In the intelligent network architecture 200 shown in fig. 2, the centralized node may correspond to the first node 110 in fig. 1, and the local node may correspond to the second node 120 in fig. 1.
Optionally, in the intelligent network architecture, at least one of OSS, CU, DU, eNB and the gNB may include a data analysis (DATA ANALYSIS, DA) function (or network element). As an example, the DA function (or network element) may be deployed at a higher location, such as within an OSS, which may be referred to herein as operation-support management system data analysis (operation-support SYSTEM DATA ANALYSIS, OSSDA). The DA function (or network element) may also be deployed within a 5G CU, 5G DU, 5G unification gNB or eNB, which may be referred to as a radio access network (radio access network DATA ANALYSIS, RANDA) at this time. Or the DA functions (or network elements) may be deployed independently, which is not limited by the embodiments of the present application. OSSDA or RANDA can provide data integration and programmable feature engineering, algorithm framework integration rich machine learning algorithm libraries, support training and execution separation of general-purpose architecture.
Specifically, the wireless intelligent service based on AI mainly comprises a closed loop formed by the steps of data acquisition, feature engineering, algorithm design, training modeling, model evaluation, prediction execution and the like. As an example, mapping the above functions into a network architecture, the DA functions (or network elements) may be abstractly combined into four functional modules, which are respectively: a data services function (DATA SERVICE function, DSF), an analysis and modeling function (analysis & modeling function, a & MF), a model executive function (model execution function, MEF), and an adaptive policy function (adaptive policy function, APF).
The DSF mainly completes the steps of data collection, data preprocessing, feature engineering and the like, and provides training data and feature vector subscription services for A & MF and MEF. DSFs have the programmable ability to customize feature engineering of data, and the ability to perform data acquisition, preprocessing, and feature engineering as required by an a & MF training algorithm or MEF predictive model.
The A & MF functions to execute a machine learning training algorithm to generate a machine learning model. Included within the a & MF are a library of common machine learning algorithms that send the training generated machine learning model to the MEF.
And the MEF receives and installs the model issued by the A & MF, subscribes the feature vector to the DSF according to the indication of the A & MF and completes the prediction, and simultaneously sends the prediction result and the operation indication corresponding to the result to the APF.
APF is the last execution validation link of the flow run. And the APF stores a strategy set, and the conversion from the model prediction result to the execution strategy is completed. As an example, the policy set includes a prediction result, an operation instruction corresponding to the prediction result, and a correspondence relation of execution policies, and when the APF obtains the prediction result and the operation instruction corresponding to the prediction result, the corresponding execution policy is determined and executed according to the correspondence relation.
Further, when a logic function is deployed at both a centralized node and a local node, the logic function in the centralized node may be referred to as a centralized (central) logic function, and the logic function in the local node may be referred to as a local (local) logic function. For example, DSFs, se:Sub>A & MF, MEFs, and APFs deployed in the centralized node may be referred to as centralized DSFs (CENTRAL DSF, C-DSFs), centralized se:Sub>A & MF (centralized se:Sub>A & MF, C-se:Sub>A & MF), centralized MEFs (CENTRAL MEF, C-MEFs), and centralized APFs (CENTRAL APF, C-APFs), respectively, and DSFs, se:Sub>A & MF, MEFs, and APFs deployed in the local node may be referred to as local DSFs (local DSFs, L-DSFs), local se:Sub>A & MF (local se:Sub>A & MF, L-se:Sub>A & MF), local MEFs (local MEFs, L-MEFs), and local APFs (local APFs, L-APFs), respectively.
It should be noted that, in the embodiment of the present application, network elements may be deployed according to characteristics of services and computing resources. At this point, the functions deployed by the different network elements may not be the same. For example, the four functions described above may be deployed on the local node side: DSF, a & MF, MEF and APF; only DSF, APF and a & MF may be deployed on the centralized node side. In addition, in order to complete a complete training and prediction task, there is a cooperative coordination within the network element, or across the network element, between different functions.
It should be noted that the names of the above functions are merely examples in the embodiment of the present application, and in a specific implementation, the names of the functions in the wireless network 200 may also be other names, which is not specifically limited in the embodiment of the present application.
FIG. 3 shows a schematic flow chart of a method 300 of training a model provided by an embodiment of the application. The method 300 may be applied to the system architecture 100 shown in fig. 1 and may also be applied to the intelligent system architecture 200 shown in fig. 2, but embodiments of the application are not limited thereto.
For convenience of description, the method 300 for training a model is described using the first node and the second node as examples. For the chip in the first node and the chip in the second node, reference may be made to specific descriptions of the first node and the second node, and description will not be repeated.
The first node sends a first request to at least one second node requesting the at least one second node to retrain the first shared model locally based on local data of the second node, respectively, 310. The first shared model may be obtained by performing parameter training on the initial model by the first node according to training data.
At 320, each second node of the at least one second node retrains the first shared model based on local data of the second node according to the first request to obtain a first local model. Here, the local model retraining means that the second node trains the parameters of the first shared model again based on the local data.
330, The at least one second node sends a model report message to the first node, respectively, the model report message comprising parameters of the first local model or an increment between parameters of the first local model and parameters of the first shared model. Here, the increment between the parameter of the first local model and the parameter of the first shared model, i.e. the amount of change of the parameter of the first local model with respect to the parameter of the first shared model.
And 340, the first node determines a second sharing model according to the model report message of at least one second node and the first sharing model.
Therefore, in the embodiment of the application, a first node in a wireless network sends a first request to at least one second node in the wireless network, each second node can locally retrain a first sharing model based on local data according to the first request, and then report parameters of the local model obtained through training or increment between the parameters of the local model and the parameters of the sharing model to the first node through a model report message, so that the first node can determine a second sharing model according to report content of the at least one second node and the first sharing model.
The second shared model is a new shared model obtained through training in steps 310 to 340, and the first shared model is an old shared model before the training.
In the embodiment of the application, joint learning refers to that a first node is used as a centralized node, a second node is used as a local node, and the first node and the second node are cooperated to jointly learn to train a model under the condition that local data of the second node does not need to be uploaded to the first node. The joint learning can overcome part or all of the disadvantages of the centralized training model and can also overcome part or all of the disadvantages of the local training model.
Specifically, compared with the mode of centralized training of the model in the prior art, the embodiment of the application does not need a local node to report training data to the centralized node, so that communication overhead caused by reporting the training data can be greatly reduced, the storage of ultra-large-scale data of the centralized node and the pressure of model training can be reduced.
In addition, compared with the local training model in the prior art, the local node sends the local model after local retraining to the centralized node, so that the centralized node can update the shared model according to the local model of at least one local node, thereby being capable of helping to overcome the problem of insufficient data quantity in local training and further improving the accuracy of the training model and the model generalization capability.
Therefore, according to the scheme for training the model by joint learning in the wireless network, the training model with high accuracy and generalization capability can be obtained.
It should be noted that, in the embodiment of the present application, the "parameters of the model" included in the model report message is used to indicate the model that needs to be reported in the model report message. In some descriptions of embodiments of the present application, the "parameters of the model" may be replaced with "model", both of which have equivalent meanings. As an example, the model report message may be described as: including the first local model or including an increment between the first local model and the first shared model.
In an alternative embodiment of the present application, the first node sends a first request for performing joint learning training to the second node when the joint learning starting condition is satisfied. As an example, the joint learning initiation condition may be that the first node cannot acquire training data, or that the calculation pressure of the first node exceeds a certain index.
According to the embodiment of the application, by setting the starting conditions of the joint learning, the model training can be realized by adopting a mode of centralized training models under the condition that the starting conditions of the joint learning are not met, and the model training can be realized by adopting the mode of the joint learning under the condition that the starting conditions of the joint training are met.
In an alternative embodiment of the present application, the first request may include information of the first sharing model, so that the second node determines the first sharing model according to the first request. As an example, the information of the first shared model includes at least one of: and the model identification, model type, model structure, input and output and initial model parameters or training data acquisition duration of the first shared model.
Optionally, the first request may further include an upload policy of the first local model. In this way, the first node sends the uploading policy of the first local model to the second node, so that the first node can indicate how the second node uploads the first local model. The first local model is a local model uploaded by the second node. As an example, the first local model is a model obtained by retraining the first shared model locally according to local data thereof in the at least one second node.
Correspondingly, after the second node obtains the uploading policy of the first local model, the second node can retrain the local model according to the uploading policy to obtain the local model and perform corresponding processing operation.
As an example, the uploading policy of the first local model includes at least one of: and the identification of the processing algorithm before the uploading of the first local model, the uploading time of the first local model or the carrying information when the first local model is uploaded. Wherein the carried information comprises the size and/or prediction error of the training data set of the first local model.
The processing algorithm before uploading the first local model includes, for example, an incremental algorithm for performing an incremental algorithm on a model obtained by local retraining and a first shared model issued by the first node, or a compression algorithm for performing model compression through algorithms such as parameter pruning, quantization, low-rank decomposition, sparse optimization, or an encryption algorithm for performing model encryption through modes such as layer confusion or conversion of parameters into codes (codes).
The training data set of the first local model, that is, the set of training data used by the second node in performing local retraining of the first shared model based on the local data. The size of the training data set is, for example, the data amount of the training data in the training data set, which is not limited by the embodiment of the present application.
After the second node is trained to obtain the local model, the local model may be predicted using the prediction dataset to obtain a prediction error for the local model. As examples, the prediction error of the local model is, for example, the mean absolute error (mean absolute error, MAE) of the local model, or the mean square error (mean squared error, MSE), which is not limited by the embodiment of the present application.
Optionally, in an embodiment of the present application, the model report message may further include a size of a training data set corresponding to the first local model, and/or a prediction error of the first local model. At this time, step 340 may specifically be:
The first node determines a second local model in at least one first local model corresponding to at least one second node, wherein the size of a training data set corresponding to the second local model is greater than or equal to a first threshold value, and/or the prediction error of the second local model is less than or equal to a second threshold value. The first node then determines a second shared model based on the second local model and the first shared model. Here, the number of the second local models may be one or more, which is not limited by the embodiment of the present application.
Specifically, when the data volume of the training data set adopted when the local model is retrained is sufficient, the accuracy of the local model obtained by training is higher, and the generalization capability is stronger. When the data volume of the training data set adopted in the local model retraining is insufficient, the accuracy of the local model obtained by training is low, and the generalization capability is weak.
That is, the first node screens out at least one second local model satisfying the screening condition among the at least one first local model, and the at least one second local model has higher accuracy or generalization capability than the at least one first local model. At this time, the first node may delete the local model in which the size of the training data set in the first local model is smaller than the first threshold or the prediction error is larger than the second threshold.
Therefore, when the first node determines the second sharing model according to the first local model, the embodiment of the application screens out the second local model with the size of the training data set being greater than or equal to a specific threshold value and/or with the prediction error being less than or equal to a specific threshold value from at least one first local model, and then determines the second sharing model according to the second local model.
Optionally, in an embodiment of the present application, the first node may perform weighted average aggregation on a parameter of the second local model or an increment between a parameter of the second local model and a parameter of the first shared model, where a weight parameter used for the weighted average aggregation is determined according to a size of a training data set corresponding to the second local model and/or a prediction error of the second local model. The first node then determines a second sharing model based on the result of the weighted average aggregation and the first sharing model.
In a possible scenario, when the parameters of the first local model are included in the model report message, the first node performs a weighted average aggregation on the parameters of the second local model after determining the second local model. Correspondingly, the first node may determine a result obtained by the weighted average aggregation of the at least one second local model as the second shared model.
In another possible scenario, when the model report message includes an increment between the parameter of the first local model and the parameter of the shared model, the first node performs a weighted average aggregation on the increment between the parameter of the second local model and the parameter of the shared model after determining the second local model. Correspondingly, the first node determines a result obtained by weighted average aggregation of at least one second local model as an increment of the first sharing model, and then determines a sum of the increment of the first sharing model and the first sharing model as the second sharing model.
As an example, the weight parameter may be the inverse of the total number of second local models, where the weight parameters of the respective second local models are the same. Or the weight parameter of the second local model may be a ratio of the size of the training data set corresponding to the second local model to the size of all the training data sets, where the size of all the training data sets is the sum of the sizes of the training data sets corresponding to the second local models. Or the weight parameter of each second local model may be the reciprocal of each corresponding MAE. It should be appreciated that the above examples of weight parameters are merely examples, and embodiments of the present application are not limited thereto.
In some alternative embodiments of the present application, the first node may not screen the at least one second local model among the at least one first local model, but may determine the second shared model directly from the at least one first local model. At this time, the weighted average aggregation may be performed on the parameters of the first local model or the increment between the parameters of the first local model and the parameters of the shared model. At this time, as an example, the weight parameter of each first local model may be the inverse of the total number of first local models. The first node then determines a second sharing model based on the result of the weighted average aggregation and the first sharing model.
Optionally, in the embodiment of the present application, after step 340, the first node or the second node may determine whether the prediction error of the second sharing model is smaller than a preset threshold. And when the prediction error of the second sharing model is smaller than the preset threshold value, the accuracy of the second sharing model can meet the requirement. Here, the accuracy of the second sharing model may be determined from the prediction dataset.
In one possible scenario, the first node may determine whether the prediction error of the second shared model is less than a third threshold.
When the prediction error of the second sharing model is less than or equal to the third threshold value, the first node updates the first sharing model to the second sharing model, and the first node respectively sends model update notification messages to the at least one second node, wherein the model update notification messages are used for requesting each second node to install the second sharing model. When it is determined that the prediction error of the second shared model is greater than the third threshold, the first node does not send the model update notification message to the second node. In addition, the first node may delete the second shared model without updating the first shared model stored therein.
In another possible scenario, the second node may determine whether the prediction error of the second shared model is less than a fourth threshold.
In particular, the first node may send a model update notification message to at least one second node, the model update notification message being used to notify the second sharing model. The second node may determine, after receiving the model update notification message, whether a prediction error of the second shared model indicated by the model update notification message is less than a fourth threshold. And installing the second sharing model when the second node determines that the prediction error of the second sharing model is less than or equal to the fourth threshold. When the second node determines that the prediction error of the second sharing model is greater than the fourth threshold, the second sharing model is not installed.
At this time, the second node does not need to send the prediction data set stored locally to the first node, so that the communication signaling overhead between the network elements can be reduced.
In the embodiment of the present application, the third threshold value and the fourth threshold value may be the same or different, which is not limited in the embodiment of the present application.
Therefore, the embodiment of the application judges the prediction error of the second sharing model, and when the prediction error is smaller than or equal to the preset threshold value, the first node stores the second sharing model, and the second node installs the second sharing model. When the prediction error is determined to be greater than the preset threshold, the first node does not update the first sharing model, and the second node does not install the second sharing model. Based on the method, the embodiment of the application can avoid the installation of the sharing model with low accuracy, and further ensure the accuracy and generalization capability of the updated sharing model.
In a possible implementation manner, the prediction error of the first sharing model may be set to a third threshold or a fourth threshold, which is not limited by the embodiment of the present application.
That is, in some alternative embodiments, the prediction error of the second shared model may be compared to the prediction error of the first shared model. If the prediction error of the second sharing model is smaller than that of the first sharing model, the first node stores the second sharing model, and the second node installs the second sharing model. If the prediction error of the second shared model is greater than or equal to the first shared model, the first node does not update the first shared model and the second node does not install the second shared model.
Fig. 4 shows a schematic flow chart of a method 400 of training a model provided by an embodiment of the application. It should be understood that fig. 4 illustrates steps or operations of a method of training a model, but these steps or operations are merely examples, and that embodiments of the present application may perform other operations or variations of the operations in fig. 4. Furthermore, the various steps in fig. 4 may be performed in a different order than presented in fig. 4, and it is possible that not all of the operations in fig. 4 are performed.
The first node in fig. 4 comprises se:Sub>A centralized APF (C-APF), se:Sub>A centralized DSF (C-DSF), se:Sub>A centralized se:Sub>A & MF (C-se:Sub>A & MF), and the second node comprises se:Sub>A local APF (L-APF), se:Sub>A local MEF (L-MEF), se:Sub>A local DSF (L-DSF), se:Sub>A local se:Sub>A & MF (L-se:Sub>A & MF). Specifically, each functional module may refer to the description in fig. 2, and for avoiding repetition, a description is omitted herein.
401, The C-APF in the first node sends se:Sub>A joint learning policy to the C-se:Sub>A & MF.
Specifically, a joint learning policy may be stored in the C-APF, where the joint learning policy is used to instruct the first node and the second node how to perform joint learning. As an example, the joint learning strategy may include at least one of: the method comprises the steps of joint learning starting conditions, information of a shared model, joint learning group member identification, uploading strategies of a local model, screening strategies of the local model, aggregation strategies of the local model, processing strategies of the local model or updating strategies of the shared model.
The joint learning starting condition is, for example, that the C-DSF cannot acquire subscription datse:Sub>A, or that the computing resource of the C-se:Sub>A & MF exceeds se:Sub>A certain threshold, which is not limited by the embodiment of the present application. In the embodiment of the application, the mode of intensively training the model can be adopted to train the model when the starting condition of the joint learning period is not satisfied, and the mode of jointly learning is adopted to train the model when the starting condition of the joint learning is satisfied. For example, model training may be performed by intensively training the model in the case where the D-SDF is able to acquire subscription datse:Sub>A, the model training is computationally intensive, or the computational resources of C-A & MF are abundant, or the computational resources of L-A & MF are insufficient.
The joint learning group member identification may include, for example, an identification of at least one second node participating in the joint learning, and may include, for example, an identification of L-a & MF in each of the at least one second node participating in the joint learning, which is not limited in the embodiment of the present application.
In the above-described joint learning strategy, the local model refers to a local model obtained by the second node after model retraining based on local data, and includes, for example, the first local model described in fig. 3.
The screening policy of the local model refers to se:Sub>A policy that se:Sub>A first node (or C-se:Sub>A & MF in the first node) screens at least one first local model for se:Sub>A second local model that satisfies se:Sub>A screening condition. For example, it may include determining in the first local model that the size of the training data set is greater than or equal to a first threshold and/or that the prediction error is less than or equal to a second threshold. As an example, the local model screening policy may be identified for a model screening rule, which embodiments of the present application do not limit.
And the aggregation strategy of the local model is used for indicating an aggregation algorithm adopted when the first node (or C-A & MF in the first node) performs local model aggregation and se:Sub>A calculation method of the weight parameters. As an example, the aggregation policy of the local model may be identified for the model aggregation algorithm. In the embodiment of the application, the aggregation of the local models can be replaced by the fusion of the local models, and the aggregation and the fusion have the same meaning.
And the processing strategy of the local model is used for indicating the first node (or C-A & MF in the first node) to process the acquired local model. The processing algorithm includes, for example, an incremental algorithm for performing an incremental algorithm on a model obtained by local retraining and a shared model issued by the first node, a compression algorithm for performing model compression through algorithms such as parameter pruning, quantization, low-rank decomposition, sparse optimization, or an encryption algorithm for performing model encryption through modes such as layer confusion or conversion of parameters into codes (codes). As an example, the processing policy of the local model may be identified for the model processing algorithm.
And the sharing model updating strategy is used for indicating the first node (or C-A & MF in the first node) to perform sharing model updating. For example, in the case where the prediction error of the new sharing model is less than or equal to a certain threshold value, the old sharing model is updated to the new sharing model. Or updating the old sharing model to the new sharing model in case that the prediction error of the new sharing model is smaller than or equal to the prediction error of the old sharing model.
In addition, the information of the sharing model and the local model uploading policy may be referred to the description above, and in order to avoid repetition, a description is omitted here.
402, The C-se:Sub>A & MF in the first node sends se:Sub>A joint learning policy issuing response to the C-APF, for indicating that the C-se:Sub>A & MF receives the joint learning policy.
403, The first node and the second node perform data collection, model training and model application.
Specifically, the L-DSF in the second node reports the acquired datse:Sub>A to the C-A & MF in the first node, and model training is carried out by the C-A & MF to obtain se:Sub>A sharing model. The C-A & MF then issues the shared model to the L-MEF for model application.
As an example, the C-se:Sub>A & MF in the first node may send se:Sub>A datse:Sub>A subscription request to the L-DSF in the second node, which after receiving the datse:Sub>A subscription request, sends se:Sub>A datse:Sub>A subscription response to the C-se:Sub>A & MF, carrying the local datse:Sub>A. The local data may include a training data set or a prediction data set, which is not limited by the embodiment of the present application.
It should be appreciated that the step of performing 403 is for training a shared model, which may also be referred to herein as an initial shared model. The following steps 404 to 422 are to update the initial sharing model generated in step 403 to make up for the problem of poor accuracy of the initial sharing model caused by insufficient or incomplete data, so that the sharing model still has higher accuracy and generalization capability under the condition of network state change in the wireless network.
404, The C-A & MF in the first node sends se:Sub>A joint learning training request to the L-A & MF in the second node.
Specifically, the C-A & MF in the first node judges whether the joint learning starting condition is satisfied according to the joint learning starting condition indicated in the joint learning strategy. For example, when the C-A & MF determines that subscription datse:Sub>A cannot be acquired from the L-DSF of the second node, or the computing resource occupation of the C-A & MF exceeds se:Sub>A preset threshold, it is determined that the joint learning starting condition is satisfied. And when the joint learning starting condition is met, the C-A & MF sends se:Sub>A joint learning training request to the L-A & MF. Otherwise, in the case where the joint learning start condition is not satisfied, the model in the intensive training 403 is performed.
Here, the joint learning training request may correspond to one specific example of the first request in fig. 3. Specifically, the joint learning training request may be referred to as the description in the first request in fig. 3, and in order to avoid repetition, a description is omitted here.
The L-A & MF in the second node sends a local joint learning policy request to the L-APF 405.
Here, a local joint learning policy is used to indicate whether the second node is to locally model retrain the shared model. Wherein the local joint learning strategy request comprises information of a sharing model. As an example, information for the shared model may be obtained from the joint learning training request in step 404. In particular, the information of the sharing model may be referred to the above description, and will not be described here for brevity.
The L-APF in the second node sends a local joint learning policy response to the L-a & MF 406.
Specifically, the L-APF determines whether to retrain the sharing model according to the utilization condition of local computing resources, namely whether the second node participates in local joint learning. As an example, the local joint learning policy response may be an identification of whether to participate in local joint learning.
Optionally, a local joint learning strategy may further include a model update strategy. For example, the model update policy may indicate that the old sharing model is updated when the prediction error of the new sharing model is less than the prediction error of the old sharing model or less than or equal to some preset threshold, otherwise the old sharing model is continued to be used. As an example, the old sharing model is, for example, the initial sharing model acquired in step 404, and the new sharing model is, for example, the sharing model acquired in step 413.
In the embodiment of the present application, when the local joint learning strategy instructs the second node to perform local model retraining on the shared model, the following steps 407 to 421 are performed. Otherwise, the following steps 407 to 421 are not performed.
Therefore, the embodiment of the application transmits the local joint learning strategy to the L-A & MF through the L-APF, so that the second node can determine whether to participate in joint learning according to the self computing capability, thereby avoiding the extension of the joint learning iteration time caused by the insufficient self computing capability of the second node and improving the joint learning efficiency.
It should be noted that, in the embodiment of the present application, steps 405 and 406 may not be executed, but the second node may participate in the joint learning all the time when receiving the joint learning training request, which is not limited in the embodiment of the present application.
407, The L-se:Sub>A & MF in the second node sends se:Sub>A joint learning training request response to the C-se:Sub>A & MF in the first node.
The L-a & MF in the second node sends 408 the data subscription request to the L-DSF.
As an example, the L-a & MF may send a data subscription request to the L-DSF according to the model input and output in the joint learning training request, and the model training data collection duration, where the data subscription request may carry a data representation and a data collection time.
409, The L-DSF in the second node sends a data subscription response to the L-a & MF.
Specifically, the L-DSF collects data according to the data subscription request in 408 and sends the collected data to the L-A & MF.
The L-A & MF in the second node performs model retraining, model processing 410.
Specifically, the L-a & MF retrains the local sharing model according to the information of the sharing model issued in step 404 and the training data acquired in step 409.
Optionally, the L-a & MF may further perform local model processing according to the identification of the processing algorithm before uploading the local model issued in step 404, for example, perform incremental operation on the local model obtained when the local model is retrained and the shared model issued in step 404 (for example, may perform incremental operation on the parameters of the local model and the parameters of the shared model). And then, performing model compression through algorithms such as parameter pruning, quantization, low-rank decomposition, sparse optimization and the like, and performing model encryption through modes such as layer confusion or conversion of parameters into codes and the like.
411, The L-se:Sub>A & MF in the second node sends se:Sub>A local model upload notification to the C-se:Sub>A & MF in the first node. As an example, the local model upload notification includes a model identifier of the shared model, a processed local model, a size of a training data set corresponding to the local model, a prediction error of the local model, and the like, which is not limited in the embodiment of the present application.
In particular, the local model upload notification may correspond to one example of a model upload message in fig. 3. Specifically, the local model upload notification may refer to the description of the model report message in fig. 3, and in order to avoid repetition, a description thereof is omitted here.
412, The C-se:Sub>A & MF in the first node sends se:Sub>A local model upload notification response to the L-se:Sub>A & MF in the second node.
413, The C-A & MF in the first node performs model screening, aggregation, processing.
Specifically, the C-se:Sub>A & MF screens out at least one received local model (such as the at least one first local model described above) for at least one local model (such as the at least one second local model described above) that satisfies the condition according to the local model screening policy indicated in step 401. In particular, the manner in which C-A & MF is screened may be found in the description above, and for brevity will not be described in detail herein.
The C-se:Sub>A & MF may then aggregate the screened at least one local model according to se:Sub>A local model aggregation policy, such as se:Sub>A weighted average aggregation. In particular, the manner in which the C-A & MF is polymerized may be found in the description above, and will not be described in detail herein for the sake of brevity.
The C-A & MF may then process the aggregated model according to se:Sub>A local model processing policy, such as compression, or encryption. In particular, the manner in which the C-A & MF is processed may be found in the foregoing description, and for brevity, will not be described in detail herein.
In one possible scenario of the present application, the various steps shown as 4A in fig. 4 are performed. Among them, 414 to 416 are included in 4A.
414, The C-se:Sub>A & MF in the first node sends se:Sub>A model update request #1 to the L-MEF in the second node.
In the embodiment of the present application, the C-se:Sub>A & MF may test the shared model obtained in step 413 according to the test datse:Sub>A in the test datse:Sub>A set, and determine the prediction error of the shared model obtained in step 413. Here, the sharing model acquired in 413 may also be referred to as a new sharing model. Here, the new sharing model may correspond to one example of the second sharing model in fig. 3.
When the C-se:Sub>A & MF determines that the prediction error of the new shared model is less than se:Sub>A certain threshold, the model update request #1 is sent, where the model update request #1 may include the model identifier of the new shared model and the parameters of the fused model. As an example, when the training model is a neural network model, the parameters of the model may include at least one of weight parameters, bias parameters, or information of an activation function of each layer.
Here, the model update request #1 may be an example of a transmitted model update notification message in a case where the first node determines that the prediction error of the second shared model is less than the third threshold value in fig. 3.
When the C-se:Sub>A & MF determines that the prediction error of the new shared model is greater than or equal to the threshold, steps 414 through 416 may not be performed.
The l-MEF performs model update installation 415. Specifically, the L-MEF replaces the current parameters of the model with the parameters of the model issued in 414.
Taking the model as a neural network model as an example, the L-MEF may replace the weight parameters, the paranoid parameters, or the activation functions of the layers of the neural network with the parameters of the model issued in step 414.
The L-MEF in the second node sends 416 se:Sub>A model update response #1 to the C-A & MF in the first node. Here, the model update response #1 may indicate that the model update has been completed.
In another possible case of the present application, the steps shown as 4A in fig. 4 may be replaced with the steps shown as 4B in fig. 4. Of these, 417 to 421 are included in 4B.
417, The C-A & MF in the first node updates request #2 to the L-A & MF pattern model in the second node.
In the embodiment of the present application, after the C-se:Sub>A & MF obtains the new shared model in step 413, se:Sub>A model update request #2 may be sent to the L-se:Sub>A & MF, where the model update request #2 may include se:Sub>A model identifier of the new shared model and parameters of the fused model. As an example, when the training model is a neural network model, the parameters of the model may include at least one of weight parameters, bias parameters, or information of an activation function of each layer.
Here, the model update request #2 may be an example of a model update notification message that the first node sends to the second node before the second node determines whether the prediction error of the updated shared model is less than the fourth threshold in fig. 3.
418, The L-se:Sub>A & MF in the second node sends se:Sub>A model update response #2 to the C-se:Sub>A & MF in the first node. Here, the model update response #2 may be used to inform the first node that the model update request #2 is received.
419 The L-a & MF in the second node sends a model install request to the L-MEF.
For example, the L-A & MF may determine whether the prediction error of the new shared model is greater than a preset threshold, or the old shared model, according to a model update strategy. As an example, when the prediction error of the new shared model is greater than or equal to the old shared model, the second node does not model-install the new shared model, and steps 419 to 421 are not performed at this time. When the prediction error of the new shared model is smaller than that of the old shared model, the second node performs update installation of the new shared model, and steps 419 to 421 are performed at this time.
The model installation request can carry the model identifier of the new shared model and the parameters of the fused model. In particular, the model identification, and the parameters of the fused model may be referred to in the description above, and will not be described here for brevity.
And 420, performing model update installation by the L-MEF in the second node.
Specifically, 420 may be referred to as 415, which is not described herein for brevity.
421, The L-MEF in the second node sends a model install response to the L-a & MF, which may indicate that the model update has been completed.
The second node performs 422 the model application.
Specifically, the L-MEF in the second node subscribes data required for model prediction to the L-DSF, and performs model prediction. And then, sending the prediction result to the local APF for policy execution.
It should be noted that, in an alternative embodiment of the present application, once the first node and the second node initiate the joint learning, the method may be performed in a loop according to the step of joint learning.
In another alternative embodiment of the present application, a joint learning stop condition may be set. In the case where the joint learning stop condition is satisfied, the first node and the second node may stop the joint learning. As one example, the joint learning stop condition may include a joint learning execution duration, or a second node resource limitation. That is, the embodiment of the present application may stop the joint learning after the execution period after the start of the joint learning, or stop the joint learning in the case where part or all of the second node resources are limited.
Alternatively, in the embodiment of the present application, the joint learning stop condition may be included in the joint learning policy or preconfigured in the first node or the second node, which is not limited in the embodiment of the present application.
Therefore, the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through the joint learning strategy, and comprises arranging and managing at least one of joint learning starting conditions, uploading strategies of models, screening strategies of models, aggregation strategies of models, processing strategies of the models and the like. Based on the above, the embodiment of the application can realize that the training model with high accuracy and generalization capability is obtained by cooperating with the first node and the second node to learn together under the condition that the local data of the second node does not need to be uploaded to the first node.
Fig. 5 shows a schematic flow chart of a method 500 for training a model according to an embodiment of the present application. It should be understood that fig. 5 illustrates steps or operations of a method of training a model, but these steps or operations are merely examples, and that embodiments of the present application may perform other operations or variations of the operations in fig. 5. Furthermore, the various steps in fig. 5 may be performed in a different order than presented in fig. 5, and it is possible that not all of the operations in fig. 5 are performed.
The CUDA and at least one DUDA are illustrated in fig. 5 as examples, and the CUDA may correspond to one example of a first node above, and DUDA may correspond to one example of a second node above. In the embodiment of the application, when the functions of the intelligent network element are integrated and deployed, the interaction information during the joint learning between the first node and the second node can be directly transmitted through the interface between the first node and the second node.
It should be noted that, in the embodiment of the present application, CUDA and DUDA are taken as examples, but the embodiment of the present application is not limited thereto, for example, CUDA may be replaced by a gNB or an eNB or a cell (cell), and DUDA may be replaced by a terminal device served by the gNB or the eNB or the cell (cell). For another example, CUDA may also be replaced with CU, and DUDA with DU managed by the CU. For another example, CUDA may also be replaced with a C-RAN and DUDA replaced with an eNB or gNB managed by the C-RAN. For another example, CUDA may also be replaced with an eNB or a gNB, and DUDA replaced with a cell (cell) managed by the eNB or the gNB. For another example, CUDA may also be replaced with OSS and DUDA with OSS managed network elements. For another example, both CUDA and CUDA may be replaced with a gNB or eNB. The embodiment of the present application is not particularly limited thereto.
501, The cuda sends a joint learning training request to at least one DUDA.
As an example, the CUDA may send a joint learning training request to each DUDA of the at least one DUDA when the joint learning initiation condition is satisfied. Specifically, the joint training request may be referred to the above description and may be referred to the above description, which is not repeated here for brevity.
502, Each DUDA of the at least one DUDA sends a joint learning training request response to the CUDA.
503, Each DUDA performs local model training and processing.
Specifically, DUDA performs data subscription, local model training, and processing according to the instruction of step 501. Specifically, reference may be made to the description of 410 in fig. 4 above, and for brevity, the description is omitted here.
504, Each DUDA sends a local model upload notification to the CUDA.
505, The cuda sends a local model upload notification response to each DUDA.
Specifically, 504 and 505 may be referred to the descriptions of 411 and 412 in fig. 4 above, and are not repeated here for brevity.
506, CUDA performs model screening, fusion and treatment.
Specifically, 506 may be referred to the description of 41/3 in fig. 4 above, and will not be repeated here for brevity.
507, Each DUDA performs model update installation and model application.
Specifically, 507 may be described with reference to 414-421 in fig. 4, and for brevity, will not be described again here.
508, Repeating steps 501 to 507.
Therefore, in the embodiment of the present application, the CUDA sends a joint learning training request to at least one DUDA, each DUDA can perform local retraining on the shared model indicated by the CUDA according to the joint learning training request, and then each DUDA reports the local model obtained by training to the CUDA, so that the CUDA can perform fusion and processing on the local model reported by the at least one DUDA to determine a new shared model.
Based on the method of the above embodiment, the communication device provided by the present application will be described below.
Fig. 6 shows a schematic structural diagram of a device 600 for training a model in a wireless network, where the device 600 for training a model may be a first node in a wireless network. The apparatus 600 for training a model includes: a transmitting unit 610, a receiving unit 620, and a determining unit 630.
A sending unit 610, configured to send a first request to at least one second node in the wireless network, where the first request is used to request the at least one second node to retrain the first sharing model based on local data of the second node, respectively.
And a receiving unit 620, configured to obtain, from the at least one second node, a model report message, where the model report message of each second node includes parameters of a first local model, or includes an increment between the parameters of the first local model and the parameters of the first shared model, where the first local model is obtained after the first local model retraining by each second node based on the first request and the local data.
The determining unit 630 is configured to determine a second sharing model according to the model report message of the at least one second node and the first sharing model.
Therefore, in the embodiment of the application, a first node in a wireless network sends a first request to at least one second node in the wireless network, each second node can locally retrain a first sharing model based on local data according to the first request, and then report parameters of the local model obtained through training or increment between the parameters of the local model and the parameters of the sharing model to the first node through a model report message, so that the first node can determine a second sharing model according to report content of the at least one second node and the first sharing model.
Optionally, the model report message further includes a size of a training data set corresponding to the first local model, and/or a prediction error of the first local model;
The determining unit 630 is specifically configured to:
determining a second local model in at least one first local model corresponding to the at least one second node, wherein the size of a training data set corresponding to the second local model is greater than or equal to a first threshold value, and/or the prediction error of the second local model is less than or equal to a second threshold value;
and determining the second sharing model according to the second local model and the first sharing model.
Therefore, when the first node determines the second sharing model according to the first local model, the embodiment of the application screens out the second local model with the size of the training data set being greater than or equal to a specific threshold value and/or with the prediction error being less than or equal to a specific threshold value from at least one first local model, and then determines the second sharing model according to the second local model.
Alternatively, the first node may not screen the at least one second local model among the at least one first local model, but determine the second shared model directly from the at least one first local model.
Optionally, the determining unit 630 is specifically configured to:
carrying out weighted average aggregation on the parameters of the second local model or the increment between the parameters of the second local model and the parameters of the first shared model, wherein the weight parameters adopted by the weighted average aggregation are determined according to the size of a training data set corresponding to the second local model and/or the prediction error of the second local model;
And the first node determines the second sharing model according to the result of the weighted average aggregation and the first sharing model.
Optionally, the first node includes a centralized adaptive policy function and a centralized analysis and modeling function:
The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, the joint learning strategy comprising at least one of the following information:
A joint learning starting condition, information of the first shared model, joint learning group member identification, an uploading strategy of the first local model, a screening strategy of the first local model, an aggregation strategy of the first local model, a processing strategy of the first local model or a shared model updating strategy.
Therefore, the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through the joint learning strategy, and comprises arranging and managing at least one of joint learning starting conditions, uploading strategies of models, screening strategies of models, aggregation strategies of models, processing strategies of the models and the like. Based on the above, the embodiment of the application can realize that the training model with high accuracy and generalization capability is obtained by cooperating with the first node and the second node to learn together under the condition that the local data of the second node does not need to be uploaded to the first node.
Optionally, the second node includes a local analysis and modeling function therein;
the sending unit 610 is specifically configured to:
When the joint learning initiation condition is satisfied, the centralized analysis and modeling function sends the first request to a local analysis and modeling function of each of the at least one second node, wherein the first request includes information of the first shared model.
Optionally, the information of the first sharing model includes at least one of:
And the model identification, model type, model structure, input and output and initial model parameters or training data acquisition duration of the first shared model.
According to the embodiment of the application, by setting the starting conditions of the joint learning, the model training can be realized by adopting a mode of centralized training models under the condition that the starting conditions of the joint learning are not met, and the model training can be realized by adopting the mode of the joint learning under the condition that the starting conditions of the joint training are met.
Optionally, the first request further includes an upload policy of the first local model.
Optionally, the uploading policy of the first local model includes at least one of the following:
The identification of the processing algorithm before the first local model is uploaded, the uploading time of the first local model or the carrying information of the first local model when the first local model is uploaded;
wherein the carried information comprises the size and/or prediction error of the training data set of the first local model.
Optionally, the determining unit 630 is further configured to determine that the prediction error of the second sharing model is less than or equal to a third threshold.
The sending unit 610 is further configured to send a model update notification message to the at least one second node, respectively, when the prediction error of the second sharing model is less than or equal to a third threshold, where the model update notification message is used to request each second node to install the second sharing model.
Therefore, the embodiment of the application judges the prediction error of the second sharing model, and when the prediction error is smaller than or equal to the preset threshold value, the first node stores the second sharing model, and the second node installs the second sharing model. When the prediction error is determined to be greater than the preset threshold, the first node does not update the first sharing model, and the second node does not install the second sharing model. Based on the method, the embodiment of the application can avoid the installation of the sharing model with low accuracy, and further ensure the accuracy and generalization capability of the updated sharing model.
In a possible implementation manner, the prediction error of the first sharing model may be set to a third threshold, which is not limited by the embodiment of the present application.
Alternatively, the transmitting unit 610 and/or the receiving unit 620 may also be referred to as a transceiving unit (module), or the communication unit may be used to perform the steps of receiving and transmitting, respectively, by the first node in the method embodiment. The processing unit 630 is used for generating the instruction sent by the sending unit 610 or processing the instruction received by the receiving unit 620. Optionally, the apparatus 600 for training a model may further include a storage unit, where the storage unit is configured to store instructions executed by the sending unit, the receiving unit, and the processing unit.
The means 600 for training the model is the first node in the method embodiment, and may be a chip in the first node. When the model training apparatus 600 is a first node, the processing unit may be a processor and the transmitting unit and the receiving unit may be transceivers. The means for training the model may further comprise a storage unit, which may be a memory. The storage unit is used for storing instructions, and the processing unit executes the instructions stored by the storage unit so as to enable the communication device to execute the method. When the model training apparatus 600 is a chip in the first node, the processing unit may be a processor, and the transmitting unit and the receiving unit may be input/output interfaces, pins or circuits, etc.; the processing unit executes the instructions stored in the storage unit, so that the communication device performs the operations performed by the terminal device in the above method embodiment, where the storage unit may be a storage unit (for example, a register, a cache, etc.) in the chip, or may be a storage unit (for example, a read-only memory, a random access memory, etc.) in the terminal device that is located outside the chip.
It will be clear to those skilled in the art that, when the step performed by the apparatus 600 for training a model and the corresponding advantageous effects are referred to the description related to the first node in the above method embodiment, the description is omitted herein for brevity.
It should be appreciated that the transmitting unit 610 and the receiving unit 620 may be implemented by transceivers. The processing unit may be implemented by a processor. The memory unit may be implemented by a memory. As shown in fig. 7, an apparatus 700 for training a model may include a processor 710, a memory 720, and a transceiver 730. The means for training a model 700 may be a first node in a wireless network.
The model training apparatus 600 shown in fig. 6 or the model training apparatus 700 shown in fig. 7 can implement the steps performed by the first node in the foregoing embodiment, and similar descriptions may be referred to the descriptions in the foregoing corresponding methods. In order to avoid repetition, a description thereof is omitted.
Fig. 8 shows a schematic structural diagram of a model training apparatus 800 provided by the present application. The means 800 for training the model may be a second node in the wireless network. The apparatus 800 for training a model includes: a receiving unit 810, a processing unit 820, and a transmitting unit 830.
A receiving unit 810 is configured to receive a first request from a first node in the radio access network.
And the processing unit 820 is configured to retrain the first local model based on the local data of the second node according to the first request, so as to obtain the first local model.
A sending unit 810, configured to send a model report message to the first node, where the model report message includes parameters of the first local model, or includes an increment between the parameters of the first local model and the parameters of the first shared model, and the model report message is used for updating the first shared model.
Therefore, in the embodiment of the application, a first node in a wireless network sends a first request to at least one second node in the wireless network, each second node can locally retrain a first sharing model based on local data according to the first request, and then report parameters of the local model obtained through training or increment between the parameters of the local model and the parameters of the sharing model to the first node through a model report message, so that the first node can determine a second sharing model according to report content of the at least one second node and the first sharing model.
Optionally, the model report message further includes a size of a training data set corresponding to the first local model, and/or a prediction error of the first local model.
In this way, the first node may screen out a second local model with a size of the training data set greater than or equal to a specific threshold and/or a prediction error less than or equal to a specific threshold according to a size of the training data set corresponding to the first local model in the model report message and/or a prediction error, and then determine the second sharing model through the second local model. Because the accuracy or generalization capability of the second local model in the embodiment of the application is higher than that of the first local model, the embodiment of the application can be beneficial to improving the accuracy and generalization capability of the second shared model.
Optionally, the second node includes a local analysis and modeling function, and the first node includes a centralized analysis and modeling function;
Wherein the second node receives a first request from a first node in the radio access network, comprising:
A local analysis and modeling function in the second node locally analyzes and models the first request from the first node, wherein the first request includes information of the first shared model.
Optionally, the first request further includes an upload policy of the first local model. Correspondingly, after the second node obtains the uploading policy of the first local model, the second node can retrain the local model according to the uploading policy to obtain the local model and perform corresponding processing operation.
Optionally, the uploading policy of the first local model includes at least one of the following:
The identification of the processing algorithm before the first local model is uploaded, the uploading time of the first local model or the carrying information of the first local model when the first local model is uploaded;
wherein the carried information comprises the size and/or prediction error of the training data set of the first local model.
Optionally, the second node further comprises a local adaptive policy function,
The local analysis and modeling function sends a third request to the local adaptive strategy, the third request being used for requesting a local joint learning strategy corresponding to the first sharing model, the local joint learning strategy being used for indicating whether the second node retrains the first sharing model locally or not, and the third request including information of the first sharing model;
The local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy;
When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local model retraining on the first shared model based on the local data.
Therefore, the embodiment of the application sends the local joint learning strategy to the local analysis and modeling function through the local self-adaptive strategy function, so that the second node can determine whether to participate in joint learning according to the self computing capability, thereby avoiding the extension of the joint learning iteration time caused by the insufficient self computing capability of the second node and improving the joint learning efficiency.
Optionally, in the embodiment of the present application, the local adaptive policy function may not send a local joint learning policy to the local analysis modeling function, and the second node may participate in joint learning all the time when receiving a request for requesting to perform joint learning training.
Optionally, the information of the first sharing model includes at least one of:
And the model identification, model type, model structure, input and output and initial model parameters or training data acquisition duration of the first shared model.
Optionally, the second node receives a model update notification message sent by the first node, where the model update notification message is used to notify a second sharing model, where the second sharing model is determined by the first node according to the model report message and the first sharing model;
And when the second node determines that the prediction error of the second sharing model is smaller than a fourth threshold value, the second sharing model is installed.
Therefore, the embodiment of the application judges the prediction error of the second sharing model, and when the prediction error is smaller than or equal to the preset threshold value, the first node stores the second sharing model, and the second node installs the second sharing model. When the prediction error is determined to be greater than the preset threshold, the first node does not update the first sharing model, and the second node does not install the second sharing model. Based on the method, the embodiment of the application can avoid the installation of the sharing model with low accuracy, and further ensure the accuracy and generalization capability of the updated sharing model.
In a possible implementation manner, the prediction error of the first sharing model may be set to a fourth threshold, which is not limited by the embodiment of the present application.
Alternatively, the receiving unit 810 and/or the transmitting unit 830 may also be referred to as a transceiver unit (module), or a communication unit, respectively, may be used to perform the steps of receiving and transmitting by the second node in the method embodiment. The processing unit 820 is also configured to generate the instruction sent by the sending unit 830, or process the instruction received by the receiving unit 810. Optionally, the communication device 800 may further include a storage unit, where the storage unit is configured to store instructions executed by the communication unit and the processing unit.
The means 800 for training the model is the second node in the method embodiment, and may be a chip in the second node. When the means 800 for training the model is a second node, the processing unit may be a processor and the transmitting unit and the receiving unit may be transceivers. The apparatus may further comprise a storage unit, which may be a memory. The storage unit is used for storing instructions, and the processing unit executes the instructions stored by the storage unit so as to enable the communication device to execute the method. When the means 800 for training the model is a chip in the second node, the processing unit may be a processor, and the transmitting unit and the receiving unit may be input/output interfaces, pins or circuits, etc.; the processing unit executes the instructions stored in the storage unit, so that the communication device performs the operations performed by the network device in the above method embodiments, where the storage unit may be a storage unit (e.g., a register, a cache, etc.) in the chip, or may be a storage unit (e.g., a read-only memory, a random access memory, etc.) located outside the chip in the communication device.
It will be clear to those skilled in the art that, when the step performed by the apparatus 800 for training a model and the corresponding advantageous effects are referred to the description of the second node in the above method embodiment, the description is omitted herein for brevity.
It is to be understood that the transmitting unit 830 and the receiving unit 810 may be implemented by transceivers and the processing unit 820 may be implemented by a processor. The memory unit may be implemented by a memory. As shown in fig. 9, an apparatus 900 for training a model may include a processor 910, a memory 920, and a transceiver 930. The means 900 for training the model may be a second node in the wireless network.
The model training apparatus 800 shown in fig. 8 or the model training apparatus 900 shown in fig. 9 can implement the steps performed by the second node in the foregoing method embodiment, and similar descriptions may refer to descriptions in the foregoing corresponding methods. In order to avoid repetition, a description thereof is omitted.
The device for training the model in the above embodiments of the device and the method in the embodiments of the method correspond to the first node or the second node, and the corresponding steps are executed by the corresponding modules or units. For example, the steps of transmitting and/or receiving (or by the transmitting unit, the receiving unit, respectively) in the method embodiments of the method performed by the transceiver unit (or the communication unit, transceiver) may be performed by a processing unit (processor) in addition to the transmitting and receiving. Reference may be made to corresponding method embodiments for the function of a specific unit. The transmitting unit and the receiving unit can form a transmitting unit, the transmitter and the receiver can form a transceiver, and the transmitting function and the receiving function in the method embodiment are realized together; the processor may be one or more.
It should be understood that the above division of the units is only a functional division, and other division methods are possible in practical implementation.
The first node or the second node may be a chip, and the processing unit may be implemented by hardware or software. When implemented in hardware, the processing unit may be a logic circuit, an integrated circuit, or the like. When implemented in software, the processing unit may be a general-purpose processor, implemented by reading software code stored in a memory unit, which may be integrated in the processor or may exist separately outside the processor.
It should be understood that the processing means may be a chip. For example, the processing device may be a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), application SPECIFIC INTEGRATED Circuit (ASIC), system on Chip (SoC), central processing unit (Central Processor Unit, CPU), network Processor (Network Processor, NP), digital signal processing Circuit (DIGITAL SIGNAL Processor, DSP), microcontroller (Micro Controller Unit, MCU), programmable controller (Programmable Logic Device, PLD) or other integrated Chip, or the like.
In implementation, each step in the method provided in the present embodiment may be implemented by an integrated logic circuit of hardware in a processor or an instruction in a software form. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
It should be noted that the processor in the embodiments of the present application may be an integrated circuit chip with signal processing capability. In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The processor in the embodiments of the present application may implement or execute the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be appreciated that the memory or storage units in embodiments of the application may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA RATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
The embodiment of the application also provides a wireless network, which comprises the first node and the second node.
The present application also provides a computer readable medium having stored thereon a computer program which, when executed by a computer, implements the method of any of the above embodiments.
Embodiments of the present application also provide a computer program product which, when executed by a computer, implements the method of any of the above embodiments.
The embodiment of the application also provides a system chip, which comprises: a communication unit and a processing unit. The processing unit may be, for example, a processor. The communication unit may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute computer instructions to cause a chip within the communication device to perform any of the methods provided by the embodiments of the present application described above.
Optionally, the computer instructions are stored in a storage unit.
The embodiments of the present application may be used independently or in combination, and are not limited in this regard.
It should be understood that the first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the objects of the description, and no order is used, nor is the number of the devices in the embodiments of the present application limited, and no limitation in the embodiments of the present application should be construed.
It should also be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a solid-state disk (solid-state drive STATE DISK, SSD)), or the like.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Claims (24)

1. A method for applying to a training model in a wireless network, the method performed by a first node in the wireless network, the method comprising:
The first node sends a first request to at least one second node in the wireless network, wherein the first request is used for requesting the at least one second node to retrain a first sharing model based on local data of the second node respectively;
The first node respectively acquires a model report message from the at least one second node, wherein the model report message of each second node comprises parameters of a first local model or an increment between the parameters of the first local model and the parameters of the first shared model, the model report message also comprises the size of a training data set corresponding to the first local model and/or a prediction error of the first local model, and the first local model is obtained after each second node retrains the first shared model based on the first request and the local data;
The first node determines a second local model in at least one first local model corresponding to the at least one second node, wherein the size of a training data set corresponding to the second local model is greater than or equal to a first threshold value, and/or the prediction error of the second local model is less than or equal to a second threshold value;
The first node determines a second sharing model according to the second local model and the first sharing model.
2. The method of claim 1, wherein the first node determining the second shared model from the second local model and the first shared model comprises:
The first node performs weighted average aggregation on the parameters of the second local model or the increment between the parameters of the second local model and the parameters of the first shared model, wherein the weight parameters adopted by the weighted average aggregation are determined according to the size of a training data set corresponding to the second local model and/or the prediction error of the second local model;
And the first node determines the second sharing model according to the result of the weighted average aggregation and the first sharing model.
3. The method according to claim 1 or 2, wherein the first node comprises a centralized adaptive policy function and a centralized analysis and modeling function, the method further comprising:
The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, the joint learning strategy comprising at least one of the following information:
A joint learning starting condition, information of the first shared model, joint learning group member identification, an uploading strategy of the first local model, a screening strategy of the first local model, an aggregation strategy of the first local model, a processing strategy of the first local model or a shared model updating strategy.
4. A method according to claim 3, wherein the second node comprises a local analysis and modeling function therein;
Wherein the first node sending a first request to at least one second node in the wireless network, comprising:
The centralized analysis and modeling function in the first node sends the first request to a local analysis and modeling function in each of the at least one second node when the joint learning initiation condition is satisfied, wherein the first request includes information of the first shared model.
5. The method of claim 3, wherein the information of the first shared model comprises at least one of:
And the model identification, model type, model structure, input and output and initial model parameters or training data acquisition duration of the first shared model.
6. The method of claim 4, wherein the first request further includes an upload policy for the first local model.
7. The method of claim 6, wherein the uploading policy of the first local model comprises at least one of:
The identification of the processing algorithm before the first local model is uploaded, the uploading time of the first local model or the carrying information of the first local model when the first local model is uploaded;
wherein the carried information comprises the size and/or prediction error of the training data set of the first local model.
8. The method according to claim 1 or 2, further comprising:
the first node determines that the prediction error of the second sharing model is less than or equal to a third threshold;
The first node sends a model update notification message to the at least one second node, respectively, where the model update notification message is used to request each second node to install the second sharing model.
9. A method applied to a training model in a radio access network, the method being performed by a second node in the radio access network, the method comprising:
The second node receives a first request from a first node in the radio access network;
The second node retrains the first shared model based on the local data of the second node according to the first request to obtain a first local model;
The second node sends a model report message to the first node, the model report message includes parameters of the first local model, or includes an increment between the parameters of the first local model and the parameters of the first shared model, the model report message is used for updating the first shared model, the model report message further includes a size of a training data set corresponding to the first local model, and/or a prediction error of the first local model, the model report message is further used for determining a second local model, the size of the training data set corresponding to the second local model is greater than or equal to a first threshold, and/or a prediction error of the second local model is less than or equal to a second threshold.
10. The method of claim 9, wherein the second node includes a local analysis and modeling function therein and the first node includes a centralized analysis and modeling function therein;
Wherein the second node receives a first request from a first node in the radio access network, comprising:
A local analysis and modeling function in the second node receives the first request from a centralized analysis and modeling function in the first node, wherein the first request includes information of the first shared model.
11. The method of claim 10, wherein the first request further includes an upload policy for the first local model.
12. The method of claim 11, wherein the uploading policy of the first local model comprises at least one of:
The identification of the processing algorithm before the first local model is uploaded, the uploading time of the first local model or the carrying information of the first local model when the first local model is uploaded;
wherein the carried information comprises the size and/or prediction error of the training data set of the first local model.
13. The method of claim 10, wherein the second node further comprises a local adaptive policy function therein, the method further comprising:
the local analysis and modeling function sends a third request to the local adaptive strategy, the third request being used for requesting a local joint learning strategy corresponding to the first sharing model, the local joint learning strategy being used for indicating whether the second node retrains the first sharing model locally or not, and the third request including information of the first sharing model;
The local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy;
When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local model retraining on the first shared model based on the local data.
14. The method according to any of claims 10-13, wherein the information of the first shared model comprises at least one of:
And the model identification, model type, model structure, input and output and initial model parameters or training data acquisition duration of the first shared model.
15. The method according to any one of claims 10-13, further comprising:
The second node receives a model update notification message sent by the first node, wherein the model update notification message is used for notifying a second sharing model, and the second sharing model is determined by the first node according to the model report message and the first sharing model;
And when the second node determines that the prediction error of the second sharing model is smaller than a fourth threshold value, the second sharing model is installed.
16. An apparatus for use in a training model in a wireless network, the apparatus being a first node in the wireless network, the apparatus comprising:
A sending unit, configured to send a first request to at least one second node in the wireless network, where the first request is used to request the at least one second node to retrain a first sharing model based on local data of the second node respectively;
The receiving unit is configured to obtain a model report message from the at least one second node, where the model report message of each second node includes parameters of a first local model, or includes an increment between the parameters of the first local model and the parameters of the first shared model, and the model report message further includes a size of a training data set corresponding to the first local model, and/or a prediction error of the first local model, where the first local model is obtained after retraining the first shared model by each second node based on the first request and the local data;
A determining unit, configured to determine a second local model from at least one first local model corresponding to the at least one second node, where a size of a training data set corresponding to the second local model is greater than or equal to a first threshold, and/or a prediction error of the second local model is less than or equal to a second threshold;
the determining unit is further configured to determine a second sharing model according to the second local model and the first sharing model.
17. The apparatus according to claim 16, wherein the determining unit is specifically configured to:
carrying out weighted average aggregation on the parameters of the second local model or the increment between the parameters of the second local model and the parameters of the first shared model, wherein the weight parameters adopted by the weighted average aggregation are determined according to the size of a training data set corresponding to the second local model and/or the prediction error of the second local model;
and determining the second sharing model according to the result of the weighted average aggregation and the first sharing model.
18. The apparatus of claim 16 or 17, further comprising a centralized adaptive policy function and a centralized analysis and modeling function:
the centralized adaptive strategy function is configured to send a joint learning strategy to the centralized analysis and modeling function, the joint learning strategy comprising at least one of the following information:
A joint learning starting condition, information of the first shared model, joint learning group member identification, an uploading strategy of the first local model, a screening strategy of the first local model, an aggregation strategy of the first local model, a processing strategy of the first local model or a shared model updating strategy.
19. The apparatus of claim 18, wherein the second node includes a local analysis and modeling function therein;
the sending unit is specifically configured to:
When the joint learning initiation condition is satisfied, the centralized analysis and modeling function sends the first request to a local analysis and modeling function of each of the at least one second node, wherein the first request includes information of the first shared model.
20. The apparatus according to claim 16 or 17, wherein,
The determining unit is further configured to determine that a prediction error of the second shared model is less than or equal to a third threshold;
The sending unit is further configured to send a model update notification message to the at least one second node, where the model update notification message is used to request each second node to install the second sharing model.
21. An apparatus for applying a training model in a radio access network, the apparatus being a second node in the radio access network, the apparatus comprising:
A receiving unit configured to receive a first request from a first node in the radio access network;
The processing unit is used for carrying out local model retraining on the first shared model based on the local data of the second node according to the first request so as to obtain a first local model;
A sending unit, configured to send a model report message to the first node, where the model report message includes parameters of the first local model, or includes an increment between the parameters of the first local model and the parameters of the first shared model, the model report message is used for updating the first shared model, the model report message further includes a size of a training data set corresponding to the first local model, and/or a prediction error of the first local model, and the model report message is further used to determine a second local model, where the size of the training data set corresponding to the second local model is greater than or equal to a first threshold, and/or a prediction error of the second local model is less than or equal to a second threshold.
22. The apparatus of claim 21, further comprising a local analysis and modeling function, the first node comprising a centralized analysis and modeling function therein;
wherein the local analysis and modeling function is configured to receive the first request from a centralized analysis and modeling function in the first node, wherein the first request includes information of the first shared model.
23. The apparatus of claim 22, further comprising a local adaptive policy function,
The local analysis and modeling function is configured to send a third request to the local adaptive policy, where the third request is configured to request a local joint learning policy corresponding to the first sharing model, the local joint learning policy is configured to instruct the second node whether to perform local model retraining on the first sharing model, and the third request includes information of the first sharing model;
The local analysis and modeling function is configured to receive the local joint learning strategy sent by the local adaptive strategy;
When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local model retraining on the first shared model based on the local data.
24. The apparatus of claim 22 or 23, wherein the device comprises a plurality of sensors,
The receiving unit is further configured to receive a model update notification message sent by the first node, where the model update notification message is used to notify a second sharing model, where the second sharing model is determined by the first node according to the model report message and the first sharing model;
The processing unit is further configured to install the second sharing model when it is determined that the prediction error of the second sharing model is less than a fourth threshold.
CN201910135464.6A 2019-02-22 2019-02-22 Method and device for training model Active CN111612153B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910135464.6A CN111612153B (en) 2019-02-22 2019-02-22 Method and device for training model
PCT/CN2019/118762 WO2020168761A1 (en) 2019-02-22 2019-11-15 Model training method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910135464.6A CN111612153B (en) 2019-02-22 2019-02-22 Method and device for training model

Publications (2)

Publication Number Publication Date
CN111612153A CN111612153A (en) 2020-09-01
CN111612153B true CN111612153B (en) 2024-06-14

Family

ID=72143917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910135464.6A Active CN111612153B (en) 2019-02-22 2019-02-22 Method and device for training model

Country Status (2)

Country Link
CN (1) CN111612153B (en)
WO (1) WO2020168761A1 (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100145B (en) * 2020-09-02 2023-07-04 南京三眼精灵信息技术有限公司 Digital model sharing learning system and method
CN112269794A (en) * 2020-09-16 2021-01-26 连尚(新昌)网络科技有限公司 Method and equipment for violation prediction based on block chain
CN112269793B (en) * 2020-09-16 2024-06-25 连尚(新昌)网络科技有限公司 Method and equipment for detecting user type based on blockchain
CN112232519B (en) * 2020-10-15 2024-01-09 成都数融科技有限公司 Joint modeling method based on federal learning
CN112329557A (en) * 2020-10-21 2021-02-05 杭州趣链科技有限公司 Model application method and device, computer equipment and storage medium
CN115735214A (en) * 2021-06-02 2023-03-03 北京小米移动软件有限公司 Model training method, model training device and storage medium
CN116233857A (en) * 2021-12-02 2023-06-06 华为技术有限公司 Communication method and communication device
CN116643954A (en) * 2022-02-14 2023-08-25 大唐移动通信设备有限公司 Model monitoring method, monitoring terminal, device and storage medium
CN116887290A (en) * 2022-03-28 2023-10-13 华为技术有限公司 Communication method and device for training machine learning model
CN117196071A (en) * 2022-05-27 2023-12-08 华为技术有限公司 Model training method and device
CN117221944A (en) * 2022-06-02 2023-12-12 华为技术有限公司 Communication method and device
WO2024026846A1 (en) * 2022-08-05 2024-02-08 华为技术有限公司 Artificial intelligence model processing method and related device
WO2024065682A1 (en) * 2022-09-30 2024-04-04 Shenzhen Tcl New Technology Co., Ltd. Communication devices and methods for machine learning model training
CN117993516A (en) * 2022-11-04 2024-05-07 华为技术有限公司 Communication method and device
CN118114780A (en) * 2022-11-30 2024-05-31 华为技术有限公司 Distributed machine learning method, apparatus, storage medium, and program product
CN116566846B (en) * 2023-07-05 2023-09-22 中国电信股份有限公司 Model management method and system, shared node and network node
CN118052302A (en) * 2024-04-11 2024-05-17 北京钢研新材科技有限公司 Federal learning method and device for material data model

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9450978B2 (en) * 2014-01-06 2016-09-20 Cisco Technology, Inc. Hierarchical event detection in a computer network
US20150242760A1 (en) * 2014-02-21 2015-08-27 Microsoft Corporation Personalized Machine Learning System
US20150324690A1 (en) * 2014-05-08 2015-11-12 Microsoft Corporation Deep Learning Training System
EP3192016B1 (en) * 2014-09-12 2019-05-08 Microsoft Technology Licensing, LLC Computing system for training neural networks
CN105575389B (en) * 2015-12-07 2019-07-30 百度在线网络技术(北京)有限公司 Model training method, system and device
US20180089587A1 (en) * 2016-09-26 2018-03-29 Google Inc. Systems and Methods for Communication Efficient Distributed Mean Estimation
US10346944B2 (en) * 2017-04-09 2019-07-09 Intel Corporation Machine learning sparse computation mechanism
CN108345661B (en) * 2018-01-31 2020-04-28 华南理工大学 Wi-Fi clustering method and system based on large-scale Embedding technology
CN108596345A (en) * 2018-04-23 2018-09-28 薛泽 Machine learning and mistake making early warning device and method based on block chain
CN109145984B (en) * 2018-08-20 2022-03-25 联想(北京)有限公司 Method and apparatus for machine training
CN109325541A (en) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for training pattern

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Zhongyang Zheng等.《SpeeDO:Parallelizing Stochastic Gradient Descent for Deep Convolutional Neural Network》.《NIPS Workshop on Machine Learning in Computational Biology》.2016,第2-3节,图1. *

Also Published As

Publication number Publication date
WO2020168761A1 (en) 2020-08-27
CN111612153A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
CN111612153B (en) Method and device for training model
US11556856B2 (en) Cloud assisted machine learning
JP7159347B2 (en) MODEL UPDATE METHOD AND APPARATUS, AND SYSTEM
EP3742669B1 (en) Machine learning in radio access networks
WO2020048594A1 (en) Procedure for optimization of self-organizing network
US20230216737A1 (en) Network performance assessment
CN110430068A (en) A kind of Feature Engineering method of combination and device
CN112153658A (en) Delay reduction based on packet error prediction
WO2020152389A1 (en) Machine learning for a communication network
CN116671068A (en) Policy determination method and device
CN112396070A (en) Model training method, device and system, and prediction method and device
US11558262B2 (en) Method and an apparatus for fault prediction in network management
US20160094274A1 (en) Method and apparatus for managing a power line communication network in multi-flow environments
CN113541986B (en) Fault prediction method and device for 5G slice and computing equipment
US20230319707A1 (en) Power saving in radio access network
EP4222934A1 (en) Determining conflicts between kpi targets in a communications network
EP4258730A1 (en) Method and apparatus for programmable and customized intelligence for traffic steering in 5g networks using open ran architectures
US20230319662A1 (en) Method and apparatus for programmable and customized intelligence for traffic steering in 5g networks using open ran architectures
US20230325654A1 (en) Scalable deep learning design for missing input features
WO2023240592A1 (en) Apparatus, methods, and computer programs
US20240121622A1 (en) System and method for aerial-assisted federated learning
US20240232645A9 (en) Zone gradient diffusion (zgd) for zone-based federated learning
US20240135192A1 (en) Zone gradient diffusion (zgd) for zone-based federated learning
Cervelló Pastor et al. D3. 2 Final report on systems and methods for AI@ EDGE platform automation
CN114666221A (en) Network slice subnet operation and maintenance management method, device, system, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant