WO2020168761A1

WO2020168761A1 - Model training method and apparatus

Info

Publication number: WO2020168761A1
Application number: PCT/CN2019/118762
Authority: WO
Inventors: 王园园; 池清华; 徐以旭
Original assignee: 华为技术有限公司
Priority date: 2019-02-22
Filing date: 2019-11-15
Publication date: 2020-08-27
Also published as: CN111612153A; CN111612153B

Abstract

A model training method and apparatus in a wireless network, wherein same can realize joint learning in the wireless network, and can help to obtain a training model with a relatively high accuracy and generalization capability. In the method, a first node in a wireless network sends a first request to at least one second node in the wireless network, and each second node can locally retrain a first shared model based on local data thereof according to the first request, and can then report a local model by means of the first node, such that the first node can determine a second shared model according to the local model reported by the second node.

Description

Method and device for training model

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 22, 2019, with application number 2019101354646 and application titled "Method and Apparatus for Training Model", the entire content of which is incorporated into this application by reference.

Technical field

This application relates to the field of artificial intelligence (AI), and more specifically, to methods and devices for training models in the AI field.

Background technique

AI can be applied to wireless networks. As wireless networks have more spectrums, more types of services, and the number of access terminals, the network system will become more complex, which will also make the architecture of the wireless network and access network equipment more intelligent and automated. Combining the characteristics of various types of services and network architectures and device forms in wireless networks, a wireless intelligent network architecture has been defined.

In the wireless intelligent network architecture, machine learning can be used to train the model. The machine learning training model mainly adopts two forms: centralized training and local training. Among them, centralized training requires aggregating training data to a central node, which will cause high communication overhead, and there are also problems such as extended modeling time, data privacy issues in user data upload, and high pressure on central node storage and calculation. Local training does not need to report the model, and each local node only uses local data to model separately. Local training has the problem of insufficient data, which will cause the model to be inaccurate, and the model also has the problem of weak local generalization ability.

Therefore, there is an urgent need for a model training scheme in which a central node and a local node cooperate in the wireless intelligent network architecture.

Summary of the invention

The present application provides a method and device for training a model in a wireless network, which can implement joint learning in a wireless network, and helps to obtain a training model with high accuracy and generalization ability.

In the first aspect, a method for training a model applied to a wireless network is provided, the method is executed by a first node in the wireless network, and the method includes:

The first node sends a first request to at least one second node in the wireless network, and the first request is used to request the at least one second node to retrain the first shared model locally based on the local data of the second node. ；

The first node obtains the model report message from the at least one second node. The model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model. The first local model is obtained after each second node performs local model retraining on the first shared model based on the first request and local data;

The first node determines the second sharing model according to the model report message of the at least one second node and the first sharing model.

Therefore, in this embodiment of the present application, a first node in the wireless network sends a first request to at least one second node in the wireless network, and each second node can respond to the first request based on its local data. The shared model is retrained locally, and then the parameters of the trained local model or the increment between the parameters of the local model and the parameters of the shared model are reported to the first node through the model report message, and the first node can then report the A second node reports content and the first sharing model, and determines the second sharing model. Based on this, the embodiment of the present application can implement joint learning in a wireless network.

In the embodiment of the present application, joint learning refers to that the first node is used as a centralized node, and the second node is used as a local node. When the local data of the second node does not need to be uploaded to the first node, the first node and The second node learns together to train the model. Joint learning can overcome some or all of the shortcomings of the centralized training model, as well as some or all of the shortcomings of the local training model.

Specifically, compared with the centralized training model in the prior art, the embodiment of the present application does not require the local node to report training data to the centralized node, thereby greatly reducing the communication overhead caused by training data reporting, and can Reduce the pressure of the storage of ultra-large-scale data of centralized nodes and model training. In addition, local nodes in the embodiment of the present application perform distributed model training, which can shorten the time of model training and protect data privacy.

In addition, compared with the local training model in the prior art, the local node sends the local model after the local retraining to the centralized node, so that the centralized node can be based on the local model of at least one local node. Updating the shared model can help overcome the problem of insufficient data in local training, thereby improving the accuracy of the training model and the generalization ability of the model.

With reference to the first aspect, in some implementations of the first aspect, the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;

Wherein, the first node determining the second sharing model according to the model report message of the at least one second node and the first sharing model includes:

The first node determines the second local model in at least one first local model corresponding to the at least one second node, wherein the size of the training data set corresponding to the second local model is greater than or equal to the first threshold, and/or, the second The prediction error of the local model is less than or equal to the second threshold;

The first node determines the second sharing model according to the second local model and the first sharing model.

Therefore, in the embodiment of the present application, when the first node determines the second shared model according to the first local model, the size of the training data set is selected from at least one first local model to be greater than or equal to a certain threshold, and/or the prediction error The second local model is less than or equal to a certain threshold, and then the second local model is used to determine the second shared model. Since the accuracy of the second local model in the embodiment of this application is higher than that of the first local model, this application is implemented Examples can help improve the accuracy of the second shared model.

With reference to the first aspect, in some implementations of the first aspect, the first node determining the second sharing model according to the second local model and the first sharing model includes:

The first node performs weighted average aggregation on the parameters of the second local model, or the increment between the parameters of the second local model and the parameters of the first shared model, where the weight parameter used in the weighted average aggregation is based on the second The size of the training data set corresponding to the local model, and/or the prediction error of the second local model is determined;

The first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.

As an example, the weight parameter may be the reciprocal of the total number of second local models, and at this time, the weight parameters of each second local model are the same. Or, the weight parameter of the second local model may be the ratio of the size of the training data set corresponding to the second local model to the size of all training data sets, where the size of all training data sets is corresponding to each second local model The sum of the size of the training data set. Alternatively, the weight parameter of each second local model may be the inverse of the corresponding MAE.

Optionally, the first node may not filter the at least one second local model from the at least one first local model, but directly determine the second shared model based on the at least one first local model. At this time, the parameters of the first local model or the increments between the parameters of the first local model and the parameters of the shared model may be aggregated by weighted average. At this time, as an example, the weight parameter of each first local model may be the reciprocal of the total number of first local models. Then, the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.

With reference to the first aspect, in some implementations of the first aspect, the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function, and the method further includes:

The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, where the joint learning strategy includes at least one of the following information:

Joint learning start conditions, information of the first shared model, joint learning group member identification, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model, processing strategy of the first local model Or share the model update strategy.

Therefore, the embodiment of the application manages the joint learning process of the first node and the second node in the wireless network through a joint learning strategy, including orchestration management including joint learning start conditions, model upload strategies, model screening strategies, and model aggregation At least one of strategy, model processing strategy, etc. Based on this, the embodiment of the present application can realize that under the condition that the local data of the second node does not need to be uploaded to the first node, the first node and the second node can learn together to obtain a training model with high accuracy and generalization ability.

Optionally, in this embodiment of the application, the first node sends a first request for joint learning training to the second node when the joint learning start condition is satisfied.

With reference to the first aspect, in some implementations of the first aspect, the foregoing second node includes local analysis and modeling functions;

Wherein, sending the first request by the first node to at least one second node in the radio access network includes:

When the above-mentioned joint learning start conditions are met, the centralized analysis and modeling function in the first node sends a first request to the local analysis and modeling function of each second node in the at least one second node. Here, the first The request includes the information of the first shared model.

Therefore, by setting the starting conditions of the joint learning in the embodiment of the present application, it is possible to use the centralized training model for model training when the joint learning starting conditions are not met, and the joint learning method when the joint training starting conditions are met. Perform model training.

With reference to the first aspect, in some implementation manners of the first aspect, the information of the first sharing model includes at least one of the following:

The model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.

With reference to the first aspect, in some implementation manners of the first aspect, the first request further includes the upload strategy of the first local model.

In this embodiment of the application, the first node sends the upload strategy of the first local model to the second node, so that the first node can instruct the second node how to upload the first local model. Correspondingly, after the second node obtains the upload strategy of the first local model, it can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.

With reference to the first aspect, in some implementations of the first aspect, the upload strategy of the first local model includes at least one of the following:

The identification of the processing algorithm before uploading the first local model, the upload time of the first local model, or the carrying information when the first local model was uploaded, where the carrying information includes the size and/or the training data set of the first local model Forecast error.

In combination with the first aspect, some implementations of the first aspect further include:

The first node determines that the prediction error of the second sharing model is less than or equal to the third threshold;

The first node respectively sends a model update notification message to at least one second node, where the model update notification message is used to request each second node to install the second shared model.

Therefore, the embodiment of the present application judges the prediction error of the second sharing model, and when it is determined that the prediction error is less than or equal to the preset threshold, the first node saves the second sharing model, and the second node installs the second sharing model. When it is determined that the prediction error is greater than the preset threshold, the first node does not update the first shared model, and the second node does not install the second shared model. Based on this, the embodiments of the present application can avoid installing a shared model with low accuracy, and further ensure the accuracy and generalization ability of the updated shared model.

In a possible implementation manner, the prediction error of the first sharing model may be set as the third threshold, which is not limited in the embodiment of the present application.

In a second aspect, a method for training a model applied to a radio access network is provided, characterized in that the method is executed by a second node in the radio access network, and the method includes:

The second node receives the first request from the first node in the radio access network;

According to the first request, the second node performs local model retraining on the first shared model based on the local data of the second node to obtain the first local model;

The second node sends a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the increment between the parameters of the first local model and the parameters of the first shared model, and the model The report message is used to update the first sharing model.

With reference to the second aspect, in some implementations of the second aspect, the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.

The first node can filter the size of the training data set corresponding to the first local model in the model report message and/or the prediction error to filter out that the size of the training data set is greater than or equal to a certain threshold, and/or the prediction error is less than Or a second local model equal to a certain threshold, and then determine the second shared model through the second local model. Since the accuracy of the second local model in the embodiment of the present application is higher than that of the first local model, the embodiment of the present application can help improve the accuracy of the second shared model.

With reference to the second aspect, in some implementations of the second aspect, the second node includes local analysis and modeling functions, and the first node includes centralized analysis and modeling functions;

Wherein, the second node receiving the first request from the first node in the radio access network includes:

The local analysis and modeling function in the second node receives a first request from the local analysis and modeling function in the first node, where the first request includes the information of the first shared model.

With reference to the second aspect, in some implementations of the second aspect, the first request further includes the upload strategy of the first local model.

In this embodiment of the application, the first node sends the upload strategy of the first local model to the second node, so that the first node can instruct the second node how to upload the first local model. Correspondingly, after obtaining the upload strategy of the first local model, the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.

With reference to the second aspect, in some implementations of the second aspect, the upload strategy of the first local model includes at least one of the following:

The identification of the processing algorithm before the upload of the first local model, the upload time of the first local model or the information carried when the first local model was uploaded; wherein the information carried when the first local model was uploaded includes the training data of the first local model Set size and/or prediction error.

With reference to the second aspect, in some implementations of the second aspect, the second node further includes a local adaptive strategy function, and the method further includes:

The local analysis and modeling function sends a third request to the local adaptive strategy. The third request is used to request the local joint learning strategy corresponding to the first shared model. The local joint learning strategy is used to indicate whether the second node is The shared model performs local model retraining, and the third request includes the information of the first shared model;

The local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy;

When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local model retraining on the first shared model based on the local data.

Therefore, the embodiment of the present application sends the local joint learning strategy to the local analysis and modeling function through the local adaptive strategy function, so that the second node can determine whether to participate in the joint learning according to its own computing ability, so as to avoid the second node from calculating by itself. Insufficient ability leads to a longer iteration time of joint learning and improves the efficiency of joint learning.

Optionally, in this embodiment of the application, the local adaptive strategy function may not send the local joint learning strategy to the local analysis and modeling function. Instead, when the second node receives a request for joint learning training, it will always Participate in joint learning.

With reference to the second aspect, in some implementation manners of the second aspect, the information of the first sharing model includes at least one of the following:

In combination with the second aspect, some implementations of the second aspect further include:

The second node receives the model update notification message sent by the first node, where the model update notification message is used to notify the second sharing model, where the second sharing model is determined by the first node according to the model report message and the first sharing model of;

The second node installs the second sharing model when it determines that the prediction error of the second sharing model is less than the fourth threshold.

In a possible implementation manner, the prediction error of the first sharing model may be set as the fourth threshold, which is not limited in the embodiment of the present application.

Optionally, in this embodiment of the present application, when the functions of the intelligent network element are deployed in one, the interactive information during joint learning between the first node and the second node may directly pass through the communication between the first node and the second node. Interface transmission.

In a third aspect, a device for training a model is provided. The device may be a first node in a wireless network or a chip in the first node. As an example, the first node may be a centralized node, or a central node. The device has the function of realizing the above-mentioned first aspect and various possible implementation modes. This function can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-mentioned functions.

In a possible design, the device includes a transceiver module. Optionally, the device further includes a processing module. The transceiver module may be, for example, at least one of a transceiver, a receiver, and a transmitter. The transceiver module Can include radio frequency circuits or antennas. The processing module may be a processor. Optionally, the device further includes a storage module, and the storage module may be a memory, for example. When a storage module is included, the storage module is used to store instructions. The processing module is connected to the storage module, and the processing module can execute the instructions stored in the storage module or from other instructions, so that the device executes the above-mentioned first aspect and various possible implementation methods.

In another possible design, when the device is a chip, the chip includes a transceiver module. Optionally, the chip also includes a processing module. The transceiver module may be, for example, an input/output interface or pin on the chip. Or circuits, etc. The processing module may be a processor, for example. The processing module can execute instructions so that the chip in the terminal executes the first aspect and any possible implementation communication methods. Optionally, the processing module may execute instructions in the storage module, and the storage module may be a storage module in the chip, such as a register, a cache, and the like. The storage module may also be located in the communication device but outside the chip, such as read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory) memory, RAM) etc.

Among them, the processor mentioned in any of the above can be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the above The integrated circuit of the program execution of the method of the first aspect and various possible implementation modes.

In a fourth aspect, a device for training a model is provided. The device may be a second node in a wireless network or a chip in the second node. As an example, the second node may be a local node or a distributed edge node. The device has the function of realizing the above-mentioned second aspect and various possible implementation manners. This function can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-mentioned functions.

In a possible design, the device includes a transceiver module. Optionally, the device further includes a processing module. The transceiver module may be, for example, at least one of a transceiver, a receiver, and a transmitter. The transceiver module Can include radio frequency circuits or antennas. The processing module may be a processor. Optionally, the device further includes a storage module, and the storage module may be a memory, for example. When a storage module is included, the storage module is used to store instructions. The processing module is connected to the storage module, and the processing module can execute instructions stored in the storage module or from other instructions, so that the device executes the communication methods of the second aspect and various possible implementation manners.

In another possible design, when the device is a chip, the chip includes a transceiver module. Optionally, the device also includes a processing module. The transceiver module may be, for example, an input/output interface or pin on the chip. Or circuits, etc. The processing module may be a processor, for example. The processing module can execute instructions so that the chip in the terminal executes the first aspect and any possible implementation methods. Optionally, the processing module may execute instructions in the storage module, and the storage module may be a storage module in the chip, such as a register, a cache, and the like. The storage module may also be located in the communication device but outside the chip, such as read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory) memory, RAM) etc.

Among them, the processor mentioned in any of the above can be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the above The second aspect and various possible implementations are integrated circuits for program execution.

In a fifth aspect, a computer storage medium is provided, the computer storage medium stores program code, and the program code is used to instruct instructions to execute the method in the first aspect or the second aspect or any possible implementation manners thereof.

In a sixth aspect, a computer program product containing instructions is provided, which when running on a computer, causes the computer to execute the method in the first aspect or the second aspect or any possible implementation manner thereof.

In a seventh aspect, a communication system is provided. The communication system includes a device capable of implementing the methods and various possible designs of the foregoing first aspect, and the foregoing device capable of implementing the various methods and various possible designs of the foregoing second aspect. Functional device.

In an eighth aspect, a processor is provided, configured to be coupled with a memory, and configured to execute the method in the first aspect or the second aspect or any possible implementation manners thereof.

In a ninth aspect, a chip is provided. The chip includes a processor and a communication interface. The communication interface is used to communicate with an external device or an internal device. The processor is used to implement the first aspect or the second aspect or any possible The method in the implementation mode.

Optionally, the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored in the memory or instructions derived from other sources. When the instruction is executed, the processor is used to implement the method in the foregoing first aspect or second aspect or any possible implementation manner thereof.

Optionally, the chip can be integrated on the first node or the second node.

Description of the drawings

Fig. 1 shows a schematic diagram of a system architecture applying an embodiment of the present application.

Fig. 2 shows a schematic diagram of an intelligent network architecture to which an embodiment of the present application is applied.

Figure 3 shows a schematic flowchart of a method for training a model provided by an embodiment of the present application

Fig. 4 shows a schematic flowchart of a method for training a model provided by an embodiment of the present application.

Fig. 5 shows a schematic flowchart of a method for training a model provided by an embodiment of the present application.

Fig. 6 is a schematic block diagram of an apparatus for training a model provided by an embodiment of the present application.

Fig. 7 is a schematic block diagram of another device for training a model provided by an embodiment of the present application.

Fig. 8 is a schematic block diagram of another device for training a model provided by an embodiment of the present application.

Fig. 9 is a schematic block diagram of another device for training a model provided by an embodiment of the present application.

detailed description

The technical solution in this application will be described below in conjunction with the drawings.

Fig. 1 shows a schematic diagram of a system architecture 100 to which an embodiment of the present application is applied. As shown in FIG. 1, the system architecture 100 includes a first node 110 and at least one second node 120. Specifically, the system 100 is, for example, a wireless network, the first node 110 may be a centralized node or a central node, and the second node may be a local node or a distributed edge node, which is not limited in the embodiment of the present application.

As an example, the first node 110 or the second node 120 may be respectively deployed in a radio access network (RAN), or may be deployed in a core network, or an operation support system (OSS), Alternatively, the second node 120 may be a terminal device in a wireless network, which is not specifically limited in the embodiment of the present application. In a possible situation, the first node 110 and the second node 120 may both be deployed in the RAN. Or, in another possible situation, the first node 110 may be deployed in the OSS or the core network, and the second node 120 may be deployed in the RAN. Or, in another possible situation, the first node 110 may be deployed in the RAN, and the second node 120 is a terminal device.

Optionally, the first node 110 or the second node 120 in the aforementioned system architecture 100 may be implemented by one device, or may be implemented by multiple devices, or may be a functional module in one device. This is not specifically limited. It is understandable that the above-mentioned functions may be network elements in hardware devices, software functions running on dedicated hardware, or virtualization functions instantiated on a platform (for example, a cloud platform). The embodiments of the present application There is no restriction on this.

Fig. 2 shows a schematic diagram of an intelligent network architecture 200 to which an embodiment of the present application is applied. The intelligent network architecture 200 is a layered architecture that can meet the differentiated requirements of different scenarios for computing resources and execution cycles as needed. The intelligent network architecture 200 may specifically be an intelligent wireless network architecture. As shown in Figure 2, the intelligent network architecture 200 includes an operation-support system (OSS), at least one cloud radio access network (C-RAN), and an evolved Node B (evolved Node B). B, eNB) or next generation node B (next genetation NodeB, gNB). Among them, each C-RAN may include a separate centralized unit (centralized unit, CU) and at least one distributed unit (distributed unit, DU).

In the embodiments of this application, in terms of logical function or deployment level, OSS is a more centralized node than C-RAN, eNB or gNB; CU in C-RAN is more centralized than DU Node: Compared with eNB or gNB, C-RAN is a more concentrated node. Correspondingly, in some possible implementations of the embodiments of the present application, OSS may be referred to as a centralized node, and C-RAN, eNB, or gNB may be referred to as a local node; the CU in C-RAN may be referred to as a centralized node, and The DU in the C-RAN is called a local node; the C-RAN is called a centralized node, and the eNB or gNB is called a local node.

In the intelligent network architecture 200 shown in FIG. 2, the centralized node may correspond to the first node 110 in FIG. 1, and the local node may correspond to the second node 120 in FIG.

Optionally, in the intelligent network architecture, at least one of OSS, CU, DU, eNB, and gNB may include a data analysis (DA) function (or network element). As an example, the DA function (or network element) can be deployed in a higher location, such as in the OSS, at this time, it can be referred to as operation-support system data analysis (OSSDA). The DA function (or network element) can also be deployed in 5G CU, 5G DU, 5G all-in-one gNB or eNB. In this case, it can be referred to as radio access network data analysis (RANDA). Alternatively, the DA function (or network element) can also be deployed independently, which is not limited in the embodiment of the present application. OSSDA or RANDA can provide data integration and programmable feature engineering, algorithm framework integration with rich machine learning algorithm libraries, and a general architecture that supports separation of training and execution.

Specifically, AI-based wireless intelligent services mainly include a closed loop consisting of data collection, feature engineering, algorithm design and training modeling, model evaluation, and prediction execution. As an example, to map the above functions to the network architecture, the DA function (or network element) can be abstracted and merged into four functional modules, namely: data service function (DSF), analysis and modeling function (analysis&modeling) function, A&MF), model execution function (model execution function, MEF) and adaptive policy function (adaptive policy function, APF).

DSF mainly completes data collection, data preprocessing, feature engineering and other steps, and provides training data and feature vector subscription services to A&MF and MEF. DSF has the programmability of customized feature engineering of data, and the ability to perform data collection, preprocessing and feature engineering according to the requirements of A&MF training algorithm or MEF prediction model.

The role of A&MF is to execute machine learning training algorithms and generate machine learning models. A&MF includes a library of commonly used machine learning algorithms, which sends the machine learning model generated by training to the MEF.

MEF receives and installs the model issued by A&MF, subscribes the feature vector to DSF according to A&MF's instructions, completes the prediction, and sends the prediction result and the operation instructions corresponding to the result to APF.

APF is the last execution effective link of process operation. The strategy set is stored in the APF, which completes the conversion from the result of model prediction to the execution strategy. As an example, the strategy set includes the prediction result, the operation instruction corresponding to the prediction result, and the corresponding relationship of the execution strategy. When the APF obtains the prediction result and the operation instruction corresponding to the prediction result, the corresponding execution is determined and executed according to the corresponding relationship. Strategy.

Further, when a logical function is deployed at the centralized node and the local node at the same time, the logical function in the centralized node can be called a central logical function, and the logical function in the local node is a local logical function. For example, DSF, A&MF, MEF, and APF deployed at centralized nodes can be called centralized DSF (central DSF, C-DSF), centralized A&MF (central A&MF, C-A&MF), and centralized MEF (central MEF, C-MEF). ) And centralized APF (central APF, C-APF), the DSF, A&MF, MEF and APF deployed in the local node are called local DSF (local DSF, L-DSF), local A&MF (local A&MF, L-A&MF), Local MEF (local MEF, L-MEF) and local APF (local APF, L-APF).

It should be noted that, in the embodiments of the present application, network elements may be deployed according to the characteristics of services and computing resources. At this time, the functions deployed by different network elements may be different. For example, the above four functions can be deployed on the local node side: DSF, A&MF, MEF, and APF; on the centralized node side, only DSF, APF, and A&MF can be deployed. In addition, in order to complete a complete training and prediction task, different functions exist in network element or cross-network element coordination.

It should be noted that the names of the above-mentioned functions in the embodiments of the present application are only taken as an example. In specific implementation, the names of the functions in the wireless network 200 may also be other names, which are not specifically limited in the embodiments of the present application.

FIG. 3 shows a schematic flowchart of a method 300 for training a model provided by an embodiment of the present application. The method 300 may be applied to the system architecture 100 shown in FIG. 1 and may also be applied to the intelligent system architecture 200 shown in FIG. 2, but the embodiment of the present application is not limited thereto.

For the convenience of description, this application uses the first node and the second node as examples to describe the method 300 for training a model. For the chip in the first node and the implementation method of the chip in the second node, please refer to the specific description of the first node and the second node, and the description will not be repeated.

310. The first node sends a first request to at least one second node, where the first request is used to request the at least one second node to perform local model reconfiguration on the first shared model based on the local data of the second node. training. The first shared model may be obtained by the first node performing parameter training on the initial model according to the training data.

320. Each of the at least one second node performs local model retraining on the first shared model based on the local data of the second node according to the first request to obtain the first local model . Here, the local model retraining refers to that the second node retrains the parameters of the first shared model based on local data.

330. The at least one second node respectively sends a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model The increment between. Here, the increment between the parameters of the first local model and the parameters of the first shared model is the amount of change of the parameters of the first local model relative to the parameters of the first shared model.

340. The first node determines a second sharing model according to the model report message of at least one second node and the first sharing model.

Wherein, the second shared model is a new shared model obtained through training in steps 310 to 340, and the first shared model is an old shared model before the above-mentioned training.

Therefore, the scheme of joint learning to train a model in a wireless network according to an embodiment of the present application can help to obtain a training model with high accuracy and generalization ability.

It should be noted that in this embodiment of the present application, the "parameter of the model" included in the model report message is used to indicate the model that the model report message needs to report. In some descriptions of the embodiments of the present application, "parameters of the model" can be replaced with "models", and the two have equivalent meanings. As an example, the model report message may be described as including the first local model or the increment between the first local model and the first shared model.

In an optional embodiment of the present application, the first node sends the first request for joint learning training to the second node when the joint learning start condition is satisfied. As an example, the joint learning starting condition may be that the first node cannot obtain training data, or the calculation pressure of the first node exceeds a certain index.

By setting the starting conditions of the joint learning in the embodiment of the application, it is possible to use a centralized training model for model training when the joint learning starting conditions are not met, and to use a joint learning method for model training when the joint training starting conditions are met. training.

In an optional embodiment of the present application, the first request may include the information of the first sharing model, so that the second node determines the first sharing model according to the first request. As an example, the information of the first shared model includes at least one of the following: model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.

Optionally, the first request may also include the upload strategy of the first local model. In this way, the first node sends the upload strategy of the first local model to the second node, so that the first node can instruct the second node how to upload the first local model. The first local model is the local model uploaded by the second node. As an example, the first local model is a model obtained by performing local model retraining on the first shared model according to the local data of the at least one second node.

Correspondingly, after obtaining the upload strategy of the first local model, the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.

As an example, the upload strategy of the first local model includes at least one of the following: the identification of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the upload time of the first local model Carry information. Wherein, the carrying information includes the size and/or prediction error of the training data set of the first local model.

Among them, the processing algorithm before uploading the first local model includes, for example, performing an incremental operation algorithm between the model obtained by local retraining and the first shared model issued by the first node, or through parameter pruning, quantization, low-rank decomposition, and sparseness. Algorithms such as optimization are used to perform model compression compression algorithms, or encryption algorithms used to perform model encryption through layer obfuscation or conversion of parameters into codes, which are not limited in the embodiments of the present application.

The training data set of the first local model, that is, the training data set used when the second node retrains the first shared model locally based on the local data. The size of the training data set is, for example, the amount of training data in the training data set, which is not limited in the embodiment of the present application.

After the second node is trained to obtain the local model, the prediction data set can be used to predict the local model to obtain the prediction error of the local model. As an example, the prediction error of the local model is, for example, the mean absolute error (MAE) of the local model, or the mean squared error (MSE), which is not limited in the embodiment of the present application.

Optionally, in this embodiment of the present application, the model report message may further include the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model. At this time, step 340 may specifically be:

The first node determines a second local model in at least one first local model corresponding to at least one second node, wherein the size of the training data set corresponding to the second local model is greater than or equal to the first threshold, and/or, The prediction error of the second local model is less than or equal to the second threshold. Then, the first node determines a second sharing model according to the second local model and the first sharing model. Here, the number of the second local model may be one or more, which is not limited in the embodiment of the present application.

Specifically, when the amount of data in the training data set used in the retraining of the local model is sufficient, the accuracy of the local model obtained by training is high and the generalization ability is strong. However, when the amount of data in the training data set used in the retraining of the local model is insufficient, the accuracy of the trained local model will be low, and the generalization ability will be weak.

That is, the first node selects at least one second local model that satisfies the screening conditions from at least one first local model, and the accuracy or generalization ability of the at least one second local model is higher than that of the at least one first local model. model. At this time, the first node may delete the local model whose size of the training data set in the first local model is less than the first threshold, or the prediction error is greater than the second threshold.

Optionally, in this embodiment of the present application, the first node may calculate the parameter of the second local model, or the increment between the parameter of the second local model and the parameter of the first shared model, Perform weighted average aggregation, wherein the weight parameter used in the weighted average aggregation is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model. Then, the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.

In a possible situation, when the model report message includes the parameters of the first local model, the first node performs weighted average aggregation on the parameters of the second local model after determining the second local model. Correspondingly, the first node may determine the result obtained by the weighted average aggregation of the at least one second local model as the second shared model.

In another possible situation, when the model report message includes the increment between the parameters of the first local model and the parameters of the shared model, after determining the second local model, the first node compares the parameters of the second local model with The increments between the parameters of the shared model are aggregated on a weighted average. Correspondingly, the first node determines the result obtained by weighted average aggregation of at least one second local model as the increment of the first shared model, and then determines the sum of the increment of the first shared model and the first shared model It is the second sharing model.

As an example, the weight parameter may be the reciprocal of the total number of second local models, and at this time, the weight parameters of each second local model are the same. Or, the weight parameter of the second local model may be the ratio of the size of the training data set corresponding to the second local model to the size of all training data sets, where the size of all training data sets is corresponding to each second local model The sum of the size of the training data set. Alternatively, the weight parameter of each second local model may be the inverse of the corresponding MAE. It should be understood that the foregoing examples of weight parameters are merely examples, and the embodiments of the present application are not limited thereto.

In some optional embodiments of the present application, the first node may not filter at least one second local model from the at least one first local model, but directly determine the second shared model based on the at least one first local model. At this time, the parameters of the first local model or the increments between the parameters of the first local model and the parameters of the shared model may be aggregated by weighted average. At this time, as an example, the weight parameter of each first local model may be the reciprocal of the total number of first local models. Then, the first node determines the second sharing model based on the weighted average aggregation result and the first sharing model.

Optionally, in this embodiment of the present application, after step 340, the first node or the second node may determine whether the prediction error of the second sharing model is less than a preset threshold. When it is determined that the prediction error of the second sharing model is less than the preset threshold, it means that the accuracy of the second sharing model can meet the requirements. Here, the accuracy of the second sharing model can be determined based on the prediction data set.

In a possible situation, the first node can determine whether the prediction error of the second sharing model is less than the third threshold.

When it is determined that the prediction error of the second shared model is less than or equal to the third threshold, the first node updates the first shared model to the second shared model, and the first node sends a model update notification message to the at least one second node respectively The model update notification message is used to request each second node to install a second shared model. When it is determined that the prediction error of the second shared model is greater than the third threshold, the first node does not send the model update notification message to the second node. In addition, the first node may delete the second shared model without updating the saved first shared model.

In another possible situation, the second node may determine whether the prediction error of the second sharing model is less than the fourth threshold.

Specifically, the first node may send a model update notification message to at least one second node, where the model update notification message is used to notify the second sharing model. After receiving the model update notification message, the second node can determine whether the prediction error of the second shared model indicated by the model update notification message is less than the fourth threshold. When the second node determines that the prediction error of the second sharing model is less than or equal to the fourth threshold, the second sharing model is installed. When the second node determines that the prediction error of the second sharing model is greater than the fourth threshold, the second sharing model is not installed.

At this time, the second node does not need to send its locally stored prediction data set to the first node, which can reduce communication signaling overhead between network elements.

In the embodiment of the present application, the third threshold and the fourth threshold may be the same or different, which is not limited in the embodiment of the present application.

In a possible implementation manner, the prediction error of the first sharing model may be set to the third threshold or the fourth threshold, which is not limited in the embodiment of the present application.

That is, in some optional embodiments, the prediction error of the second shared model may be compared with the prediction error of the first shared model. If the prediction error of the second shared model is smaller than the first shared model, the first node saves the second shared model, and the second node installs the second shared model. If the prediction error of the second sharing model is greater than or equal to the first sharing model, the first node does not update the first sharing model, and the second node does not install the second sharing model.

FIG. 4 shows a schematic flowchart of a method 400 for training a model provided by an embodiment of the present application. It should be understood that FIG. 4 shows the steps or operations of the method for training a model, but these steps or operations are only examples, and the embodiment of the present application may also perform other operations or variations of each operation in FIG. 4. In addition, the various steps in FIG. 4 may be performed in a different order from that presented in FIG. 4, and it is possible that not all operations in FIG. 4 are to be performed.

The first node in Figure 4 includes centralized APF (C-APF), centralized DSF (C-DSF), centralized A&MF (C-A&MF), and the second node includes local APF (L-APF), local MEF (L- MEF), local DSF (L-DSF), local A&MF (L-A&MF). Specifically, each functional module can be referred to the description in FIG. 2. To avoid repetition, details are not repeated here.

401. The C-APF in the first node sends a joint learning strategy to C-A&MF.

Specifically, a joint learning strategy can be stored in the C-APF, and the joint learning strategy is used to indicate how the first node and the second node perform joint learning. As an example, the joint learning strategy may include at least one of the following: joint learning start conditions, shared model information, joint learning group member identification, local model upload strategy, local model screening strategy, local model aggregation strategy, Local model processing strategy or shared model update strategy.

The starting condition of the joint learning is, for example, that the C-DSF cannot obtain the subscription data, or the computing resource of the C-A&MF exceeds a certain threshold, which is not limited in the embodiment of the present application. In the embodiments of the present application, by setting the starting conditions of the joint learning, it is possible to implement model training by using a centralized training model when the joint semester starting conditions are not met, and using a joint learning method for model training when the joint learning starting conditions are met. For example, the centralized training model can be used for model training when D-SDF can obtain subscription data, the model training calculation is small, or the computing resources of C-A&MF are sufficient, or the computing resources of L-A&MF are insufficient.

The identification of the members of the joint learning group, for example, may include the identification of at least one second node participating in the joint learning, and for example, may include the identification of L-A&MF in each second node of the at least one second node participating in the joint learning. The application embodiment does not limit this.

In the above-mentioned joint learning strategy, the local model refers to the local model obtained by the second node after retraining the model based on local data, for example, including the first local model described in FIG. 3.

The screening strategy of the local model refers to a strategy for the first node (or C-A&MF in the first node) to screen out the second local model that meets the screening condition from at least one first local model. For example, it may include determining in the first local model that the size of the training data set is greater than or equal to the first threshold, and/or the local model whose prediction error is less than or equal to the second threshold. As an example, the local model screening strategy may be a model screening rule identifier, which is not limited in the embodiment of the present application.

The aggregation strategy of the local model is used to indicate the aggregation algorithm used by the first node (or the C-A&MF in the first node) to aggregate the local model and the calculation method of the weight parameter. As an example, the aggregation strategy of the local model may be a model aggregation algorithm identification. In the embodiments of the present application, the aggregation of local models can also be replaced with the integration of local models, and the two have the same meaning.

The processing strategy of the local model is used to instruct the first node (or C-A&MF in the first node) to process the acquired local model. The processing algorithm includes, for example, an incremental operation algorithm that performs an incremental operation between a model obtained by local retraining and a shared model issued by the first node, or a compression algorithm that performs model compression through algorithms such as parameter pruning, quantization, low-rank decomposition, and sparse optimization. Or encryption algorithms that perform model encryption through layer obfuscation or conversion of parameters into codes, which are not limited in the embodiment of the present application. As an example, the processing strategy of the local model may be the model processing algorithm identification.

The shared model update strategy is used to instruct the first node (or C-A&MF in the first node) to update the shared model. For example, when the prediction error of the new shared model is less than or equal to a certain threshold, the old shared model is updated to the new shared model. Or, in the case where the prediction error of the new shared model is less than or equal to the prediction error of the old shared model, the old shared model is updated to the new shared model.

In addition, the information of the shared model and the local model upload strategy can be referred to the above description, in order to avoid repetition, it will not be repeated here.

402. The C-A&MF in the first node sends a joint learning strategy issuance response to the C-APF, which is used to indicate that the C-A&MF receives the aforementioned joint learning strategy.

403. The first node and the second node perform data collection, model training, and model application.

Specifically, the L-DSF in the second node reports the collected data to the C-A&MF in the first node, and the C-A&MF performs model training to obtain a shared model. Then, C-A&MF delivers the shared model to L-MEF for model application.

As an example, C-A&MF in the first node can send a data subscription request to L-DSF in the second node. After receiving the data subscription request, L-DSF sends a data subscription response to C-A&MF, which carries the local data. The local data may include a training data set or a prediction data set, which is not limited in the embodiment of the present application.

It should be understood that the step of performing 403 is to train to obtain a shared model, and the shared model herein may also be referred to as an initial shared model. Subsequent steps 404 to 422 are to update the initial sharing model generated in step 403 to compensate for the poor accuracy of the initial sharing model due to insufficient or incomplete data, so that the sharing model will change when the network status in the wireless network changes. , Still has high accuracy and generalization ability.

404: The C-A&MF in the first node sends a joint learning training request to the L-A&MF in the second node.

Specifically, the C-A&MF in the first node determines whether the joint learning start condition is satisfied according to the joint learning start condition indicated in the joint learning strategy. For example, when C-A&MF determines that the subscription data cannot be obtained from the L-DSF of the second node, or the computing resource occupation of C-A&MF exceeds a preset threshold, it determines that the joint learning start condition is met. When the joint learning start conditions are met, C-A&MF sends a joint learning training request to L-A&MF. Otherwise, if the joint learning start condition is not satisfied, the model in the centralized training 403 is performed.

Here, the joint learning training request may correspond to a specific example of the first request in FIG. 3. Specifically, the joint learning training request may refer to the description in the first request in FIG. 3, and to avoid repetition, the details are not repeated here.

405. The L-A&MF in the second node sends a local joint learning strategy request to the L-APF.

Here, the local joint learning strategy is used to indicate whether the second node performs local model retraining on the shared model. Wherein, the local joint learning strategy request includes the information of the shared model. As an example, the information of the shared model can be obtained from the joint learning training request in step 404. Specifically, the information of the shared model can be referred to the above description, for the sake of brevity, it will not be described here.

406. The L-APF in the second node sends a local joint learning strategy response to the L-A&MF.

Specifically, L-APF determines whether to perform local model retraining on the shared model according to the utilization of local computing resources, that is, whether the second node participates in local joint learning. As an example, the local joint learning strategy response may be an indicator of whether to participate in the local joint learning.

Optionally, the local joint learning strategy may also include a model update strategy. For example, the model update strategy may indicate that when the prediction error of the new shared model is less than the prediction error of the old shared model or is less than or equal to a certain preset threshold, the old shared model is updated, otherwise the old shared model is continued to be used. As an example, the old sharing model is, for example, the initial sharing model obtained in 404, and the new sharing model is, for example, the sharing model obtained in step 413.

In the embodiment of this application, when the local joint learning strategy instructs the second node to retrain the shared model locally, the following steps 407 to 421 are executed. Otherwise, the following steps 407 to 421 are not executed.

Therefore, in the embodiment of the present application, the local joint learning strategy is sent to L-A&MF through L-APF, so that the second node can determine whether to participate in joint learning according to its own computing ability, so as to avoid the second node's lack of its own computing ability leading to joint learning. The learning iteration time is extended to improve the efficiency of joint learning.

It should be noted that, in the embodiment of the present application, steps 405 and 406 may not be performed. Instead, the second node always participates in the joint learning when receiving the joint learning training request, which is not limited in the embodiment of the present application.

407. The L-A&MF in the second node sends a joint learning training request response to the C-A&MF in the first node.

408: The L-A&MF in the second node sends a data subscription request to the L-DSF.

As an example, L-A&MF can send a data subscription request to L-DSF according to the model input and output in the joint learning training request and the model training data collection time. The data subscription request can carry the data representation and the data collection time.

409: The L-DSF in the second node sends a data subscription response to L-A&MF.

Specifically, L-DSF collects data according to the data subscription request in 408, and sends the collected data to L-A&MF.

410: The L-A&MF in the second node performs model retraining and model processing.

Specifically, L-A&MF retrains the local shared model according to the information of the shared model issued in step 404 and the training data obtained in 409.

Optionally, L-A&MF can also perform local model processing according to the identification of the processing algorithm issued in step 404 before uploading the local model, for example, the local model obtained during the retraining of the local model and the one issued in step 404 The shared model performs incremental operations (as an example, the local model parameters and the shared model parameters can be incrementally calculated). Then the model can be compressed by algorithms such as parameter pruning, quantization, low-rank decomposition, and sparse optimization, and model encryption can be performed by methods such as layer confusion or conversion of parameters to code.

411. The L-A&MF in the second node sends a local model upload notification to the C-A&MF in the first node. As an example, the local model upload notification includes the model identification of the shared model, the processed local model, and the size of the training data set corresponding to the local model, the prediction error of the local model, etc., which are not limited in the embodiment of the application .

Specifically, the local model upload notification may correspond to an example of the model report message in FIG. 3. Specifically, the local model upload notification can refer to the description of the model report message in FIG. 3, and to avoid repetition, it will not be repeated here.

412. The C-A&MF in the first node sends a local model upload notification response to the L-A&MF in the second node.

413. The C-A&MF in the first node performs model screening, aggregation, and processing.

Specifically, C-A&MF selects at least one local model that satisfies the condition from the received at least one local model (such as the at least one first local model described above) according to the local model screening strategy indicated in step 401 ( Such as the at least one second local model described above). Specifically, the method of screening by C-A&MF can be referred to the description above. For brevity, it will not be described in detail here.

Then, C-A&MF can aggregate at least one local model selected according to the local model aggregation strategy, such as weighted average aggregation. Specifically, the manner in which C-A&MF performs aggregation can be referred to the above description. For brevity, it will not be described in detail here.

Then, C-A&MF can process the aggregated model according to the local model processing strategy, such as compression or encryption. Specifically, the processing method of C-A&MF can be referred to the above description. For brevity, it will not be described in detail here.

In a possible situation of this application, each step shown in 4A in FIG. 4 is executed. Among them, 4A includes 414 to 416.

414. The C-A&MF in the first node sends a model update request #1 to the L-MEF in the second node.

In the embodiment of the present application, C-A&MF may test the shared model obtained in step 413 according to the test data in the test data set, and determine the prediction error of the shared model obtained in step 413. Here, the sharing model obtained in 413 may also be referred to as a new sharing model. Here, the new sharing model may correspond to an example of the second sharing model in FIG. 3.

When C-A&MF determines that the prediction error of the new shared model is less than a certain threshold, it sends the above model update request #1, where the model update request #1 may include the model identification of the new shared model and the fused model Parameters. As an example, when the training model is a neural network model, the parameters of the model may include at least one of weight parameters, bias parameters, or activation function information of each layer.

Here, the model update request #1 may be an example of the model update notification message sent when the first node in FIG. 3 determines that the prediction error of the second shared model is less than the third threshold.

When C-A&MF determines that the prediction error of the new shared model is greater than or equal to the threshold, steps 414 to 416 may not be performed.

415, L-MEF performs model update installation. Specifically, L-MEF replaces the current parameters of the model with the parameters of the model issued in 414.

Taking the model as a neural network model as an example, the L-MEF can replace the weight parameters, paranoid parameters, or activation functions of each layer of the neural network with the parameters of the model delivered in step 414.

416. The L-MEF in the second node sends a model update response #1 to the C-A&MF in the first node. Here, model update response #1 may indicate that the model update has been completed.

In another possible situation of this application, the steps shown in 4A in FIG. 4 can be replaced with the steps shown in 4B in FIG. 4. Among them, 4B includes 417 to 421.

417. The C-A&MF in the first node requests the L-A&MF mode model update request #2 in the second node.

In the embodiment of this application, after C-A&MF obtains the new shared model through step 413, it can send a model update request #2 to L-A&MF, where the model update request #2 can include the model of the new shared model Identification, and the parameters of the fused model. As an example, when the training model is a neural network model, the parameters of the model may include at least one of weight parameters, bias parameters, or activation function information of each layer.

Here, the model update request #2 may be an example of the model update notification message sent by the first node to the second node before the second node in FIG. 3 determines whether the prediction error of the updated shared model is less than the fourth threshold.

418. The L-A&MF in the second node sends a model update response #2 to the C-A&MF in the first node. Here, the model update response #2 can be used to notify the first node that the model update request #2 is received.

419: The L-A&MF in the second node sends a model installation request to the L-MEF.

For example, L-A&MF can determine whether the prediction error of the new shared model is greater than a preset threshold or the old shared model according to the model update strategy. As an example, when the prediction error of the new shared model is greater than or equal to the old shared model, the second node does not perform model installation on the new shared model, and steps 419 to 421 are not executed at this time. When the prediction error of the new shared model is smaller than the old shared model, the second node updates and installs the new shared model, and steps 419 to 421 are executed at this time.

Among them, the model installation request can carry the model identifier of the new shared model and the parameters of the fused model. Specifically, the model identification and the parameters of the fused model can be referred to the above description, for the sake of brevity, it will not be described here.

420. The L-MEF in the second node performs model update installation.

Specifically, 420 may refer to the description in 415, and for brevity, it is not described here.

421. The L-MEF in the second node sends a model installation response to L-A&MF, and the model installation response may indicate that the model update has been completed.

422, the second node performs model application.

Specifically, the L-MEF in the second node subscribes to the L-DSF the data required for model prediction, and performs model prediction. Then, the prediction result is sent to the local APF for policy execution.

It should be noted that, in an optional embodiment of the present application, once the first node and the second node start the joint learning, they can be executed cyclically according to the steps of the joint learning.

In another optional embodiment of this application, a joint learning stop condition can be set. When the joint learning stop condition is satisfied, the first node and the second node can stop the joint learning. As an example, the joint learning stop condition may include the execution time of the joint learning or the limited resources of the second node. That is to say, the embodiment of the present application may stop the joint learning after the execution time long after the joint learning is started, or stop the joint learning when some or all of the resources of the second node are limited.

Optionally, in the embodiment of the present application, the joint learning stop condition may be included in the joint learning strategy, or pre-configured in the first node or the second node, which is not limited in the embodiment of the present application.

FIG. 5 shows a schematic flowchart of a method 500 for training a model provided by an embodiment of the present application. It should be understood that FIG. 5 shows the steps or operations of the method for training a model, but these steps or operations are only examples, and the embodiment of the present application may also perform other operations or variations of each operation in FIG. 5. In addition, the various steps in FIG. 5 may be performed in a different order from that presented in FIG. 5, and it is possible that not all operations in FIG. 5 are to be performed.

In FIG. 5, CUDA and at least one DUDA are taken as examples for description. CUDA may correspond to an example of the first node above, and DUDA may correspond to an example of the second node above. In the embodiment of the present application, when the functions of the intelligent network element are deployed in one, the interactive information during joint learning between the first node and the second node can be directly transmitted through the interface between the first node and the second node.

It should be noted that the embodiments of this application take CUDA and DUDA as examples, but the embodiments of this application are not limited thereto. For example, CUDA can also be replaced with gNB or eNB or cell, and DUDA can be replaced with the gNB Or terminal equipment served by an eNB or a cell. For another example, CUDA can also be replaced with CU, and DUDA can be replaced with DU managed by the CU. For another example, CUDA can also be replaced with C-RAN, and DUDA can be replaced with an eNB or gNB managed by the C-RAN. For another example, CUDA may also be replaced by an eNB or gNB, and DUDA may be replaced by a cell managed by the eNB or gNB. For another example, CUDA can also be replaced with OSS, and DUDA can be replaced with network elements managed by OSS. For another example, both CUDA and CUDA can be replaced with gNB or eNB. The embodiment of the application does not specifically limit this.

501. CUDA sends a joint learning training request to at least one DUDA.

As an example, CUDA may send a joint learning training request to each DUDA of at least one DUDA when the joint learning start condition is satisfied. Specifically, the joint training request may refer to the above description, and may refer to the above description. For brevity, details are not repeated here.

502. Each DUDA of at least one DUDA sends a joint learning training request response to CUDA.

503. Each DUDA performs local model training and processing.

Specifically, DUDA performs data subscription, local model training, and processing according to the instructions in step 501. For details, please refer to the description of 410 in FIG. 4 above. For brevity, details are not described herein again.

504. Each DUDA sends a local model upload notification to CUDA.

505. CUDA sends a local model upload notification response to each DUDA.

Specifically, 504 and 505 can be referred to the descriptions of 411 and 412 in FIG. 4 above. For brevity, details are not repeated here.

506, CUDA performs model screening, fusion, and processing.

Specifically, 506 can refer to the description of 41/3 in FIG. 4 above, and for the sake of brevity, it will not be repeated here.

507. Each DUDA performs model update installation and model application.

Specifically, for 507, refer to the descriptions 414 to 421 in FIG. 4, and for the sake of brevity, details are not repeated here.

508, repeat steps 501 to 507.

Therefore, in this embodiment of the application, CUDA sends a joint learning training request to at least one DUDA, and each DUDA can locally retrain the shared model indicated by CUDA according to the joint learning training request, and then each DUDA will train the result The local model is reported to CUDA, and then CUDA can merge and process the local model reported by the at least one DUDA to determine a new shared model. Based on this, the embodiment of the present application can use the CU and the DU to separate the architecture. The inter-interface transmits interactive information during joint learning, and based on this, a shared model with high accuracy and generalization ability is obtained.

Based on the method of the foregoing embodiment, the communication device provided in the present application will be introduced below.

FIG. 6 shows a schematic structural diagram of an apparatus 600 for training a model in a wireless network provided by the present application. The apparatus 600 for training a model may be the first node in the wireless network. The device 600 for training a model includes: a sending unit 610, a receiving unit 620, and a determining unit 630.

The sending unit 610 is configured to send a first request to at least one second node in the wireless network, where the first request is used to request the at least one second node to perform a request based on the local data of the second node. The first shared model performs local model retraining.

The receiving unit 620 is configured to obtain a model report message from the at least one second node, where the model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and The increment between the parameters of the first shared model, wherein the first local model is that each second node performs a local model on the first based on the first request and the local data Obtained after retraining.

The determining unit 630 is configured to determine the second sharing model according to the model report message of the at least one second node and the first sharing model.

Optionally, the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;

The determining unit 630 is specifically configured to:

A second local model is determined in at least one first local model corresponding to the at least one second node, and the size of the training data set corresponding to the second local model is greater than or equal to a first threshold, and/or, the first 2. The prediction error of the local model is less than or equal to the second threshold;

Determine the second sharing model according to the second local model and the first sharing model.

Therefore, in the embodiment of the present application, when the first node determines the second shared model according to the first local model, the size of the training data set is selected from at least one first local model to be greater than or equal to a certain threshold, and/or the prediction error The second local model is less than or equal to a certain threshold, and then the second shared model is determined through the second local model. Since the accuracy or generalization ability of the second local model in the embodiment of this application is higher than that of the first local model, Therefore, the embodiments of the present application can help improve the accuracy and generalization ability of the second sharing model.

Optionally, the first node may not filter the at least one second local model from the at least one first local model, but directly determine the second shared model based on the at least one first local model.

Optionally, the determining unit 630 is specifically configured to:

Perform weighted average aggregation on the parameters of the second local model, or the increments between the parameters of the second local model and the parameters of the first shared model, wherein the weight used by the weighted average aggregation The parameter is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model;

The first node determines the second sharing model according to the weighted average aggregation result and the first sharing model.

Optionally, the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function:

Joint learning start conditions, information of the first shared model, identification of members of the joint learning group, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model , The processing strategy of the first local model or the shared model update strategy.

Optionally, the second node includes local analysis and modeling functions;

Wherein, the sending unit 610 is specifically configured to:

When the joint learning start condition is met, the centralized analysis and modeling function sends the first request to the local analysis and modeling function of each second node in the at least one second node, wherein the The first request includes the information of the first sharing model.

Optionally, the information of the first sharing model includes at least one of the following:

Optionally, the first request further includes the upload strategy of the first local model.

Optionally, the upload strategy of the first local model includes at least one of the following:

The identifier of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the information carried when the first local model was uploaded;

Wherein, the carrying information includes the size and/or prediction error of the training data set of the first local model.

Optionally, the determining unit 630 is further configured to determine that the prediction error of the second sharing model is less than or equal to a third threshold.

When the prediction error of the second shared model is less than or equal to the third threshold, the sending unit 610 is further configured to send a model update notification message to the at least one second node respectively, and the model update notification message is used to request The second shared model is installed in each of the second nodes.

Optionally, the sending unit 610 and/or the receiving unit 620 may also be collectively referred to as a transceiver unit (module), or a communication unit, which may be respectively used to perform the steps of receiving and sending by the first node in the method embodiment. The processing unit 630 is configured to generate instructions sent by the sending unit 610, or process instructions received by the receiving unit 620. Optionally, the device 600 for training a model may further include a storage unit for storing instructions executed by the sending unit, the receiving unit, and the processing unit.

The device 600 for training the model is the first node in the method embodiment, and may also be a chip in the first node. When the device 600 for training the model is the first node, the processing unit may be a processor, and the sending unit and the receiving unit may be transceivers. The apparatus for training a model may further include a storage unit, and the storage unit may be a memory. The storage unit is used to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the communication device executes the foregoing method. When the device 600 for training the model is a chip in the first node, the processing unit may be a processor, and the sending unit and receiving unit may be input/output interfaces, pins or circuits, etc.; the processing unit executes the storage unit In order to cause the communication device to perform the operations performed by the terminal device in the above method embodiment, the storage unit may be a storage unit in the chip (for example, a register, a cache, etc.), or may be in the terminal device A storage unit (for example, read-only memory, random access memory, etc.) located outside the chip.

Those skilled in the art can clearly understand that the steps performed by the apparatus 600 for training the model and the corresponding beneficial effects can be referred to the related description of the first node in the foregoing method embodiment, and for brevity, details are not described herein again.

It should be understood that the sending unit 610 and the receiving unit 620 may be implemented by a transceiver. The processing unit can be implemented by a processor. The storage unit can be realized by a memory. As shown in FIG. 7, the apparatus 700 for training a model may include a processor 710, a memory 720, and a transceiver 730. The device 700 for training the model may be the first node in the wireless network.

The device 600 for training a model shown in FIG. 6 or the device 700 for training a model shown in FIG. 7 can implement the steps performed by the first node in the foregoing embodiment. For similar descriptions, reference may be made to the description in the foregoing corresponding method. To avoid repetition, I won’t repeat them here.

FIG. 8 shows a schematic structural diagram of an apparatus 800 for training a model provided in this application. The device 800 for training the model may be the second node in the wireless network. The device 800 for training a model includes: a receiving unit 810, a processing unit 820, and a sending unit 830.

The receiving unit 810 is configured to receive a first request from a first node in the radio access network.

The processing unit 820 is configured to perform local model retraining on the first shared model based on the local data of the second node according to the first request to obtain the first local model.

The sending unit 830 is configured to send a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the first shared model. The increment between the parameters, the model report message is used to update the first shared model.

Optionally, the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.

In this way, the first node can filter out that the size of the training data set is greater than or equal to a certain threshold, and/or the prediction based on the size of the training data set corresponding to the first local model in the model report message, and/or the prediction error The second local model whose error is less than or equal to a certain threshold is then used to determine the second shared model through the second local model. Since the accuracy or generalization ability of the second local model in the embodiment of the application is higher than that of the first local model, the embodiment of the application can help improve the accuracy and generalization ability of the second shared model.

Optionally, the second node includes local analysis and modeling functions, and the first node includes centralized analysis and modeling functions;

The local analysis and modeling function in the second node obtains the first request from the local analysis and modeling function in the first node, wherein the first request includes the information of the first shared model.

Optionally, the first request further includes the upload strategy of the first local model. Correspondingly, after obtaining the upload strategy of the first local model, the second node can retrain locally according to the upload strategy to obtain the local model and perform corresponding processing operations.

Optionally, the second node also includes a local adaptive policy function,

The local analysis and modeling function sends a third request to the local adaptive strategy, the third request is used to request a local joint learning strategy corresponding to the first shared model, and the local joint learning strategy is used to Indicating whether the second node performs local model retraining on the first shared model, and the third request includes the information of the first shared model;

When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local retraining on the first shared model based on the local data. Model retraining.

Optionally, the second node receives a model update notification message sent by the first node, where the model update notification message is used to notify a second sharing model, where the second sharing model is the first node Determined according to the model report message and the first sharing model;

When the second node determines that the prediction error of the second sharing model is less than a fourth threshold, install the second sharing model.

Optionally, the receiving unit 810 and/or the sending unit 830 may also be collectively referred to as a transceiver unit (module), or a communication unit, which may be respectively used to perform the steps of receiving and sending by the second node in the method embodiment. The processing unit 820 is also configured to generate instructions sent by the sending unit 830, or process instructions received by the receiving unit 810. Optionally, the communication device 800 may further include a storage unit for storing instructions executed by the communication unit and the processing unit.

The device 800 for training the model is the second node in the method embodiment, and may also be a chip in the second node. When the device 800 for training the model is the second node, the processing unit may be a processor, and the sending unit and the receiving unit may be transceivers. The device may further include a storage unit, and the storage unit may be a memory. The storage unit is used to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the communication device executes the foregoing method. When the device 800 for the training model is a chip in the second node, the processing unit may be a processor, the sending unit and the receiving unit may be input/output interfaces, pins or circuits, etc.; the processing unit executes the storage unit Stored instructions to enable the communication device to perform the operations performed by the network device in the foregoing method embodiments. The storage unit may be a storage unit in the chip (for example, a register, a cache, etc.), or may be in the communication device. The storage unit (for example, read only memory, random access memory, etc.) located outside the chip.

Those skilled in the art can clearly understand that the steps performed by the device 800 for training the model and the corresponding beneficial effects can be referred to the related description of the second node in the foregoing method embodiment. For brevity, details are not described herein again.

It should be understood that the sending unit 830 and the receiving unit 810 may be implemented by a transceiver, and the processing unit 820 may be implemented by a processor. The storage unit can be realized by a memory. As shown in FIG. 9, the apparatus 900 for training a model may include a processor 910, a memory 920, and a transceiver 930. The device 900 for training the model may be the second node in the wireless network.

The device 800 for training a model shown in FIG. 8 or the device 900 for training a model shown in FIG. 9 can implement the steps performed by the second node in the foregoing method embodiment. For a similar description, reference may be made to the description in the foregoing corresponding method. To avoid repetition, I won’t repeat them here.

The first node or the second node in the device and method embodiment for training the model in each of the above device embodiments corresponds to the first node or the second node, and the corresponding modules or units execute the corresponding steps. For example, the sending and receiving unit (or communication unit, transceiver) method executes the sending and/or receiving steps in the method embodiment (or is executed by the sending unit and the receiving unit respectively), and other steps except sending and receiving can be performed by the processing unit (processor )carried out. For the functions of specific units, refer to the corresponding method embodiments. The sending unit and the receiving unit may form a transceiver unit, and the transmitter and receiver may form a transceiver to jointly implement the transceiver function in the method embodiment; the processor may be one or more.

It should be understood that the division of each unit described above is only a functional division, and there may be other division methods in actual implementation.

The above-mentioned first node or second node may be a chip, and the processing unit may be implemented by hardware or software. When implemented by hardware, the processing unit may be a logic circuit, an integrated circuit, or the like. When implemented by software, the processing unit can be a general-purpose processor, which can be implemented by reading the software code stored in the storage unit. The storage unit can be integrated in the processor or can exist independently of the processor. .

It should be understood that the foregoing processing device may be a chip. For example, the processing device may be a Field-Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a System on Chip (SoC), and a Central Processor (Central Processor). Unit, CPU), network processor (Network Processor, NP), digital signal processing circuit (Digital Signal Processor, DSP), microcontroller (Micro Controller Unit, MCU), programmable controller (Programmable Logic Device, PLD) or Other integrated chips, etc.

In the implementation process, each step in the method provided in this embodiment can be completed by an integrated logic circuit of hardware in the processor or instructions in the form of software. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.

It should be noted that the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software. The above-mentioned processor can be a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated crcuit, ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components. The processors in the embodiments of the present application may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

It can be understood that the memory or storage unit in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) ) And direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memories of the systems and methods described herein are intended to include, but are not limited to, these and any other suitable types of memories.

An embodiment of the present application also provides a wireless network, which includes the first node and the second node described above.

The embodiments of the present application also provide a computer-readable medium on which a computer program is stored, and when the computer program is executed by a computer, the method in any of the foregoing embodiments is implemented.

The embodiments of the present application also provide a computer program product, which implements the method in any of the foregoing embodiments when the computer program product is executed by a computer.

The embodiment of the present application also provides a system chip, which includes a communication unit and a processing unit. The processing unit may be a processor, for example. The communication unit may be, for example, an input/output interface, a pin, or a circuit. The processing unit can execute computer instructions so that the chip in the communication device executes any of the methods provided in the foregoing embodiments of the present application.

Optionally, the computer instructions are stored in a storage unit.

The various embodiments in this application can be used independently or in combination, which is not limited here.

It should be understood that the first, second, etc. descriptions appearing in the embodiments of this application are only for illustration and to distinguish the description objects, and there is no order, nor does it mean that the number of devices in the embodiments of this application is particularly limited. It constitutes any limitation to the embodiments of the present application.

It should also be understood that in the various embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not be implemented in this application. The implementation process of the example constitutes any limitation.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices. The computer instruction can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instruction can be transmitted from a website, computer, server, or data center through a cable. (Such as coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The usable medium can be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state disk (SSD)). ))Wait.

In this application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the associated objects are in an "or" relationship. "The following at least one item (a)" or similar expressions refers to any combination of these items, including any combination of a single item (a) or plural items (a). For example, at least one item (a) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .

It should be understood that "one embodiment" or "an embodiment" mentioned throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Therefore, the appearance of "in one embodiment" or "in an embodiment" in various places throughout the specification does not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics can be combined in one or more embodiments in any suitable manner. It should be understood that in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, rather than corresponding to the embodiments of the present application. The implementation process constitutes any limitation.

Claims

A method for training a model applied to a wireless network, characterized in that the method is executed by a first node in the wireless network, and the method includes:

The first node sends a first request to at least one second node in the wireless network, and the first request is used to request the at least one second node to make a request to the first node based on the local data of the second node. A shared model for local model retraining;

The first node obtains a model report message from the at least one second node, and the model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and the The increment between the parameters of the first sharing model, wherein the first local model is that each second node performs a local operation on the first sharing model based on the first request and the local data. Obtained after the model is retrained;

The first node determines a second sharing model according to the model report message of the at least one second node and the first sharing model.
The method according to claim 1, wherein the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;

The first node determining the second sharing model according to the model report message of the at least one second node and the first sharing model includes:

The first node determines a second local model in at least one first local model corresponding to the at least one second node, and the size of the training data set corresponding to the second local model is greater than or equal to a first threshold, and/ Or, the prediction error of the second local model is less than or equal to a second threshold;

The first node determines the second sharing model according to the second local model and the first sharing model.
The method according to claim 2, wherein the first node determining the second sharing model according to the second local model and the first sharing model comprises:

The first node performs weighted average aggregation on the parameters of the second local model, or the increments between the parameters of the second local model and the parameters of the first shared model, where the weighted average The weight parameter used in the aggregation is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model;

The first node determines the second sharing model according to the weighted average aggregation result and the first sharing model.
The method according to any one of claims 1 to 3, wherein the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function, and the method further comprises:

The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, where the joint learning strategy includes at least one of the following information:

Joint learning start conditions, information of the first shared model, identification of members of the joint learning group, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model , The processing strategy of the first local model or the shared model update strategy.
The method according to claim 4, wherein the second node includes local analysis and modeling functions;

Wherein, sending the first request by the first node to at least one second node in the radio access network includes:

When the joint learning start condition is met, the centralized analysis and modeling function in the first node sends the first node to the local analysis and modeling function of each second node in the at least one second node A request, wherein the first request includes information of the first sharing model.
The method according to claim 4 or 5, wherein the information of the first sharing model includes at least one of the following:

The model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
The method according to claim 5 or 6, wherein the first request further includes an upload strategy of the first local model.
The method according to claim 7, wherein the upload strategy of the first local model comprises at least one of the following:

The identifier of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the information carried when the first local model was uploaded;

Wherein, the carrying information includes the size and/or prediction error of the training data set of the first local model.
The method according to any one of claims 1-8, further comprising:

Determining, by the first node, that the prediction error of the second sharing model is less than or equal to a third threshold;

The first node sends a model update notification message to the at least one second node respectively, and the model update notification message is used to request each second node to install the second shared model.
A method for training a model applied to a radio access network, wherein the method is executed by a second node in the radio access network, and the method includes:

The second node receives the first request from the first node in the radio access network;

According to the first request, the second node performs local model retraining on the first shared model based on the local data of the second node to obtain the first local model;

The second node sends a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model The model report message is used to update the first shared model.
The method according to claim 10, wherein the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.
The method according to claim 10 or 11, wherein the second node includes a local analysis and modeling function, and the first node includes a centralized analysis and modeling function;

Wherein, the second node receiving the first request from the first node in the radio access network includes:

The local analysis and modeling function in the second node obtains the first request from the local analysis and modeling function in the first node, wherein the first request includes the information of the first shared model.
The method according to claim 12, wherein the first request further includes an upload strategy of the first local model.
The method according to claim 13, wherein the upload strategy of the first local model comprises at least one of the following:

The identifier of the processing algorithm before the upload of the first local model, the upload time of the first local model, or the information carried when the first local model was uploaded;

Wherein, the carrying information includes the size and/or prediction error of the training data set of the first local model.
The method according to claim 12, wherein the second node further comprises a local adaptive policy function, and the method further comprises:

The local analysis and modeling function sends a third request to the local adaptive strategy, the third request is used to request a local joint learning strategy corresponding to the first shared model, and the local joint learning strategy is used to Indicating whether the second node performs local model retraining on the first shared model, and the third request includes the information of the first shared model;

The local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy;

When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local retraining on the first shared model based on the local data. Model retraining.
The method according to any one of claims 12-15, wherein the information of the first sharing model includes at least one of the following:

The model identification, model type, model structure, input and output, and initial model parameters or training data collection duration of the first shared model.
The method according to any one of claims 12-16, further comprising:

The second node receives a model update notification message sent by the first node, where the model update notification message is used to notify a second sharing model, where the second sharing model is that the first node is based on the model The report message and the first sharing model are determined;

When the second node determines that the prediction error of the second sharing model is less than a fourth threshold, install the second sharing model.
A device for training a model applied to a wireless network, wherein the device is a first node in the wireless network, and the device includes:

The sending unit is configured to send a first request to at least one second node in the wireless network, where the first request is used to request the at least one second node to send a request to the first node based on the local data of the second node. A shared model for local model retraining;

The receiving unit is configured to obtain a model report message from the at least one second node, where the model report message of each second node includes the parameters of the first local model, or includes the parameters of the first local model and the The increment between the parameters of the first shared model, wherein the first local model is that each second node performs a local model reconfiguration on the first based on the first request and the local data Obtained after training;

The determining unit is configured to determine the second sharing model according to the model report message of the at least one second node and the first sharing model.
The apparatus according to claim 18, wherein the model report message further includes the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model;

The determining unit is specifically used for:

A second local model is determined in at least one first local model corresponding to the at least one second node, and the size of the training data set corresponding to the second local model is greater than or equal to a first threshold, and/or, the first 2. The prediction error of the local model is less than or equal to the second threshold;

Determine the second sharing model according to the second local model and the first sharing model.
The device according to claim 19, wherein the determining unit is specifically configured to:

Perform weighted average aggregation on the parameters of the second local model, or the increments between the parameters of the second local model and the parameters of the first shared model, wherein the weight used by the weighted average aggregation The parameter is determined according to the size of the training data set corresponding to the second local model, and/or the prediction error of the second local model;

The first node determines the second sharing model according to the weighted average aggregation result and the first sharing model.
The device according to any one of claims 18-20, wherein the first node includes a centralized adaptive strategy function and a centralized analysis and modeling function:

The centralized adaptive strategy function sends a joint learning strategy to the centralized analysis and modeling function, where the joint learning strategy includes at least one of the following information:

Joint learning start conditions, information of the first shared model, identification of members of the joint learning group, upload strategy of the first local model, screening strategy of the first local model, aggregation strategy of the first local model , The processing strategy of the first local model or the shared model update strategy.
The device according to claim 21, wherein the second node includes local analysis and modeling functions;

Wherein, the sending unit is specifically used for:

When the joint learning start condition is met, the centralized analysis and modeling function sends the first request to the local analysis and modeling function of each second node in the at least one second node, wherein the The first request includes the information of the first sharing model.
The device according to any one of claims 18-22, wherein:

The determining unit is further configured to determine that the prediction error of the second sharing model is less than or equal to a third threshold;

The sending unit is further configured to send a model update notification message to the at least one second node respectively, and the model update notification message is used to request each second node to install the second shared model.
A device applied to a training model in a wireless access network, wherein the device is a second node in the wireless access network, and the device includes:

A receiving unit, configured to receive a first request from a first node in the wireless access network;

A processing unit, configured to perform local model retraining on the first shared model based on the local data of the second node according to the first request to obtain the first local model;

The sending unit is configured to send a model report message to the first node, where the model report message includes the parameters of the first local model, or includes the parameters of the first local model and the parameters of the first shared model The model report message is used to update the first shared model.
The apparatus according to claim 24, wherein the model report message further comprises the size of the training data set corresponding to the first local model, and/or the prediction error of the first local model.
The device according to claim 24 or 25, wherein the second node includes a local analysis and modeling function, and the first node includes a centralized analysis and modeling function;

Wherein, the second node receiving the first request from the first node in the radio access network includes:

The local analysis and modeling function in the second node obtains the first request from the local analysis and modeling function in the first node, wherein the first request includes the information of the first shared model.
The device according to claim 26, wherein the second node further includes a local adaptive policy function,

The local analysis and modeling function sends a third request to the local adaptive strategy, the third request is used to request a local joint learning strategy corresponding to the first shared model, and the local joint learning strategy is used to Indicating whether the second node performs local model retraining on the first shared model, and the third request includes the information of the first shared model;

The local analysis and modeling function receives the local joint learning strategy sent by the local adaptive strategy;

When the local joint learning strategy instructs the second node to perform local model retraining on the first shared model, the local analysis and modeling function performs local retraining on the first shared model based on the local data. Model retraining.
The device according to claim 26 or 27, further comprising:

The second node receives a model update notification message sent by the first node, where the model update notification message is used to notify a second sharing model, where the second sharing model is that the first node is based on the model The report message and the first sharing model are determined;

When the second node determines that the prediction error of the second sharing model is less than a fourth threshold, install the second sharing model.