WO2022061940A1 - 一种模型数据传输方法及通信装置 - Google Patents

一种模型数据传输方法及通信装置 Download PDF

Info

Publication number
WO2022061940A1
WO2022061940A1 PCT/CN2020/118593 CN2020118593W WO2022061940A1 WO 2022061940 A1 WO2022061940 A1 WO 2022061940A1 CN 2020118593 W CN2020118593 W CN 2020118593W WO 2022061940 A1 WO2022061940 A1 WO 2022061940A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
model
data set
network element
information
Prior art date
Application number
PCT/CN2020/118593
Other languages
English (en)
French (fr)
Inventor
黄谢田
秦东润
于益俊
杨水根
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/118593 priority Critical patent/WO2022061940A1/zh
Publication of WO2022061940A1 publication Critical patent/WO2022061940A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic

Definitions

  • the embodiments of the present application relate to the field of communications, and in particular, to a model data transmission method and a communication device.
  • the 5th generation (5G) communication system has made a major leap in key performance such as network speed and network delay, and can adapt to a variety of scenarios and differentiated service requirements.
  • Artificial intelligence (AI) technology and machine learning (ML) technology are also gradually applied in 5G communication systems.
  • eNA network automation
  • NWDAF network data analytics function
  • the network data analytics function, NWDAF can train the model, so that the model can be used for business prediction, speech recognition, face recognition, object detection, etc.
  • NWDAF can request data from the data collection coordination function (DCCF) network element through the event ID (event ID) for training the model.
  • DCCF data collection coordination function
  • event ID corresponds to a data type (data Type).
  • NWDAF can also request data from the DCCF network element through the same data type for evaluating the trained model.
  • the data returned by DCCF to NWDAF according to the same data type may overlap, that is, the data used for model training and model evaluation may overlap, which may lead to inaccurate evaluation results.
  • Embodiments of the present application provide a model data transmission method and a communication device, which can improve the accuracy of model evaluation.
  • a first aspect provides a model data transmission method, comprising: a first network element determining a first data set; the first network element receiving first information and second information from a second network element, where the first information is used to indicate the first data set a model, the second information is used to request a second data set, and the second data set is used to train the first model or to test the first model; the first network element sends the second data set to the second network element, the second data set set is a subset of the first dataset.
  • the data management function module (for example, the first network element) may determine according to the first information and the second information sent by the second network element (for example, the model training function module or the model evaluation function model). Whether the second network element requests a training set or a test set, the data management function module can ensure that the data returned to the model training function module does not overlap with the data returned to the model evaluation function module, so that the data used for model training can be avoided.
  • the data used for model evaluation are identical or partially identical, which improves the accuracy of model evaluation.
  • the second information is used to indicate the type of the second data set
  • the type of the second data set includes a training set or a test set
  • the training set is used for training
  • the test set is used to test the first model
  • the second information is used to indicate the range of the second data set.
  • the present application provides a specific implementation of the second information.
  • the training set and the test set may be divided by the first network element.
  • the first network element may determine whether the second network element requests the training set or the test set according to the type of the training set.
  • the model management function module for example, the third network element
  • the first network element may determine, through the range of the data set, whether the second network element requests the training set or the test set.
  • the scope of the second data set includes one or more of the following: The range, the time range of the data distribution in the second data set, and the range of the network area of the data distribution in the second data set.
  • This application provides a specific implementation of the range of training sets and test sets, and the range of data sets can be divided according to scenarios or business characteristics, thereby improving the accuracy of model training and model evaluation.
  • the method further includes: the first network element according to the first information and the first The second information determines the second data set.
  • the first network element may determine the first data set according to the first information, and may also divide the training set or the test set from the first data set according to the second information.
  • the information determines whether the second network element requests the training set or the test set, thereby avoiding the intersection of the training set and the test set, and improving the accuracy of the model evaluation.
  • the first network element determines the second data set according to the first information and the second information, including: the first network element The meta determines a training set of the first model and/or a test set of the first model from the first data set according to the data partitioning strategy.
  • the present application provides a specific implementation of dividing the training set and the test set, which can more reasonably divide the training set and the test set according to the data division strategy, thereby improving the accuracy of model training and model evaluation.
  • the data division strategy is any one of the following: time division according to data distribution, network according to data distribution The area is divided or divided according to the specified ratio.
  • the present application also provides a specific implementation of the data partitioning strategy. Different partitioning strategies are suitable for different scenarios or business requirements, and the training set and/or the test set can be more reasonably divided according to the data partitioning strategy.
  • the method further includes: the first network element receiving a data division policy from a third network element; Or, the first network element determines a data division policy.
  • the present application also provides a configuration method of the data division policy, which may be configured by a third network element (eg, a model management function module MMF) for the first network element, or may be stored locally by the first network element.
  • a third network element eg, a model management function module MMF
  • the method further includes: the first network element sending one or more data types corresponding to the first model, and one or more data types corresponding to the first model to the third network element.
  • the third network element when the third network element divides the test set and the training set, the first network element needs to report to the third network element the range of data collected according to the requirements (for example, data types) of the first model, so that the third network element can The element divides the range of the training set and the range of the test set according to the range of the data collected by the first network element.
  • the method further includes: A network element receives third information from a second network element, and the third information includes one or more of the following: one or more data types of the first model, and the collection object of the data required by the first model, and the collection object includes at least the following One item: one or more user equipment UE, one or more cells.
  • the first network element may also receive data requirements (eg, requirements indicated by the third information) of the second network element from the second network element, so as to subscribe the second network element to data for the second network element Train a model or evaluate a model.
  • data requirements eg, requirements indicated by the third information
  • determining the first data set by the first network element includes: The network element obtains the first data set; or, the first network element obtains the third data set or information of the third data set from the fifth network element according to the third information, and the information of the third data set is used to indicate the information of the third data set. range, obtain a fourth data set from the fourth network element according to the third information, and determine the first data set according to the third data set and the fourth data set.
  • the present application provides a specific implementation for the first network element to determine the first data set.
  • the first network element can send data to the access network device (for example, the fourth network element) according to the data requirements of the second network element.
  • a network element subscribes to data, and the subscribed data may constitute a first data set.
  • the first network element can collect data from the fifth network element according to the data requirements of the second network element, and then The collected data and the data obtained from the fifth network element are combined and deduplicated.
  • it can still be ensured that the training set and test set of the model do not exist. Intersection to improve the accuracy of model evaluation.
  • a method for transmitting model data comprising: a second network element sending first information and second information to a first network element, where the first information is used to indicate the first model, and the second information is used for Request a second data set, the second data set is used for training the first model or for testing the first model; the second network element receives the second data set from the first network element, and the second data set is a child of the first data set set.
  • the model training functional module (for example, the second network element) may request data for model training from the data management functional module (for example, the first network element), and the model evaluation functional module (for example, the first network element)
  • the second network element can request data for model evaluation from the data management function module
  • the transmission method provided in the embodiment of the present application can ensure that the data returned to the model training function module and the data returned to the model evaluation function module There is no intersection, so that the data used for model training and the data used for model evaluation are completely or partially the same, thereby improving the accuracy of model evaluation.
  • the second information is used to indicate the type of the second data set
  • the type of the second data set includes a training set or a test set
  • the training set is used for training
  • the test set is used to test the first model
  • the second information is used to indicate the range of the second data set.
  • the present application provides a specific implementation of the second information.
  • the training set and the test set may be divided by the first network element.
  • the first network element may determine whether the second network element requests the training set or the test set according to the type of the training set.
  • the model management function module for example, the third network element
  • the first network element may determine, through the range of the data set, whether the second network element requests the training set or the test set.
  • the scope of the second data set includes one or more of the following: The range, the time range of the data distribution in the second data set, and the range of the network area of the data distribution in the second data set.
  • This application provides a specific implementation of the range of training sets and test sets, and the range of data sets can be divided according to scenarios or business characteristics, thereby improving the accuracy of model training and model evaluation.
  • the method further includes:
  • the second network element sends third information to the first network element, where the third information includes one or more of the following: one or more data types of the first model, the collection object of the data required by the first model, and the collection objects include the following At least one item: one or more user equipment UEs, and one or more cells.
  • the second network element may also initiate a data subscription process, and send third information to the first network element to indicate the data requirements of the second network element, so that the first network element can collect data according to the data requirements of the second network element.
  • Two NEs to train the model or evaluate the model.
  • a model data transmission method which includes: a third network element determines a data division strategy, the data division strategy is used to determine a second data set from the first data set, and the second data set is used to train the first data set. A model or a second data set is used to test the first model; the third network element sends the data division strategy to the first network element.
  • the present application also provides a configuration method of the data division policy, which may be configured by a third network element (for example, a model management function module MMF) for the first network element,
  • a third network element for example, a model management function module MMF
  • the model management function module (for example, the third network element) may configure a data division strategy for the data management function module (for example, the first network element), and the first network element guarantees the training set, the test When there is no intersection of the sets, the training set and the test set can be divided reasonably to further improve the accuracy of model training and model evaluation.
  • the data division strategy is any one of the following: division according to time of data distribution, division according to network area of data distribution, or division according to a specified ratio.
  • a model data transmission method comprising: a third network element determining a range of a second data set according to the range of the first data set, the second data set being a subset of the first data set, and the third network element determining the range of the second data set according to the range of the first data set
  • the second data set is used for training the first model or for testing the first model
  • the third network element sends the range of the second data set to the second network element, and the range of the second data set is used by the second network element
  • the meta request second dataset comprising: a third network element determining a range of a second data set according to the range of the first data set, the second data set being a subset of the first data set, and the third network element determining the range of the second data set according to the range of the first data set
  • the second data set is used for training the first model or for testing the first model
  • the third network element sends the range of the second data set to the second network element
  • the range of the second data set is used by the
  • the model management function module (for example, the third network element) can divide the range of the training set and the range of the test set. While ensuring that there is no intersection between the training set and the test set, the training set can be divided reasonably , test set, to further improve the accuracy of model training and model evaluation.
  • the range of the second data set includes one or more of the following: the range of key values of the data in the second data set, the distribution of data in the second data set The range of time and the range of the network area in which the data in the second dataset is distributed.
  • the method further includes:
  • One or more data types corresponding to the first model and a range of a first data set corresponding to the one or more data types are received from the first network element, where the range of the data set includes one or more of the following: the first data The range of the key value of the centralized data, the time range of the data distribution in the first data set, and the range of the network area in which the data in the first data set is distributed.
  • a communication apparatus may be a first network element, and the communication apparatus includes: a processing unit configured to determine a first data set; a communication unit configured to receive first information from a second network element and the second information, where the first information is used to indicate the first model, the second information is used to request a second data set, and the second data set is used to train the first model or to test the first model; the communication unit is also used to: A second data set is sent to the second network element, where the second data set is a subset of the first data set.
  • the second information is used to indicate the type of the second data set
  • the type of the second data set includes a training set or a test set
  • the training set is used for training
  • the test set is used to test the first model
  • the second information is used to indicate the range of the second data set.
  • the scope of the second data set includes one or more of the following: The range, the time range of the data distribution in the second data set, and the range of the network area of the data distribution in the second data set.
  • the processing unit is further configured to: Two datasets.
  • the data division strategy is any one of the following: according to the time division of data distribution, according to the network area of data distribution Divide or divide according to specified proportions.
  • the communication unit is further configured to receive a data division policy from a third network element; or, the first The network element determines the data division strategy.
  • the communication unit is further configured to send one or more data types corresponding to the first model and one or more data types to the third network element
  • the range of the first data set includes one or more of the following: the range of the key value of the data in the first data set, the time range of the data distribution in the first data set, the range of the data in the first data set The extent of the network area in which the data is distributed.
  • the communication unit is further configured to, from the The second network element receives the third information, where the third information includes one or more of the following: one or more data types of the first model, the collection object of the data required by the first model, and the collection object includes at least one of the following: one or more: Multiple user equipment UEs and one or more cells.
  • the processing unit is specifically configured to acquire the first data set from the fourth network element according to the third information; or,
  • the information of the third data set is used to indicate the range of the third data set, and obtain the fourth data set from the fourth network element according to the third information
  • a data set, the first data set is determined according to the third data set and the fourth data set.
  • a communication device may be a second network element, comprising: a processing unit configured to determine first information and second information, the first information is used to indicate the first model, and the second information is used to request a second data set, and the second data set is used to train the first model or to test the first model; the communication unit is used to send the first information and the second information to the first network element; the communication unit is also used for for receiving a second data set from the first network element, the second data set being a subset of the first data set.
  • the second information is used to indicate the type of the second data set
  • the type of the second data set includes a training set or a test set
  • the training set is used for training
  • the test set is used to test the first model
  • the second information is used to indicate the range of the second data set.
  • the scope of the second data set includes one or more of the following: The range, the time range of the data distribution in the second data set, and the range of the network area of the data distribution in the second data set.
  • the communication unit is further configured to send third information to the first network element,
  • the third information includes one or more of the following: one or more data types of the first model, a collection object of data required by the first model, and the collection object includes at least one of the following: one or more user equipments UE, one or more Multiple cells.
  • a communication apparatus configured to be a third network element, comprising: a processing unit configured to determine a data division strategy, the data division strategy is used to determine a second data set from the first data set, the second The data set is used for training the first model or the second data set is used for testing the first model; the communication unit is used for sending the data division strategy to the first network element.
  • the data division strategy is any of the following: division according to time of data distribution, division according to network area of data distribution, or division according to a specified ratio.
  • a communication apparatus may be a third network element, including: a processing unit configured to determine the range of a second data set according to the range of the first data set, where the second data set is the first data A subset of the set, the second data set is used for training the first model or for testing the first model; the communication unit sends the range of the second data set to the second network element, and the range of the second data set is used for the second network element The element requests the second data set from the first network element.
  • the range of the second data set includes one or more of the following: the range of key values of the data in the second data set, the distribution of data in the second data set The range of time and the range of the network area in which the data in the second dataset is distributed.
  • the communication unit is further configured to receive, from the first network element, one corresponding to the first model. or more data types, the range of the first data set corresponding to the one or more data types, the range of the data set includes one or more of the following: the range of key values of the data in the first data set, the data in the first data set The time range of the distribution, and the range of the network area in which the data in the first data set is distributed.
  • a communication device comprising at least one processor and a memory, the at least one processor is coupled with the memory; the memory is used to store a computer program;
  • the at least one processor is configured to execute a computer program stored in the memory, so that the apparatus executes the method according to any one of the above-mentioned first aspect and the first aspect, or the above-mentioned second aspect and The method described in any implementation manner of the second aspect, or the method described in any implementation manner of the third aspect and the third aspect, or the method described in any implementation manner of the fourth aspect and the fourth aspect .
  • a computer-readable storage medium comprising: instructions are stored in the computer-readable storage medium; when the computer-readable storage medium communicates in the fifth aspect and any one of the implementation manners of the fifth aspect When running on the device, the communication device is caused to execute the communication method described in the first aspect and any one of the implementation manners of the first aspect.
  • a computer-readable storage medium comprising: instructions stored in the computer-readable storage medium; when the computer-readable storage medium is described in the sixth aspect and any implementation manner of the sixth aspect When running on the communication device, the communication device is caused to execute the communication method described in the second aspect and any one of the implementation manners of the second aspect.
  • a twelfth aspect provides a computer-readable storage medium, comprising: instructions stored in the computer-readable storage medium; when the computer-readable storage medium is described in the seventh aspect and any implementation manner of the seventh aspect When running on the communication device, the communication device is caused to execute the communication method described in the third aspect and any one of the implementation manners of the third aspect.
  • a thirteenth aspect provides a computer-readable storage medium, comprising: the computer-readable storage medium stores instructions; when the computer-readable storage medium is described in the eighth aspect and any implementation manner of the eighth aspect When running on the communication device, the communication device is caused to execute the communication method described in the fourth aspect and any one of the implementation manners of the fourth aspect.
  • a fourteenth aspect provides a wireless communication apparatus, the communication apparatus including a processor, for example, applied in a communication apparatus, for implementing the above-mentioned first aspect and the method described in any one of the implementation manners of the first aspect, or The method described in any implementation manner of the above second aspect and the second aspect, or the method described in any implementation manner of the above third aspect and the third aspect, or the implementation of any one of the above fourth aspect and the fourth aspect method described.
  • the communication device may be, for example, a system-on-chip.
  • the chip system further includes a memory, and the memory is used for storing necessary program instructions and data to implement the functions of the method in the first aspect.
  • the chip system in the above aspects may be a system on chip (system on chip, SOC), or a baseband chip, etc.
  • the baseband chip may include a processor, a channel encoder, a digital signal processor, a modem, an interface module, and the like.
  • a fifteenth aspect provides a communication system, where the communication system includes the first network element, the second network element, and the third network element described in any one of the foregoing implementation manners.
  • the communication system further includes a fourth network element and a fifth network element.
  • the fourth network element may be an access network device, and the fifth network element may be a data management function module.
  • FIG. 1 is a schematic diagram of a network architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another network architecture provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of another network architecture provided by an embodiment of the present application.
  • FIG. 4a is a schematic structural diagram of a communication device provided by an embodiment of the present application.
  • FIG. 4b is another schematic structural diagram of a communication device provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a model data transmission method provided by an embodiment of the present application.
  • 6a is a schematic diagram of a training set and a test set provided by an embodiment of the present application.
  • 6b is another schematic diagram of a training set and a test set provided by an embodiment of the application.
  • 7 to 15 are another schematic flowchart of a model data transmission method provided by an embodiment of the present application.
  • 16 to 17 are another structural block diagram of a communication apparatus provided by an embodiment of the present application.
  • this network architecture comprises model management function (modelmanagement function, MMF) module 10, model training function (modeltraining function, MTF) module 20, data management function (datamanagement function, DMF) module 30, model evaluation function (modelevaluation function) , MEF) module 40 and access network equipment 50.
  • the network architecture supports the application of wireless artificial intelligence (AI) technology and machine learning (ML) technology in wireless communication networks.
  • the access network device is a device in the network for connecting the terminal device to the wireless network.
  • the access network device may be a node in a radio access network, and may also be referred to as a base station, and may also be referred to as a radio access network (radio access network, RAN) node (or device).
  • RAN radio access network
  • the network equipment may include an evolved base station (NodeB or eNB or e-NodeB, evolutional Node B) in a long term evolution (long term evolution, LTE) system or an evolved LTE system (LTE-Advanced, LTE-A), such as a traditional Macro base station eNB and micro base station eNB in heterogeneous network scenarios, or may also include the next generation Node B (next generation) in the fifth generation mobile communication technology (5th generation mobile networks, 5G) new radio (new radio, NR) system node B, gNB), or may also include radio network controller (radio network controller, RNC), node B (Node B, NB), base station controller (base station controller, BSC), base transceiver station (base transceiver station, BTS), transmission reception point (TRP), home base station (for example, home evolved NodeB, HeNB or home Node B, HNB), base band unit (BBU), baseband pool BBU pool, or WiFi access point access point (AP), etc., or
  • CU supports radio resource control (RRC), packet data convergence protocol (PDCP), service data adaptation protocol (service data adaptation) protocol, SDAP) and other protocols;
  • DU mainly supports radio link control layer (radio link control, RLC), media access control layer (media access control, MAC) and physical layer protocols.
  • MTF is responsible for training the model
  • MEF is responsible for evaluating the performance of the trained model
  • MMF is responsible for managing the model, for example, life cycle management, triggering model training or model evaluation, etc.
  • DMF is responsible for subscribing and storing the data required by the model, and providing data to MTF and MEF.
  • the DMF can collect data from the RAN; the DMF can send data to the MTF for the MTF to train a model, and the DMF can send data to the MEF for the MEF to evaluate or test the performance of the model.
  • FIG. 1 shows a scenario where model training and evaluation functions are deployed separately, that is, MTF and MEF are deployed in different network elements.
  • FIG. 1 only shows the functional modules involved in the embodiment of the present application, and the system shown in FIG. 1 may further include other network elements or functional modules, which are not limited in the embodiment of the present application.
  • the network architecture shown in FIG. 1 can be applied to an eNA (enabler of network automation,) architecture.
  • the eNA architecture is an intelligent network architecture based on the network data analytics function (NWDAF).
  • NWDAF can request data from DCCF
  • DCCF can collect data from NF.
  • MTF and MEF may be implemented by two different NWDAFs.
  • MTF and MEF are NWDAF1 and NWDAF2 shown in FIG. 2 , respectively.
  • DMF can be implemented by DCCF.
  • MMF can be implemented by another NWDAF (NWDAF3), or can be co-deployed with MTF in NWDAF1, or co-deployed with MEF in NWDAF2.
  • the network architecture shown in FIG. 1 may also be applied to the network architecture shown in FIG. 3 .
  • the network architecture includes an operation and maintenance management module (operations administration and maintenance, OAM), a first wireless controller and a second wireless controller.
  • OAM operations administration and maintenance
  • the first wireless controller is mainly used to provide the function of the wireless network control plane
  • the second wireless controller and the OAM are mainly used to provide the function of the management plane.
  • the first wireless controller and the second wireless controller can implement functional services by deploying different service function modules, and the OAM and the first wireless controller collect data from the RAN through different interfaces.
  • the MTF and the MEF may be implemented by different functional modules shown in FIG. 3 .
  • MTF can be deployed in the second wireless controller
  • MEF can be deployed in the first wireless controller
  • MMF can be deployed in the OAM or in the second wireless controller
  • both the OAM and the first wireless controller are deployed with DMF.
  • the embodiment of the present application provides a method for transmitting model data.
  • the model training functional module can request data for model training from the data management functional module
  • the model evaluation functional module can request data for model evaluation from the data management functional module.
  • the transmission method provided by the application embodiment can ensure that the data returned to the model training functional module and the data returned to the model evaluation functional module do not have an intersection, thereby avoiding that the data used for model training and the data used for model evaluation are completely identical or partially The same, improve the accuracy of model evaluation.
  • the model may be an artificial intelligence (AI) model or a machine learning (ML) model.
  • AI artificial intelligence
  • ML machine learning
  • a model can be thought of as an algorithm that realizes the automatic "learning" of a computer.
  • the network element may implement a specific service function by using the ML/AI model.
  • the model is used for fault prediction, service type/mode prediction, user trajectory/location prediction, service perception prediction, interference prediction, network key performance indicator (KPI) prediction, etc. Based on these predictions, proactive network management and control can be achieved, effectively improving network operation and maintenance efficiency and network resource utilization efficiency, and providing personalized and differentiated network service capabilities.
  • KPI network key performance indicator
  • the index and the resource utilization rate of the cell predict the performance of the UE in the cell, for example, the throughput rate of the UE, and select to access (or switch to) the cell with the best performance according to the prediction result.
  • the UE uses the ML/AI model to perform face recognition, predict vehicle driving information, and the like.
  • the data type can be called data Type, and different data can be identified by the data type.
  • the data type can be reference signal receiving power (RSRP), reference signal receiving quality (RSRQ), downlink data volume (Data Volume in DL), etc.
  • the data type corresponding to the model can be used to indicate The data needed to train the model and evaluate the model.
  • the data type corresponding to the model may be RSRP
  • the model may be trained by using the RSRP data of the UE, and after the model is trained, the performance of the model may also be evaluated by using the RSRP data of the UE.
  • the data in the training set of the model is used to train the model, and the type of the data in the training set is the data type corresponding to the model. For example, input the data in the training set into the initial model to determine the parameters of the model.
  • the parameters of the model may be weights, biases, gradient values, etc. of the network, which are not limited in this embodiment of the present application.
  • the data in the test set of the model is used to evaluate (or test) the model, and the type of the data in the test set is the data type corresponding to the model.
  • the data in the test set is fed into the trained model and the performance of the model is evaluated. For example, whether the output result of the model is accurate can be verified according to the comparison between the output result of the model and the actual result, so that the performance of the model can be evaluated.
  • the data set type (or the data set type) includes a training set and a test set.
  • the type of data set is training set, indicating that the data set is the training set of the model.
  • the type of dataset is test set, indicating that the dataset is the test set of the model.
  • FIG. 4a is a schematic diagram of a hardware structure of a communication apparatus 410 provided by an embodiment of the present application.
  • the communication device 410 includes a processor 4101 and at least one communication interface (in FIG. 4a , it is only an example of including the communication interface 4103 for illustration), and optionally, also includes a memory 4102 .
  • the processor 4101, the memory 4102 and the communication interface 4103 are connected to each other.
  • the processor 4101 can be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more processors used to control the execution of the program of the present application. integrated circuit.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • Communication interface 4103 using any transceiver-like device for communicating with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN) Wait.
  • RAN radio access network
  • WLAN wireless local area networks
  • Memory 4102 may be read-only memory (ROM) or other type of static storage device that can store static information and instructions, random access memory (RAM), or other type of static storage device that can store information and instructions It can also be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, CD-ROM storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and capable of being executed by a computer Access any other medium without limitation.
  • the memory can exist independently or be connected to the processor.
  • the memory can also be integrated with the processor.
  • the memory 4102 is used for storing computer-executed instructions for executing the solution of the present application, and the execution is controlled by the processor 4101 .
  • the processor 4101 is configured to execute the computer-executed instructions stored in the memory 4102, thereby implementing the intent processing method provided by the following embodiments of the present application.
  • the computer-executed instructions in the embodiment of the present application may also be referred to as application code, which is not specifically limited in the embodiment of the present application.
  • the processor 4101 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 4a.
  • the communication apparatus 410 may include multiple processors, such as the processor 4101 and the processor 4106 in FIG. 4a. Each of these processors can be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the communication apparatus 410 may further include an output device 4104 and an input device 4105 .
  • the output device 4104 is in communication with the processor 4101 and can display information in a variety of ways.
  • the output device 4104 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait.
  • Input device 4105 is in communication with processor 4101 and can receive user input in a variety of ways.
  • the input device 4105 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like.
  • the above-mentioned communication apparatus 410 may be a general-purpose device or a dedicated device.
  • the communication device 410 may be a desktop computer, a portable computer, a network server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or a similar structure in FIG. 4a. equipment.
  • PDA personal digital assistant
  • This embodiment of the present application does not limit the type of the communication device 410 .
  • the communication device 410 may be a complete terminal, may also be a functional component or component that implements the terminal, or may be a communication chip, such as a baseband chip.
  • the communication interface may be a radio frequency module.
  • the communication interface 4103 may be an input/output interface circuit of the chip, and the input/output interface circuit is used to read in and output baseband signals.
  • the communication device includes at least one processor 4201 , at least one transceiver 4203 , at least one network interface 4204 and one or more antennas 4205 .
  • at least one memory 4202 is also included.
  • the processor 4201, the memory 4202, the transceiver 4203 and the network interface 4204 are connected, for example, through a bus.
  • the antenna 4205 is connected to the transceiver 4203.
  • the network interface 4204 is used for the communication device to be connected with other communication devices through a communication link, for example, the communication device is connected to the core network element through the S1 interface.
  • the connection may include various types of interfaces, transmission lines, or buses, which are not limited in this embodiment.
  • the processor in this embodiment of the present application may include at least one of the following types: a general-purpose central processing unit (CPU), a digital signal processor (DSP), a microprocessor, An application-specific integrated circuit (ASIC), a microcontroller (MCU), a field programmable gate array (FPGA), or an integrated circuit for implementing logic operations .
  • the processor 4201 may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. At least one processor 4201 may be integrated in one chip or located on multiple different chips.
  • the memory in this embodiment of the present application may include at least one of the following types: read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types of dynamic storage devices that can store information and instructions, or EEPROM.
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • the memory may also be compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.) , a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, without limitation.
  • the memory 4202 may exist independently and be connected to the processor 4201 .
  • the memory 4202 can also be integrated with the processor 4201, for example, in one chip.
  • the memory 4202 can store program codes for implementing the technical solutions of the embodiments of the present application, and is controlled and executed by the processor 4201 .
  • the processor 4201 is configured to execute the computer program codes stored in the memory 4202, thereby implementing the technical solutions in the embodiments of the present application.
  • the transceiver 4203 may be used to support the reception or transmission of radio frequency signals between the communication device and other network elements, and the transceiver 4203 may be connected to the antenna 4205 .
  • one or more antennas 4205 can receive radio frequency signals
  • the transceiver 4203 can be used to receive the radio frequency signals from the antennas, convert the radio frequency signals into digital baseband signals or digital intermediate frequency signals, and convert the digital baseband signals or digital intermediate frequency signals.
  • the digital intermediate frequency signal is provided to the processor 4201, so that the processor 4201 performs further processing on the digital baseband signal or the digital intermediate frequency signal, such as demodulation processing and decoding processing.
  • the transceiver 4203 can be used to receive a modulated digital baseband signal or a digital intermediate frequency signal from the processor 4201, and convert the modulated digital baseband signal or digital intermediate frequency signal into a radio frequency signal, and transmit the modulated digital baseband signal or digital intermediate frequency signal to a radio frequency signal, and transmit the modulated digital baseband signal or digital intermediate frequency signal to a radio frequency signal through one or more antennas 4205
  • the radio frequency signal is transmitted.
  • the transceiver 4203 can selectively perform one or more stages of down-mixing processing and analog-to-digital conversion processing on the radio frequency signal to obtain a digital baseband signal or a digital intermediate frequency signal. The order of precedence is adjustable.
  • the transceiver 4203 can selectively perform one or more stages of up-mixing processing and digital-to-analog conversion processing on the modulated digital baseband signal or digital intermediate frequency signal to obtain a radio frequency signal.
  • the up-mixing processing and digital-to-analog conversion processing The sequence of s is adjustable.
  • Digital baseband signals and digital intermediate frequency signals can be collectively referred to as digital signals.
  • a transceiver may be referred to as a transceiver circuit, a transceiver unit, a transceiver device, a transmission circuit, a transmission unit, or a transmission device, and the like.
  • the communication device 420 may be a whole communication device, a component or component that realizes the function of the communication device, or a communication chip.
  • the transceiver 4203 may be an interface circuit of the chip, and the interface circuit is used to read in and output baseband signals.
  • An embodiment of the present application provides a model data transmission method. As shown in FIG. 5 , the method includes the following steps:
  • a first network element determines a first data set.
  • the first network element may also be called a data management function module, which is used to collect and manage data.
  • the first network element may be the DMF in the network architecture shown in FIG. 1 , or the DCCF in the network architecture shown in FIG. 2 , or a functional module in the DCCF for implementing data collection and data management, or as shown in FIG. 3 .
  • the second network element may train the model and evaluate the trained model.
  • the second network element may be a model training function module or a model evaluation function module, and the second network element may acquire data from the first network element.
  • the second network element may be an MTF or MEF in the network architecture shown in FIG. 1 , or may be an NWDAF1 in the network architecture shown in FIG. 2 , and the NWDAF1 may be responsible for model training.
  • the second network element may be a functional module in NWDAF1 shown in FIG. 2 for implementing model training.
  • the second network element is the NWDAF2 in the network architecture shown in FIG. 2, and the NWDAF2 may be responsible for model evaluation.
  • the second network element may be a functional module for implementing model evaluation in the NWDAF2 shown in FIG. 2 .
  • the second network element is the second wireless controller in the network architecture shown in FIG. 3 , or is a functional module in the second wireless controller for implementing model training.
  • the second network element is the first wireless controller in the network architecture shown in FIG. 3 , or is a functional module in the first wireless controller for implementing model evaluation.
  • the first network element may determine the first data set in the following two ways:
  • the access network device can perform data collection and data recording, and the first network element can obtain the data required by the model from the access network device through the data subscription process, wherein the data required by the model can be used for model training or Model evaluation.
  • the second network element may send a data subscription request to the first network element, and the data subscription request It is used to indicate the data requirement of the second network element, where the data requirement is used to characterize the data required for training the first model.
  • the data subscription request includes third information, where the third information includes one or more of the following: one or more data types corresponding to the first model, a collection object of data required by the first model , the collection object includes at least one of the following: one or more user equipment UEs, and one or more cells.
  • the first network element receives the data subscription request sent by the second network element, and subscribes data to the access network device according to the data requirement of the second network element. For example, the first network element sends the third information to the access network device.
  • the access network device may determine the data subscribed by the second network element according to the third information, and the access network device may also send the data subscribed by the second network element to the first network element.
  • the specific implementation of determining the first data set by the first network element in step 501 includes: the first network element receives the first data set from the fourth network element.
  • the fourth network element is an access network device.
  • the first network element may combine and deduplicate data from other network elements to determine the first data set.
  • the embodiment of the present application is applicable to a scenario in which the model training function and the model evaluation function are deployed separately.
  • multiple network elements responsible for data management may also be deployed.
  • a network element responsible for data management can be deployed in an area close to the model training function module, and the model training function module can obtain data from this network element;
  • a network element responsible for data management can be deployed in an area close to the model evaluation function module , the model evaluation function module can obtain data from the network element.
  • the network elements responsible for data management may be the first network element and the fifth network element described in the embodiments of the present application.
  • the first network element may be deployed close to the model training functional module, and when the second network element is a functional module responsible for model training, the second network element may acquire data from the first network element.
  • the fifth network element may be deployed close to the model evaluation function module, and when the second network element is the function module responsible for model evaluation, the second network element may acquire data from the fifth network element.
  • one network element among the network elements for managing data can summarize and deduplicate the data collected by other network elements responsible for data management and send it to other management data.
  • network element For example, the first network element is responsible for summarizing and deduplicating data.
  • the first network element may acquire the fourth data set from the access network device according to the requirements of the second network element.
  • the first network element may also acquire the third data set or information of the third data set from the fifth network element according to the requirements of the second network element. For example, the first network element sends the third information to the fifth network element, which is used to indicate the data requirement of the second network element. After receiving the third information sent by the first network element, the fifth network element determines a third data set that meets the requirements of the second network element according to the third information, and may also send the third data set to the first network element.
  • the fifth network element sends information of the third data set to the first network element, where the information of the third data set may be the range of the third data set, and the range of the third data set may be one or more of the following : The value range of the key value of the data in the third data set, the time range of the data distribution in the third data set, and the network area range of the data distribution in the third data set.
  • the first network element summarizes and deduplicates the data in the fourth data set and the third data set, and obtains a data set finally used for model training and model evaluation, that is, the first data set described in the embodiment of the present application.
  • the specific implementation of determining the first data set by the first network element includes: the first network element receives the third data set or information of the third data set from the fifth network element.
  • the first network element may also receive a fourth data set from a fourth network element (for example, the access network device described above), and determine the first data set according to the third data set and the fourth data set. data set. For example, the first network element may combine the data in the third data set and the fourth data set, and then remove duplicate data to obtain the first data set.
  • a fourth network element for example, the access network device described above
  • the first network element may further record one or more data types corresponding to the first model and the correspondence between the first data sets.
  • the first network element may obtain the correspondence between the identifier of the first model and one or more data types corresponding to the first model, and after determining the first data set, the first network element maintains the first data set and the corresponding relationship between the identifier of the first model.
  • the first network element receives from the second network element the correspondence between the identifier of the first model and one or more data types corresponding to the first model.
  • the correspondence between the identifier of the first model and one or more data types corresponding to the first model is received from a third network element (eg, a model management function module).
  • the second network element sends the first information and the second information to the first network element.
  • the first information is used to indicate a first model
  • the second information is used to request a second data set
  • the second data set is used to train the first model or to test the first model.
  • a training set may be requested from the first network element; when the second network element needs to perform model evaluation, a test set may be requested from the first network element.
  • the second network element may send a data request message (data query) to the first network element, where the data request message includes the first information and the second information.
  • the first information is used to indicate a model trained or evaluated by the first network element, and the second information is used to request a training set or a test set of the model.
  • the first information may be the identification of the first model, for example, the first information is the model ID of the first model.
  • the second information can be implemented in the following two ways:
  • the training set and test set of the model can be divided by the first network element.
  • the second network element can request the training set or test set of the model from the first network element through the type of the data set. .
  • the second information sent by the second network element is used to indicate the type of the second data set.
  • the type of the second data set includes a training set (train) or a test set (test), the training set being used to train the first model, and the test set being used to test the first model Model.
  • a training set and a test set of the model may be divided by a third network element (for managing the model, for example, the model management function module MMF described in the embodiment of the present application).
  • the third network element may also notify the second network element of the division result. Specifically, the third network element may notify the second network element of the range of the training set or the range of the test set.
  • the range of the training set may be the value range of the key value of the data in the training set, or the time range of the data distribution in the training set, or the range of the network area where the data in the training set is distributed.
  • the range of the test set can be the value range of the key value of the data in the test set, or the range of the time when the data in the test set is distributed, or the range of the network area where the data in the test set is distributed.
  • the second network element may request the training set from the first network element through the range of the training set, or request the test set from the first network element through the range of the test set.
  • the second information sent by the second network element to the first network element is used to indicate the range of the second data set; the second data set is the training set of the first model or the first model the test set.
  • the range of the second data set includes one or more of the following: the range of key values of the data in the second data set, the time range of the data distribution in the second data set, the range of the data in the second data set The extent of the network area in which the data is distributed.
  • the third network element may further notify the first network element of the result of dividing the first data set.
  • the first network element may record one or more data types corresponding to the first data set and the correspondence between the division results. For example, record the correspondence between one or more data types corresponding to the first data set, the range of the training set, and the range of the test set.
  • the one or more data types corresponding to the first data set are one or more data types corresponding to the first model.
  • the second network element may request the training set or the test set of the model from the first network element through the data set type request.
  • the second information sent by the second network element is used to indicate the type of the second data set.
  • the type of the second data set includes a training set (train) or a test set (test), the training set being used to train the first model, and the test set being used to test the first model Model.
  • the data request message sent by the second network element may further include a data type (data Type), where the data type is one or more data types corresponding to the first model, and the data type corresponding to the first model One or more data types are used to characterize the type of data required to train and evaluate the first model.
  • data Type data type
  • One or more data types are used to characterize the type of data required to train and evaluate the first model.
  • the first network element receives the first information and the second information from the second network element, and sends the second data set to the second network element according to the first information and the second information.
  • the second dataset is a subset of the first dataset.
  • the training set and/or the test set of the model are divided by the first network element, and the second network element may request the first network element through the data set type (for example, training set or test set).
  • the training or test set of the model Specifically, after receiving the first information and the second information from the second network element, the first network element determines the first data set according to the first information, and may also determine the first data set from the first data set according to the data set type indicated by the second information. Two datasets.
  • the first information may be an identifier of the first model, and the first network element may determine a data type associated with (corresponding to) the identifier of the first model, that is, the data type of data used for training and evaluating the first model.
  • the first network element may also determine corresponding data according to the determined data type, and these data constitute the first data set.
  • the first network element maintains the correspondence between the first data set and the identifier of the first model. After receiving the first information in step 503, the first network element determines the identifier of the first model according to the first information. The first data set corresponding to the first model is determined according to the identification of the first model. Further, if the second information indicates that the second network element requests the training set of the first model, for example, the value of the second information is "train", then a subset is divided from the first data set as the training set of the first model , and send the subset to the first network element.
  • the second information indicates that what the second network element requests is the test set of the first model, for example, the value of the second information is "test”
  • a subset is divided from the first data set as the test set of the first model, and The subset is sent to the first network element.
  • the first network element may divide the training set of the first model and/or the test set of the first model from the first data set according to the data division strategy.
  • the data division strategy is any one of the following: division according to time of data distribution, division according to network area of data distribution, or division according to a specified ratio.
  • the first network element receives the data division policy from a third network element; or, the first network element determines the data division policy.
  • the third network element divides the training set of the model and/or the test set of the model.
  • the second network element may receive the division result from the third network element, eg, the range of the training set of the first model or the range of the test set of the first model.
  • the second network element may also request a training set of the first model from the first network element according to the range of the training set, and may also request a test set of the first model from the first network element according to the range of the test set.
  • the first network element determines the first data set according to the first information, and may also determine the second data set from the first data set according to the range indicated by the second information. data set.
  • the first information may be the identifier of the first model
  • the second information may be the range of the test set or the range of the training set.
  • the first network element maintains the correspondence between the first data set and the identifier of the first model. After receiving the first information in step 503, the first network element determines the identifier of the first model according to the first information. The first data set corresponding to the first model is determined according to the identification of the first model. The first network element may further divide a subset from the first data set according to the range indicated by the second information, as the second data set.
  • the key value range of the training set divided by the third network element is (x to y), and the key value range of the test set is (w to z).
  • the first information may be the identifier of the first model, and the second information may be the range "(x ⁇ y)" of the training set.
  • the first network element may use the data whose key value range is (x to y) in the first data set as the second data set, that is, the training set requested by the second network element.
  • the first information may be the identifier of the first model, and the second information may be the range "(w ⁇ z)" of the test set.
  • the first network element may use the data whose key value range is (w to z) in the first data set as the second data set, that is, the test set requested by the second network element.
  • the second network element may also send one or more data types corresponding to the first model to the first network element.
  • the data request message sent by the second network element further includes one or more data types corresponding to the first model.
  • the first network element maintains one or more data types corresponding to the first model and the corresponding relationship of the first data set.
  • the first network element can also One or more data types corresponding to the first model are indexed to the first data set.
  • the second network element requests the training set from the first network element through the range and data type of the data set. or test set.
  • the second network element sends one or more data types corresponding to the first model and the range of the training set to the first network element.
  • the corresponding relationship of the first data set determines the first data set.
  • a training set of the first model is divided from the first data set according to the range of the training set.
  • the second network element sends one or more data types corresponding to the first model and the range of the test set to the first network element, and the first network element The corresponding relationship of the first data set determines the first data set.
  • a training set of the first model is divided from the first data set according to the range of the test set.
  • the method shown in FIG. 5 further includes: the second network element subscribes data to the first network element, specifically, the second network element sends third information to the first network element, where the third information is used to represent Data requirements of the first network element.
  • the third information includes one or more of the following: one or more data types corresponding to the first model, a collection object of data required by the first model, and the collection object includes at least one of the following Item: one or more user equipment UE, one or more cells.
  • the first network element may also report the range of data required by the first model to the third network element, so that the third network element can divide the data according to the range.
  • the data required by the first model may be data subscribed by the first network element from an access network device according to one or more data types corresponding to the first model.
  • the method shown in FIG. 5 further includes: the first network element sending, to a third network element, one or more data types corresponding to the first model, and one or more data types corresponding to the one or more data types.
  • the range of the first data set, the range of the first data set includes one or more of the following: the range of the key value of the data in the first data set, the range of the time when the data in the first data set is distributed , the range of the network area in which the data in the first data set is distributed.
  • the main management network element can combine and deduplicate the data collected by other network elements, and divide the training set and/or test set according to the processed data set.
  • the main management network element can also send divided test sets or test sets to other network elements, so that the model training function module can obtain the training set from the network elements that are deployed closer, and the model evaluation function module can deploy the closer network elements to obtain the test set. , to shorten the transmission delay of model data.
  • the network elements responsible for data management are the first network element and the fifth network element described in this embodiment of the application.
  • the first network element is responsible for merging and deduplicating data
  • the first network element is deployed close to the model training function module
  • the fifth network element is deployed close to the model evaluation function module.
  • the model training function module may request the training set from the first network element.
  • the first network element may also send the information of the test set or the test set to the fifth network element
  • the fifth network element may receive the information of the test set or the test set from the first network element
  • the model evaluation function module may request the fifth network element test set.
  • the first network element is responsible for combining and deduplicating data
  • the fifth network element is deployed close to the model training function module
  • the first network element is deployed close to the model evaluation function module.
  • the model evaluation function module may request the test set from the first network element.
  • the first network element can also send the training set or training set information to the fifth network element
  • the fifth network element can receive the training set or training set information from the first network element
  • the model training function module can request the fifth network element Training set.
  • the first data set can be divided into two parts: training set and test set, that is, the data of the training set and the test set do not have an intersection, and the sum of the data of the training set and the test set constitutes the first data set.
  • both the test set and the training set are subsets of the first data set, and the sum of the data of the training set and the test set is smaller than the first data set.
  • the following describes the model data transmission method provided by the embodiment of the present application in detail by taking the first network element as DMF, the second network element as MTF or MEF, and the third network element as MMF as an example.
  • MTF and MEF can add two parameters in the data request message: model identifier and data set type, so that DMF can distinguish whether the requested data is test data or training data according to these two parameters. There is no intersection between the training set and test set returned by MTF and MEF, which can improve the accuracy of model evaluation.
  • the method includes the following steps:
  • MTF and MEF perform model training and model evaluation, respectively, and MTF and MEF can also subscribe to DMF for data required for training a model and data required for evaluating a model, respectively.
  • the MTF may send data requirements to the DMF, where the data requirements are used to indicate one or more data types (datatypes) corresponding to the training model (hereinafter referred to as the first model) and data collection objects, where the collection objects include at least the following: One item: one or more user equipment UE, one or more cells.
  • the MEF sends to the DMF one or more data types corresponding to the evaluation model and a collection object of the corresponding data type, where the collection object includes at least one of the following: one or more user equipments UE, one or more cells .
  • the DMF can receive the data type (datatype) of the data required for training the model, evaluating the model, and the collection object corresponding to the data type from the MTF and MEF.
  • the DMF can also request data from the RAN according to the data type and the data collection object to complete the data collection. After the DMF receives the data from the RAN, it can also record the data collected from the RAN in the form of a dataset. For example, MTF initiates data subscription for Model 1, and instructs DMF the data type "RSRP" corresponding to Model 1 and the data collection object "cell 1" during the data subscription process. DMF initiates data collection to RAN according to "RSRP" and "cell 1", and DMF receives the collected data from RAN.
  • the MTF starts model training after receiving the message for starting model training.
  • the MMF sends an instruction message to the MTF to trigger the MTF to perform model training, where the instruction message includes the model identifier of the first model, such as model ID.
  • the MTF sends a data request message 1 to the DMF, where the data request message 1 includes a model identifier, a data type, and a data set type.
  • the model identifier in the data request message 1 is used to indicate the model trained by the MTF, for example, for the first model described above, the model identifier can be a model ID; the data type in the data request message 1 is used to indicate the first model.
  • the data type can be "data Type”; the data set type is used to indicate the data set requested by the MTF, and the data set type can be "dataset Type”.
  • the value of the dataset type "dataset Type" in the data request message 1 may be "train”.
  • the data request message 1 may further include data subset information, which is used to indicate the detailed information of the data subset requested by the MTF, for example, the size of the data subset, where the size of the data subset is used to characterize the data subset.
  • the amount of data in the set for example, the size of the data subset is 1000, that is, the data subset includes 1000 records.
  • the DMF determines the data set 1 according to the data type, and divides the data set 1 into a training set and a test set.
  • the DMF determines a matching data set according to the data type, and the data set is hereinafter referred to as data set 1.
  • the DMF receives data from the RAN, denoting it as data set 1, and records the correspondence between the data set 1 and the data type "RSRP".
  • the DMF may index into data set 1 according to the data type "RSRP".
  • step 701 DMF continues to collect data from the RAN until it receives a data request message from MTF in step 703. During this period, the data collected by DMF according to the data type "RSRP" constitutes data set 1, and DMF may also record the data set 1 Correspondence to data type "RSRP". In step 704 the DMF may index into data set 1 according to the data type "RSRP".
  • the data request message sent by the MTF in step 703 may indicate a time range, and the DMF forms the data set 1 according to the data collected within the time range, and the DMF may also record the correspondence between the data set 1 and the data type "RSRP".
  • the DMF may index into data set 1 according to the data type "RSRP".
  • DMF can determine the data set division method by itself, or can pre-configure the data set division method.
  • the data set division method can be random division, or time division according to data distribution, etc. For example, according to the data type matched to 3 months (April 1st - June 30th), DMF can divide the data of the first 87 days (April 1st - June 27th) into training sets for training Model; the data of the last 3 days (June 28 to June 30) can also be divided into a test set for model evaluation.
  • the DMF after the DMF has divided the training set and the test set, it can also manage the divided data set according to the model ID (modelID). For example, label the training set data with the label "modelID:train” and label the test set data with the label "modelID:test”.
  • the DMF does not find the data requested by the MTF, it will return a NACK message, including the reason for the error.
  • the DMF returns the training set to the MTF.
  • the DMF returns the training set to the MTF according to the value of the data subset identifier (datasetType) in the data request message 1 as "train”.
  • NACK negative acknowledgement
  • the MTF uses the training set to perform model training.
  • the MEF starts the evaluation process.
  • the MEF starts model training after receiving a message for starting model evaluation.
  • the MMF sends an indication message to the MEF to trigger the MEF to perform model evaluation, where the indication message includes a model identifier, such as a model ID.
  • the MEF sends a data request message 2 to the DMF, where the data request message 2 includes the model identifier and the data set type.
  • the model identifier is used to indicate the model evaluated by MEF, and the model identifier can be model ID; the data set type is used to indicate the data set requested by MEF, and the data set type can be "dataset Type".
  • the value of the dataset type "dataset Type" in the data request message 2 may be "test”.
  • the test set and the data set can be marked by using the model identifier and the data set type. Therefore, when the MEF requests the test set, the data request message 2 may not carry the data type, but only need to carry the model identifier and the data set type.
  • the DMF determines the test set according to the model identifier and the data set type in the data request message 2.
  • DMF can find the test set requested by MEF according to the values of model ID and dataset Type. For example, the value of dataset Type in data request message 2 is "test", and the data type is "RSRP". DMF can index into the test set divided from dataset 1.
  • the DMF sends a test set to the MEF.
  • MEF uses the test set for model evaluation.
  • the model that uses the data is indicated by the model identifier, and the data set type is used to indicate whether the requested data is a training set or a test set.
  • the DMF can identify the data subset according to the data set type. It is clear whether the request is a training set or a test set, and the training set is returned to MTF and the test set to MEF, which can ensure that there is no intersection between the issued training set and test set, and ensure the accuracy of the model evaluation results.
  • the MMF may determine the data division strategy, and send the determined data division strategy to the DMF, and the DMF divides the training set and the test set according to the division strategy issued by the MMF.
  • the method includes the following steps:
  • the MMF sends a data information query request to the DMF.
  • the MMF queries the DMF for data information according to the data type (data Type) corresponding to the model, and the data information query request includes one or more data types queried by the MMF and the data information to be queried.
  • the queried data information may be one or more of the following: the data volume of the data set corresponding to the data type, the data distribution range of the data set corresponding to the one or more data types, and the data distribution range of the data set corresponding to the one or more data types.
  • the data distribution range may be a time period of data distribution or a network area of data distribution, or the like.
  • the DMF sends data information to the MMF.
  • the MMF determines a data division policy (split Policy), and sends the data division policy to the DMF.
  • split Policy a data division policy
  • step 802 and step 803 are optional steps, and the MMF may determine the data division strategy according to the data information.
  • the DMF may also determine the data division strategy without relying on data information.
  • the DMF data division strategy may be a commonly used division strategy, such as random division or division according to a specified ratio.
  • the split Policy can include split Method and split Ratio.
  • the split Method can be a random division, and the split Ratio is used to indicate the ratio of the data volume of the training set and the test set, such as 4:1, that is, after determining the data required by the model, 80% of the data is divided into the training set, and the remaining 20 % of the data is divided into the test set.
  • Split Policy can also be fixed by time, that is, the first x% of the data is taken as the training set in chronological order, and the rest of the data is used as the test set.
  • x is a value determined according to split Ratio.
  • the message sent in 804 also includes a model identifier and/or a data type, and the DMF maintains the corresponding relationship between the model identifier or data type and the partitioning strategy.
  • the MMF sends a message to the MTF to trigger the MTF to perform model training.
  • the message sent by the MMF includes the model identifier.
  • the MTF sends a data request message 1 to the DMF, where the data request message 1 includes a model identifier, a data type, and a data set type.
  • the model identifier is used to indicate the model trained by MTF, and the model identifier can be model ID; the data type is used to indicate the type of data required by MTF to train the model, and the data type can be "data Type”; the data set type is used to indicate MTF
  • the requested dataset, the dataset type can be "dataset Type”.
  • the value of the dataset type "dataset Type" in the data request message 1 may be "train”.
  • the DMF divides the data set into a training set and a test set according to the data division strategy sent by the MMF.
  • the DMF first determines the data corresponding to the data type, and constructs a data set according to the data. Then, according to the corresponding relationship between the model identifier or the data type and the partitioning strategy in 804, the partitioning strategy of the data set is indexed, and the partitioning is completed according to the specified strategy.
  • Steps 808 to 814 are the same as steps 705 to 711 described above, and are not repeated here.
  • the method shown in FIG. 8 is applicable to the scenario in which the model management function may be deployed separately from the model training function, and is also applicable to the scenario in which the model management function and the model training function may be co-deployed.
  • the DMF performs data set division after receiving the division policy, that is, step 807 is performed before step 806 and after step 804 , which can reduce the waiting time after step 806 .
  • MMF can determine a more reasonable data division strategy according to expert experience, data information, scene characteristics, etc., and send the division strategy to DMF to provide a basis for DMF to divide the data set. While ensuring that there is no intersection between the test set and the training set, the model training and evaluation effects are further improved by dividing the data set reasonably.
  • the training set and the test set are directly divided by the MMF, and the division result is notified to the MTF and the MEF, and the MTF and the MEF each request data from the DMF.
  • the method includes the following steps:
  • the MMF sends a data information query request to the DMF.
  • the data information query request includes a data type list (data Type List) and data information.
  • the list of data types includes one or more data types of the model (for example, the first model described in the embodiments of the present application).
  • the data information is the data information that MMF expects to query, for example, the data volume of the data set corresponding to the one or more data types, the data distribution range of the data set corresponding to the one or more data types, and the data set corresponding to the one or more data types.
  • the data distribution range may be a time period of data distribution or a network area of data distribution, or the like.
  • the DMF returns the specified data information to the MMF.
  • the MMF determines the range of the training set and the range of the test set.
  • the training set and the test set are divided according to the obtained data information.
  • MMF finds the distribution range of the data
  • MMF can divide the data distribution range into two parts, and the distribution corresponds to the training set and the test set. For example, MMF finds that DMF has collected data for 3 months (April 1-June 30), and the data division strategy determined by MMF can be: the first 87 days (April 1-June 27) The data is used for model training, and the data of the last 3 days (June 28-June 30) is used for model testing.
  • MMF has queried the key of the data, and MMF can divide the key of the data into two parts, and the distribution corresponds to the training set and the test set. For example, if MMF finds that the key value of the data currently collected by DMF is 100-1000, the data division strategy determined by MMF can be: the data whose key value is in the range of 100-900 is used for model training, and the data whose key value is in the range of 901-1000 is used for model training. The data is used for model evaluation.
  • the MMF may be divided according to scenarios, data characteristics, or experience, which is not limited in this embodiment of the present application.
  • the MMF sends a message to the MTF to trigger model training, where the message includes the model identifier and the range of the training set.
  • the message sent by the MMF to the MTF includes the identification of the model and the range of training data of the model, for example, "April 1-June 27", that is, the data of "April 1-June 27" for training the model.
  • the MTF sends a data request message to the DMF, where the message includes the data type and the range of the training set.
  • MTF can maintain the correspondence between the data type data type and the model ID model ID. After MTF receives the model ID and the range of the training set from MMF, it can determine the data type corresponding to the model ID, so that the data can be determined. Correspondence between types, model IDs, and extents of the training set.
  • the MTF can also request training data from the DMF based on the data type and the extent of the training set.
  • the DMF returns the training set to the MTF.
  • the DMF determines the data corresponding to the data type sent by the MTF, and then divides the training set from the data according to the range of the training set. Specifically, the DMF can determine the matching dataset according to the data type, which is referred to as dataset 1 below.
  • the DMF receives data from the RAN, denoting it as data set 1, and records the correspondence between the data set 1 and the data type "RSRP".
  • the DMF may index into data set 1 according to the data type "RSRP".
  • step 901 DMF continues to collect data from the RAN until it receives a data request message from MTF in step 906. During this period, the data collected by DMF according to the data type "RSRP" constitutes data set 1, and DMF may also record the data set 1 Correspondence to data type "RSRP". In step 907 the DMF may index into data set 1 according to the data type "RSRP".
  • the data request message sent by the MTF in step 906 may indicate a time range, and the DMF forms the data set 1 according to the data collected within the time range, and the DMF may also record the correspondence between the data set 1 and the data type "RSRP".
  • the DMF may index into data set 1 according to the data type "RSRP".
  • the data type sent by the MTF is "RSRP”
  • the DMF determines the RSRP data of "April 1-June 30" according to the data type sent by the data type "RSRP”. Since the range of the training set is "April 1-June 27", DMF sends the RSRP data of "April 1-June 27" as a training set to MTF.
  • the MTF uses the training set for model training.
  • the MMF sends a message to the MEF to trigger the evaluation, and the content of the message includes the model identifier and the range of the test set.
  • the message sent by the MMF to the MTF includes the identification of the model and the range of test data of the model, for example, "June 28-June 30", that is, the data of "June 28-June 30" used to evaluate the model.
  • the MEF sends a data request message to the DMF, where the message includes the data type and the range of the test set.
  • MEF can maintain the correspondence between the data type data type and the model ID model ID. After MEF receives the model ID and the range of the test set from MMF, it can determine the data type corresponding to the model ID, so that the data can be determined. Correspondence between types, model IDs, and the extent of the test set.
  • the MEF can also request test data from the DMF based on the data type and the scope of the test set.
  • the DMF returns the test set to the MEF.
  • the DMF determines the data corresponding to the data type sent by the MEF, and then divides the training set from the data according to the range of the test set. Specifically, the DMF determines the data set 1 according to the data type, and then divides the training set from the data set according to the range of the training set.
  • the data set found in step 911 according to the data type sent by the MEF and the data set found in step 907 according to the data type sent by the MTF are the same, for example, the data set 1 described in this embodiment of the present application.
  • the data type sent by the MEF is "RSRP”
  • the DMF determines the RSRP data according to the data type "RSRP”. Since the range of the test set is "June 28-June 30", DMF sends the RSRP data of "June 28-June 30" as a test set to MEF.
  • MEF uses the test set for model evaluation.
  • the MMF divides the training set and the test set after querying the DMF for data information, and informs the MTF and the MEF of the division results, and the MTF and the MEF each request data from the DMF.
  • MMF divides the data set, it can ensure that there is no intersection between the training set and the test set, thus ensuring the accuracy of the model evaluation results.
  • DMF does not need to distinguish between training set and test set, which simplifies the internal operation of DMF, and there is no need to add parameters in the data request message.
  • the data type is added to the message for issuing the data division policy, and the MTF and MEF do not need to transmit the data type when subsequently requesting data, and the DMF can index the data type corresponding to the model according to the model identifier.
  • the method includes the following steps:
  • the MMF sends a data information query request to the DMF.
  • the MMF may initiate a data query request for a certain model (for example, the first model described in the embodiment of the present application), so as to trigger a subsequent process to divide the training set and/or test set of the model.
  • a certain model for example, the first model described in the embodiment of the present application
  • the MMF can query the DMF for data information according to the data type (data Type) corresponding to the model, and the data information query request includes one or more data types and data information queried by the MMF.
  • the queried data information may be one or more of the following: the data volume of the data set corresponding to the one or more data types, the data distribution range of the data set corresponding to the one or more data types , the range of the key of the data in the dataset corresponding to the one or more data types.
  • the data distribution range may be a time period of data distribution or a network area of data distribution, or the like.
  • the DMF sends data information to the MMF.
  • step 1002 and step 1003 are optional steps, and the MMF may directly determine the data division strategy without performing the step of querying data information.
  • the MMF sends a model configuration message to the DMF, where the message includes a model ID (model ID), a data type (data Type), and a data division policy (split Policy).
  • model ID model ID
  • data Type data type
  • split Policy data division policy
  • the data type in the model configuration message is one or more data types corresponding to the model
  • the data partition strategy in the model configuration message corresponds to the model identifier in the model configuration message
  • the data partition strategy is used to determine the model identifier indication The training set and/or test set of the model.
  • the MMF sends a message to the MTF to trigger model training, where the message includes the model identifier.
  • the model identifier in the message-triggered model training is used to instruct the MTF to trigger the training process of the model indicated by the model identifier.
  • DMF can determine the data type corresponding to the model identifier. Therefore, in step 1006, the MTF does not need to send the data type corresponding to the model, but only needs to indicate the model identifier and the data set type.
  • the model identifier indexes into the data type corresponding to the model.
  • Steps 1007 to 1014 are the same as steps 704 to 711 described above, and the specific implementation is referred to the above, which is not repeated here.
  • the data type corresponding to the model is sent to the DMF in advance, and the MTF or MEF does not need to send the data type corresponding to the model when requesting data, which saves the transmission overhead of signaling.
  • the MMF sends the data division policy (split Policy) and data type (data Type) corresponding to the model to the DMF through different messages.
  • the method includes the following steps:
  • the MMF sends the model identifier and data type to the DMF.
  • the MMF may initiate step 1102 for a certain model (for example, the first model described in the embodiment of the present application), so as to trigger a subsequent process to determine the data division policy corresponding to the model.
  • a certain model for example, the first model described in the embodiment of the present application
  • model identifier sent by the MMF to the DMF in step 1102 is used to indicate the model
  • data type sent by the MMF to the DMF is one or more data types corresponding to the model.
  • Steps 1103 to 1104 are optional steps, which are the same as steps 802 to 803 described above, and are not repeated here.
  • the MMF sends a model configuration message to the DMF, where the message includes a model ID (model ID) and a data division policy (split Policy).
  • model ID model ID
  • split Policy data division policy
  • the MMF sends a message to the MTF to trigger training, the message model identifier.
  • Steps 1108 to 1115 are the same as the steps 704 to 711 described above, and the specific implementation is referred to the above, which will not be repeated here.
  • step 1102 may be executed before step 1107 .
  • the embodiment of the present application also provides a model data transmission method.
  • the MTF and the MEF can obtain data from different DMFs, and data synchronization can be performed between the two DMFs first, and then the data sets can be divided after the data synchronization.
  • DMF1 can provide data to MTF
  • DMF2 can provide data to MEF.
  • the method includes the following steps:
  • MTF and MEF respectively initiate data subscription requests to DMF1 and DMF2, including the data requirements to be subscribed by the model (for example, the first model described in the embodiment of this application), and DMF1 and DMF2 send data subscription requests to the access network device according to the data requirements.
  • the model for example, the first model described in the embodiment of this application
  • DMF1 and DMF2 send data subscription requests to the access network device according to the data requirements.
  • subscribing data refer to step 701 described above, which is not repeated here.
  • the MTF starts model training.
  • the MMF sends a model training trigger message to the MTF, the message model ID (model ID).
  • model training trigger message is used to trigger the MTF to start the training of the model (for example, the first model), and the model ID in the message is used to indicate the model.
  • the value of the dataset Type (that is, the second information described in the embodiment of the present application) in the data request message 1 sent by the MTF is "train", indicating that the MTF requests the training set.
  • DMF1 sends one or more data types corresponding to the model to DMF2.
  • DMF1 can also send other data requirements of MTF in the data subscription process to DMF2, for example, the collection object of the data required by the model, and the collection object includes at least one of the following: one or more user equipment UE, one or more cells.
  • DMF2 determines data according to the one or more data types, and sends the determined data to DMF1.
  • DMF1 receives data from DMF2, merges and deduplicates the data, and obtains data set 1.
  • DMF1 obtains a part of data from the access network device according to the data requirements of MTF in the data subscription process, and after receiving data from DMF2, merges and deduplicates the two parts of data to obtain data set 1.
  • Data set 1 is used to divide the training set and test set of the model.
  • DMF1 divides the data set 1 into a training set and a test set.
  • DMF1 can also manage the divided training set and test set according to the model ID, so that the training set and test set divided from data set 1 can be found later according to the model ID.
  • the data set division method can be determined by DMF itself, for example, random division according to a specified ratio, fixed division by time, etc.
  • DMF1 sends the training set to the MTF.
  • DMF1 sends a test set to DMF2.
  • step 1209 may be executed after step 1207 and before step 1212 .
  • the test set sent by DMF1 is bound to the model ID mentioned above, and the specific label can be "model ID: test", indicating that the test is the test set corresponding to the model indicated by the model ID.
  • the MMF sends a message to the MEF to start the model evaluation.
  • the message includes the model identification.
  • step 1210 may be executed after step 1208.
  • the MEF sends a data request message 2 to the DMF2, where the message includes a model ID (model ID) and a dataset type (dataset Type).
  • the value of the dataset Type (that is, the second information described in the embodiment of the present application) in the data request message sent by the MEF is "test", indicating that the MEF requests the test set.
  • DMF2 sends a test set to the MEF.
  • the DMF2 if it does not find the data requested by the MEF, it will reply a NACK message to the MEF, including the error reason.
  • DMF1 can initiate data synchronization, and DMF1 can complete data merging and data set division.
  • data synchronization is initiated by DMF2.
  • the embodiments of the present application also provide the following two data synchronization solutions:
  • DMF1 sends to DMF2 the data type requested by the MTF (for example, one or more data types corresponding to the first model) and the information of the data collected by DMF1.
  • the data collected by DMF1 is the data collected by DMF1 according to the data type requested by the MTF.
  • the information of the data may be the range of the data, for example, the range of the key value of the data, or the time range of the data distribution, or the range of the network area in which the data is distributed.
  • DMF2 determines the data that does not exist in DMF1 in the collected data and sends it to DMF1.
  • Data synchronization can be performed between two DMFs by transmitting data information without data transmission.
  • DMF1 sends DMF2 the data type of the MTF request.
  • DMF2 determines the information that DMF2 has collected data according to the data type and sends it to DMF1. The two parts of data information are combined and deduplicated by DMF1.
  • the method shown in Figure 12 is suitable for multiple DMF scenarios. After data synchronization between DMFs, the training set and the test set are divided, which can ensure that there is no intersection between the training set and the test set, and improve the accuracy of model evaluation.
  • the embodiment of the present application also provides a model data transmission method, which is different from the method shown in FIG. 12 in that the MMF can issue the corresponding relationship between the model ID and the data Type to the DMF1 in advance, and the MTF requests the training set from the DMF1 in step 1203
  • the MMF can issue the corresponding relationship between the model ID and the data Type to the DMF1 in advance
  • the MTF requests the training set from the DMF1 in step 1203
  • DMF1 can find the corresponding data Type according to the model ID, thereby determining the data set corresponding to the data Type, and further divide the training set from the data set according to the data set type sent by MTF.
  • the embodiment of the present application also provides a model data transmission method, which is suitable for a scenario of multiple DMFs, and the MMF issues a data division policy. As shown in Figure 13, the method includes the following steps:
  • step 1201 described above refers to step 1201 described above, which is not repeated here.
  • the MMF determines a data division policy, and sends a data division policy (split Policy) to the DMF1.
  • MMF sends a configuration message to DMF1, and the configuration message includes split Policy and model ID.
  • Split Policy is applicable to the division of the training set or the test set of the model indicated by the model ID.
  • MMF can use expert experience, data information, scenarios, etc. to determine a reasonable data division strategy to improve model training and evaluation results.
  • the information of the data in DMF1 and DMF2 can be queried respectively, and the MMF can also determine the split Policy with reference to the queried data information.
  • Steps 1303 to 1313 are the same as steps 1202 to 1212 described above, wherein the DMF1 divides the data set according to the data division strategy issued by the MMF.
  • MMF can determine a more reasonable data division strategy, and in the scenario of multiple DMFs, the model training and evaluation effects can be improved.
  • the embodiment of the present application also provides a model data transmission method, which is different from the method shown in FIG. 13 in that the MMF can issue the corresponding relationship between the model ID and the data Type to the DMF1 in advance, and the MTF requests the training set from the DMF1 in step 1304
  • the MMF can issue the corresponding relationship between the model ID and the data Type to the DMF1 in advance
  • the MTF requests the training set from the DMF1 in step 1304
  • DMF1 can find the corresponding data Type according to the model ID, thereby determining the data set corresponding to the data Type, and further divide the training set from the data set according to the data set type sent by MTF.
  • the embodiment of the present application also provides a model data transmission method, which is suitable for a scenario of multiple DMFs, and data synchronization is performed between different DMFs by transmitting data information.
  • the method includes the following steps:
  • step 1201 described above refers to step 1201 described above, which is not repeated here.
  • the MMF sends a data information query request to the DMF1, where the request includes one or more data types (data Types) corresponding to the model of the data and the data information to be queried.
  • data Types data types
  • the data information to be queried may be the amount of data and the range of the data, and the range of the data may be the range of the key value of the data, or the time range of the data distribution, or the range of the network area where the data is distributed.
  • DMF1 returns data information to the MMF.
  • DMF1 determines data set 1 according to one or more data types sent by MMF (data collected by DMF1 constitutes data set 1), and returns information of data set 1 to MMF, for example, according to the one or more data sets Data Type The amount of data collected, the range of collected data.
  • the DMF1 If the DMF1 does not find data information corresponding to the one or more data types, it returns a NACK message to the MMF, including the cause of the error.
  • the MMF sends a data information query request to the DMF2, where the request includes one or more data types (data Types) corresponding to the model of the data and the data information to be queried.
  • data Types data types
  • the data information to be queried may be the amount of data and the range of the data, and the range of the data may be the range of the key value of the data, or the time range of the data distribution, or the range of the network area where the data is distributed.
  • DMF2 returns data information to MMF.
  • DMF2 determines data set 2 according to one or more data types sent by MMF (data collected by DMF2 constitutes data set 2), and returns information of data set 2 to MMF, for example, according to the one or more data sets Data Type The amount of data collected, the range of collected data.
  • the DMF2 If the DMF2 does not find data information corresponding to the one or more data types, it returns a NACK message to the MMF, including the reason for the error.
  • the MMF divides the data set according to the specific scenario and the data information obtained from DMF1 and DMF2, and determines the range of the training set and the range of the test set.
  • the MMF integrates the data information obtained from DMF1 and DMF2, divides the range of the data, and determines the range of the training set and the range of the test set.
  • DMF1 collects the RSRP data of cell 1 from June 1 to July 31
  • DMF2 collects the RSRP data of cell 1 from August 1 to August 31.
  • the time range of the integrated data is from June 1st to August 31st, of which, the data from June 1st to August 20th is used to train the model, that is, the range of the training set is "June 1st to August 8th". August 20"; the data from August 21 to August 31 is used to evaluate the model, that is, the range of the test set is "August 21 to August 31".
  • the MMF sends a model training trigger message to the MTF, where the message includes the model identifier and the range of the training set.
  • the range of the training set may be the range of key values of the data in the training set, or the time range of the data distribution in the training set, or the range of the network area in which the data in the training set is distributed.
  • the MTF sends a data request message 1 to the DMF1, where the message includes one or more data types corresponding to the model and the range of the training set.
  • DMF1 sends the training set to the MTF.
  • DMF1 can determine the data set formed by the collected data according to the one or more data types sent by the MTF, and then obtain the training set of the model from the data set according to the range of the training set sent by the MTF.
  • the MTF uses the training set for model training.
  • the MMF sends a model evaluation trigger message to the MEF, where the message includes the model identifier and the range of the test set.
  • the range of the test set may be the range of key values of the data in the test set, or the time range of the data distribution in the test set, or the range of the network area in which the data in the test set is distributed.
  • the MEF sends a data request message 2 to the DMF2, where the message includes one or more data types corresponding to the model and the range of the test set.
  • DMF2 sends a test set to the MEF.
  • the DMF2 may determine a data set composed of the collected data according to the one or more data types sent by the MEF, and then obtain a test set of the model from the data set according to the range of the training set sent by the MEF.
  • the MEF uses the test set for model evaluation.
  • the method shown in FIG. 14 is applicable to the scenario of multiple DMFs, and data synchronization between DMFs is not required, which simplifies the operation of DMFs and at the same time ensures the privacy of data.
  • MMF divides the data set, it can ensure that there is no intersection between the training set and the test set, which ensures the accuracy of the model evaluation results.
  • the embodiment of the present application also provides a model data transmission method, which is different from the method shown in FIG. 14 in that the MMF can issue the corresponding relationship between the model ID and the data Type to DMF1 and DMF2 in advance, then in step 1408, the MTF requests the DMF1
  • the data Type may not be sent during the training set, and only the model ID and the range of the training set may be sent.
  • the MEF requests the test set from the DMF2
  • the data Type may not be sent, and only the model ID and the range of the test set may be sent.
  • DMF1 can find the corresponding data Type according to the model ID, thereby determining the data set corresponding to the data Type, and further divide the training set from the data set according to the data set type sent by MTF.
  • DMF2 can find the corresponding data Type according to the model ID, thereby determining the data set corresponding to the data Type, and further divide the test set from the data set according to the data set type sent by MEF.
  • the present application also provides a model data transmission method, which is different from the method shown in FIG. 14 in that the MMF divides the training set and the test set and then notifies DMF1 and DMF2 of the division results.
  • MTF, MEF can request data from DMF by data type and data set type. As shown in Figure 15, the method includes the following steps:
  • step 1201 described above refers to step 1201 described above, which is not repeated here.
  • the MMF sends a data information query request to the DMF1, where the request includes one or more data types (data Types) corresponding to the model of the data and the data information to be queried.
  • data Types data types
  • the data information to be queried may be the amount of data and the range of the data, and the range of the data may be the range of the key value of the data, or the time range of the data distribution, or the range of the network area where the data is distributed.
  • DMF1 returns data information to the MMF.
  • DMF1 determines data set 1 according to one or more data types sent by MMF (data collected by DMF1 constitutes data set 1), and returns information of data set 1 to MMF, for example, according to the one or more data sets Data Type The amount of data collected, the range of collected data.
  • the DMF1 If the DMF1 does not find data information corresponding to the one or more data types, it returns a NACK message to the MMF, including the cause of the error.
  • the MMF sends a data information query request to the DMF2, where the request includes one or more data types (data Types) corresponding to the model of the data and the data information to be queried.
  • data Types data types
  • the data information to be queried may be the amount of data and the range of the data, and the range of the data may be the range of the key value of the data, or the time range of the data distribution, or the range of the network area where the data is distributed.
  • DMF2 returns data information to MMF.
  • DMF2 determines data set 2 according to one or more data types sent by MMF (data collected by DMF2 constitutes data set 2), and returns information of data set 2 to MMF, for example, according to the one or more data sets Data Type The amount of data collected, the range of collected data.
  • the DMF2 If the DMF2 does not find data information corresponding to the one or more data types, it returns a NACK message to the MMF, including the reason for the error.
  • the MMF divides the data set according to the specific scene and the data information obtained from DMF1 and DMF2, and determines the range of the training set and the range of the test set.
  • the MMF integrates the data information obtained from DMF1 and DMF2, divides the range of the data, and determines the range of the training set and the range of the test set.
  • DMF1 collects the RSRP data of cell 1 from June 1 to July 31
  • DMF2 collects the RSRP data of cell 1 from August 1 to August 31.
  • the time range of the integrated data is from June 1st to August 31st, of which, the data from June 1st to August 20th is used to train the model, that is, the range of the training set is "June 1st to August 8th". August 20"; the data from August 21 to August 31 is used to evaluate the model, that is, the range of the test set is "August 21 to August 31".
  • MMF can deduplicate the data before dividing.
  • the MMF sends the range of the training set and the model identifier to DMF1.
  • the range of the training set may be the range of key values of the data in the training set, or the time range of the data distribution in the training set, or the range of the network area in which the data in the training set is distributed.
  • the DMF1 After the DMF1 receives the range of the training set and the model ID, it can also record the correspondence between the range of the training set and the model ID.
  • the MTF sends a data request message 1 to DMF1, where the message includes the model identifier of the model and the data set type.
  • the value of the data set type in the data request message 1 indicates that the MTF requests a training set, for example, the data set type in the data request message 1 may be "train”.
  • DMF1 sends the training set to the MTF.
  • the DMF1 may determine the range of the corresponding training set according to the model identifier sent by the MTF, and determine the training set requested by the MTF according to the data set type, and then determine the training set according to the range of the training set.
  • DMF1 may find a data set determined according to the data type corresponding to the model according to the model identifier, and then divide the training set from the data set according to the scope of the training set.
  • the MTF uses the training set for model training.
  • the MMF sends the range of the test set and the model identifier to the DMF2.
  • the range of the test set may be the range of key values of the data in the test set, or the time range of the data distribution in the test set, or the range of the network area in which the data in the test set is distributed.
  • DMF2 After DMF2 receives the range of the test set and the model ID, it can also record the correspondence between the range of the test set and the model ID.
  • the MEF sends a data request message 2 to the DMF2, where the message includes the model identifier of the model and the data set type.
  • the value of the data set type in the data request message 2 indicates that the MEF requests a test set, for example, the data set type in the data request message 2 may be "test"
  • DMF2 sends a test set to the MEF.
  • the DMF2 may determine the range of the corresponding test set according to the model identifier sent by the MEF, determine that the MEF request is the test set according to the data set type, and then divide the test set according to the range of the test set.
  • DMF2 may find a data set determined according to the data type corresponding to the model according to the model identifier, and then divide the test set from the data set according to the scope of the test set.
  • MEF uses the test set for model evaluation.
  • the MMF in 1507 and 1511 can include the data Type in the message when delivering the range of the training set and the test set, and not send the model ID , then in step 1508, the MTF can request the training set from DMF1 through the data Type and the range of the training set.
  • the MEF can request the test set from DMF2 through the data Type and the range of the test set.
  • DMF1 directly determines the data set corresponding to the data Type according to the data Type, and further divides the training set from the data set according to the data set type sent by the MTF.
  • DMF2 directly determines the data set corresponding to the data Type according to the data Type, and further divides the test set from the data set according to the data set type sent by the MEF.
  • FIG. 16 shows a possible schematic structural diagram of the communication device involved in the above embodiment.
  • the communication device shown in FIG. 16 may be the first network element, the second network element, or the third network element described in the embodiments of the present application, or may be implemented in the first network element, the second network element, or the third network element Alternatively, the components of the above method may also be chips applied to the first network element, the second network element or the third network element.
  • the chip may be a System-On-a-Chip (SOC) or a baseband chip with a communication function, or the like.
  • the communication apparatus includes a processing unit 1601 and a communication unit 1602 .
  • the processing unit may be one or more processors, and the communication unit may be a transceiver or a communication interface.
  • the processing unit 1601 may be configured to support the communication device to perform the processing actions in the above method embodiments, for example, may be configured to support the first network element to perform step 501, and to support the second network element (eg, MTF) to perform steps 702 and 706, A second network element (eg, MEF) is supported to perform step 707, step 711, a third network element is supported to perform step 1406, and/or other processes for the techniques described herein.
  • MTF network element
  • a second network element eg, MEF
  • a third network element is supported to perform step 1406, and/or other processes for the techniques described herein.
  • the communication unit 1602 is used to support the communication between the communication device and other devices (or devices), for example, support the first network element to perform step 502, and support the second network element to perform steps 703, 705, 708, and 710 , enabling the third network element to perform step 905, and/or other processes for the techniques described herein.
  • the communication device may further include a storage unit 1603, where the storage unit 1603 is configured to store program codes and/or data of the communication device.
  • the processing unit 1601 may include at least one processor, the communication unit 1602 may be a transceiver or a communication interface, and the storage unit 1603 may include a memory.
  • each unit may also be called a module, a component, or a circuit, etc. accordingly.
  • An embodiment of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium; the instructions are used to execute the method shown in FIG. 5 or FIG. 7 to FIG. 15 .
  • Embodiments of the present application provide a computer program product including instructions, which, when executed on a communication device, cause the communication device to execute the method shown in FIG. 5 or FIG. 7 to FIG. 15 .
  • a wireless communication device includes: an instruction is stored in the wireless communication device; when the wireless communication device runs on the communication device shown in FIG. 4a, FIG. 4b, FIG. 16, and FIG. The method shown in FIG. 5 or FIGS. 7 to 15 .
  • the wireless communication device may be a chip.
  • the embodiment of the present application further provides a communication system, including: a terminal device and an access network device.
  • the terminal device may be the communication device shown in FIG. 5a , FIG. 9 , and FIG. 10
  • the access network device may be the communication device shown in FIG. 5b , FIG. 11 , and FIG. 12 .
  • the processors in the embodiments of the present application may include, but are not limited to, at least one of the following: a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a microcontroller (MCU) ), or artificial intelligence processors and other types of computing devices that run software, each computing device may include one or more cores for executing software instructions to perform operations or processing.
  • the processor can be a separate semiconductor chip, or can be integrated with other circuits into a semiconductor chip. For example, it can form a SoC (on-chip) with other circuits (such as codec circuits, hardware acceleration circuits, or various bus and interface circuits).
  • the processor may further include necessary hardware accelerators, such as field programmable gate arrays (FPGA), PLDs (Programmable Logic Devices) , or a logic circuit that implements dedicated logic operations.
  • FPGA field programmable gate arrays
  • PLD Programmable Logic Devices
  • the memory in this embodiment of the present application may include at least one of the following types: read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory) , RAM) or other types of dynamic storage devices that can store information and instructions, and can also be electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM).
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • the memory may also be compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.) , a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, without limitation.
  • CD-ROM compact disc read-only memory
  • optical disc storage including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.
  • magnetic disk storage medium or other magnetic storage device or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, without limitation.
  • At least one means one or more.
  • “Plural” means two or more.
  • "And/or" which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • “At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one item (a) of a, b, or c may represent: a, b, c, ab, ac, bc, or abc, where a, b, and c may be single or multiple .
  • words such as “first” and “second” are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that the words “first”, “second” and the like do not limit the quantity and execution order, and the words “first”, “second” and the like are not necessarily different.
  • the disclosed apparatus and method for accessing a database may be implemented in other manners.
  • the embodiments of the database access apparatus described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection of database access devices or units through some interfaces, which may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (which may be a single chip microcomputer, a chip, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请实施例公开了一种模型数据传输方法及通信装置,涉及通信领域,能够提高模型评估的准确性。包括:第一网元确定第一数据集;第一网元从第二网元接收第一信息与第二信息,第一信息用于指示第一模型,第二信息用于请求第二数据集,第二数据集用于训练第一模型或者用于测试第一模型;第一网元向第二网元发送第二数据集,第二数据集为第一数据集的子集。

Description

一种模型数据传输方法及通信装置 技术领域
本申请实施例涉通信领域,尤其涉及一种模型数据传输方法及通信装置。
背景技术
第五代(the 5th generation,5G)通信系统在网络速度、网络延迟等关键性能有了重大飞跃,能够适应多种多样的场景和差异化服务需求。人工智能(artificial intelligence,AI)技术、机器学习(machine learning,ML)技术也逐渐应用在5G通信系统中,例如,使能网络自动化(enabler of network automation,eNA)架构中,网络数据分析功能(network data analytics function,NWDAF)网元可以训练模型,从而可以利用模型进行业务预测、语音识别、人脸识别、物体检测等。
在eNA架构中,NWDAF可以通过事件标识(event ID)向数据收集协调功能(data collection coordination function,DCCF)网元请求数据用于训练模型。其中,一个event ID对应一个数据类型(data Type)。NWDAF还可以通过相同的数据类型向DCCF网元请求数据用于对训练好的模型进行评估。DCCF前后两次根据相同的数据类型向NWDAF返回的数据可能存在交集,即用于模型训练数据和模型评估的数据存在交集,可能导致评估结果不准确。
发明内容
本申请实施例提供一种模型数据传输方法及通信装置,能够提高模型评估的准确性。
第一方面,提供了一种模型数据传输方法,包括:第一网元确定第一数据集;第一网元从第二网元接收第一信息与第二信息,第一信息用于指示第一模型,第二信息用于请求第二数据集,第二数据集用于训练第一模型或者用于测试第一模型;第一网元向第二网元发送第二数据集,第二数据集为第一数据集的子集。
本申请提供的方法中,数据管理功能模块(例如,所述第一网元)可以根据第二网元(例如,模型训练功能模块或模型评估功能模型)发送的第一信息、第二信息确定第二网元请求的是训练集还是测试集,数据管理功能模块可以保证向模型训练功能模块返回的数据和向模型评估功能模块返回的数据不存在交集,从而可以避免用于模型训练的数据和用于模型评估的数据完全相同或者部分相同,提高模型评估的准确性。
结合第一方面,在第一方面的第一种可能的实现方式中,第二信息用于指示第二数据集的类型,第二数据集的类型包括训练集或者测试集,训练集用于训练第一模型,测试集用于测试第一模型;或者,第二信息用于指示第二数据集的范围。
本申请提供了第二信息的具体实现。其中,可以由第一网元划分训练集和测试集,这种方式中第一网元可以通过训练集的类型确定第二网元请求的是训练集还是测试集。或者,由模型管理功能模块(例如,第三网元),第一网元可以通过数据集的范围确定第二网元请求的是训练集或测试集。
结合第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,第二数据集的范围包括以下一项或多项:第二数据集中数据的key值的范围、第二数 据集中数据分布的时间的范围、第二数据集中数据分布的网络区域的范围。
本申请提供了训练集、测试集范围的具体实现,可以根据场景或业务特性划分数据集的范围,从而可以提高模型训练、模型评估的准确性。
结合第一方面或第一方面的第一或第二种可能的实现方式,在第一方面的第三种可能的实现方式中,所述方法还包括:第一网元根据第一信息和第二信息确定第二数据集。
本申请中,第一网元可以根据第一信息确定第一数据集,还可以根据第二信息从第一数据集中划分训练集或测试集,可以第一网元可以根据第一信息、第二信息确定第二网元请求的是训练集还是测试集,从而避免训练集、测试集存在交集,提高模型评估的准确性。
结合第一方面的第三种可能的实现方式,在第一方面的第四种可能的实现方式中,第一网元根据第一信息和第二信息确定第二数据集,包括:第一网元根据数据划分策略从第一数据集中确定第一模型的训练集和/或第一模型的测试集。
本申请提供了划分训练集、测试集的具体实现,可以根据数据划分策略更合理地划分训练集、测试集,从而可以提高模型训练、模型评估的准确性。
结合第一方面的第四种可能的实现方式,在第一方面的第五种可能的实现方式中,所述数据划分策略为以下任意一项:根据数据分布的时间划分、根据数据分布的网络区域进行划分或根据指定比例划分。
本申请还提供了数据划分策略的具体实现,不同的划分策略适用于不同的场景或业务需求,根据数据划分策略更合理地划分训练集和/或测试集。
结合第一方面的第四或第五种可能的实现方式,在第一方面的第六种可能的实现方式中,所述方法还包括:第一网元从第三网元接收数据划分策略;或者,第一网元确定数据划分策略。
本申请还提供了数据划分策略的配置方式,可以是第三网元(例如,模型管理功能模块MMF)为第一网元配置的,也可以是第一网元存储在本地的。
结合第一方面,在第一方面的第七种可能的实现方式中,所述方法还包括:第一网元向第三网元发送与第一模型对应的一个或多个数据类型、与一个或多个数据类型对应的第一数据集的范围,第一数据集的范围包括以下一项或多项:第一数据集中数据的key值的范围、第一数据集中数据分布的时间的范围、第一数据集中数据分布的网络区域的范围。
本申请中,当第三网元划分测试集、训练集,第一网元需要向第三网元上报根据第一模型的需求(例如,数据类型)收集到的数据的范围,以便第三网元根据第一网元收集到的数据的范围划分训练集的范围、测试集的范围。
结合第一方面或第一方面的第一至第七种可能的实现方式中的任意一种可能的实现方式,在第一方面的第八种可能的实现方式中,所述方法还包括:第一网元从第二网元接收第三信息,第三信息包括以下一项或多项:第一模型的一个或多个数据类型、第一模型所需数据的采集对象,采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
本申请中,第一网元还可以从第二网元接收第二网元的数据需求(例如,第三信 息所指示的需求),以便为第二网元订阅数据,用于第二网元训练模型或评估模型。
结合第一方面的第八种可能的实现方式,在第一方面的第九种可能的实现方式中,第一网元确定第一数据集,包括:第一网元根据第三信息从第四网元获取第一数据集;或者,第一网元根据第三信息从第五网元获取第三数据集或者第三数据集的信息,第三数据集的信息用于指示第三数据集的范围,根据第三信息从第四网元获取第四数据集,根据第三数据集和第四数据集确定第一数据集。
本申请提供了第一网元确定第一数据集的具体实现,第一网元可以在第二网元发起数据订阅后,根据第二网元的数据需求向接入网设备(例如,第四网元)订阅数据,订阅的数据可以构成第一数据集。或者,存在多个数据管理功能模块的场景下,例如,存在第一网元和第五网元,第一网元可以根据第二网元的数据需求向第五网元收集数据,再将自身收集到的数据和从第五网元获取的数据进行合并、去重,在存在多个数据管理功能模块的场景下,根据本申请提供的方法,依然能够保证模型的训练集、测试集不存在交集,提高模型评估的准确性。
第二方面,提供一种模型数据传输方法,所述方法包括:第二网元向第一网元发送第一信息与第二信息,第一信息用于指示第一模型,第二信息用于请求第二数据集,第二数据集用于训练第一模型或者用于测试第一模型;第二网元从第一网元接收第二数据集,第二数据集为第一数据集的子集。
本申请提供的方法中,模型训练功能模块(例如,所述第二网元)可以向数据管理功能模块(例如,所述第一网元)请求用于模型训练的数据,模型评估功能模块(例如,所述第二网元)可以向数据管理功能模块请求用于模型评估的数据,本申请实施例提供的传输方法可以保证向模型训练功能模块返回的数据和向模型评估功能模块返回的数据不存在交集,从而可以避免用于模型训练的数据和用于模型评估的数据完全相同或者部分相同,提高模型评估的准确性。
结合第二方面,在第二方面的第一种可能的实现方式中,第二信息用于指示第二数据集的类型,第二数据集的类型包括训练集或者测试集,训练集用于训练第一模型,测试集用于测试第一模型;或者,第二信息用于指示第二数据集的范围。
本申请提供了第二信息的具体实现。其中,可以由第一网元划分训练集和测试集,这种方式中第一网元可以通过训练集的类型确定第二网元请求的是训练集还是测试集。或者,由模型管理功能模块(例如,第三网元),第一网元可以通过数据集的范围确定第二网元请求的是训练集或测试集。
结合第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,第二数据集的范围包括以下一项或多项:第二数据集中数据的key值的范围、第二数据集中数据分布的时间的范围、第二数据集中数据分布的网络区域的范围。
本申请提供了训练集、测试集范围的具体实现,可以根据场景或业务特性划分数据集的范围,从而可以提高模型训练、模型评估的准确性。
结合第二方面或第二方面的第一或第二种可能的实现方式,方法还包括:
第二网元向第一网元发送第三信息,第三信息包括以下一项或多项:第一模型的一个或多个数据类型、第一模型所需数据的采集对象,采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
本申请中,第二网元还可以发起数据订阅流程,向第一网元发送第三信息,指示第二网元的数据需求,以便第一网元根据第二网元的数据需求收集数据第二网元训练模型或评估模型。
第三方面,提供了一种模型数据传输方法,所述包括:第三网元确定数据划分策略,数据划分策略用于从第一数据集中确定第二数据集,第二数据集用于训练第一模型或者第二数据集用于测试第一模型;第三网元向第一网元发送数据划分策略。
本申请还提供了数据划分策略的配置方式,可以是第三网元(例如,模型管理功能模块MMF)为第一网元配置的,
本申请提供的方法中,模型管理功能模块(例如,第三网元)可以为数据管理功能模块(例如,所述第一网元)配置数据划分策略,第一网元在保证训练集、测试集不存在交集的同时,可以合理地划分训练集、测试集,进一步提高模型训练、模型评估的准确性。
结合第三方面,在第三方面的第一种可能的实现方式中,数据划分策略为以下任意一项:根据数据分布的时间划分、根据数据分布的网络区域进行划分或根据指定比例划分。
第四方面,提供了一种模型数据传输方法,所述包括:第三网元根据第一数据集的范围确定第二数据集的范围,第二数据集为第一数据集的子集,第二数据集用于训练第一模型或者用于测试第一模型;第三网元向第二网元发送第二数据集的范围,第二数据集的范围用于第二网元从第一网元请求第二数据集。
本申请提供的方法中,模型管理功能模块(例如,第三网元)可以划分训练集的范围、测试集的范围,在保证训练集、测试集不存在交集的同时,可以合理地划分训练集、测试集,进一步提高模型训练、模型评估的准确性。
结合第四方面,在第四方面的第一种可能的实现方式中,第二数据集的范围包括以下一项或多项:第二数据集中数据的key值的范围、第二数据集中数据分布的时间的范围、第二数据集中数据分布的网络区域的范围。
结合第四方面或第四方面的第一种可能的实现方式中,所述方法还包括:
从第一网元接收与第一模型对应的一个或多个数据类型、与一个或多个数据类型对应的第一数据集的范围,数据集的范围包括以下一项或多项:第一数据集中数据的key值的范围、第一数据集中数据分布的时间的范围、第一数据集中数据分布的网络区域的范围。
第五方面,提供一种通信装置,该通信装置可以是第一网元,该通信装置包括:处理单元,用于确定第一数据集;通信单元,用于从第二网元接收第一信息与第二信息,第一信息用于指示第一模型,第二信息用于请求第二数据集,第二数据集用于训练第一模型或者用于测试第一模型;通信单元还用于,向第二网元发送第二数据集,第二数据集为第一数据集的子集。
结合第五方面,在第五方面的第一种可能的实现方式中,第二信息用于指示第二数据集的类型,第二数据集的类型包括训练集或者测试集,训练集用于训练第一模型,测试集用于测试第一模型;或者,第二信息用于指示第二数据集的范围。
结合第五方面的第一种可能的实现方式,在第五方面的第二种可能的实现方式中, 第二数据集的范围包括以下一项或多项:第二数据集中数据的key值的范围、第二数据集中数据分布的时间的范围、第二数据集中数据分布的网络区域的范围。
结合第五方面或第五方面的第一或第二种可能的实现方式,在第五方面的第三种可能的实现方式中,处理单元还用于,根据第一信息和第二信息确定第二数据集。
结合第五方面的第三种可能的实现方式,在第五方面的第四种可能的实现方式中,数据划分策略为以下任意一项:根据数据分布的时间划分、根据数据分布的网络区域进行划分或根据指定比例划分。
结合第五方面的第三或第四种可能的实现方式,在第五方面的第五种可能的实现方式中,通信单元还用于,从第三网元接收数据划分策略;或者,第一网元确定数据划分策略。
结合第五方面,在第五方面的第六种可能的实现方式中,通信单元还用于,向第三网元发送与第一模型对应的一个或多个数据类型、与一个或多个数据类型对应的第一数据集的范围,第一数据集的范围包括以下一项或多项:第一数据集中数据的key值的范围、第一数据集中数据分布的时间的范围、第一数据集中数据分布的网络区域的范围。
结合第五方面或第五方面的第一至第六种可能的实现方式中的任意一种可能的实现方式,在第五方面的第七种可能的实现方式中通信单元还用于,从第二网元接收第三信息,第三信息包括以下一项或多项:第一模型的一个或多个数据类型、第一模型所需数据的采集对象,采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
结合第五方面的第七种可能的实现方式,在第五方面的第八种可能的实现方式中,处理单元具体用于,根据第三信息从第四网元获取第一数据集;或者,
根据第三信息从第五网元获取第三数据集或者第三数据集的信息,第三数据集的信息用于指示第三数据集的范围,根据第三信息从第四网元获取第四数据集,根据第三数据集和第四数据集确定第一数据集。
第六方面,提供了一种通信装置,该通信装置可以是第二网元,包括:处理单元,用于确定第一信息与第二信息,第一信息用于指示第一模型,第二信息用于请求第二数据集,第二数据集用于训练第一模型或者用于测试第一模型;通信单元,用于向第一网元发送第一信息与第二信息;通信单元,还用于从第一网元接收第二数据集,第二数据集为第一数据集的子集。
结合第六方面,在第六方面的第一种可能的实现方式中,第二信息用于指示第二数据集的类型,第二数据集的类型包括训练集或者测试集,训练集用于训练第一模型,测试集用于测试第一模型;或者,第二信息用于指示第二数据集的范围。
结合第六方面的第一种可能的实现方式,在第六方面的第二种可能的实现方式中,第二数据集的范围包括以下一项或多项:第二数据集中数据的key值的范围、第二数据集中数据分布的时间的范围、第二数据集中数据分布的网络区域的范围。
结合第六方面或第六方面的第一或第二种可能的实现方式,在第六方面的第三种可能的实现方式中,通信单元还用于,向第一网元发送第三信息,第三信息包括以下一项或多项:第一模型的一个或多个数据类型、第一模型所需数据的采集对象,采集 对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
第七方面,提供了一种通信装置,该装置可以是第三网元,包括:处理单元,用于确定数据划分策略,数据划分策略用于从第一数据集中确定第二数据集,第二数据集用于训练第一模型或者第二数据集用于测试第一模型;通信单元,用于向第一网元发送数据划分策略。
结合第七方面,在第七方面的第一种可能的实现方式中,数据划分策略为以下任意一项:根据数据分布的时间划分、根据数据分布的网络区域进行划分或根据指定比例划分。
第八方面,提供了一种通信装置,该装置可以是第三网元,包括:处理单元,用于根据第一数据集的范围确定第二数据集的范围,第二数据集为第一数据集的子集,第二数据集用于训练第一模型或者用于测试第一模型;通信单元,向第二网元发送第二数据集的范围,第二数据集的范围用于第二网元从第一网元请求第二数据集。
结合第八方面,在第八方面的第一种可能的实现方式中,第二数据集的范围包括以下一项或多项:第二数据集中数据的key值的范围、第二数据集中数据分布的时间的范围、第二数据集中数据分布的网络区域的范围。
结合第八方面或第八方面的第一种可能的实现方式,在第八方面的第二种可能的实现方式中,通信单元还用于,从第一网元接收与第一模型对应的一个或多个数据类型、与一个或多个数据类型对应的第一数据集的范围,数据集的范围包括以下一项或多项:第一数据集中数据的key值的范围、第一数据集中数据分布的时间的范围、第一数据集中数据分布的网络区域的范围。
第九方面,提供了一种通信装置,包括至少一个处理器和存储器,所述至少一个处理器与所述存储器耦合;所述存储器,用于存储计算机程序;
所述至少一个处理器,用于执行所述存储器中存储的计算机程序,以使得所述装置执行如上述第一方面以及第一方面任意一种实现方式所述的方法,或上述第二方面以及第二方面任意一种实现方式所述的方法,或上述第三方面以及第三方面任意一种实现方式所述的方法,或上述第四方面以及第四方面任意一种实现方式所述的方法。
第十方面,提供了一种计算机可读存储介质,包括:计算机可读存储介质中存储有指令;当计算机可读存储介质在上述第五方面以及第五方面任意一种实现方式所述的通信装置上运行时,使得通信装置执行如上述第一方面以及第一方面任意一种实现方式所述的通信方法。
第十一方面,提供了一种计算机可读存储介质,包括:计算机可读存储介质中存储有指令;当计算机可读存储介质在上述第六方面以及第六方面任意一种实现方式所述的通信装置上运行时,使得通信装置执行如上述第二方面以及第二方面任意一种实现方式所述的通信方法。
第十二方面,提供了一种计算机可读存储介质,包括:计算机可读存储介质中存储有指令;当计算机可读存储介质在上述第七方面以及第七方面任意一种实现方式所述的通信装置上运行时,使得通信装置执行如上述第三方面以及第三方面任意一种实现方式所述的通信方法。
第十三方面,提供了一种计算机可读存储介质,包括:计算机可读存储介质中存 储有指令;当计算机可读存储介质在上述第八方面以及第八方面任意一种实现方式所述的通信装置上运行时,使得通信装置执行如上述第四方面以及第四方面任意一种实现方式所述的通信方法。
第十四方面,提供了一种无线通信装置,该通信装置包括处理器,例如,应用于通信装置中,用于实现上述第一方面以及第一方面任意一种实现方式所述的方法,或上述第二方面以及第二方面任意一种实现方式所述的方法,或上述第三方面以及第三方面任意一种实现方式所述的方法,或上述第四方面以及第四方面任意一种实现方式所述的方法。该通信装置例如可以是芯片系统。在一种可行的实现方式中,所述芯片系统还包括存储器,所述存储器,用于保存实现上述第一方面所述方法的功能必要的程序指令和数据。
上述方面中的芯片系统可以是片上系统(system on chip,SOC),也可以是基带芯片等,其中基带芯片可以包括处理器、信道编码器、数字信号处理器、调制解调器和接口模块等。
第十五方面,提供了一种通信系统,所述通信系统包括上述任意一种实现方式所述的第一网元、第二网元以及第三网元。
结合第十五方面,在第十五方面的第一种可能的实现方式中,所述通信系统还包括第四网元、第五网元。第四网元可以是接入网设备,第五网元可以是数据管理功能模块。
附图说明
图1为本申请实施例提供的网络架构的示意图;
图2为本申请实施例提供的另一网络架构的示意图;
图3为本申请实施例提供的另一网络架构的示意图;
图4a为本申请实施例提供的通信装置的结构示意图;
图4b为本申请实施例提供的通信装置的另一结构示意图;
图5为本申请实施例提供的模型数据传输方法的流程示意图;
图6a为本申请实施例提供的训练集、测试集的示意图;
图6b为本申请实施例提供的训练集、测试集的另一示意图;
图7~图15为本申请实施例提供的模型数据传输方法的另一流程示意图;
图16~图17为本申请实施例提供的通信装置的另一结构框图。
具体实施方式
本申请实施例提供一种模型相关的系统架构。参考图1,该网络架构包括模型管理功能(modelmanagement function,MMF)模块10,模型训练功能(modeltraining function,MTF)模块20,数据管理功能(datamanagement function,DMF)模块30,模型评估功能(modelevaluation function,MEF)模块40以及接入网设备50。该网络架构支持在无线通信网络中应用无线人工智能(artificial intelligence,AI)技术、机器学习(machine learning,ML)技术。
其中,接入网设备是网络中用于将终端设备接入到无线网络的设备。所述接入网设备可以为无线接入网中的节点,又可以称为基站,还可以称为无线接入网(radio access network,RAN)节点(或设备)。网络设备可以包括长期演进(long term evolution, LTE)系统或演进的LTE系统(LTE-Advanced,LTE-A)中的演进型基站(NodeB或eNB或e-NodeB,evolutional Node B),如传统的宏基站eNB和异构网络场景下的微基站eNB,或者也可以包括第五代移动通信技术(5th generation mobile networks,5G)新无线(new radio,NR)系统中的下一代节点B(next generation node B,gNB),或者还可以包括无线网络控制器(radio network controller,RNC)、节点B(Node B,NB)、基站控制器(base station controller,BSC)、基站收发台(base transceiver station,BTS)、传输接收点(transmission reception point,TRP)、家庭基站(例如,home evolved NodeB,HeNB或home Node B,HNB)、基带单元(base band unit,BBU)、基带池BBU pool,或WiFi接入点(access point,AP)等,再或者还可以包括云接入网(cloud radio access network,CloudRAN)系统中的集中式单元(centralized unit,CU)和分布式单元(distributed unit,DU),本申请实施例并不限定。在接入网设备包括CU和DU的分离部署场景中,CU支持无线资源控制(radio resource control,RRC)、分组数据汇聚协议(packet data convergence protocol,PDCP)、业务数据适配协议(service data adaptation protocol,SDAP)等协议;DU主要支持无线链路控制层(radio link control,RLC)、媒体接入控制层(media access control,MAC)和物理层协议。
另外,MTF负责训练模型;MEF负责评估训练好的模型的性能;MMF负责对模型进行管理,例如,生命周期管理、触发模型训练或模型评估等。DMF负责订阅和存储模型需要的数据,向MTF、MEF提供数据。例如,DMF可以从RAN收集数据;DMF可以向MTF发送数据用于MTF训练模型,DMF可以向MEF发送数据用于MEF评估或测试模型的性能。
需要说明的是,图1所示网络架构中,不同的功能模块可以分设在不同的设备上,也可以合设在同一个设备。例如,图1所示为模型训练和评估功能分离部署的场景,即MTF和MEF部署在不同的网元中。此外,图1仅示出了本申请实施例涉及的功能模块,图1所示系统还可以包括其他网元或功能模块,本申请实施例对此不做限制。
一种可能的实现方式中,图1所示的网络架构可以应用到使能网络自动化eNA(enabler of network automation,)架构中。eNA架构是基于网络数据分析功能(network data analytics function,NWDAF)的智能网络架构,如图2所示,eNA架构包括NWDAF功能模块、数据收集协调功能(datacollection coordination function,DCCF)模块以及网络功能(network function,NF)模块。其中,NWDAF可以向DCCF请求数据,DCCF可以从NF收集数据。
本申请实施例中,MTF和MEF可以由两个不同的NWDAF实现,例如,MTF和MEF分别是图2所示的NWDAF1、NWDAF2。DMF可以由DCCF实现。MMF可以由另一个NWDAF实现(NWDAF3),也可以与MTF共部署在NWDAF1中,或与MEF共部署在NWDAF2中。
另一种可能的实现方式中,图1所示的网络架构还可以应用到图3所示的网络架构中。参考图3,该网络架构包括操作维护管理模块(operations administrationand maintenance,OAM)、第一无线控制器以及第二无线控制器。其中,第一无线控制器主要用于提供无线网络控制面的功能,第二无线控制器和OAM主要用于提供管理面的功能。第一无线控制器、第二无线控制器可以通过部署不同的业务功能模块来实现 功能业务,OAM和第一无线控制器通过不同的接口从RAN收集数据。
本申请实施例,MTF和MEF可以分别由图3中不同的功能模块来实现。例如,MTF可以部署在第二无线控制器中,MEF可以部署在第一无线控制器中,MMF可以部署在OAM中或第二无线控制器中,OAM和第一无线控制器都部署有DMF。
本申请实施例提供一种模型数据的传输方法,模型训练功能模块可以向数据管理功能模块请求用于模型训练的数据,模型评估功能模块可以向数据管理功能模块请求用于模型评估的数据,本申请实施例提供的传输方法可以保证向模型训练功能模块返回的数据和向模型评估功能模块返回的数据不存在交集,从而可以避免用于模型训练的数据和用于模型评估的数据完全相同或者部分相同,提高模型评估的准确性。
首先,对本申请实施例涉及的术语进行解释说明:
(1)模型
模型可以是人工智能(artificial intelligence,AI)模型、机器学习(machine learning,ML)模型。模型可以认为是实现计算机自动“学习”的算法。本申请实施例中,网元可以利用ML/AI模型实现特定业务功能。例如,利用模型进行故障预测、业务类型/模式预测、用户轨迹/位置预测、业务感知预测、干扰预测、网络关键绩效指标(key performance indicators,KPI)预测等。基于这些预测,可实现主动式的网络管理和控制,有效提升网络运维效率和网络资源利用效率,并提供个性化、差异化的网络服务能力。
示例的,根据UE上报的参考信号接收功率(reference signal receiving power,RSRP)、参考信号接收质量(reference signal receiving quality,RSRQ)或信号与干扰加噪声比(signal to interference plus noise ratio,SINR)等指标以及小区的资源利用率预测UE在该小区的性能,例如,UE的吞吐率,根据预测结果选择接入(或切换至)性能最优的小区。或者,UE利用ML/AI模型进行人脸识别、预测车辆行驶信息等。
(2)模型对应的数据类型
数据类型可以称为data Type,通过数据类型可以识别不同的数据。数据类型可以是参考信号接收功率(reference signal receiving power,RSRP),参考信号接收质量(reference signal receiving quality,RSRQ)、下行数据量(Data Volume in DL)等,模型对应的数据类型可以用来指示训练模型、评估模型所需的数据。示例的,模型对应的数据类型可以是RSRP,可以利用UE的RSRP数据来训练模型,模型训练好之后也可以利用UE的RSRP数据来评估模型的性能。
(3)模型的训练集
模型的训练集中的数据用于训练模型,训练集中数据的类型为模型对应的数据类型。例如,将训练集中的数据输入初始的模型,确定模型的参数。其中,模型的参数可以是网络的权重、偏置、梯度值等,本申请实施例对此不作限制。
(4)模型的测试集
模型的测试集中的数据用于评估(或测试)模型,测试集中数据的类型为模型对应的数据类型。例如,将测试集中的数据输入训练好的模型,评估模型的性能。示例的,可以根据模型的输出结果和实际结果的比较验证模型输出结果是否准确,从而可以对模型性能的高低进行评估。
(5)数据集类型
本申请实施例中,数据集类型(或数据集的类型)包括训练集、测试集。其中,数据集的类型为训练集,表明数据集为模型的训练集。数据集的类型为测试集,表明数据集为模型的测试集。
本申请实施例所述的网元,可以通过图4a中的通信装置410来实现。图4a所示为本申请实施例提供的通信装置410的硬件结构示意图。该通信装置410包括处理器4101以及至少一个通信接口(图4a中仅是示例性的以包括通信接口4103为例进行说明),可选的,还包括存储器4102。其中,处理器4101、存储器4102以及通信接口4103之间互相连接。
处理器4101可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。
通信接口4103,使用任何收发器一类的装置,用于与其他设备或通信网络进行通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。
存储器4102可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,也可以与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器4102用于存储执行本申请方案的计算机执行指令,并由处理器4101来控制执行。处理器4101用于执行存储器4102中存储的计算机执行指令,从而实现本申请下述实施例提供的意图处理方法。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
在具体实现中,作为一种实施例,处理器4101可以包括一个或多个CPU,例如图4a中的CPU0和CPU1。
在具体实现中,作为一种实施例,通信装置410可以包括多个处理器,例如图4a中的处理器4101和处理器4106。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,通信装置410还可以包括输出设备4104和输入设备4105。输出设备4104和处理器4101通信,可以以多种方式来显示信息。例如,输出设备4104可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或 投影仪(projector)等。输入设备4105和处理器4101通信,可以以多种方式接收用户的输入。例如,输入设备4105可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的通信装置410可以是一个通用设备或者是一个专用设备。在具体实现中,通信装置410可以是台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端装置、嵌入式设备或有图4a中类似结构的设备。本申请实施例不限定通信装置410的类型。
需要说明的是,通信装置410可以是终端整机,也可以是实现终端上的功能部件或组件,也可以是通信芯片,例如基带芯片等。通信装置410是终端整机时,通信接口可以是射频模块。当通信装置410为通信芯片,通信接口4103可以是该芯片的输入输出接口电路,输入输出接口电路用于读入和输出基带信号。
本申请实施例所述的网元还可以通过图4b所示的通信装置来实现。参考图4b,通信装置包括至少一个处理器4201、至少一个收发器4203、至少一个网络接口4204和一个或多个天线4205。可选的,还包括至少一个存储器4202。处理器4201、存储器4202、收发器4203和网络接口4204相连,例如通过总线相连。天线4205与收发器4203相连。网络接口4204用于通信装置通过通信链路与其它通信设备相连,例如通信装置通过S1接口与核心网网元相连。在本申请实施例中,所述连接可包括各类接口、传输线或总线等,本实施例对此不做限定。
本申请实施例中的处理器,例如处理器4201,可以包括如下至少一种类型:通用中央处理器(central processing unit,CPU)、数字信号处理器(digital signal processor,DSP)、微处理器、特定应用集成电路专用集成电路(application-specific integrated circuit,ASIC)、微控制器(microcontroller unit,MCU)、现场可编程门阵列(field programmable gate array,FPGA)、或者用于实现逻辑运算的集成电路。例如,处理器4201可以是一个单核(single-CPU)处理器或多核(multi-CPU)处理器。至少一个处理器4201可以是集成在一个芯片中或位于多个不同的芯片上。
本申请实施例中的存储器,例如存储器4202,可以包括如下至少一种类型:只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM。在某些场景下,存储器还可以是只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
存储器4202可以是独立存在,与处理器4201相连。可选的,存储器4202也可以和处理器4201集成在一起,例如集成在一个芯片之内。其中,存储器4202能够存储执行本申请实施例的技术方案的程序代码,并由处理器4201来控制执行,被执行的各类计算机程序代码也可被视为是处理器4201的驱动程序。例如,处理器4201用于执行存储器4202中存储的计算机程序代码,从而实现本申请实施例中的技术方案。
收发器4203可以用于支持通信装置与其他网元之间射频信号的接收或者发送,收发器4203可以与天线4205相连。具体地,一个或多个天线4205可以接收射频信号, 该收发器4203可以用于从天线接收所述射频信号,并将射频信号转换为数字基带信号或数字中频信号,并将该数字基带信号或数字中频信号提供给所述处理器4201,以便处理器4201对该数字基带信号或数字中频信号做进一步的处理,例如解调处理和译码处理。此外,收发器4203可以用于从处理器4201接收经过调制的数字基带信号或数字中频信号,并将该经过调制的数字基带信号或数字中频信号转换为射频信号,并通过一个或多个天线4205发送所述射频信号。具体地,收发器4203可以选择性地对射频信号进行一级或多级下混频处理和模数转换处理以得到数字基带信号或数字中频信号,所述下混频处理和模数转换处理的先后顺序是可调整的。收发器4203可以选择性地对经过调制的数字基带信号或数字中频信号时进行一级或多级上混频处理和数模转换处理以得到射频信号,所述上混频处理和数模转换处理的先后顺序是可调整的。数字基带信号和数字中频信号可以统称为数字信号。收发器可以称为收发电路、收发单元、收发器件、发送电路、发送单元或者发送器件等等。
需要说明的是,通信装置420可以是通信装置整机,也可以是实现通信装置功能的部件或组件,也可以是通信芯片。当通信装置420为通信芯片,收发器4203可以是该芯片的接口电路,该接口电路用于读入和输出基带信号。
本申请实施例提供一种模型数据传输方法,如图5所示,所述方法包括以下步骤:
501、第一网元确定第一数据集。
其中,第一网元还可以称为数据管理功能模块,用于收集数据、管理数据。第一网元可以是图1所示网络架构中的DMF,或者是图2所示网络架构中的DCCF,或是DCCF中用于实现数据收集、数据管理的功能模块,或者是图3所示网络架构中的OAM,或是OAM中用于实现数据收集、数据管理的功能模块。
需要说明的是,机器学习技术、人工智能技术需要依靠大量的数据进行模型训练,训练结束后对模型性能进行评估,通过测试的模型才会上线用于相关的业务。本申请实施例中,第二网元可以训练模型、对训练好的模型进行评估。
其中,第二网元可以是模型训练功能模块或模型评估功能模块,第二网元可以从第一网元获取数据。本申请实施例中,第二网元可以是图1所示网络架构中的MTF或MEF,也可以是图2所示网络架构中的NWDAF1,所述NWDAF1可以负责模型训练。或者,第二网元可以是图2所示NWDAF1中用于实现模型训练的功能模块。或者,第二网元是图2所示网络架构中的NWDAF2,所述NWDAF2可以负责模型评估。或者,第二网元可以是图2所示NWDAF2中用于实现模型评估的功能模块。或者,第二网元是图3所示网络架构中的第二无线控制器,或者是第二无线控制器中用于实现模型训练的功能模块。或者,第二网元是图3所示网络架构中的第一无线控制器,或者是第一无线控制器中用于实现模型评估的功能模块。
本申请实施例中,第一网元可以通过以下两种方式确定第一数据集:
第一种、接入网设备可以进行数据采集、数据记录,第一网元可以通过数据订阅流程从接入网设备获取模型所需的数据,其中,模型所需的数据可以用于模型训练或模型评估。
具体地,当第二网元启动某个模型(例如,本申请实施例所述的第一模型)的训练流程,第二网元可以向第一网元发送数据订阅请求,所述数据订阅请求用于指示第 二网元的数据需求,所述数据需求用于表征训练所述第一模型所需的数据。例如,所述数据订阅请求包括第三信息,所述第三信息包括以下一项或多项:所述第一模型对应的一个或多个数据类型、所述第一模型所需数据的采集对象,所述采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
第一网元接收第二网元发送的数据订阅请求,根据第二网元的数据需求向接入网设备订阅数据。例如,第一网元向接入网设备发送第三信息。接入网设备可以根据第三信息确定第二网元订阅的数据,接入网设备还可以向第一网元发送所述第二网元订阅的数据。
一种可能的实现方式中,步骤501中第一网元确定第一数据集的具体实现包括:所述第一网元从第四网元接收所述第一数据集。其中,第四网元为接入网设备。
第二种、第一网元可以对来自其他网元的数据进行合并、去重处理,确定第一数据集。
本申请实施例适用于模型训练功能、模型评估功能分离部署的场景,在此场景中,还可以部署多个负责数据管理的网元。具体地,可以在靠近模型训练功能模块的区域部署一个负责数据管理的网元,模型训练功能模块可以从该网元获取数据;可以在靠近模型评估功能模块的区域部署一个负责数据管理的网元,模型评估功能模块可以从该网元获取数据。例如,负责数据管理的网元可以是本申请实施例所述的第一网元和第五网元。其中,第一网元可以靠近模型训练功能模块部署,当第二网元为负责模型训练的功能模块,第二网元可以从第一网元获取数据。第五网元可以靠近模型评估功能模块部署,当第二网元为负责模型评估的功能模块,第二网元可以从第五网元获取数据。
需要说明的是,多个管理数据的网元中有一个网元(以下简称主管理网元)可以对其他负责数据管理的网元收集到的数据进行汇总、去重后下发给其他管理数据的网元。例如,第一网元负责对数据进行汇总、去重。
具体地,第一网元在数据订阅流程中可以根据第二网元的需求从接入网设备获取第四数据集。
第一网元还可以根据第二网元的需求从第五网元获取第三数据集或第三数据集的信息。例如,第一网元向第五网元发送第三信息,用于指示第二网元的数据需求。第五网元接收第一网元发送的第三信息后,根据第三信息确定符合第二网元需求的第三数据集,还可以向第一网元发送第三数据集。
或者,第五网元向第一网元发送第三数据集的信息,其中,第三数据集的信息可以是第三数据集的范围,第三数据集的范围可以是以下一项或多项:第三数据集中数据的键key值的取值范围、第三数据集中数据分布的时间的范围、第三数据集中数据分布的网络区域的范围。
此外,第一网元对第四数据集和第三数据集中的数据进行汇总、去重,获得最终用于模型训练、模型评估的数据集,即本申请实施例所述的第一数据集。
一种可能的实现方式中,第一网元确定第一数据集的具体实现包括:所述第一网元从第五网元接收第三数据集或者第三数据集的信息。
所述第一网元还可以从第四网元(例如,前文所述的接入网设备)接收第四数据 集,根据所述第三数据集和所述第四数据集确定所述第一数据集。示例的,第一网元可以对第三数据集、第四数据集中的数据进行合并,再去除重复的数据,获得第一数据集。
需要说明的是,第一网元确定第一数据集后,还可以记录所述第一模型对应的一个或多个数据类型、所述第一数据集之间的对应关系。
或者,第一网元可以获取到第一模型的标识与所述第一模型对应的一个或多个数据类型之间的对应关系,第一网元确定第一数据集后,维护第一数据集和第一模型的标识之间的对应关系。
一种可能的实现方式中,第一网元从第二网元接收第一模型的标识与所述第一模型对应的一个或多个数据类型之间的对应关系。或者,从第三网元(例如,模型管理功能模块)接收第一模型的标识与所述第一模型对应的一个或多个数据类型之间的对应关系。
502、第二网元向第一网元发送第一信息和第二信息。所述第一信息用于指示第一模型,所述第二信息用于请求第二数据集,所述第二数据集用于训练所述第一模型或者用于测试所述第一模型。
本申请实施例中,当第二网元需要进行模型训练时,可以向第一网元请求训练集;当第二网元需要进行模型评估时,可以向第一网元请求测试集。
具体实现中,第二网元可以向第一网元发送数据请求消息(data query),数据请求消息包括第一信息和第二信息。其中,第一信息用于指示第一网元所训练或评估的模型,第二信息用于请求模型的训练集或测试集。
具体地,第一信息可以是第一模型的标识,例如,第一信息为第一模型的model ID。区别于划分数据集的不同网元,第二信息有以下两种实现可能:
第一种、可以由第一网元划分模型的训练集、模型的测试集,在这种方式中,第二网元可以通过数据集的类型向第一网元请求模型的训练集或测试集。
示例的,第二网元发送的第二信息用于指示所述第二数据集的类型。在这种实现方式中,第二数据集的类型包括训练集(train)或者测试集(test),所述训练集用于训练所述第一模型,所述测试集用于测试所述第一模型。
第二种、可以由第三网元(用于对模型进行管理,例如,本申请实施例所述的模型管理功能模块MMF)划分模型的训练集、测试集。
一种可能的实现方式中,第三网元还可以向第二网元通知划分的结果。具体地,第三网元可以向第二网元通知训练集的范围或测试集的范围。其中,训练集的范围可以是训练集中数据的key值的取值范围,或者是训练集中数据分布的时间的范围,或者是训练集中数据分布的网络区域的范围。测试集的范围可以是测试集中数据的key值的取值范围,或者是测试集中数据分布的时间的范围,或者是测试集中数据分布的网络区域的范围。
在这种方式中,第二网元可以通过训练集的范围向第一网元请求训练集,或者通过测试集的范围向第一网元请求测试集。示例的,第二网元向发送第一网元的第二信息用于指示所述第二数据集的范围;所述第二数据集是所述第一模型的训练集或所述第一模型的测试集。
其中,所述第二数据集的范围包括以下一项或多项:所述第二数据集中数据的key值的范围、所述第二数据集中数据分布的时间的范围、所述第二数据集中数据分布的网络区域的范围。
另一种可能的实现方式中,第三网元划分模型的训练集、测试集后,还可以向第一网元通知对第一数据集划分的结果。
第一网元从第三网元接收划分结果后,可以记录第一数据集对应的一个或多个数据类型、划分结果之间的对应关系。例如,记录第一数据集对应的一个或多个数据类型、训练集的范围、测试集的范围之间的对应关系。其中,第一数据集对应的一个或多个数据类型即第一模型对应的一个或多个数据类型。
在这种方式中,第二网元可以通过数据集类型请求向第一网元请求模型的训练集或测试集。
示例的,第二网元发送的第二信息用于指示所述第二数据集的类型。在这种实现方式中,第二数据集的类型包括训练集(train)或者测试集(test),所述训练集用于训练所述第一模型,所述测试集用于测试所述第一模型。
需要说明的是,第二网元发送的数据请求消息还可以包括数据类型(data Type),所述数据类型为所述第一模型对应的一个或多个数据类型,所述第一模型对应的一个或多个数据类型用于表征训练、评估所述第一模型所需数据的类型。
503、所述第一网元从第二网元接收第一信息与第二信息,根据所述第一信息和第二信息向所述第二网元发送所述第二数据集,所述第二数据集为所述第一数据集的子集。
一种可能的实现方式中,由第一网元划分模型的训练集和/或模型的测试集,第二网元可以通过数据集类型(例如,训练集或测试集)向第一网元请求模型的训练集或测试集。具体地,第一网元从第二网元接收第一信息和第二信息后,根据第一信息确定第一数据集,还可以根据第二信息指示的数据集类型从第一数据集中确定第二数据集。其中,第一信息可以是第一模型的标识,第一网元可以确定第一模型的标识关联(对应)的数据类型,即用于训练、评估第一模型的数据的数据类型。第一网元还可以根据确定的数据类型确定相应的数据,这些数据构成第一数据集。
示例的,第一网元维护了第一数据集和第一模型的标识之间的对应关系,步骤503第一网元接收第一信息后,根据第一信息确定第一模型的标识,还可以根据第一模型的标识确定与其对应的第一数据集。进一步,若第二信息指示第二网元请求的是第一模型的训练集,例如,第二信息的值为“train”,则从第一数据集中划分出子集作为第一模型的训练集,并向第一网元发送该子集。
若第二信息指示第二网元请求的是第一模型的测试集,例如,第二信息的值为“test”,则从第一数据集中划分出子集作为第一模型的测试集,并向第一网元发送该子集。
需要说明的是,第一网元可以根据数据划分策略从第一数据集中划分第一模型的训练集和/或第一模型的测试集。
其中,所述数据划分策略为以下任意一项:根据数据分布的时间划分、根据数据分布的网络区域进行划分或根据指定比例划分。
具体实现中,所述第一网元从第三网元接收所述数据划分策略;或者,所述第一网元确定所述数据划分策略。
另一种可能的实现方式中,由第三网元划分模型的训练集和/或模型的测试集。第二网元可以从第三网元接收划分结果,例如,第一模型的训练集的范围或第一模型的测试集的范围。第二网元还可以根据训练集的范围向第一网元请求第一模型的训练集,还可以根据测试集的范围向第一网元请求第一模型的测试集。
具体地,第一网元接收第二网元发送的第一信息和第二信息后,根据第一信息确定第一数据集,还可以根据第二信息指示的范围从第一数据集中确定第二数据集。其中,第一信息可以是第一模型的标识,第二信息可以是测试集的范围或训练集的范围。
示例的,第一网元维护了第一数据集和第一模型的标识之间的对应关系,步骤503第一网元接收第一信息后,根据第一信息确定第一模型的标识,还可以根据第一模型的标识确定与其对应的第一数据集。第一网元还可以根据第二信息指示的范围从第一数据集中划分出一个子集,作为第二数据集。
示例的,第三网元划分的训练集的key值范围是(x~y),测试集的key值范围是(w~z)。当第二网元向第一网元请求第一模型的训练集,第一信息可以是第一模型的标识,第二信息可以是训练集的范围“(x~y)”。第一网元可以将第一数据集中key值范围是(x~y)的数据作为第二数据集,即第二网元所请求的训练集。
当第二网元向第一网元请求第一模型的测试集,第一信息可以是第一模型的标识,第二信息可以是测试集的范围“(w~z)”。第一网元可以将第一数据集中key值范围是(w~z)的数据作为第二数据集,即第二网元所请求的测试集。
需要说明的是,当第二网元向第一网元请求数据时,第二网元还可以向第一网元发送所述第一模型对应的一个或多个数据类型。例如,第二网元发送的数据请求消息中除了第一信息、第二信息外,还包括所述第一模型对应的一个或多个数据类型。当步骤501中,第一网元维护所述第一模型对应的一个或多个数据类型、所述第一数据集的对应关系,第一网元接收所述数据请求消息后还可以根据所述第一模型对应的一个或多个数据类型索引到第一数据集。
可选的,在第三网元划分训练集、测试集,向第二网元通知划分的结果的实现方式中,第二网元通过数据集的范围和数据类型向第一网元请求训练集或测试集。
例如,第二网元向第一网元发送第一模型对应的一个或多个数据类型以及训练集的范围,第一网元根据所述第一模型对应的一个或多个数据类型、所述第一数据集的对应关系,确定第一数据集。根据训练集的范围从所述第一数据集中划分出第一模型的训练集。
或者,第二网元向第一网元发送第一模型对应的一个或多个数据类型以及测试集的范围,第一网元根据所述第一模型对应的一个或多个数据类型、所述第一数据集的对应关系,确定第一数据集。根据测试集的范围从所述第一数据集中划分出第一模型的训练集。
可选的,图5所示的方法还包括:第二网元向第一网元订阅数据,具体地,第二网元向第一网元发送第三信息,所述第三信息用于表征第一网元的数据需求。
具体地,所述第三信息包括以下一项或多项:所述第一模型对应的一个或多个数 据类型、所述第一模型所需数据的采集对象,所述采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
可选的,在第三网元划分训练集、测试集的场景中,第一网元还可以向第三网元上报第一模型所需数据的范围,以便第三网元根据数据的范围划分训练集的范围和/或测试集的范围。其中,所述第一模型所需数据可以是所述第一网元根据所述第一模型对应的一个或多个数据类型从接入网设备订阅的数据。
示例的,图5所示的方法还包括:所述第一网元向第三网元发送与所述第一模型对应的一个或多个数据类型、与所述一个或多个数据类型对应的所述第一数据集的范围,所述第一数据集的范围包括以下一项或多项:所述第一数据集中数据的key值的范围、所述第一数据集中数据分布的时间的范围、所述第一数据集中数据分布的网络区域的范围。
当存在多个负责管理数据的网元,主管理网元可以其他网元收集到的数据进行合并、去重处理,并根据处理后的数据集划分训练集和/或测试集。主管理网元还可以向其他网元发送划分的测试集或测试集,以便模型训练功能模块可以从部署较近的网元获取训练集,模型评估功能模块可以部署较近的网元获取测试集,缩短模型数据的传输时延。例如,负责数据管理的网元为本申请实施例所述的第一网元和第五网元。假设第一网元负责对数据进行合并、去重处理,且第一网元靠近模型训练功能模块部署,第五网元靠近模型评估功能模块部署。第一网元划分训练集后,模型训练功能模块可以向第一网元请求训练集。第一网元还可以向第五网元发送测试集或测试集的信息,第五网元可以从第一网元接收测试集或测试集的信息,模型评估功能模块可以向第五网元请求测试集。
或者,第一网元负责对数据进行合并、去重处理,且第五网元靠近模型训练功能模块部署,第一网元靠近模型评估功能模块部署。第一网元划分训练集和测试集后,模型评估功能模块可以向第一网元请求测试集。第一网元还可以向第五网元发送训练集或训练集的信息,第五网元可以从第一网元接收训练集或训练集的信息,模型训练功能模块可以向第五网元请求训练集。
参考图6a,第一数据集可以划分成训练集、测试集两部分,即训练集、测试集的数据不存在交集,且训练集、测试集的数据总和构成第一数据集。或者,参考图6b,测试集、训练集均为第一数据集的子集,且训练集、测试集的数据总和小于第一数据集。
以下以第一网元为DMF,第二网元为MTF或MEF,第三网元为MMF为例,详细介绍本申请实施例提供的模型数据传输方法。
图7所示的方法中,MTF、MEF可以在数据请求消息中增加两个参数:模型标识和数据集类型,使得DMF可以根据这两个参数区分所请求的是测试数据还是训练数据,DMF向MTF、MEF返回的训练集和测试集不存在交集,从而可以提高模型评估的准确性。如图7所示,所述方法包括以下步骤:
701、数据订阅和收集流程。
具体地,MTF、MEF分别进行模型训练、模型评估,MTF、MEF还可以分别向DMF订阅训练模型所需的数据、评估模型所需的数据。例如,MTF可以向DMF发送 数据需求,所述数据需求用于指示训练模型(以下称为第一模型)对应的一个或多个数据类型(datatype)以及数据采集对象,所述采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。MEF向DMF发送评估该模型对应的一个或多个数据类型(datatype)以及对应数据类型的采集对象,所述采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
DMF可以从MTF、MEF接收训练模型、评估模型所需要数据的数据类型(datatype)以及对应数据类型的采集对象。
DMF还可以根据数据类型(data type)以及数据的采集对象向RAN请求数据,完成数据收集。DMF从RAN接收数据后,还可以以数据集的形式记录从RAN收集到的数据。例如,MTF为模型1发起数据订阅,在数据订阅过程中向DMF指示模型1对应的数据类型“RSRP”以及数据采集对象“cell 1”。DMF根据“RSRP”、“cell 1”向RAN发起数据收集,DMF从RAN接收收集到的数据。
702、启动模型训练流程。
具体实现中,MTF接收启动模型训练的消息之后启动模型训练。例如,MMF向MTF发送指示消息触发MTF进行模型训练,该指示消息包括第一模型的模型标识,例如model ID。
703、MTF向DMF发送数据请求消息1,数据请求消息1包括模型标识、数据类型以及数据集类型。
其中,数据请求消息1中的模型标识用于指示MTF所训练的模型,例如,前文所述的第一模型,模型标识可以是model ID;数据请求消息1中的数据类型用于指示第一模型对应的一个或多个数据类型,数据类型可以是“data Type”;数据集类型用于指示MTF所请求的数据集,数据集类型可以是“dataset Type”。例如,数据请求消息1中的数据集类型“dataset Type”的值可以是“train”。
可选地,数据请求消息1还可以包括数据子集信息,用于指示MTF所请求的数据子集的详细信息,例如,数据子集的大小,其中,数据子集的大小用于表征数据子集中的数据量,例如,数据子集的大小为1000,即数据子集包括1000记录。
704、DMF根据数据类型确定数据集1,将该数据集1划分为训练集和测试集。
具体实现中,DMF根据数据类型确定匹配的数据集,以下将该数据集称为数据集1。
例如,步骤701中DMF从RAN接收数据,记为数据集1,并记录数据集1与数据类型“RSRP”的对应关系。在步骤704中DMF可以根据数据类型“RSRP”索引到数据集1。
或者,步骤701执行后,DMF持续从RAN收集数据,直至步骤703接收到MTF的数据请求消息,在此期间DMF根据数据类型“RSRP”收集到的数据构成数据集1,DMF还可以记录数据集1与数据类型“RSRP”的对应关系。在步骤704中DMF可以根据数据类型“RSRP”索引到数据集1。
或者,步骤703中MTF发送的数据请求消息可以指示一个时间范围,DMF根据该时间范围内收集到的数据构成数据集1,DMF还可以记录数据集1与数据类型“RSRP”的对应关系。在步骤704中DMF可以根据数据类型“RSRP”索引到数据集1。
另外,DMF可以自行确定数据集划分方法,也可以预先配置数据集划分方法。数据集划分方法可以是随机划分,或者按数据分布的时间划分等。例如,根据datatype匹配到3个月(4月1日-6月30日)的数据,DMF可以将前87天(4月1日~6月27日)的数据划分为训练集,用于训练模型;还可以将最后3天(6月28日~6月30日)的数据划分为测试集,用于模型评估。
此外,DMF划分好训练集和测试集之后,还可以根据模型标识(modelID)管理划分后的数据集。例如,用标签“modelID:train”来标记训练集数据,用标签“modelID:test”来标记测试集数据。
需要说明的是,如果DMF没有查询到MTF所请求的数据,则返回NACK消息,包含错误原因。
705、DMF向MTF返回训练集。
具体实现中,DMF根据数据请求消息1中数据子集标识(datasetType)的值为“train”,将训练集返回给MTF。
需要说明的是,如果DMF中没有查找到MTF所请求的训练集,则向MTF返回否定应答(negative acknowledgement,NACK)消息,NACK消息可以包含错误原因。
706、MTF利用训练集进行模型训练。
707、MEF启动评估流程。
具体实现中,MEF接收启动模型评估的消息之后启动模型训练。例如,MMF向MEF发送指示消息触发MEF进行模型评估,该指示消息包括模型标识,例如model ID。
708、MEF向DMF发送数据请求消息2,数据请求消息2包括模型标识以及数据集类型。
其中,模型标识用于指示MEF所评估的模型,模型标识可以是model ID;数据集类型用于指示MEF所请求的数据集,数据集类型可以是“dataset Type”。例如,数据请求消息2中的数据集类型“dataset Type”的值可以是“test”。
需要说明的是,DMF在步骤104中划分训练集以及测试集之后,可以利用模型标识以及数据集类型标记了测试集、数据集。因此,MEF请求测试集时,数据请求消息2可以不携带数据类型,携带模型标识以及数据集类型即可。
709、DMF根据数据请求消息2中的模型标识以及数据集类型确定测试集。
具体实现中,DMF可以根据model ID、dataset Type的值查找到MEF所请求的测试集。例如,数据请求消息2中dataset Type的值为“test”,数据类型为“RSRP”。DMF可以索引到从数据集1中划分的测试集。
需要说明的是,如果DMF中没有查找到MEF所请求的测试集,则向MEF返回NACK消息,NACK消息可以包含错误原因。
710、DMF向MEF发送测试集。
711、MEF利用测试集进行模型评估。
图7所示的方法中,在请求数据时通过模型标识指示使用数据的模型,通过数据集类型指示所请求的数据为训练集还是测试集,DMF在数据请求消息后,可以根据数据子集标识明确所请求的是训练集还是测试集,向MTF返回训练集,向MEF返回测试集,可以保证下发的训练集和测试集之间不存在交集,确保模型评估结果的准确性。
图8所示的方法中,MMF可以确定数据划分策略,并将确定数据划分策略发送给DMF,DMF根据MMF下发的划分策略划分训练集以及测试集。如图8所示,所述方法包括以下步骤:
801、数据订阅和收集流程。
具体实现参考前文步骤701的相关描述,在此不做赘述。
802、MMF向DMF发送数据信息查询请求。
具体地,MMF根据模型对应的数据类型(data Type)向DMF查询数据信息,数据信息查询请求包括MMF查询的一个或多个数据类型以及需要查询的数据信息。其中,查询的数据信息可以是以下一项或多项:与数据类型对应的数据集的数据量、与所述一个或多个数据类型对应的数据集的数据分布范围、与所述一个或多个数据类型对应的数据集中数据的key值的范围。其中,数据分布范围可以是数据分布的时间段或数据分布的网络区域等。
803、DMF向MMF发送数据信息。
804、MMF确定数据划分策略(split Policy),将数据划分策略发送给DMF。
需要说明的是,步骤802、步骤803为可选步骤,MMF可以根据数据信息确定数据划分策略。
当不执行步骤802、步骤803,DMF也可以不依靠数据信息确定数据划分策略,此时,DMF数据划分策略可以是常用的划分策略,如,随机划分或按指定比例划分。
split Policy可以包括划分方式(split Method)和划分比例(split Ratio)。其中split Method可以是随机划分,split Ratio用于指示训练集和测试集的数据量比值,例如4:1,即确定模型所需的数据后,将其中80%的数据划分为训练集,剩余20%的数据划分为测试集。
Split Policy也可以是按时间固定划分,即按时间顺序取前x%的数据作为训练集,其余数据作为测试集。其中,x是根据split Ratio确定的数值。
804发送的消息中除划分策略外,还包含模型标识,和/或,数据类型,DMF维护模型标识或数据类型与划分策略的对应关系。
805、MMF给MTF发送消息触发MTF进行模型训练。
其中,MMF发送的消息包括模型标识。
806、MTF向DMF发送数据请求消息1,数据请求消息1包括模型标识、数据类型以及数据集类型。
其中,模型标识用于指示MTF所训练的模型,模型标识可以是model ID;数据类型用于指示MTF训练模型所需数据的类型,数据类型可以是“data Type”;数据集类型用于指示MTF所请求的数据集,数据集类型可以是“dataset Type”。例如,数据请求消息1中的数据集类型“dataset Type”的值可以是“train”。
807、DMF根据MMF发送的数据划分策略将数据集划分为训练集和测试集。
具体实现中,DMF首先确定与数据类型对应的数据,根据这些数据构建数据集。然后,根据804中模型标识或数据类型与划分策略的对应关系索引到数据集的划分策略,按照指定的策略完成划分。
步骤808~步骤814同前文所述的步骤705~步骤711,在此不做赘述。
需要说明的是,图8所示方法适用于模型管理功能可能与模型训练功能分离部署的场景,也适用于模型管理功能可能与模型训练功能共部署的场景。可选的,DMF在收到划分策略后就进行数据集划分,即步骤807在步骤806之前、步骤804之后执行,可以减少步骤806之后的等待时间。
图8所示方法中,MMF可以根据专家经验、数据信息、场景特性等确定更合理的数据划分策略,并将划分策略发送给DMF,为DMF划分数据集提供依据。保证测试集与训练集不存在交集的同时,通过合理的划分数据集,进一步改善模型训练和评估效果。
图9所示的方法中,由MMF直接划分训练集和测试集,将划分结果告知MTF和MEF,MTF和MEF各自向DMF请求数据。如图9所示,所述方法包括以下步骤:
901、数据订阅和收集流程。
具体实现参考前文步骤701的相关描述,在此不做赘述。
902、MMF向DMF发送数据信息查询请求。
其中,数据信息查询请求包括数据类型列表(data Type List)和数据信息。数据类型列表包括模型(例如,本申请实施例所述的第一模型)的一个或多个数据类型。数据信息是MMF期望查询的数据信息,例如,与所述一个或多个数据类型对应的数据集的数据量、与所述一个或多个数据类型对应的数据集的数据分布范围、与所述一个或多个数据类型对应的数据集中数据的key的范围。其中,数据分布范围可以是数据分布的时间段或数据分布的网络区域等。
903、DMF向MMF返回指定的数据信息。
需要说明的是,如果DMF没有MMF所查询data Type对应的数据。则向MMF返回NACK消息,包含错误原因。
904、MMF确定训练集的范围以及测试集的范围。
具体实现中,根据获取的数据信息划分训练集和测试集。
如果MMF查询到了数据的分布范围,MMF可以将数据分布范围划分成两部分,分布对应训练集和测试集。例如,MMF查询到DMF当前共收集到3个月(4月1日-6月30日)的数据,MMF确定的数据划分策略可以是:前87天(4月1日-6月27日)的数据用于模型训练,最后3天(6月28日-6月30日)的数据用于模型测试。
或者,MMF查询到了数据的key,MMF可以将数据的key划分成两部分,分布对应训练集和测试集。例如,MMF查询到DMF当前收集到的数据的key值为100~1000,MMF确定的数据划分策略可以是:key值在100~900范围的数据用于模型训练,key值在901~1000范围的数据用于模型评估。
此外,MMF可以根据场景、数据特性或经验进行划分,本申请实施例对此不做限制。
905、MMF向MTF发送消息触发模型训练,该消息包括模型标识以及训练集的范围。
示例的,MMF向MTF发送的消息包括模型的标识以及该模型的训练数据的范围,例如,“4月1日-6月27日”,即“4月1日-6月27日”的数据用于训练模型。
906、MTF向DMF发送数据请求消息,该消息包括数据类型和训练集的范围。
需要说明的是,MTF可以维护数据类型data type和模型标识model ID之间的对应关系,MTF从MMF接收模型标识以及训练集的范围后,可以确定与模型标识对应的数据类型,从而可以确定数据类型、模型标识以及训练集的范围之间的对应关系。
MTF还可以根据数据类型和训练集的范围向DMF请求训练数据。
907、DMF向MTF返回训练集。
具体实现中,DMF确定与MTF所发送的数据类型对应的数据,再根据训练集的范围从这些数据中划分出训练集。具体地,DMF可以根据数据类型确定匹配的数据集,以下将该数据集称为数据集1。
例如,步骤901中DMF从RAN接收数据,记为数据集1,并记录数据集1与数据类型“RSRP”的对应关系。在步骤907中DMF可以根据数据类型“RSRP”索引到数据集1。
或者,步骤901执行后,DMF持续从RAN收集数据,直至步骤906接收到MTF的数据请求消息,在此期间DMF根据数据类型“RSRP”收集到的数据构成数据集1,DMF还可以记录数据集1与数据类型“RSRP”的对应关系。在步骤907中DMF可以根据数据类型“RSRP”索引到数据集1。
或者,步骤906中MTF发送的数据请求消息可以指示一个时间范围,DMF根据该时间范围内收集到的数据构成数据集1,DMF还可以记录数据集1与数据类型“RSRP”的对应关系。在步骤907中DMF可以根据数据类型“RSRP”索引到数据集1。
示例的,MTF所发送的数据类型为“RSRP”,DMF根据数据类型“RSRP”所发送的数据类型确定出“4月1日-6月30日”的RSRP数据。由于训练集的范围是“4月1日-6月27日”,DMF则将“4月1日-6月27日”的RSRP数据作为训练集发送给MTF。
908、MTF利用训练集进行模型训练。
909、MMF给MEF发送消息触发评估,消息内容包括模型标识以及测试集的范围。
示例的,MMF向MTF发送的消息包括模型的标识以及该模型的测试数据的范围,例如,“6月28日-6月30日”,即“6月28日-6月30日”的数据用于评估模型。
910、MEF向DMF发送数据请求消息,该消息包括数据类型和测试集的范围。
需要说明的是,MEF可以维护数据类型data type和模型标识model ID之间的对应关系,MEF从MMF接收模型标识以及测试集的范围后,可以确定与模型标识对应的数据类型,从而可以确定数据类型、模型标识以及测试集的范围之间的对应关系。
MEF还可以根据数据类型和测试集的范围向DMF请求测试数据。
911、DMF向MEF返回测试集。
具体实现中,DMF确定与MEF所发送的数据类型对应的数据,再根据测试集的范围从这些数据中划分出训练集。具体地,DMF根据数据类型确定数据集1,再根据训练集的范围从该数据集中划分出训练集。
需要说明的是,步骤911根据MEF所发送的数据类型查找到的数据集、步骤907根据MTF所发送数据类型查找到的数据集相同,例如,本申请实施例所述的数据集1。
示例的,MEF所发送的数据类型为“RSRP”,DMF根据数据类型“RSRP”确定出RSRP数据。由于测试集的范围是“6月28日-6月30日”,DMF则将“6月28日 -6月30日”的RSRP数据作为测试集发送给MEF。
912、MEF利用测试集进行模型评估。
图9所示的方法中,MMF向DMF查询数据信息后划分训练集和测试集,将划分结果告知MTF和MEF,MTF和MEF各自向DMF请求数据。MMF进行数据集划分时可以保证训练集和测试集之间不存在交集,因此确保了模型评估结果的准确性。
此外,图9所示的方法中,DMF无需区分训练集和测试集,简化了DMF内部操作,且数据请求消息中无需增加参数。
图10所示的方法中,在下发数据划分策略的消息中增加数据类型,MTF、MEF后续请求数据时无需传输数据类型,DMF可以根据模型标识索引到模型对应的数据类型。如图10所示,所述方法包括以下步骤:
1001、数据订阅和收集流程。
具体实现参考前文步骤701的相关描述,在此不做赘述。
1002、MMF向DMF发送数据信息查询请求。
具体地,MMF可以针对某一个模型(例如,本申请实施例所述的第一模型)发起数据查询请求,以便触发后续流程划分该模型的训练集和/或测试集。
MMF可以根据该模型对应的数据类型(data Type)向DMF查询数据信息,数据信息查询请求包括MMF查询的一个或多个数据类型以及数据信息。其中,查询的数据信息可以是以下中一项或多项:与所述一个或多个数据类型对应的数据集的数据量、与所述一个或多个数据类型对应的数据集的数据分布范围、与所述一个或多个数据类型对应的数据集中数据的key的范围。其中,数据分布范围可以是数据分布的时间段或数据分布的网络区域等。
1003、DMF向MMF发送数据信息。
需要说明的是,步骤1002、步骤1003为可选步骤,MMF也可以不执行查询数据信息的步骤,直接确定数据划分策略。
1004、MMF向DMF发送模型配置消息,该消息包括模型标识(model ID)、数据类型(data Type)和数据划分策略(split Policy)。
具体地,模型配置消息中的数据类型为所述模型对应的一个或多个数据类型,模型配置消息中的数据划分策略与模型配置消息中的模型标识对应,数据划分策略用于确定模型标识指示的模型的训练集和/或测试集。
1005、MMF给MTF发送消息触发模型训练,该消息包括模型标识。
其中,消息触发模型训练中的模型标识用于指示MTF触发该模型标识所指示模型的训练流程。
1006、MTF向DMF发送数据请求消息1,该消息包括模型标识(model ID)、数据集子集类型“dataset Type=train”。
需要说明的是,通过步骤1004的模型配置消息,DMF可以确定模型标识对应的数据类型,因此在步骤1006中MTF无需发送模型对应的数据类型,仅需要指示模型标识以及数据集类型,DMF可以根据模型标识索引到模型对应的数据类型。
步骤1007~步骤1014同前文所述的步骤704~步骤711,具体实现参考前文,在此不做赘述。
图10所示的方法中,将模型对应的数据类型提前发送给DMF,MTF或MEF请求数据时无需发送模型对应的数据类型,节省信令的传输开销。
图11所示的方法中,MMF通过不同的消息将模型对应的数据划分策略(split Policy)和数据类型(data Type)分别发送给DMF。如图11所示,所述方法包括以下步骤:
1101、数据订阅和收集流程。
具体实现参考前文步骤701的相关描述,在此不做赘述。
1102、MMF向DMF发送模型标识和数据类型。
本申请实施例中,MMF可以针对某一个模型(例如,本申请实施例所述的第一模型)发起步骤1102,以触发后续流程确定该模型对应的数据划分策略。
需要说明的是,步骤1102中MMF向DMF发送的模型标识用于指示该模型,MMF向DMF发送的数据类型是该模型对应的一个或多个数据类型。
1103~1104为可选步骤,与前文所述的802~803相同,在此不做赘述。
1105、MMF向DMF发送模型配置消息,该消息包括模型标识(model ID)、数据划分策略(split Policy)。
1106、MMF给MTF发送消息触发训练,该消息模型标识。
1107、MTF向DMF发送数据请求消息,该消息包括模型标识(model ID)、数据集类型(dataset Type=“train”)。
1108~1115同前文所述的步骤704~步骤711,具体实现参考前文,在此不做赘述。
图11所示的方法中,模型对应的数据类型和数据集随划分策略分开下发,步骤1102在步骤1107之前执行即可。
本申请实施例还提供一种模型数据传输方法,MTF和MEF可以从不同的DMF获取数据,两个DMF之间可以先进行数据同步,数据同步之后再进行数据集的划分。其中,DMF1可以向MTF提供数据,DMF2可以向MEF提供数据。如图12所示,所述方法包括以下步骤:
1201、数据订阅和收集流程。
具体实现中,MTF和MEF分别向DMF1和DMF2发起数据订阅请求,包含模型(例如,本申请实施例所述的第一模型)需订阅的数据需求,DMF1和DMF2根据数据需求向接入网设备订阅数据,具体实现参考前文所述的步骤701,在此不做赘述。
1202、MTF启动模型训练。
一种可能的实现方式中,MMF向MTF发送模型训练触发消息,该消息模型标识(model ID)。
需要说明的是,模型训练触发消息用于触发MTF启动模型(例如,第一模型)的训练,该消息中的model ID用于指示所述模型。
1203、MTF向DMF1发送数据请求消息1,该消息内容中包括模型标识(model ID),数据集类型(dataset Type=“train”)和数据类型(data Type)。
其中,MTF发送的数据请求消息1中dataset Type(即本申请实施例所述的第二信息)的值为“train”,指示MTF请求的是训练集。
1204、DMF1向DMF2发送所述模型对应的一个或多个数据类型。
需要说明的是,DMF1还可以向DMF2发送MTF在数据订阅流程中的其他数据需求,例如,所述模型所需数据的采集对象,所述采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
1205、DMF2根据所述一个或多个数据类型确定数据,向DMF1发送确定的数据。
1206、DMF1从DMF2接收数据,对数据进行合并、去重,获得数据集1。
具体实现中,DMF1在数据订阅流程中根据MTF的数据需求从接入网设备获取了一部分数据,从DMF2接收数据后,对这两部分数据进行合并、去重,获得数据集1。数据集1用于划分所述模型的训练集、测试集。
1207、DMF1将数据集1划分为训练集和测试集。
需要说明的是,DMF1还可以根据model ID管理划分的训练集、测试集,以便后续可以根据model ID查找到从数据集1划分的训练集和测试集。
示例的,给训练集和测试集数据分别打上标签model ID:train和model ID:test。
此外,数据集划分方法可以由DMF自行确定,例如,按照指定的比例随机划分,按时间固定划分等。
1208、DMF1向MTF发送训练集。
1209、DMF1向DMF2发送测试集。
需要说明的是,步骤1209在步骤1207之后,步骤1212之前执行即可。
DMF1发送的测试集和前文所述的model ID是绑定的,具体标签可以是“model ID:test”,指示测试是该model ID所指示模型对应的测试集。
1210、MEF启动模型评估
一种可能的实现方式中,MMF向MEF发送消息启动模型评估。具体地,该消息包括模型标识。
需要说明的是,步骤1210在步骤1208之后执行即可。
1211、MEF向DMF2发送数据请求消息2,该消息包括模型标识(model ID),数据集类型(dataset Type)。
需要说明的是,MEF发送的数据请求消息中dataset Type(即本申请实施例所述的第二信息)的值为“test”,指示MEF请求的是测试集。
1212、DMF2向MEF发送测试集。
具体实现中,DMF2根据MEF在步骤1211发送的model ID和dataset Type=“test”查询测试集,将测试集返回给MEF。
需要说明的是,若DMF2未查找到MEF所请求的数据,则向MEF回复NACK消息,包含错误原因。
可以理解的是,靠近MTF的DMF中数据更多,因此图12所示方法中可以由DMF1发起数据同步,并由DMF1完成数据合并和数据集的划分。另一种可能的实现方式中,由DMF2发起数据同步。本申请实施例还提供以下两种数据同步方案:
方案一:DMF1向DMF2发送MTF请求的数据类型(例如,第一模型对应的一个或多个数据类型)以及DMF1已收集数据的信息。其中,DMF1已收集数据即DMF1根据MTF请求的数据类型收集到的数据。数据的信息可以是数据的范围,例如,数据的key值的范围,或者数据分布的时间的范围,或者数据分布的网络区域的范围。
DMF2根据数据类型以及DMF1已收集数据的信息,确定收集到的数据中DMF1不存在的数据并发送给DMF1。
方案一中,DMF之间无需传输重复数据,节省了传输资源。
方案二:两个DMF之间可以通过传输数据信息进行数据同步,不传输数据。例如,DMF1向DMF2发送MTF请求的数据类型。
DMF2根据数据类型确定DMF2已收集数据的信息并发送给DMF1。由DMF1根据对两部分数据信息进行合并、去重。
方案二中无需传输数据,节省了传输带宽。
图12所示的方法,适用于多个DMF的场景,DMF之间进行数据同步之后再进行训练集、测试集的划分,可以保证训练集和测试集不存在交集,提高模型评估的准确性。
本申请实施例还提供一种模型数据传输方法,与图12所示方法不同的是,MMF可以提前将model ID和data Type的对应关系下发给DMF1,则步骤1203中MTF向DMF1请求训练集时可以不发送data Type,仅发送model ID和数据集类型。DMF1可以根据model ID查找到对应的data Type,从而确定与所述data Type对应的数据集,进一步根据MTF发送的数据集类型从该数据集中划分训练集。
本申请实施例还提供一种模型数据传输方法,适用于多个DMF的场景,且由MMF下发数据划分策略。如图13所示,所述方法包括以下步骤:
1301、数据订阅和收集流程。
具体实现参考前文所述的步骤1201,在此不做赘述。
1302、MMF确定数据划分策略,向DMF1发送数据划分策略(split Policy)。
具体实现中,MMF向DMF1发送配置消息,配置消息包括split Policy和model ID,split Policy适用于model ID所指示模型的训练集的划分或测试集的划分。
具体地,MMF可以利用专家经验、数据信息、场景等确定合理的数据划分策略,改善模型训练和评估效果。
可选的,步骤1302之前可以分别查询DMF1和DMF2中数据的信息,则MMF还可以参考查询到的数据信息确定split Policy。
Split Policy的具体描述参考前文步骤804相关描述,在此不做赘述。1303~1313同前文所述的步骤1202~1212,其中,DMF1按照MMF下发的数据划分策略进行数据集划分。
前文提出的数据同步方案同样适用于图13所示的方法。图13所示的方法中,MMF可以确定更合理的数据划分策略,在多个DMF的场景下,可以改善模型训练和评估效果。
本申请实施例还提供一种模型数据传输方法,与图13所示方法不同的是,MMF可以提前将model ID和data Type的对应关系下发给DMF1,则步骤1304中MTF向DMF1请求训练集时可以不发送data Type,仅发送model ID和数据集类型。DMF1可以根据model ID查找到对应的data Type,从而确定与所述data Type对应的数据集,进一步根据MTF发送的数据集类型从该数据集中划分训练集。
本申请实施例还提供一种模型数据传输方法,适用于多个DMF的场景,且不同 DMF之间通过传输数据信息进行数据同步。如图14所示,所述方法包括以下步骤:
1401、数据订阅和收集流程。
具体实现参考前文所述的步骤1201,在此不做赘述。
1402、MMF向DMF1发送数据信息查询请求,该请求包括数据所述模型对应的一个或多个数据类型(data Type)、待查询的数据信息。
其中,待查询的数据信息,可以是数据量、数据的范围,数据的范围可以是数据的key值的范围,或者数据分布的时间的范围,或者数据分布的网络区域的范围。
1403、DMF1向MMF返回数据信息。
具体地,DMF1根据MMF发送的一个或多个数据类型确定数据集1(DMF1已收集到的数据构成数据集1),并向MMF返回数据集1的信息,例如,根据所述一个或多个数据类型已收集数据的数据量、已收集数据的范围。
如果DMF1没有查询到与所述一个或多个数据类型对应的数据信息,则向MMF返回NACK消息,包含错误原因。
1404、MMF向DMF2发送数据信息查询请求,该请求包括数据所述模型对应的一个或多个数据类型(data Type)、待查询的数据信息。
其中,待查询的数据信息,可以是数据量、数据的范围,数据的范围可以是数据的key值的范围,或者数据分布的时间的范围,或者数据分布的网络区域的范围。
1405、DMF2向MMF返回数据信息。
具体地,DMF2根据MMF发送的一个或多个数据类型确定数据集2(DMF2已收集到的数据构成数据集2),并向MMF返回数据集2的信息,例如,根据所述一个或多个数据类型已收集数据的数据量、已收集数据的范围。
如果DMF2没有查询到与所述一个或多个数据类型对应的数据信息,向MMF返回NACK消息,包含错误原因。
1406、MMF根据具体场景以及从DMF1、DMF2获取的数据信息进行数据集划分,确定训练集的范围和测试集的范围。
具体实现中,MMF整合从DMF1、DMF2获取的数据信息,对数据的范围进行划分,确定训练集的范围和测试集的范围。例如,查询到DMF1中收集了cell 1在6月1号~7月31号的RSRP数据,DMF2中收集了cell 1在8月1号~8月31号的RSRP数据。整合后的数据分别的时间范围是6月1号~8月31号,其中,6月1号~8月20号的数据用于训练模型,即训练集的范围是“6月1号~8月20号”;8月21号~8月31号的数据用于评估模型,即测试集的范围是“8月21号~8月31”。
1407、MMF给MTF发送模型训练触发消息,该消息包括模型标识以及训练集的范围。
其中,训练集的范围可以是训练集中数据的key值的范围,或者训练集中数据分布的时间的范围,或者训练集中数据分布的网络区域的范围。
1408、MTF向DMF1发送数据请求消息1,该消息包括所述模型对应的一个或多个数据类型和训练集的范围。
1409、DMF1向MTF发送训练集。
具体实现中,DMF1可以根据MTF发送的所述一个或多个数据类型确定已收集到 的数据构成的数据集,再根据MTF发送的训练集的范围从该数据集中得到所述模型的训练集。
1410、MTF利用训练集进行模型训练。
1411、MMF给MEF发送模型评估触发消息,该消息包括模型标识以及测试集的范围。
其中,测试集的范围可以是测试集中数据的key值的范围,或者测试集中数据分布的时间的范围,或者测试集中数据分布的网络区域的范围。
1412、MEF向DMF2发送数据请求消息2,该消息包括所述模型对应的一个或多个数据类型和测试集的范围。
1413、DMF2向MEF发送测试集。
具体实现中,DMF2可以根据MEF发送的所述一个或多个数据类型确定已收集到的数据构成的数据集,再根据MEF发送的训练集的范围从该数据集中得到所述模型的测试集。
1414、MEF利用测试集进行模型评估。
图14所示的方法中,适用于多个DMF的场景,且DMF之间无需进行数据同步,简化了DMF的操作,同时保障了数据的隐私性。此外,MMF进行数据集划分时可以保证训练集和测试集之间不存在交集,确保了模型评估结果的准确性。
本申请实施例还提供一种模型数据传输方法,与图14所示方法不同的是,MMF可以提前将model ID和data Type的对应关系下发给DMF1和DMF2,则步骤1408中MTF向DMF1请求训练集时可以不发送data Type,仅发送model ID和训练集的范围,类似的,步骤1412中MEF向DMF2请求测试集时可以不发送data Type,仅发送model ID和测试集的范围。DMF1可以根据model ID查找到对应的data Type,从而确定与所述data Type对应的数据集,进一步根据MTF发送的数据集类型从该数据集中划分训练集。DMF2可以根据model ID查找到对应的data Type,从而确定与所述data Type对应的数据集,进一步根据MEF发送的数据集类型从该数据集中划分测试集。
本申请还提供一种模型数据传输方法,与图14所示方法不同的是,MMF划分训练集、测试集后向DMF1、DMF2通知划分结果。MTF、MEF可以通过数据类型和数据集类型向DMF请求数据。如图15,所述方法包括以下步骤:
1501、数据订阅和收集流程。
具体实现参考前文所述的步骤1201,在此不做赘述。
1502、MMF向DMF1发送数据信息查询请求,该请求包括数据所述模型对应的一个或多个数据类型(data Type)、待查询的数据信息。
其中,待查询的数据信息,可以是数据量、数据的范围,数据的范围可以是数据的key值的范围,或者数据分布的时间的范围,或者数据分布的网络区域的范围。
1503、DMF1向MMF返回数据信息。
具体地,DMF1根据MMF发送的一个或多个数据类型确定数据集1(DMF1已收集到的数据构成数据集1),并向MMF返回数据集1的信息,例如,根据所述一个或多个数据类型已收集数据的数据量、已收集数据的范围。
如果DMF1没有查询到与所述一个或多个数据类型对应的数据信息,则向MMF 返回NACK消息,包含错误原因。
1504、MMF向DMF2发送数据信息查询请求,该请求包括数据所述模型对应的一个或多个数据类型(data Type)、待查询的数据信息。
其中,待查询的数据信息,可以是数据量、数据的范围,数据的范围可以是数据的key值的范围,或者数据分布的时间的范围,或者数据分布的网络区域的范围。
1505、DMF2向MMF返回数据信息。
具体地,DMF2根据MMF发送的一个或多个数据类型确定数据集2(DMF2已收集到的数据构成数据集2),并向MMF返回数据集2的信息,例如,根据所述一个或多个数据类型已收集数据的数据量、已收集数据的范围。
如果DMF2没有查询到与所述一个或多个数据类型对应的数据信息,向MMF返回NACK消息,包含错误原因。
1506、MMF根据具体场景以及从DMF1、DMF2获取的数据信息进行数据集划分,确定训练集的范围和测试集的范围。
具体实现中,MMF整合从DMF1、DMF2获取的数据信息,对数据的范围进行划分,确定训练集的范围和测试集的范围。例如,查询到DMF1中收集了cell 1在6月1号~7月31号的RSRP数据,DMF2中收集了cell 1在8月1号~8月31号的RSRP数据。整合后的数据分别的时间范围是6月1号~8月31号,其中,6月1号~8月20号的数据用于训练模型,即训练集的范围是“6月1号~8月20号”;8月21号~8月31号的数据用于评估模型,即测试集的范围是“8月21号~8月31”。
需要说明的是,若从DMF1、DMF2获取的数据信息存在重叠,即DMF1、DMF2中与数据类型“RSRP”匹配的数据存在重叠,MMF可以对数据进行去重处理之后再进行划分。
1507、MMF向DMF1发送训练集的范围和模型标识。
其中,训练集的范围可以是训练集中数据的key值的范围,或者训练集中数据分布的时间的范围,或者训练集中数据分布的网络区域的范围。
DMF1接收训练集的范围和模型标识之后,还可以记录训练集的范围和模型标识之间的对应关系。
1508、MTF向DMF1发送数据请求消息1,该消息包括所述模型的模型标识和数据集类型。
其中,数据请求消息1中的数据集类型的值指示MTF请求的是训练集,例如,数据请求消息1中的数据集类型可以是“train”。
1509、DMF1向MTF发送训练集。
具体实现中,DMF1可以根据MTF发送的模型标识确定对应的训练集的范围,根据数据集类型确定MTF请求的是训练集,则根据训练集的范围确定训练集。
一种可能的实现方式中,DMF1可以根据模型标识查找到根据所述模型对应的数据类型确定的数据集,再根据训练集的范围从该数据集中划分训练集。
1510、MTF利用训练集进行模型训练。
1511、MMF向DMF2发送测试集的范围和模型标识。
其中,测试集的范围可以是测试集中数据的key值的范围,或者测试集中数据分 布的时间的范围,或者测试集中数据分布的网络区域的范围。
DMF2接收测试集的范围和模型标识之后,还可以记录测试集的范围和模型标识之间的对应关系。
1512、MEF向DMF2发送数据请求消息2,该消息包括所述模型的模型标识和数据集类型。
其中,数据请求消息2中的数据集类型的值指示MEF请求的是测试集,例如,数据请求消息2中的数据集类型可以是“test”
1513、DMF2向MEF发送测试集。
具体实现中,DMF2可以根据MEF发送的模型标识确定对应的测试集的范围,根据数据集类型确定MEF请求的是测试集,则根据测试集的范围划分测试集。
一种可能的实现方式中,DMF2可以根据模型标识查找到根据所述模型对应的数据类型确定的数据集,再根据测试集的范围从该数据集中划分测试集。
1514、MEF利用测试集进行模型评估。
本申请实施例还提供一种模型数据传输方法,与图15所示方法不同的是,1507和1511中MMF下发训练集和测试集的范围时可以在消息中包含data Type,不发送model ID,则步骤1508中MTF可以通过data Type和训练集的范围向DMF1请求训练集,类似的,步骤1512中MEF可以通过data Type和测试集的范围向DMF2请求测试集。DMF1直接根据data Type确定与所述data Type对应的数据集,进一步根据MTF发送的数据集类型从该数据集中划分训练集。DMF2直接根据data Type确定与所述data Type对应的数据集,进一步根据MEF发送的数据集类型从该数据集中划分测试集。
在采用对应各个功能划分各个功能模块的情况下,图16示出上述实施例中所涉及的通信装置的一种可能的结构示意图。图16所示的通信装置可以是本申请实施例所述的第一网元、第二网元或第三网元,也可以是第一网元、第二网元或第三网元中实现上述方法的部件,或者,也可以是应用于第一网元、第二网元或第三网元中的芯片。所述芯片可以是片上系统(System-On-a-Chip,SOC)或者是具备通信功能的基带芯片等。如图16所示,通信装置包括处理单元1601以及通信单元1602。处理单元可以是一个或多个处理器,通信单元可以是收发器或者通信接口。
处理单元1601,可用于支持通信装置执行上述方法实施例中的处理动作,例如,可以用于支持第一网元执行步骤501,支持第二网元(例如,MTF)执行步骤702、步骤706,支持第二网元(例如,MEF)执行步骤707、步骤711,支持第三网元执行步骤1406,和/或用于本文所描述的技术的其它过程。
通信单元1602,用于支持该通信装置与其他设备(或装置)之间的通信,例如,支持第一网元执行步骤502,支持第二网元执行步骤703、步骤705、步骤708、步骤710,支持第三网元执行步骤905,和/或用于本文所描述的技术的其它过程。
需要说明的是,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
如图17所示,通信装置还可以包括存储单元1603,存储单元1603用于存储通信装置的程序代码和/或数据。
处理单元1601可以包括至少一个处理器,通信单元1602可以为收发器或者通信接口,存储单元1603可以包括存储器。
需要说明的是,上述各个通信装置实施例中,各个单元也可以相应的称之为模块或者部件或者电路等。
本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有指令;指令用于执行如图5或图7~图15所示的方法。
本申请实施例提供一种包括指令的计算机程序产品,当其在通信装置上运行时,使得通信装置执行如图5或图7~图15所示的方法。
本申请实施例一种无线通信装置,包括:无线通信装置中存储有指令;当无线通信装置在图4a、图4b、图16、图17所示的通信装置上运行时,使得通信装置执行如图5或图7~图15所示的方法。该无线通信装置可以为芯片。
本申请实施例还提供一种通信系统,包括:终端设备以及接入网设备。示例性的,终端设备可以是图5a、图9、图10所示的通信装置,接入网设备可以是图5b、图11、图12所示的通信装置。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将通信装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
本申请实施例中的处理器,可以包括但不限于以下至少一种:中央处理单元(central processing unit,CPU)、微处理器、数字信号处理器(DSP)、微控制器(microcontroller unit,MCU)、或人工智能处理器等各类运行软件的计算设备,每种计算设备可包括一个或多个用于执行软件指令以进行运算或处理的核。该处理器可以是个单独的半导体芯片,也可以跟其他电路一起集成为一个半导体芯片,例如,可以跟其他电路(如编解码电路、硬件加速电路或各种总线和接口电路)构成一个SoC(片上系统),或者也可以作为一个ASIC的内置处理器集成在所述ASIC当中,该集成了处理器的ASIC可以单独封装或者也可以跟其他电路封装在一起。该处理器除了包括用于执行软件指令以进行运算或处理的核外,还可进一步包括必要的硬件加速器,如现场可编程门阵列(field programmable gate array,FPGA)、PLD(可编程逻辑器件)、或者实现专用逻辑运算的逻辑电路。
本申请实施例中的存储器,可以包括如下至少一种类型:只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmabler-only memory,EEPROM)。在某些场景下,存储器还可以是只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
本申请中,“至少一个”是指一个或者多个。“多个”是指两个或两个以上。“和/或”, 描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
在本申请所提供的几个实施例中,应该理解到,所揭露的数据库访问装置和方法,可以通过其它的方式实现。例如,以上所描述的数据库访问装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,数据库访问装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (29)

  1. 一种模型数据传输方法,其特征在于,包括:
    第一网元确定第一数据集;
    所述第一网元从第二网元接收第一信息与第二信息,所述第一信息用于指示第一模型,所述第二信息用于请求第二数据集,所述第二数据集用于训练所述第一模型或者用于测试所述第一模型;
    所述第一网元向所述第二网元发送所述第二数据集,所述第二数据集为所述第一数据集的子集。
  2. 根据权利要求1所述的方法,其特征在于,
    所述第二信息用于指示所述第二数据集的类型,所述第二数据集的类型包括训练集或者测试集,所述训练集用于训练所述第一模型,所述测试集用于测试所述第一模型;或者,
    所述第二信息用于指示所述第二数据集的范围。
  3. 根据权利要求2所述的方法,其特征在于,所述第二数据集的范围包括以下一项或多项:所述第二数据集中数据的键key值的范围、所述第二数据集中数据分布的时间的范围、所述第二数据集中数据分布的网络区域的范围。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:
    所述第一网元根据所述第一信息和所述第二信息确定所述第二数据集。
  5. 根据权利要求4所述的方法,其特征在于,所述第一网元根据所述第一信息和所述第二信息确定所述第二数据集,包括:
    所述第一网元根据数据划分策略从所述第一数据集中确定所述第一模型的训练集和/或所述第一模型的测试集。
  6. 根据权利要求5所述的方法,其特征在于,所述数据划分策略为以下任意一项:
    根据数据分布的时间划分、根据数据分布的网络区域进行划分或根据指定比例划分。
  7. 根据权利要求5或6所述的方法,其特征在于,所述方法还包括:
    所述第一网元从第三网元接收所述数据划分策略;或者,
    所述第一网元确定所述数据划分策略。
  8. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述第一网元向第三网元发送与所述第一模型对应的一个或多个数据类型、与所述一个或多个数据类型对应的所述第一数据集的范围,所述第一数据集的范围包括以下一项或多项:所述第一数据集中数据的key值的范围、所述第一数据集中数据分布的时间的范围、所述第一数据集中数据分布的网络区域的范围。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述还包括:
    所述第一网元从所述第二网元接收第三信息,所述第三信息包括以下一项或多项:所述第一模型的一个或多个数据类型、所述第一模型所需数据的采集对象,所述采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
  10. 根据权利要求9所述的方法,其特征在于,所述第一网元确定第一数据集,包括:
    所述第一网元根据所述第三信息从第四网元获取所述第一数据集;或者,
    所述第一网元根据所述第三信息从第五网元获取第三数据集或者第三数据集的信息,所述第三数据集的信息用于指示所述第三数据集的范围,根据所述第三信息从第四网元获取第四数据集,根据所述第三数据集和所述第四数据集确定所述第一数据集。
  11. 一种模型数据传输方法,其特征在于,所述方法包括:
    第二网元向第一网元发送第一信息与第二信息,所述第一信息用于指示第一模型,所述第二信息用于请求第二数据集,所述第二数据集用于训练所述第一模型或者用于测试所述第一模型;
    所述第二网元从所述第一网元接收所述第二数据集,所述第二数据集为第一数据集的子集。
  12. 根据权利要求11所述的方法,其特征在于,所述第二信息用于指示所述第二数据集的类型,所述第二数据集的类型包括训练集或者测试集,所述训练集用于训练所述第一模型,所述测试集用于测试所述第一模型;或者,
    所述第二信息用于指示所述第二数据集的范围。
  13. 根据权利要求12所述的方法,其特征在于,所述第二数据集的范围包括以下一项或多项:所述第二数据集中数据的key值的范围、所述第二数据集中数据分布的时间的范围、所述第二数据集中数据分布的网络区域的范围。
  14. 根据权利要求11-13任一项所述的方法,其特征在于,所述方法还包括:
    所述第二网元向所述第一网元发送第三信息,所述第三信息包括以下一项或多项:所述第一模型的一个或多个数据类型、所述第一模型所需数据的采集对象,所述采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
  15. 一种通信装置,其特征在于,包括:
    处理单元,用于确定第一数据集;
    通信单元,用于从第二网元接收第一信息与第二信息,所述第一信息用于指示第一模型,所述第二信息用于请求第二数据集,所述第二数据集用于训练所述第一模型或者用于测试所述第一模型;
    所述通信单元还用于,向所述第二网元发送所述第二数据集,所述第二数据集为所述第一数据集的子集。
  16. 根据权利要求15所述的装置,其特征在于,所述第二信息用于指示所述第二数据集的类型,所述第二数据集的类型包括训练集或者测试集,所述训练集用于训练所述第一模型,所述测试集用于测试所述第一模型;或者,
    所述第二信息用于指示所述第二数据集的范围。
  17. 根据权利要求16所述的装置,其特征在于,所述第二数据集的范围包括以下一项或多项:所述第二数据集中数据的key值的范围、所述第二数据集中数据分布的时间的范围、所述第二数据集中数据分布的网络区域的范围。
  18. 根据权利要求15-17任一项所述的装置,其特征在于,
    所述处理单元还用于,根据所述第一信息和所述第二信息确定所述第二数据集。
  19. 根据权利要求18所述的装置,其特征在于,所述数据划分策略为以下任意一项:
    根据数据分布的时间划分、根据数据分布的网络区域进行划分或根据指定比例划分。
  20. 根据权利要求18或19所述的装置,其特征在于,所述通信单元还用于,从第三网元接收所述数据划分策略;或者,
    所述通信装置确定所述数据划分策略。
  21. 根据权利要求15所述的装置,其特征在于,所述通信单元还用于,向第三网元发送与所述第一模型对应的一个或多个数据类型、与所述一个或多个数据类型对应的所述第一数据集的范围,所述第一数据集的范围包括以下一项或多项:所述第一数据集中数据的key值的范围、所述第一数据集中数据分布的时间的范围、所述第一数据集中数据分布的网络区域的范围。
  22. 根据权利要求15-21任一项所述的装置,其特征在于,所述通信单元还用于,从所述第二网元接收第三信息,所述第三信息包括以下一项或多项:所述第一模型的一个或多个数据类型、所述第一模型所需数据的采集对象,所述采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
  23. 根据权利要求22所述的装置,其特征在于,所述处理单元具体用于,根据所述第三信息从第四网元获取所述第一数据集;或者,
    根据所述第三信息从第五网元获取第三数据集或者第三数据集的信息,所述第三数据集的信息用于指示所述第三数据集的范围,根据所述第三信息从第四网元获取第四数据集,根据所述第三数据集和所述第四数据集确定所述第一数据集。
  24. 一种通信装置,其特征在于,包括:
    处理单元,用于确定第一信息与第二信息,所述第一信息用于指示第一模型,所述第二信息用于请求第二数据集,所述第二数据集用于训练所述第一模型或者用于测试所述第一模型;
    通信单元,用于向第一网元发送所述第一信息与所述第二信息;
    所述通信单元,还用于从所述第一网元接收所述第二数据集,所述第二数据集为第一数据集的子集。
  25. 根据权利要求24所述的装置,其特征在于,所述第二信息用于指示所述第二数据集的类型,所述第二数据集的类型包括训练集或者测试集,所述训练集用于训练所述第一模型,所述测试集用于测试所述第一模型;或者,
    所述第二信息用于指示所述第二数据集的范围。
  26. 根据权利要求25所述的装置,其特征在于,所述第二数据集的范围包括以下一项或多项:所述第二数据集中数据的key值的范围、所述第二数据集中数据分布的时间的范围、所述第二数据集中数据分布的网络区域的范围。
  27. 根据权利要求24-26任一项所述的装置,其特征在于,所述通信单元还用于,向所述第一网元发送第三信息,所述第三信息包括以下一项或多项:所述第一模型的一个或多个数据类型、所述第一模型所需数据的采集对象,所述采集对象包括以下至少一项:一个或多个用户设备UE、一个或多个小区cell。
  28. 一种通信装置,其特征在于,包括处理器,所述处理器与存储器耦合;
    存储器,用于存储计算机程序;
    处理器,用于执行所述存储器中存储的计算机程序,以使得所述装置执行如权利要求1至14中任一项所述的方法。
  29. 一种计算机可读存储介质,包括程序或指令,当所述程序或指令被处理器运行时,如权利要求1至14中任意一项所述的方法被执行。
PCT/CN2020/118593 2020-09-28 2020-09-28 一种模型数据传输方法及通信装置 WO2022061940A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/118593 WO2022061940A1 (zh) 2020-09-28 2020-09-28 一种模型数据传输方法及通信装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/118593 WO2022061940A1 (zh) 2020-09-28 2020-09-28 一种模型数据传输方法及通信装置

Publications (1)

Publication Number Publication Date
WO2022061940A1 true WO2022061940A1 (zh) 2022-03-31

Family

ID=80846099

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118593 WO2022061940A1 (zh) 2020-09-28 2020-09-28 一种模型数据传输方法及通信装置

Country Status (1)

Country Link
WO (1) WO2022061940A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024007156A1 (zh) * 2022-07-05 2024-01-11 华为技术有限公司 一种通信方法和装置
WO2024011474A1 (zh) * 2022-07-14 2024-01-18 华为技术有限公司 一种通信方法和装置
WO2024067248A1 (zh) * 2022-09-30 2024-04-04 华为技术有限公司 一种获取训练数据集的方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110121180A (zh) * 2018-02-05 2019-08-13 华为技术有限公司 一种数据分析装置、系统及方法
CN110430068A (zh) * 2018-04-28 2019-11-08 华为技术有限公司 一种特征工程编排方法及装置
CN110831029A (zh) * 2018-08-13 2020-02-21 华为技术有限公司 一种模型的优化方法和分析网元
US20200196155A1 (en) * 2018-12-12 2020-06-18 Verizon Patent And Licensing Inc. Utilizing machine learning to provide closed-loop network management of a fifth generation (5g) network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110121180A (zh) * 2018-02-05 2019-08-13 华为技术有限公司 一种数据分析装置、系统及方法
CN110430068A (zh) * 2018-04-28 2019-11-08 华为技术有限公司 一种特征工程编排方法及装置
CN110831029A (zh) * 2018-08-13 2020-02-21 华为技术有限公司 一种模型的优化方法和分析网元
US20200196155A1 (en) * 2018-12-12 2020-06-18 Verizon Patent And Licensing Inc. Utilizing machine learning to provide closed-loop network management of a fifth generation (5g) network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Study on enablers for network automation for the 5G System (5GS); Phase 2 (Release 17)", 3GPP STANDARD; TECHNICAL REPORT; 3GPP TR 23.700-91, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, no. V0.4.0, 1 July 2020 (2020-07-01), Mobile Competence Centre ; 650, route des Lucioles ; F-06921 Sophia-Antipolis Cedex ; France , pages 1 - 186, XP051925861 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024007156A1 (zh) * 2022-07-05 2024-01-11 华为技术有限公司 一种通信方法和装置
WO2024011474A1 (zh) * 2022-07-14 2024-01-18 华为技术有限公司 一种通信方法和装置
WO2024067248A1 (zh) * 2022-09-30 2024-04-04 华为技术有限公司 一种获取训练数据集的方法和装置

Similar Documents

Publication Publication Date Title
WO2022061940A1 (zh) 一种模型数据传输方法及通信装置
WO2020020263A1 (zh) 一种数据收集方法、设备及系统
WO2020048438A1 (zh) 一种射频参数的上报方法及装置
EP4156757A1 (en) Communication method and apparatus for dual-connectivity system
WO2021073577A1 (zh) 资源测量的方法和装置
CN113873538A (zh) 一种模型数据传输方法及通信装置
WO2021134626A1 (zh) 传输同步信号块的方法和装置
WO2020001452A1 (zh) 一种能力上报的方法及装置
WO2022077387A1 (zh) 一种通信方法及通信装置
KR20230002537A (ko) 포지셔닝 신호 처리 방법 및 장치
US20220095164A1 (en) Traffic volume prediction method and apparatus
WO2021081959A1 (zh) 通信方法、设备及系统
WO2024007784A1 (zh) 终端无线能力标识上报方法、装置、终端和介质
WO2013178177A2 (zh) Ue类型上报、资源分配方法及装置、ue、基站
JP2021513242A (ja) 信号送信方法、信号受信方法、リソース決定方法及びデバイス
WO2021208920A1 (zh) 网络性能监控方法、装置及系统
WO2022127555A1 (zh) 多用户无线数据传输方法、系统、设备及存储介质
WO2020199815A1 (zh) 通信方法及装置
WO2021027907A1 (zh) 通信方法和通信装置
US11615023B2 (en) Bit register in shared memory indicating the processor and the software handlers
WO2022165721A1 (zh) Ran域的模型共享方法及装置
CN111565434B (zh) 一种通信方法和接入网设备
WO2022027386A1 (zh) 一种天线选择方法及装置
EP2793506B1 (en) User equipment reporting of the detection of blindly configured secondary cell
EP4333490A1 (en) Communication method and apparatus for acquiring load information

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20954787

Country of ref document: EP

Kind code of ref document: A1