WO2022100514A1 - 决策方法和决策装置 - Google Patents

决策方法和决策装置 Download PDF

Info

Publication number
WO2022100514A1
WO2022100514A1 PCT/CN2021/128808 CN2021128808W WO2022100514A1 WO 2022100514 A1 WO2022100514 A1 WO 2022100514A1 CN 2021128808 W CN2021128808 W CN 2021128808W WO 2022100514 A1 WO2022100514 A1 WO 2022100514A1
Authority
WO
WIPO (PCT)
Prior art keywords
communication device
decision
group
network element
communication
Prior art date
Application number
PCT/CN2021/128808
Other languages
English (en)
French (fr)
Inventor
叶德仕
孙武杰
王坚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022100514A1 publication Critical patent/WO2022100514A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic

Definitions

  • the present application relates to the field of wireless communication, and in particular, to a decision-making method and a decision-making apparatus.
  • the tasks of wireless communication systems have evolved from a single language transmission task to performing tasks such as detection, cooperation, control, decision-making and optimization. Therefore, there are a large number of decision-making tasks in wireless communication systems, such as, Radio resource scheduling, power control, etc.
  • the decision-making problem is usually modeled as an optimization problem, and the decision-making action is obtained by solving the optimization method.
  • the decision-making problem can also be modeled as a Markov decision process and solved by artificial intelligence. , get the decision action.
  • the full multi-agent reinforcement learning method is used to solve the Markov decision process, and the decision action is obtained.
  • the method of fully multi-agent reinforcement learning is to train a reinforcement learning model for each decision-making subject.
  • the complexity of training the reinforcement learning model is high, and the decision-making subjects will affect each other, resulting in system instability.
  • the Markov decision-making process will change, and the reinforcement learning model needs to be retrained, resulting in poor scalability of the method.
  • the present application provides a decision-making method and a decision-making device, which are beneficial to reduce the complexity of training a decision-making model and improve the flexibility and scalability of the decision-making system.
  • a decision-making method comprising: a first communication device receives a decision model from a first network element, where the decision model is determined based on a group of the first communication device; the first communication device is based on the decision model Make mission decisions.
  • the first network element may be a server, where the network device may be a base station, and the server may be a server with a storage function deployed on a core network or a centralized unit (Centralized Unit, CU) of a base station , or a third-party server independent of the communication system, for example, a server dedicated to model storage.
  • the first communication device is a terminal device
  • the first network element may be a server or a network device or a core network network element.
  • the first network element may determine one decision model for the first communication device from the multiple different decision models.
  • the first network element may actively determine a decision model for the first communication device based on its own policy, or may determine a decision model for the first communication device according to a request of the first communication device.
  • the decision model of the first communication apparatus may be determined by the first network element, or may be directly and rigidly specified by a standard, a protocol or an equipment installation manual, and then the first network element sends the specified decision model to the first communication apparatus.
  • the first network element can determine the decision model according to the group of the first communication device, which is beneficial to reduce the complexity of training the decision model, and when a new communication device is added, the first network element can The groups of communication devices of different groups match the appropriate decision model from the decision model that has been trained, which makes the decision model more flexible and scalable.
  • the decision models of different groups of communication devices are different, so that multiple different The communication devices of the group can cooperate and compete better.
  • the above method further includes: the first communication device sends a first request message to the first network element, the The first request message is used to request the decision model of the first communication device.
  • the first network element may determine a decision model for the first communication device according to the request of the first communication device, and may determine a more suitable decision model for the first communication device.
  • the above-mentioned first request message may include a group of the first communication apparatus.
  • the above-mentioned first request message may include one or more of the following information: the physical cell identifier of the first communication device, the neighbor relationship table of the first communication device, the first The type of the communication device and the location information of the first communication device.
  • the above method further includes: the first communication device according to the physical cell identifier of the first communication device or the first A neighbor relationship table of a communication device or a category of the first communication device or location information of the first communication device determines the group of the first communication device.
  • the first communication device determines the group of the first communication device according to the physical cell identifier of the first communication device, which may include: A remainder operation is performed on the physical cell identifier and the total number of groups to obtain a remainder; the first communication device determines the group of the first communication device according to the remainder.
  • the first communication device may query the correspondence between the physical cell identification and the group of the communication device according to the identification of the physical cell of the first communication device, so as to obtain the group of the first communication device.
  • the first communication device determines the group of the first communication device according to the neighbor relationship table of the first communication device, which may include: the first communication device to the first communication device Perform a remainder operation on each physical cell identifier and the total number of groups in the neighbor relationship table of do not.
  • the first communication device queries the corresponding relationship between the identifier of each physical cell in the neighbor relationship table and the group of the first communication device according to the neighbor relationship table of the first communications device, and obtains the corresponding relationship between the identifiers of each physical cell.
  • the group of the first communication device is obtained, and then the group of the first communication device is obtained according to the number of different groups corresponding to each physical cell identifier.
  • the above method may further include: the first communication device adjusts the above decision model based on a training sample of the first communication device, and the training sample of the first communication device includes the decision model state information, action information and income information.
  • the first communication device can make the decision model more suitable for the scene where the first communication device is located, so that the first communication device can make better decisions according to the real-time information of the scene Actions can be used to improve resource utilization, improve network performance, and the like.
  • the grouping basis of the first communication devices may be an interference relationship between cells, a type of the first communication device, or location information of the first communication device.
  • the above decision model is determined by the first network element from multiple decision models, the multiple decision models correspond to multiple groups of communication devices, and each group corresponds to The decision model of is obtained by training based on the training samples of the communication devices included in each group.
  • another decision-making method including: the first network element determines a decision-making model for the first communication device, where the decision-making model is determined based on the group of the first communication device; the first network element communicates with the first communication device The device sends the decision model.
  • the first network element can determine the decision model according to the group of the first communication device, which is beneficial to reduce the complexity of training the decision model, and when a new communication device is added, the first network element can The appropriate decision-making model is matched to the group of communication devices obtained by training, so that the flexibility and scalability of the decision-making model are higher.
  • the decision-making models of different groups of communication devices are different, so that multiple The communication devices of the group can cooperate and compete better.
  • the above method further includes: the first network element receives a first request message from the first communication device, the The first request message is used to request the decision model of the first communication device; the first network element determining the decision model for the first communication device includes: the first network element determining the decision model for the first communication device according to the first request message.
  • the first network element may determine a decision model for the first communication device according to a request of the first communication device, and may determine a more suitable decision model for the first communication device.
  • the above-mentioned first request message includes a group of the first communication device.
  • the first network element may directly determine a suitable decision model for the first communication device according to the group of the first communication device included in the first request message, which is beneficial to improve the first network element's determination of the first communication device. Efficiency of decision models for communication devices.
  • the above-mentioned first request message includes one or more of the following information: the physical cell identifier of the first communication device, the neighbor relationship table of the first communication device, the first communication device The category of the device, the location information of the first communication device.
  • the first network element receives the request message including the information of the first communication device, and determines the decision model of the first communication device according to the information of the first communication device.
  • the method facilitates the first network element to obtain the change of the information of the first communication device in time, and to update the decision model of the first communication device for the first communication device.
  • the first network element determines a decision model for the first communication device according to the first request message, including: the first network element determines the physical cell identifier and group identification of the first communication device for the first communication device. Perform a remainder operation on the total number to obtain the remainder; the first network element determines the group of the first communication device according to the remainder; the first network element determines a decision model for the first communication device according to the group of the first communication device.
  • the first network element determines a decision-making model for the first communication device according to the first request message, including: in the neighbor relationship table of the first network element to the first communication device Perform a remainder operation on each physical cell identifier and the total number of groups to obtain the remainder corresponding to each physical cell identifier; the first network element determines the group of the first communication device according to the number of different remainders corresponding to each physical cell identifier The first network element determines a decision model for the first communication device according to the group of the first communication device.
  • the above-mentioned method further includes: the first network element determines the above-mentioned decision-making model from a plurality of decision-making models, the plurality of decision-making models correspond to a plurality of groups of communication devices, each The decision model corresponding to the group is obtained by training based on the training samples of the communication devices included in each group.
  • the above method further includes: the first network element groups the communication devices in the network according to the information of the communication devices in the network to obtain at least one group, each The group includes at least one communication device, and the at least one group includes a group to which the first communication device belongs; the first network element obtains training samples of the communication devices included in each group; the first network element is based on each group The included training samples of the communication device respectively train the decision model corresponding to each group.
  • the first network element trains the decision model corresponding to each group according to the grouping of the communication devices in the network, which is beneficial to reduce the complexity of the training model.
  • the first network element groups the communication devices in the network according to the information of the communication devices in the network to obtain at least one group, including: the first network element according to the network According to the interference relationship between the cells of the communication device in the network, the type of the communication device in the network, or the location information of the communication device in the network, the communication devices in the network are grouped to obtain at least one group.
  • the above method further includes: the first network element receives one or more decision models from the second network element, the one or more decision models corresponding to one or more decision models of the communication device.
  • the decision model corresponding to each group is obtained by training based on the training samples of the communication devices included in each group.
  • the first network element does not need to train the decision-making model, and only needs to have the functions of storage and communication, which is beneficial to reducing the cost of the system.
  • a training method for a decision model including: a first network element grouping communication devices in the network according to information of the communication devices in the network to obtain at least one group, each group including At least one communication device, at least one group includes a group to which the first communication device belongs; the first network element obtains training samples of the communication devices included in each group; the first network element is based on the communication included in each group The training samples of the device respectively train the decision model corresponding to each group.
  • the first network element groups the communication devices in the network according to the information of the communication devices in the network to obtain at least one group, including: the first network element according to the network According to the interference relationship between the cells of the communication device in the network, the type of the communication device in the network, or the location information of the communication device in the network, the communication devices in the network are grouped to obtain at least one group.
  • multiple decision models correspond to multiple groups of communication devices, and the decision models corresponding to each group are obtained by training based on training samples of communication devices included in each group. of.
  • a decision-making apparatus configured to execute the method in the first aspect or any possible implementation manner of the first aspect.
  • the apparatus includes a unit for performing the method in the above-mentioned first aspect or any possible implementation manner of the first aspect.
  • the apparatus may include modules corresponding to one-to-one execution of the methods/operations/steps/actions described in the first aspect, and the modules may be hardware circuits, software, or a combination of hardware circuits Software Implementation.
  • the decision-making apparatus includes a transceiver unit and a processing unit.
  • the transceiver unit is configured to receive a decision model from the first network element, where the decision model is determined based on the group of the first communication device.
  • the processing unit is used to make task decisions according to the decision model.
  • the device is a communication chip, which may include input circuits or interfaces for transmitting information or data, and output circuits or interfaces for receiving information or data.
  • the apparatus is a communication device that may include a transmitter for transmitting information or data and a receiver for receiving information or data.
  • the device is configured to execute the method in the first aspect or any possible implementation manner of the first aspect, the device may be configured in the first communication device, or the device itself is the first communication device. communication device.
  • another decision-making apparatus for executing the method in the second aspect or any possible implementation manner of the second aspect.
  • the apparatus includes a unit for performing the method in the above-mentioned second aspect or any possible implementation manner of the second aspect.
  • the apparatus may include modules corresponding to one-to-one execution of the methods/operations/steps/actions described in the second aspect, and the modules may be hardware circuits, software, or a combination of hardware circuits Software Implementation.
  • the decision-making apparatus includes a processing unit and a transceiver unit.
  • the processing unit is configured to determine a decision model for the first communication device, where the decision model is determined based on the group of the first communication device.
  • the transceiver unit is used for sending the decision model to the first communication device.
  • the device is a communication chip, which may include an input circuit or interface for sending information or data, and an output circuit or interface for receiving information or data.
  • the apparatus is a communication device that may include a transmitter for transmitting information or data and a receiver for receiving information or data.
  • the apparatus is configured to execute the method in the second aspect or any possible implementation manner of the second aspect, and the apparatus may be configured in the first network element, or the apparatus itself is the first network element described above. network element.
  • a first communication device including a processor and a transceiver, the transceiver is used for communicating with other devices, the processor is coupled with a memory, the memory is used for storing a computer program, when the processor calls When the computer program is used, the first communication apparatus is caused to execute the method in any one of the possible implementation manners of the first aspect.
  • a first network element including a processor and a transceiver, the transceiver is used for communicating with other devices, the processor is coupled with a memory, the memory is used for storing a computer program, when the processor calls The computer program causes the first network element to execute the method in any one of the possible implementation manners of the second aspect.
  • another decision-making apparatus including a processor and a memory, the memory is used to store a computer program, the processor is used to call and run the computer program from the memory, so that the apparatus executes any of the above-mentioned aspects. method in any of the possible implementations.
  • processors there are one or more processors and one or more memories.
  • the memory may be integrated with the processor, or the memory may be provided separately from the processor.
  • the communication device further includes a transmitter (transmitter) and a receiver (receiver).
  • the transmitter and receiver can be set separately or integrated together, which is called a transceiver (transceiver).
  • a communication system including a device for implementing the above-mentioned first aspect or any possible implementation method of the first aspect, and a device for implementing any of the above-mentioned second aspect or the second aspect Apparatus for possible implementation of the method.
  • the communication system may further include other devices that interact with the first communication apparatus and/or the first network element in the solutions provided in the embodiments of the present application.
  • another first communication device comprising an input-output interface and a logic circuit, the input-output interface is used for receiving a decision model of the first network element, and the logic circuit is used for making task decisions according to the decision model,
  • the communication apparatus is caused to perform the method in any one of the possible implementation manners of the above-mentioned first aspect.
  • another first network element including an input and output interface and a logic circuit, the logic circuit is used to determine a decision model for the first communication device, and the input and output interface is used to send the decision model, so that the The communication apparatus performs the method in any one of the possible implementation manners of the second aspect above.
  • a twelfth aspect provides a computer-readable medium storing a computer program (also referred to as code, or instructions) that, when executed on a computer, causes the computer to perform any of the above-mentioned aspects method in any of the possible implementations.
  • a computer program also referred to as code, or instructions
  • a thirteenth aspect provides a computer program product, the computer program product comprising: a computer program (also referred to as code, or instructions) that, when the computer program is executed, causes a computer to perform any of the above aspects method in any of the possible implementations.
  • a computer program also referred to as code, or instructions
  • FIG. 1 is a schematic diagram of a communication system provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a decision-making method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a decision-making method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a decision-making method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a communication device grouping provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a decision-making method provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a decision-making method provided by an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a decision-making apparatus provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of another decision-making apparatus provided by an embodiment of the present application.
  • NB-IoT narrow band-internet of things
  • LTE long term evolution
  • LTE frequency division duplex frequency division duplex, FDD
  • LTE time division duplex time division duplex, TDD
  • 5th generation, 5G systems usually include the following three application scenarios: enhanced mobile broadband (eMBB), ultra-reliable and low latency communications (URLLC) and massive machine type of communication communication, mMTC).
  • eMBB enhanced mobile broadband
  • URLLC ultra-reliable and low latency communications
  • mMTC massive machine type of communication communication
  • the terminal equipment in the embodiments of the present application may also be referred to as: user equipment (user equipment, UE), mobile station (mobile station, MS), mobile terminal (mobile terminal, MT), access terminal, subscriber unit, subscriber station, Mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication device, user agent or user equipment, etc.
  • user equipment user equipment
  • MS mobile station
  • MT mobile terminal
  • access terminal subscriber unit, subscriber station, Mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication device, user agent or user equipment, etc.
  • the terminal device may be a device that provides voice/data connectivity to the user, such as a handheld device with a wireless connection function, a vehicle-mounted device, and the like.
  • some examples of terminal devices are: mobile phone (mobile phone), tablet computer, notebook computer, PDA, mobile internet device (MID), wearable device, virtual reality (VR) device, augmented Augmented reality (AR) equipment, wireless terminals in industrial control, wireless terminals in self-driving, wireless terminals in remote medical surgery, smart grid wireless terminal in transportation safety, wireless terminal in smart city, wireless terminal in smart home, cellular phone, cordless phone, session initiation protocol protocol, SIP) telephone, wireless local loop (wireless local loop, WLL) station, personal digital assistant (personal digital assistant, PDA), wireless communication-capable handheld device, computing device or other processing device connected to a wireless modem, Vehicle-mounted equipment, wearable equipment, terminal equipment in a 5G network, or terminal equipment in a future evolved public land mobile network (public land mobile network, PLMN), etc., are not limited in this embodiment of
  • the terminal device may also be a terminal device in an Internet of Things (IoT) system.
  • IoT Internet of Things
  • IoT is an important part of the future development of information technology, and its main technical feature is that items pass through communication technology Connect with the network, so as to realize the intelligent network of human-machine interconnection and interconnection of things.
  • the network device in this embodiment of the present application may be a device that provides a wireless communication function for a terminal device, and the network device may also be called an access network device or a wireless access network device, and may be a transmission reception point (transmission reception point, TRP), it can also be an evolved base station (evolved NodeB, eNB or eNodeB) in the LTE system, it can also be a home base station (for example, home evolved NodeB, or home Node B, HNB), base band unit (base band unit, BBU) ), it can also be a wireless controller in a cloud radio access network (CRAN) scenario, or the network device can be a relay station, an access point, an in-vehicle device, a wearable device, and a network device in a 5G network Or the network equipment in the PLMN network that evolves in the future, which can be an access point (AP) in a WLAN, a gNB in a new wireless (new radio, NR) system, or
  • a network device may include a centralized unit (CU) node, or a distributed unit (DU) node, or a RAN device including a CU node and a DU node, or a control plane CU node (CU).
  • CU centralized unit
  • DU distributed unit
  • RAN device including a CU node and a DU node, or a control plane CU node (CU).
  • CU-UP nodes user plane CU nodes
  • the network equipment provides services for the terminal equipment in the cell, and the terminal equipment communicates with the network equipment or other equipment corresponding to the cell through the transmission resources (for example, frequency domain resources, or spectrum resources) allocated by the network equipment.
  • a macro base station for example, a macro eNB or a macro gNB, etc.
  • the embodiments of the present application do not specifically limit the specific structure of the execution body of the methods provided by the embodiments of the present application, as long as the program that records the codes of the methods provided by the embodiments of the present application can be executed to execute the methods provided by the embodiments of the present application. It is sufficient to perform communication.
  • the execution subject of the method provided by the embodiment of the present application may be a terminal device or a network device, or a functional module in the terminal device or network device that can call and execute a program.
  • various aspects or features of the present application may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques.
  • article of manufacture encompasses a computer program accessible from any computer-readable device, carrier or media.
  • computer readable media may include, but are not limited to, magnetic storage devices (eg, hard disks, floppy disks, or magnetic tapes, etc.), optical disks (eg, compact discs (CDs), digital versatile discs (DVDs) etc.), smart cards and flash memory devices (eg, erasable programmable read-only memory (EPROM), card, stick or key drives, etc.).
  • various storage media described herein can represent one or more devices and/or other machine-readable media for storing information.
  • the term "machine-readable medium” may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.
  • FIG. 1 To facilitate understanding of the embodiments of the present application, a communication system applicable to the embodiments of the present application is first described in detail with reference to FIG. 1 .
  • FIG. 1 is a schematic diagram of a communication system 100 according to an embodiment of the present application.
  • the communication system 100 may include one or more cells.
  • the area where the regular hexagon is located represents the coverage area of the cell, and it should be understood that the use of the regular hexagon to represent the cell is just an example.
  • the cell includes a network device 101 and at least one terminal device 102 .
  • the network device 101 may provide communication services for at least one terminal device 102 .
  • the network device 101 can allocate resources such as available wireless spectrum according to the channel quality and service quality requirements of the terminal device 102, and can also perform power control and other decisions on the terminal device 102.
  • the allocation problem is usually modeled as an optimization problem, and the decision-making action is obtained by solving the optimization problem, and the allocation problem can also be modeled as a Markov problem.
  • the decision-making process is solved using reinforcement learning.
  • the network device can use the shared parameter multi-agent reinforcement learning method and the complete multi-agent reinforcement learning method to solve the Markov decision process to obtain the decision action.
  • the shared parameter multi-agent reinforcement learning method is that the network equipment of each cell uses the same multi-agent reinforcement learning (MARL) model to solve the Markov decision process and obtain the decision action.
  • the fully multi-agent reinforcement learning method is that the network devices of each cell use different multi-agent reinforcement learning models to solve the Markov decision process to obtain decision-making actions.
  • the decision-making strategies of all cells are the same, and the interference that may be caused by the decisions of other cells is not considered.
  • the network devices of the two adjacent cells or referred to as the network devices corresponding to the two adjacent cells
  • the signals transmitted by the two cells may interfere. Since the decision-making strategies of the two cells are the same, coordination cannot be achieved by adjusting the profit function, so this method is not suitable for the situation that multiple cells cooperate or compete.
  • the decision-making strategies of all cells are different, resulting in a high degree of training complexity of the multi-agent reinforcement learning model; at the same time, if an area is covered by two cells at the same time, the decision-making strategies of the two cells are different. It will cause the communication system to be in an unstable state; in addition, if the number of cells changes, the optimization problem needs to be re-established and re-trained by the multi-agent reinforcement learning model, resulting in poor scalability of the method.
  • the present application provides a decision-making method and a decision-making device.
  • each group includes multiple network devices, and training a multi-agent reinforcement learning model for each group of network devices, it is beneficial to Reduce the complexity of training reinforcement learning models, at the same time, network devices can cooperate and compete better, and when new network devices are added, there is no need to retrain the learning model.
  • the appropriate learning model is matched with the learning model, which makes the learning model more flexible and extensible.
  • FIG. 2 is a schematic flowchart of a decision-making method provided by an embodiment of the present application. This method can be applied to the communication system shown in FIG. 1 , but the embodiment of the present application is not limited thereto. As shown in Figure 2, the method may include the following steps:
  • the first network element determines a decision model for the first communication device, where the decision model is determined based on the group of the first communication device.
  • the first network element sends the decision model to the first communication device, and correspondingly, the first communication device receives the decision model.
  • the first communication device makes a task decision according to the decision model.
  • the first communication apparatus may be a network device or a terminal device.
  • the terminal device may be replaced with a device or chip capable of implementing functions similar to the terminal device
  • the network device may also be replaced by a device or chip capable of implementing functions similar to the network device, the names of which are not limited in this embodiment of the present application.
  • the first network element may be a server, wherein the network device may be a base station, and the server may be a server with a storage function deployed on a core network or a centralized unit (CU) of a base station , or a third-party server independent of the communication system, for example, a server dedicated to model storage.
  • the first communication device is a terminal device
  • the first network element may be a server or a network device or a core network element.
  • D2D device-to-device
  • the first communication apparatus and the first network element may both be terminal equipment.
  • the first network element may be a cluster head sensor node, and the first communication device may be another node in the cluster.
  • the decision-making method provided by the embodiment of the present application can be applied to decision tasks such as resource scheduling and power control in a wireless communication system, and can also be applied to terminal equipment and remote terminals in a user-centric unbounded network (UE centric no cell, UCNC). Connection and switching problems of the remote radio unit (RRU).
  • UE centric no cell UE centric no cell, UCNC.
  • RRU remote radio unit
  • the decision-making model may include any model for realizing decision-making, which is not limited in this embodiment of the present application.
  • the decision model may be an agent model in multi-agent reinforcement learning, an agent model in multi-agent deep reinforcement learning, such as an agent model in an actor-critic algorithm, depth determination The agent model in the Deep Deterministic Policy Gradient (DDPG) algorithm, etc.
  • DDPG Deep Deterministic Policy Gradient
  • the first network element may determine one decision model for the first communication device from the multiple different decision models.
  • the first network element may actively determine a decision model for the first communication device based on its own policy, or may determine a decision model for the first communication device according to a request of the first communication device.
  • the first network element determines a decision model for the first communication device.
  • the entry of the first communication device into the network means that the first communication device joins the network for the first time or joins the network after the device is restarted. For example, due to the expansion of network capacity, after a new base station is added, the base station requests a decision model from the server, or the server detects a new After a base station is added, a decision model is determined for the newly added base station.
  • the change of the neighbor relationship of the first communication device means that other communication devices adjacent to the first communication device join the network or leave the network.
  • the first network element will re-determine a decision model for the first communication device. It should be understood that if the group of the first communication device changes, the first network element will re-determine a decision model for the first communication device, and the embodiment of the present application does not make a change in the group of the first communication device. limited.
  • the decision model of the first communication apparatus may be determined by the first network element, or may be directly and rigidly specified by a standard, a protocol or an equipment installation manual, and then the first network element sends the specified decision model to the first communication apparatus.
  • the first network element may determine a decision model for the first communication device from a plurality of different decision models according to the group of the first communication device. Correspondingly, the first network element needs to acquire the group of the first communication device.
  • the first network element acquires the group of the first communication device.
  • the first communication device determines the group and sends the group to the first network element.
  • the first network element itself determines the group of the first communication device.
  • the manner in which the first network element and the first communication device determine the group of the first communication device may be the same or different.
  • both the first network element and the first communication device can The neighbor relationship table of the communication device, the category of the first communication device, or the location information of the first communication device determine the group of the first communication device.
  • the first network element determines the group of the first communication device according to the physical cell identifier of the first communication device, and the first communication device determines the group of the first communication device according to the neighbor relationship table.
  • the physical cell identifier of the first communication device, the neighbor relationship table of the first communication device, the category of the first communication device, or the location information of the first communication device may be sent by the first communication device to the first network element, or may be sent by the first communication device to the first network element.
  • the first network element is obtained from other devices, which is not limited in this embodiment.
  • the physical cell identifier is used to distinguish different cells, and when the first communication device is a base station, the physical cell identifier may be the identifier of the base station.
  • the neighbor relationship table of the first communication device includes adjacent cells of the first communication device and physical cell identifiers corresponding to the adjacent cells.
  • the first network element and the first communication device determine the group of the first communication device according to the physical cell identifier or the neighbor relationship table, which is conducive to reducing interference between cells, and can be applied to decision-making tasks related to interference management, such as power control, user scheduling, etc.
  • the category of the first communication device can generally describe the level of the communication device in the network, its own hardware processing capability and its service type. Exemplarily, if the first communication device is a network device, the categories may be macro base stations, micro cells, pico cells, home base stations, etc.; if the first communication device is a terminal device, the categories may be mobile phones, tablets, desktops, etc. Computers, laptops, etc.
  • the first network element and the first communication device determine the group of the first communication device according to the category of the first communication device, and the method can be applied to a multi-level heterogeneous network and a communication system with multiple service types.
  • the location information of the first communication device may be geographic location information of the first communication device, such as longitude and latitude information of the first communication device.
  • the location information of the first communication device may indirectly reflect the inter-cell interference situation, for example, the base stations with relatively close locations interfere strongly with each other;
  • the service type of the communication device in the area is different from the service type of the communication device located in the factory.
  • the first network element and the first communication device determine the group of the first communication device according to the location information of the first communication device.
  • the method can be applied to decision tasks related to interference management and in a communication system with multiple service types.
  • the multiple different decision models in the first network element may be obtained by training of the first network element, or may be obtained by training of other network elements and sent to the first network element.
  • a plurality of different decision-making models in the first network element correspond to a plurality of different communication device groups. Taking the training of a plurality of different decision-making models by the first network element as an example, the first network element trains one decision-making model for each group. Model. Specifically, the first network element groups the communication devices in the network according to the information of the communication devices in the network to obtain at least one group, each group includes at least one communication device, and at least one group includes the first communication device. The group to which the communication device belongs; the first network element obtains the training samples of the communication devices included in each group; the first network element separately trains the corresponding training samples of each group based on the training samples of the communication devices included in each group decision model.
  • the first network element can determine the decision model according to the group of the first communication device, which is beneficial to reduce the complexity of training the decision model, and when a new communication device is added, the first network element can The appropriate decision-making model is matched to the group of communication devices obtained by training, so that the flexibility and scalability of the decision-making model are higher.
  • the decision-making models of different groups of communication devices are different, so that multiple The communication devices of the group can cooperate and compete better.
  • FIG. 3 is a schematic flowchart of a decision-making method provided by an embodiment of the present application. This method can be applied to the communication system shown in FIG. 1 , but the embodiment of the present application is not limited thereto. This embodiment is described by taking the first communication device requesting the first network element to determine the decision model as an example. As shown in FIG. 3 , the method may include the following steps:
  • the first communication device sends a first request message to a first network element, where the first request message is used to request a decision model of the first communication device, and correspondingly, the first network element receives the first request message.
  • the first network element determines a decision model for the first communication device according to the first request message.
  • the first network element sends the decision model to the first communication device, and correspondingly, the first communication device receives the decision model.
  • the first communication device makes a task decision according to the decision model.
  • the first communication apparatus may use the decision model determined for it by the first network element to allocate available wireless spectrum resources.
  • the first communication apparatus may use the decision model determined for it by the first network element to control the power.
  • the above-mentioned first request message is used to request the decision-making model of the first communication device.
  • the first request message may be an existing message or a newly defined message, which is dedicated to requesting a decision-making model, wherein the existing message may be MSG3 in the random access phase or uplink signal (uplink control information, UCI) sent in the uplink.
  • UCI uplink control information
  • the first request message may include the group of the first communication device, and after receiving the first request message, the first network element selects the first communication device from a plurality of decision models according to the group of the first communication device.
  • a communication device determines a decision model.
  • the first request message includes a physical cell identifier of the first communication device, a neighbor relationship table of the first communication device, a category of the first communication device, or location information of the first communication device.
  • the first network element determines the first communication device according to the physical cell identifier of the first communication device, the neighbor relationship table of the first communication device, the category of the first communication device, or the location information of the first communication device the group of devices, and then determine a decision model for the first communication device from the plurality of decision models according to the group of the first communication device.
  • the first communication device adjusts the decision model based on the training samples of the first communication device.
  • Step S305 is an optional step, that is, the first communication device may not adjust the decision model after receiving the decision model.
  • the decision-making model can be made more suitable for the scene where the first communication device is located, so that the first communication device can better make a decision-making action according to the real-time information of the scene, thereby improving resources utilization or improve network performance.
  • the first communication device adjusts the decision model according to information such as the current channel quality and current service quality requirements of users in the cell covered by the first communication device, so that the first communication device can Better allocation of equal resources of wireless spectrum.
  • the first communication device adjusts the decision model according to the current channel gain, power, rate and other information of users in the cell covered by the first communication device, so that the first communication device can better use the real-time information of users Perform power control.
  • the first communication device uses the real-time information collected by the first communication device, the decision-making action made according to the real-time information by the above-mentioned decision-making model, and the obtained punishment and reward as training samples, and adjusts the above-mentioned decision-making model.
  • the real-time information collected by the first communication device is called state information
  • the decision-making action made by the above-mentioned decision-making model according to the real-time information is called action information
  • the obtained penalty And the reward is called the benefit information, which is used to judge the pros and cons of the decision-making action.
  • the adjustment of the decision-making model may include adjustment of the parameters of the decision-making model and adjustment of the structure.
  • the state information may be the sum of the states of the users of the cell where the first communication device is located.
  • the state of the user may include the channel gain at the current moment, the power at the last moment, and the rate at the last moment of the user and its adjacent users.
  • the action information may be the power of the users of the cell where the first communication apparatus is located.
  • the benefit information may be a weighted sum of the rate at the current moment of the user in the cell where the first communication device is located and the rate at the current moment of the user in the cell where the first communication device is located.
  • the state information may be the sum of the states of the users of the cell where the first communication device is located.
  • the state of the user may include channel state information (channel state information, CSI), historical throughput information, estimated throughput information obtained based on the channel state information, and state information of user data packet buffering.
  • the status information of the user data packet buffer may include the size of the data packet in the buffer, the size of the remaining space in the buffer, the waiting time of the data packet in the buffer, and the historical packet loss information in the buffer.
  • the action information may be the result of resource scheduling, for example, to which user a certain block of transmission resources is allocated, and a certain block of transmission resources may be resources in any domain of time-frequency space code.
  • the benefit information may be the current rate of the user in the cell where the first communication device is located, the weighted sum of the current rate of the user in the cell where the first communication device is located, the fairness, the user's packet loss rate, and the user's delay. Wait.
  • the first communication device sends a first request message to the first network element to request the decision model of the first communication device.
  • the first network element receives the first request message from the first communication device, determines the group of the first communication device, and then determines the decision model.
  • the first network element may determine a decision model for the first communication device according to the request of the first communication device, and may determine a more suitable decision model for the first communication device.
  • FIG. 4 is a flowchart of a training method of a decision model provided by an embodiment of the present application.
  • the decision model mentioned in the above embodiment can be obtained by training the method of this embodiment.
  • the method provided by the embodiment of the present application The method includes the following steps:
  • the first network element groups the communication devices in the network according to the information of the communication devices in the network to obtain at least one group, each group includes at least one communication device, and the at least one group includes the first The group to which the communication device belongs.
  • the first network element may group the communication devices in the network device according to the inter-cell interference relationship of the communication devices in the network, the category of the communication devices in the network, or the location information of the communication devices to obtain at least one group.
  • the mutual interference between adjacent cells is relatively large, and the decision-making bodies of adjacent cells can use different decision-making models, which is more conducive to the cooperation and coordination between decision-making bodies.
  • the distance is relatively long. If the mutual interference between the cells of the same distance is small, the decision-making subject of the cell with a farther distance can use the same decision-making model. Therefore, the first network element can group the communication devices in the network according to the interference relationship between the cells of the communication devices in the network, train a corresponding decision model for adjacent cells, and reduce the mutual interference between the cells.
  • the first network element may group the communication devices in the network according to the types of communication devices in the network, and train corresponding decision models for different types of communication devices.
  • the first network element may group the communication devices in the network according to the geographic information of the communication devices in the network, and train corresponding decision models for the communication devices in different geographical locations.
  • the first network element acquires training samples of communication devices included in each group.
  • the first network element separately trains the decision model corresponding to each group based on the training samples of the communication devices included in each group.
  • the first network element may include an initial decision model corresponding to each group, the initial decision model corresponding to each group is allocated to the communication devices included in each group, and the data collected by the communication devices included in each group are collected.
  • Real-time information use the initial decision model to make decisions and obtain benefits based on the collected real-time information, wherein the collected real-time information of the communication device is the state information of the communication device, the decision made by the communication device according to the real-time information is the action information, and the communication device Making a decision to get the benefit is the benefit information.
  • the first network element acquires the state information, action information and income information of the communication devices included in each group as training samples.
  • the state information of the communication devices included in each group is used as the input of the initial model corresponding to each group, and the action information of the communication devices included in each group is used as the output of the initial model corresponding to each group.
  • the income information of the included communication devices is used to judge the quality of the action information of the included communication devices in each group.
  • the first network element can update the trained decision-making model so as to adapt to changes in the system.
  • the updating of the decision model by the first network element can be implemented in different ways.
  • the first network element may periodically update the decision model. For example, the first network element updates the decision model every hour.
  • the first network element obtains the status information, action information and income information of the communication device within one hour as an update sample, updates the decision model, and sends the updated decision model to the communication device.
  • the first network element monitors the performance of the communication device in real time, and updates the decision model when the performance of the communication device is lower than the first threshold.
  • the performance of the communication device may include information such as throughput and delay of the communication device. It should be understood that if the performance of the real-time monitoring communication device by the first network element includes multiple pieces of information, the first threshold value will also correspondingly become multiple pieces.
  • the above method of updating the decision-making model is to update some parameters of the decision-making model, so that the decision-making model is more adaptable to the changes of the system. If the system changes greatly, the trained decision model cannot make better decisions, and the first network element may retrain the decision model.
  • the first network element may periodically train the decision-making model. For example, the first network element retrains the decision model every 7 days. The first network element obtains the state information, action information, and income information of the communication device within 7 days as training samples, retrains the decision-making model, and sends the trained decision-making model to the communication device.
  • the first network element monitors the performance of the communication device in real time, and trains the decision-making model when the performance of the communication device is lower than the second threshold.
  • the performance of the communication device may include information such as throughput and delay of the communication device. It should be understood that, if the performance of the real-time monitoring communication device by the first network element includes multiple pieces of information, the second threshold value will also correspondingly become multiple pieces. It should be understood that the second threshold is less than or equal to the first threshold.
  • the above method of retraining the decision model can change all parameters of the decision model and change the structure of the decision model, so that the decision model can adapt to the changes of the system.
  • FIG. 5 there are 25 cells in the network, wherein the area where the regular hexagons are located represents the coverage area of the cells, and it should be understood that the use of regular hexagons to represent the cells is just an example.
  • the first network element groups the communication devices in the network according to the inter-cell interference relationship of the communication devices in the network, three groups are obtained, and different groups are represented by different line patterns in the figure. Cells with the same line pattern represent that the communication devices in the cells belong to one group, and cells with different line patterns represent that the communication devices in the cells belong to different groups.
  • the state information may be the sum of the states of the M users in one cell.
  • the state of the user may be the channel gain of the user at the current moment, the power at the last moment, and the rate at the last moment.
  • the action information may be the current moment power of M users in a cell.
  • the benefit information may be the weighted sum of the current moment rate of 25*M users in 25 cells and the current moment rate of 25*M users in 25 cells divided by the number of cells.
  • the number of users included in each cell is variable.
  • the number of users may take the maximum value of the number of users in all cells. If the actual number of users in the cell is less than M, the input of the decision model can be filled with zeros; when the number of actual users is greater than M, a user selection algorithm can be designed to select M users from the actual users for power distribute.
  • the training method of the decision model in the embodiment of the present application only needs to train the decision model corresponding to each group, and does not need to train a decision model for each communication device, which is beneficial to reduce the complexity of the training model, and when new When the communication device is added, the first network element can match an appropriate decision model from the learned model obtained by training according to the group of the new communication device, so that the flexibility and scalability of the decision model are higher.
  • FIG. 6 is a schematic flowchart of a decision-making method provided by an embodiment of the present application.
  • the decision task is power control as an example
  • the first communication device is the base station 101
  • the first network element may be a server as an example for description.
  • the method of the embodiment of the present application may include the following steps:
  • the server obtains a decision model corresponding to each group according to the group training of the base station.
  • the server can group the base stations according to the interference relationship between the cells, and train a decision model for each group.
  • the decision model corresponding to each group For the specific training process of the decision model corresponding to each group, reference may be made to the method shown in FIG. 4 , which will not be repeated here.
  • the base station acquires the physical cell identifier of the base station.
  • the base station determines the group of the base station according to the physical cell identifier of the base station.
  • the base station sends a first request message to the server, where the first request message includes the group of the base station.
  • the server receives the first request message.
  • the server determines a decision model for the base station according to the first request message.
  • the server sends the decision model to the base station, and correspondingly, the base station receives the decision model.
  • the base station makes a task decision according to the decision model.
  • the base station can obtain the physical cell identifier of the base station through neighbor cell discovery and negotiation with base stations of adjacent cells.
  • the base station may determine the group of the base station in the following manner: In one manner, the base station performs a remainder operation on the physical cell identifier of the base station and the total number of groups to obtain the remainder. The base station may determine the group of base stations according to the remainder.
  • the group of the base station is 1.
  • the correspondence between the physical cell identifier of the base station and the group of the base station may be pre-defined, and after acquiring the physical cell identifier, the base station queries the correspondence according to the physical cell identifier, and obtains the base station's identifier corresponding to the physical cell identifier. group.
  • the corresponding relationship may be configured by the server to the base station, or may be configured by the core network device to the base station.
  • the base station After the base station determines the group, it sends a first request message to the server, where the first request message includes the group information of the base station.
  • the server receives the first request message, and determines a decision model corresponding to the group information for the base station according to the first request message.
  • the base station may re-determine the physical cell identifier of the base station through neighbor discovery and negotiation with base stations of neighboring cells.
  • the neighbor relationship change may include other base stations joining the network or exiting the network.
  • the base station will re-determine the group of the base station according to the updated physical cell identifier.
  • the base station sends a first request message to the server, where the first request message includes the re-determined group of the base station.
  • the server re-determines a decision model corresponding to the group of the base station for the base station according to the first request message.
  • the above S601 may further include that the server groups the base stations according to the categories of the base stations or the location information of the base stations, and trains a decision model for each group.
  • the base station determines the group of the base station according to the physical cell identifier, and sends the group information to the server for determining the decision model.
  • the base station may also carry the physical cell identifier in the first request message and send it to the server, and the server determines the group of the base station according to the physical cell identifier of the base station.
  • the server may use the same method as the base station. The group of the base station is determined, and details are not repeated here.
  • FIG. 7 is a schematic flowchart of a decision-making method provided by an embodiment of the present application.
  • the embodiments of the present application take the decision task as power control as an example, the first communication device is the base station 101, and the first network element may be a server as an example for description.
  • the method of this embodiment of the present application may include the following steps:
  • the server obtains a decision model corresponding to each group according to the group training of the base station.
  • the server can group the base stations according to the interference relationship between the cells, and train a decision model for each group.
  • the decision model corresponding to each group For the specific training process of the decision model corresponding to each group, reference may be made to the method shown in FIG. 4 , which will not be repeated here.
  • the base station acquires a neighbor relationship table of the base station.
  • the base station determines the group of the base station according to the neighbor relationship table of the base station.
  • the base station sends a first request message to the server, where the first request message includes the group of the base station, and correspondingly, the server receives the first request message.
  • the server determines a decision model for the base station according to the first request message.
  • the server sends the decision model to the base station, and correspondingly, the base station receives the decision model.
  • the base station makes a task decision according to the decision model.
  • the base station can obtain the neighbor relationship table of the base station through neighbor cell discovery and negotiation with the base station of the neighbor cell.
  • the base station can determine the group of the base station in the following way: in one way, the base station performs a remainder operation on each physical cell identifier in the neighbor relationship table of the base station and the total number of groups, and obtains the remainder corresponding to each physical cell identifier, Then, the group of base stations is determined according to the number of different remainders corresponding to the identifiers of each physical cell.
  • the base station can perform a remainder operation on the physical cell identifiers of the 10 adjacent cells and 3 respectively. If 5 0s, 4 1s, and 1 2 are obtained. , the group of the base station is 2.
  • the corresponding relationship between the physical cell identifier of the base station and the group of the base station can be pre-defined. After the base station obtains the neighbor relationship table, it queries the corresponding relationship according to the identifier of each physical cell in the neighbor relationship table, and obtains The identification of each physical cell corresponds to the group of base stations, and then the group of base stations is determined according to the number of different groups corresponding to the identification of each physical cell.
  • the corresponding relationship may be configured by the server to the base station, or may be configured by the core network device to the base station.
  • the base station After the base station determines the group, the base station sends a first request message to the server, where the first request message includes the group information of the base station.
  • the server receives the first request message, and determines a decision model corresponding to the group information for the base station according to the first request message.
  • the base station may re-determine the neighbor relationship table of the base station through neighbor discovery and negotiation with base stations of neighboring cells.
  • the neighbor relationship change may include other base stations joining the network or exiting the network.
  • the base station will re-determine the group of base stations according to the updated neighbor relationship table.
  • the base station sends a first request message to the server, where the first request message includes the re-determined group of the base station.
  • the server re-determines a decision model corresponding to the group of the base station for the base station according to the first request message.
  • the above S701 may further include that the server groups the base stations according to the categories of the base stations or the location information of the base stations, and trains a decision model for each group.
  • the base station determines the group of the base station according to the neighbor relationship table, and sends the group information to the server for determining the decision model.
  • the base station may also carry the neighbor relationship table in the first request message and send it to the server, and the server determines the group of the base station according to the neighbor relationship table of the base station.
  • the group of the base station is determined by the method, which is not repeated here.
  • the decision-making method provided by the embodiment of the present application can be simulated.
  • the decision task is power control
  • the decision subject is the base station.
  • N 25 cells in the simulation environment, which are distributed in 5 rows and 5 columns.
  • 25 cells can be divided into 3 groups according to the interference relationship between cells, the corresponding decision-making models of each group are trained 5 times respectively, and a total of 5 decision-making models are obtained for each group. It should be understood that each group The parameters and structures of the five decision-making models can be different.
  • the decision model in the simulation environment is based on DDPG-based multi-agent deep reinforcement learning (multi-agent deep deterministic policy gradient, MADDPG).
  • Five comparison schemes can be set in the simulation, namely: shared parameter multi-agent scheme, fractional optimization scheme (FP), weighted minimum mean square error scheme (WMMSE), MAX scheme and RANDOM scheme.
  • FP fractional optimization scheme
  • WMMSE weighted minimum mean square error scheme
  • MAX scheme the base station selects the maximum transmit power each time
  • RANDOM scheme the base station randomly selects a value smaller than the maximum allowable transmit power for transmission each time.
  • the shared parameter multi-agent scheme, FP, WMMSE, MAX and RANDOM algorithms in the comparison scheme can be directly used for testing without training.
  • the average weighted sum rate average value of the test results of 25 cells can be calculated respectively during simulation.
  • the best weighted sum rate mean r B of the test results of 25 cells and the worst weighted sum rate mean r W of the test results of 25 cells are obtained in Table 1, wherein the weight of each cell is 1. It should be understood that each cell has a weight of 1. The weight of each cell can also be set to different values.
  • Table 1 is a comparison table of simulation results provided by the embodiments of the present application.
  • FIG. 8 shows a decision-making apparatus provided by an embodiment of the present application.
  • the apparatus may include: a transceiver unit and a processing unit.
  • the apparatus is configured to execute each process and step corresponding to the first communication apparatus in the foregoing method embodiment.
  • the transceiver unit is used for: receiving a decision model from the first network element, where the decision model is determined based on the group of the device.
  • the processing unit is used for: making task decisions according to the decision model.
  • the transceiver unit is further configured to: send a first request message to the first network element, where the first request message is used to request a decision model of the apparatus.
  • the above-mentioned first request message includes the group of the device.
  • the above-mentioned first request message includes one or more of the following information: a physical cell identifier of the device, a neighbor relationship table of the device, a category of the device, and location information of the device.
  • the processing unit is further configured to: determine the group of the device according to the physical cell identifier of the device or the neighbor relationship table of the device or the category of the device or the location information of the device.
  • the processing unit is further configured to: perform a remainder operation on the physical cell identifier of the device and the total number of groups to obtain a remainder; and determine the group of the device according to the remainder.
  • the processing unit is further configured to: perform a remainder operation on each physical cell identifier and the total number of groups in the neighbor relationship table of the device, to obtain a remainder corresponding to each physical cell identifier; according to each physical cell identifier; The number of different remainders corresponding to the cell identifier determines the group of the device.
  • the processing unit is further configured to: adjust the decision model based on a training sample of the device, where the training sample of the device includes state information, action information and income information of the decision model.
  • the grouping of the above-mentioned devices is based on the interference relationship between cells, the category of the device, or the location information of the device.
  • the above-mentioned decision model is determined by the first network element from a plurality of decision models, the plurality of decision models correspond to a plurality of groups of communication devices, and the decision model corresponding to each group is based on each group. It is obtained by training the training samples of the included communication device.
  • the apparatus is configured to execute each process and step corresponding to the first network element in the foregoing method embodiment.
  • the processing unit is configured to: determine a decision model for the first communication device, where the decision model is determined based on the group of the first communication device.
  • the transceiver unit is used for: sending the decision model to the first communication device.
  • the transceiver unit is further configured to: receive a first request message from the first communication device, where the first request message is used to request the decision model of the first communication device;
  • the processing unit is further configured to: determine the decision model for the first communication device according to the first request message.
  • the above-mentioned first request message includes the group of the first communication device.
  • the above-mentioned first request message includes one or more of the following information: the physical cell identifier of the first communication device, the neighbor relationship table of the first communication device, the category of the first communication device, and the location of the first communication device information.
  • the processing unit is further configured to: perform a remainder operation on the physical cell identifier of the first communication device and the total number of groups to obtain a remainder; according to the remainder, determine the group of the first communication device; according to the first communication device The decision model is determined for the first communication device.
  • the processing unit is further configured to: perform a remainder operation on each physical cell identifier and the total number of groups in the neighbor relationship table of the first communication device, to obtain a remainder corresponding to each physical cell identifier; The number of different remainders corresponding to the physical cell identifiers determines the group of the first communication device; and according to the group of the first communication device, a decision model is determined for the first communication device.
  • the processing unit is further configured to: determine the decision model from a plurality of decision models.
  • the processing unit is further configured to: group the communication devices in the network according to the information of the communication devices in the network to obtain at least one group, each group includes at least one communication device, the at least one group including the group to which the first communication device belongs; acquiring training samples of the communication devices included in each group; and training the decision model corresponding to each group based on the training samples of the communication devices included in each group.
  • the processing unit is further configured to: group the communication devices in the network according to the interference relationship between the cells of the communication devices in the network, the category of the communication devices in the network, or the location information of the communication devices in the network to obtain at least one group. do not.
  • the above-mentioned multiple decision models correspond to multiple groups of communication devices, and the decision models corresponding to each group are obtained by training based on training samples of communication devices included in each group.
  • the transceiver unit is further configured to: receive one or more decision models from the second network element, the one or more decision models correspond to one or more groups of the communication device, and the decision model corresponding to each group It is obtained by training based on the training samples of the communication devices included in each group.
  • the devices herein are embodied in the form of functional units.
  • the term "unit" as used herein may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor for executing one or more software or firmware programs (eg, a shared processor, a dedicated processor, or a group of processors, etc.) and memory, merge logic, and/or other suitable components to support the described functions.
  • ASIC application specific integrated circuit
  • the apparatus may specifically be the first communication apparatus or the first network element in the foregoing embodiment, and the apparatus may be configured to perform communication with the first communication apparatus or the first communication apparatus in the foregoing method embodiment.
  • Each process and/or step corresponding to a network element is not repeated here in order to avoid repetition.
  • the devices of the above solutions have the function of implementing the corresponding steps performed by the first communication device or the first network element in the above method; the above functions may be implemented by hardware, or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the above-mentioned transceiving unit may include a transmitting unit and a receiving unit, and the transmitting unit may be used to implement various steps and/or processes corresponding to the above-mentioned transceiving unit for performing the sending action, and the receiving unit may be used to implement the corresponding Various steps and/or processes for performing the receiving action.
  • the sending unit may be replaced by a transmitter, and the receiving unit may be replaced by a receiver, respectively performing the transceiving operations and related processing operations in each method embodiment.
  • the device in FIG. 8 may also be a chip or a system-on-chip, such as a system on chip (system on chip, SoC).
  • the transceiver unit may be a transceiver circuit of the chip, which is not limited herein.
  • FIG. 9 shows another decision-making apparatus provided by an embodiment of the present application.
  • the apparatus includes a processor, a transceiver and a memory.
  • the processor, the transceiver and the memory communicate with each other through an internal connection path, the memory is used for storing instructions, and the processor is used for executing the instructions stored in the memory to control the transceiver to send and/or receive signals.
  • the apparatus is configured to execute each process and step corresponding to the first communication apparatus in the above method.
  • the transceiver is used for: receiving a decision model from the first network element, where the decision model is determined based on the group of the device.
  • the processor is used for: making task decisions according to the decision model.
  • the apparatus is configured to execute each process and step corresponding to the first network element in the above method.
  • the processor is configured to: determine a decision model for the first communication device, where the decision model is determined based on the group of the first communication device.
  • the transceiver is used for: sending the decision model to the first communication device.
  • the apparatus may be specifically the first communication apparatus or the first network element in the foregoing embodiments, and may be configured to execute various steps and/or steps corresponding to the first communication apparatus or the first network element in the foregoing method embodiments process.
  • the memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory.
  • the memory may also store device type information.
  • the processor may be configured to execute the instructions stored in the memory, and when the processor executes the instructions stored in the memory, the processor is configured to execute each of the foregoing method embodiments corresponding to the first communication device or the first network element steps and/or processes.
  • the transceiver may include a transmitter and a receiver, the transmitter may be used to implement various steps and/or processes corresponding to the foregoing transceiver for performing the sending action, and the receiver may be used to implement the corresponding Perform the various steps and/or processes of the receiving action.
  • the processor of the above device may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like.
  • DSPs digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA Field Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software units in the processor.
  • the software unit may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor executes the instructions in the memory, and completes the steps of the above method in combination with its hardware. To avoid repetition, detailed description is omitted here.
  • the present application also provides a first communication device, including an input and output interface and a logic circuit, where the input and output interface is used for receiving a decision model of the first network element, and the logic circuit is used for receiving a decision model of the first network element according to the decision model and the above embodiments. method for task decision-making.
  • the present application also provides a first network element, including an input and output interface and a logic circuit, the logic circuit is used for determining a decision model for the first communication device according to the method in the above embodiment, and the input and output interface is used for sending the decision model.
  • the present application provides a readable computer storage medium, where the readable computer storage medium is used to store a computer program, and the computer program is used to implement the method corresponding to the first communication apparatus shown in the various possible implementation manners in the foregoing embodiments.
  • the present application provides another computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is used to implement the method corresponding to the first network element shown in the various possible implementation manners in the foregoing embodiments .
  • the present application provides a computer program product, the computer program product includes a computer program (also referred to as code, or instructions), when the computer program runs on a computer, the computer can execute the first communication shown in the above embodiments the corresponding method of the device.
  • a computer program also referred to as code, or instructions
  • the present application provides another computer program product, the computer program product includes a computer program (also referred to as code, or instructions), when the computer program runs on a computer, the computer can execute various possible implementations in the above embodiments.
  • the method corresponding to the first network element shown in the implementation manner is implemented.
  • the present application provides a chip system, where the chip system is used to support the above-mentioned first communication apparatus to implement the functions shown in the embodiments of the present application.
  • the present application provides another chip system, where the chip system is configured to support the above-mentioned first network element to implement the functions shown in the embodiments of the present application.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请提供了一种决策方法和决策装置,该方法包括:第一网元为第一通信装置确定决策模型,该决策模型是基于第一通信装置的组别确定的;第一网元向第一通信装置发送该决策模型,对应地,第一通信装置接收该决策模型;第一通信装置根据该决策模型进行任务决策。本申请有利于降低训练决策模型的复杂程度,提高决策系统的灵活度和可扩展性。

Description

决策方法和决策装置
本申请要求于2020年11月11日提交中国专利局、申请号为202011255202.2、申请名称为“决策方法和决策装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及无线通信领域,尤其涉及一种决策方法和决策装置。
背景技术
随着无线通信技术的发展,无线通信系统的任务从单一的语言传输任务,发展到执行检测、协作、控制、决策和优化等任务,因此,无线通信系统中存在大量的决策类任务,例如,无线资源调度、功率控制等。解决无线通信系统中的决策任务,通常将决策问题建模成优化问题,通过解优化的方法,得到决策动作,也可以将决策问题建模成马尔科夫决策过程,通过人工智能的方式进行求解,得到决策动作。
现有技术中,采用完全多智能体强化学习的方法求解马尔科夫决策过程,得到决策动作。完全多智能体强化学习的方法是为每一个决策主体训练一种强化学习模型,当决策主体较多时,训练强化学习模型的复杂度高,且决策主体间会相互影响,导致系统不稳定,同时,当有新的决策主体加入时,马尔科夫决策过程会发生变化,需要重新训练强化学习模型,导致该方法可扩展性差。
发明内容
本申请提供一种决策方法和决策装置,有利于降低训练决策模型的复杂程度,提高决策系统的灵活度和可扩展性。
第一方面,提供了一种决策方法,包括:第一通信装置接收来自第一网元的决策模型,该决策模型是基于第一通信装置的组别确定的;第一通信装置根据该决策模型进行任务决策。
若第一通信装置为网络设备,则第一网元可以是服务器,其中,网络设备可以为基站,服务器可以是部署在核心网或基站集中单元(Centralized Unit,CU)上的具有存储功能的服务器,也可以是独立于通信系统的第三方服务器,例如,专用于进行模型存储的服务器。若第一通信装置为终端设备,则第一网元可以是服务器或网络设备或核心网网元。
第一网元中可能有多个不同的决策模型,第一网元可以从多个不同的决策模型中为第一通信装置确定一个决策模型。第一网元可以基于自身的策略主动为第一通信装置确定决策模型,也可以根据第一通信装置的请求为第一通信装置确定决策模型。
第一通信装置的决策模型可以由第一网元确定,还可以由标准、协议或设备安装手册直接硬性指定,然后第一网元向第一通信装置发送该指定的决策模型。
在本申请实施例中,第一网元可以根据第一通信装置的组别确定决策模型,有利于降低训练决策模型的复杂程度,并且当新的通信装置加入时,第一网元可以根据新的通信装置的组别从已训练得到的决策模型中匹配合适的决策模型,使得决策模型的灵活性以及可扩展性更高,另外,不同组别的通信装置的决策模型不同,使得多个不同组别的通信装置之间可以较好地合作和竞争。
结合第一方面,在一种可能的实现方式中,第一通信装置接收来自第一网元的决策模型之前,上述方法还包括:第一通信装置向第一网元发送第一请求消息,该第一请求消息用于请求第一通信装置的决策模型。
此方法第一网元可以根据第一通信装置的请求,为第一通信装置确定决策模型,可以为第一通信装置确定更适合的决策模型。
结合第一方面,在一种可能的实现方式中,上述第一请求消息可以包括第一通信装置的组别。
结合第一方面,在一种可能的实现方式中,上述第一请求消息可以包括以下信息中一个或者多个:第一通信装置的物理小区标识、第一通信装置的邻区关系表、第一通信装置的类别、第一通信装置的位置信息。
结合第一方面,在一种可能的实现方式中,第一通信装置向第一网元发送第一请求消息之前,上述方法还包括:第一通信装置根据第一通信装置的物理小区标识或者第一通信装置的邻区关系表或者第一通信装置的类别或者第一通信装置的位置信息,确定第一通信装置的组别。
结合第一方面,在一种可能的实现方式中,第一通信装置根据第一通信装置的物理小区标识,确定第一通信装置的组别,可以包括:第一通信装置对第一通信装置的物理小区标识与组别总数做取余运算,得到余数;第一通信装置根据该余数,确定第一通信装置的组别。
或者,第一通信装置根据第一通信装置的物理小区的标识,查询物理小区标识与通信装置的组别的对应关系,得到第一通信装置的组别。
结合第一方面,在一种可能的实现方式中,第一通信装置根据第一通信装置的邻区关系表,确定第一通信装置的组别,可以包括:第一通信装置对第一通信装置的邻区关系表中的每个物理小区标识与组别总数做取余运算,得到余数;第一通信装置根据该每个物理小区标识对应的不同余数的个数,确定第一通信装置的组别。
或者,第一通信装置根据第一通信装置的邻区关系表,查询邻区关系表中每个物理小区的标识与第一通信装置的组别的对应关系,得到该每个物理小区的标识对应的第一通信装置的组别,然后根据该每个物理小区标识对应的不同组别的个数,得到第一通信装置的组别。
结合第一方面,在一种可能的实现方式中,上述方法还可以包括:第一通信装置基于第一通信装置的训练样本,调整上述决策模型,该第一通信装置的训练样本包括该决策模型的状态信息、动作信息和收益信息。
在本申请实施例中,第一通信装置通过调整该决策模型,可以使该决策模型更适用于第一通信装置所处的场景,使第一通信装置根据场景的实时信息更好地做出决策动作,从而能够提高资源的利用率或者提高网络的性能等。
结合第一方面,在一种可能的实现方式中,第一通信装置的分组依据可以为小区间的干扰关系、第一通信装置的类别或者第一通信装置的位置信息。
结合第一方面,在一种可能的实现方式中,上述决策模型是第一网元从多个决策模型中确定的,该多个决策模型对应通信装置的多个组别,每个组别对应的决策模型是基于该每个组别包括的通信装置的训练样本训练得到的。
第二方面,提供了另一种决策方法,包括:第一网元为第一通信装置确定决策模型,该决策模型是基于第一通信装置的组别确定的;第一网元向第一通信装置发送该决策模型。
在本申请实施例中,第一网元可以根据第一通信装置的组别确定决策模型,有利于降低训练决策模型的复杂程度,并且当新的通信装置加入时,第一网元可以根据新的通信装置的组别从已训练得到的学习模型中匹配合适的决策模型,使得决策模型的灵活性以及可扩展性更高,另外,不同组别的通信装置的决策模型不同,使得多个不同组别的通信装置之间可以较好地合作和竞争。
结合第二方面,在一种可能的实现方式中,第一网元为第一通信装置确定决策模型之前,上述方法还包括:第一网元接收来自第一通信装置的第一请求消息,该第一请求消息用于请求第一通信装置的决策模型;第一网元为第一通信装置确定决策模型,包括:第一网元根据该第一请求消息,为第一通信装置确定决策模型。
在本申请实施例中,第一网元可以根据第一通信装置的请求,为第一通信装置确定决策模型,可以为第一通信装置确定更适合的决策模型。
结合第二方面,在一种可能的实现方式中,上述第一请求消息包括第一通信装置的组别。
在本申请实施例中,第一网元可以直接根据第一请求消息中包括的第一通信装置的组别,为第一通信装置确定适合的决策模型,有利于提高第一网元确定第一通信装置的决策模型的效率。
结合第二方面,在一种可能的实现方式中,上述第一请求消息包括以下信息中一个或者多个:第一通信装置的物理小区标识、第一通信装置的邻区关系表、第一通信装置的类别、第一通信装置的位置信息。
在本申请实施例中,第一网元接收到包括第一通信装置信息的请求消息,根据第一通信装置信息,确定第一通信装置的决策模型。当第一通信装置信息变化时,该方法有利于第一网元及时获取第一通信装置信息的变化,为第一通信装置更新第一通信装置的决策模型。
结合第二方面,在一种可能的实现方式中,第一网元根据第一请求消息,为第一通信装置确定决策模型,包括:第一网元对第一通信装置的物理小区标识与组别总数做取余运算,得到余数;第一网元根据该余数,确定第一通信装置的组别;第一网元根据该第一通信装置的组别,为第一通信装置确定决策模型。
结合第二方面,在一种可能的实现方式中,第一网元根据第一请求消息,为第一通信装置确定决策模型,包括:第一网元对第一通信装置的邻区关系表中的每个物理小区标识与组别总数做取余运算,得到每个物理小区标识对应的余数;第一网元根据每个物理小区标识对应的不同余数的个数,确定第一通信装置的组别;第一网元根据该第一通信装置的组别,为该第一通信装置确定决策模型。
结合第二方面,在一种可能的实现方式中,上述方法还包括:第一网元从多个决策模型中确定上述决策模型,该多个决策模型对应通信装置的多个组别,每个组别对应的决策模型是基于每个该组别包括的通信装置的训练样本训练得到的。
结合第二方面,在一种可能的实现方式中,上述方法还包括:第一网元根据网络中的通信装置的信息,对该网络中的通信装置进行分组,得到至少一个组别,每个组别包括至少一个通信装置,至少一个组别中包括第一通信装置属于的组别;第一网元获取每个组别包括的通信装置的训练样本;第一网元基于该每个组别包括的通信装置的训练样本,分别训练每个组别对应的决策模型。
在本申请实施例中,第一网元根据网络中通信装置的分组训练每个组别对应的决策模型,有利于降低训练模型的复杂程度。
结合第二方面,在一种可能的实现方式中,第一网元根据网络中的通信装置的信息,对网络中的通信装置进行分组,得到至少一个组别,包括:第一网元根据网络中通信装置的小区间的干扰关系、网络中通信装置的类别或者网络中通信装置的位置信息,对网络中的通信装置进行分组,得到至少一个组别。
结合第二方面,在一种可能的实现方式中,上述方法还包括:第一网元接收来自第二网元的一个或多个决策模型,该一个或多个决策模型对应通信装置的一个或多个组别,每个组别对应的决策模型是基于每个组别包括的通信装置的训练样本训练得到的。
在本申请实施例中,第一网元不需要训练决策模型,只需要拥有存储和通信的功能即可,有利于降低系统的成本。
第三方面,提供了一种决策模型的训练方法,包括:第一网元根据网络中的通信装置的信息,对该网络中的通信装置进行分组,得到至少一个组别,每个组别包括至少一个通信装置,至少一个组别中包括第一通信装置属于的组别;第一网元获取每个组别包括的通信装置的训练样本;第一网元基于该每个组别包括的通信装置的训练样本,分别训练每个组别对应的决策模型。
结合第三方面,在一种可能的实现方式中,第一网元根据网络中的通信装置的信息,对网络中的通信装置进行分组,得到至少一个组别,包括:第一网元根据网络中通信装置的小区间的干扰关系、网络中通信装置的类别或者网络中通信装置的位置信息,对网络中的通信装置进行分组,得到至少一个组别。
结合第三方面,在一种可能的实现方式中,多个决策模型对应通信装置的多个组别,每个组别对应的决策模型是基于每个组别包括的通信装置的训练样本训练得到的。
第四方面,提供了一种决策装置,用于执行上述第一方面或第一方面任意可能的实现方式中的方法。具体地,该装置包括用于执行上述第一方面或第一方面任意可能的实现方式中的方法的单元。
在一种设计中,该装置可以包括执行上述第一方面中所描述的方法/操作/步骤/动作所一一对应的模块,该模块可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。
一种可能的实现中,该决策装置包括收发单元和处理单元。收发单元用于接收来自第一网元的决策模型,该决策模型是基于所述第一通信装置的组别确定的。处理单元用于根据决策模型进行任务决策。
在另一种设计中,该装置为通信芯片,通信芯片可以包括用于发送信息或数据的输入 电路或者接口,以及用于接收信息或数据的输出电路或者接口。
在另一种设计中,该装置为通信设备,通信设备可以包括用于发送信息或数据的发射机,以及用于接收信息或数据的接收机。
在另一种设计中,该装置用于执行上述第一方面或第一方面任意可能的实现方式中的方法,该装置可以配置在上述第一通信装置中,或者该装置本身即为上述第一通信装置。
第五方面,提供了另一种决策装置,用于执行上述第二方面或第二方面任意可能的实现方式中的方法。具体地,该装置包括用于执行上述第二方面或第二方面任意可能的实现方式中的方法的单元。
在一种设计中,该装置可以包括执行上述第二方面中所描述的方法/操作/步骤/动作所一一对应的模块,该模块可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。
一种可能的实现中,该决策装置包括处理单元和收发单元。处理单元用于为第一通信装置确定决策模型,该决策模型是基于第一通信装置的组别确定的。收发单元用于向第一通信装置发送该决策模型。
在另一种设计中,该装置为通信芯片,通信芯片可以包括用于发送信息或数据的输入电路或者接口,以及用于接收信息或数据的输出电路或者接口。
在另一种设计中,该装置为通信设备,通信设备可以包括用于发送信息或数据的发射机,以及用于接收信息或数据的接收机。
在另一种设计中,该装置用于执行上述第二方面或第二方面任意可能的实现方式中的方法,该装置可以配置在上述第一网元中,或者该装置本身即为上述第一网元。
第六方面,提供了一种第一通信装置,包括,处理器和收发器,该收发器用于和其它装置通信,该处理器与存储器耦合,该存储器用于存储计算机程序,当该处理器调用所述计算机程序时,使得该第一通信装置执行上述第一方面中任一种可能实现方式中的方法。
第七方面,提供了一种第一网元,包括,处理器和收发器,该收发器用于和其它装置通信,该处理器与存储器耦合,该存储器用于存储计算机程序,当该处理器调用所述计算机程序时,使得该第一网元执行上述第二方面中任一种可能实现方式中的方法。
第八方面,提供了另一种决策装置,包括,处理器,存储器,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得该装置执行上述任一方面中任一种可能实现方式中的方法。
可选地,所述处理器为一个或多个,所述存储器为一个或多个。
可选地,所述存储器可以与所述处理器集成在一起,或者所述存储器与处理器分离设置。
可选地,该通信设备还包括,发射机(发射器)和接收机(接收器),发射机和接收机可以分离设置,也可以集成在一起,称为收发机(收发器)。
第九方面,提供了一种通信系统,包括用于实现上述第一方面或第一方面的任一种可能实现的方法的装置,以及用于实现上述第二方面或第二方面的任一种可能实现的方法的装置。
在一个可能的设计中,该通信系统还可以包括本申请实施例所提供的方案中与第一通信装置和/或第一网元进行交互的其他设备。
第十方面,提供了另一种第一通信装置,包括输入输出接口和逻辑电路,该输入输出 接口用于接收第一网元的决策模型,该逻辑电路用于根据该决策模型进行任务决策,使得该通信装置执行上述第一方面中任一种可能实现方式中的方法。
第十一方面,提供了另一种第一网元,包括输入输出接口和逻辑电路,该逻辑电路用于为第一通信装置确定决策模型,该输入输出接口用于发送该决策模型,使得该通信装置执行上述第二方面中任一种可能实现方式中的方法。
第十二方面,提供了一种计算机可读介质,所述计算机可读介质存储有计算机程序(也可以称为代码,或指令)当其在计算机上运行时,使得计算机执行上述任一方面中任一种可能实现方式中的方法。
第十三方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序(也可以称为代码,或指令),当所述计算机程序被运行时,使得计算机执行上述任一方面中任一种可能实现方式中的方法。
附图说明
图1是本申请实施例提供的通信系统的示意图。
图2是本申请实施例提供的一种决策方法的示意性流程图。
图3是本申请实施例提供的一种决策方法的示意性流程图。
图4是本申请实施例提供的一种决策方法的示意性流程图。
图5是本申请实施例提供的一种通信装置分组的示意图。
图6是本申请实施例提供的一种决策方法的示意性流程图。
图7是本申请实施例提供的一种决策方法的示意性流程图。
图8是本申请实施例提供的一种决策装置的示意性框图。
图9是本申请实施例提供的另一种决策装置的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例的技术方案可以应用于各种通信系统,例如:窄带物联网系统(narrow band-internet of things,NB-IoT)、长期演进(long term evolution,LTE)系统、LTE频分双工(frequency division duplex,FDD)系统、LTE时分双工(time division duplex,TDD)、第五代移动通信(5th generation,5G)系统或新无线(new radio,NR)、或者其他演进的通信系统等。5G系统通常包括以下三大应用场景:增强移动宽带(enhanced mobile broadband,eMBB),超高可靠与低时延通信(ultra-reliable and low latency communications,URLLC)和海量机器类通信(massive machine type of communication,mMTC)。
本申请实施例中的终端设备也可以称为:用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端(mobile terminal,MT)、接入终端、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、终端、无线通信设备、用户代理或用户装置等。
终端设备可以是一种向用户提供语音/数据连通性的设备,例如,具有无线连接功能的手持式设备、车载设备等。目前,一些终端设备的举例为:手机(mobile phone)、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(mobile internet device,MID)、可穿 戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端、蜂窝电话、无绳电话、会话启动协议(session initiation protocol,SIP)电话、无线本地环路(wireless local loop,WLL)站、个人数字助理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备、可穿戴设备,5G网络中的终端设备或者未来演进的公用陆地移动通信网络(public land mobile network,PLMN)中的终端设备等,本申请实施例对此并不限定。
此外,在本申请实施例中,终端设备还可以是物联网(internet of things,IoT)系统中的终端设备,IoT是未来信息技术发展的重要组成部分,其主要技术特点是将物品通过通信技术与网络连接,从而实现人机互连,物物互连的智能化网络。
另外,本申请实施例中的网络设备可以是为终端设备提供无线通信功能的设备,该网络设备也可以称为接入网设备或无线接入网设备,可以是传输接收点(transmission reception point,TRP),还可以是LTE系统中的演进型基站(evolved NodeB,eNB或eNodeB),还可以是家庭基站(例如,home evolved NodeB,或home Node B,HNB)、基带单元(base band unit,BBU),还可以是云无线接入网络(cloud radio access network,CRAN)场景下的无线控制器,或者该网络设备可以为中继站、接入点、车载设备、可穿戴设备以及5G网络中的网络设备或者未来演进的PLMN网络中的网络设备等,可以是WLAN中的接入点(access point,AP),可以是新型无线(new radio,NR)系统中的gNB,可以是卫星通信系统中的卫星基站等,以及设备到设备(Device-to-Device,D2D)、车辆外联(vehicle-to-everything,V2X)、机器到机器(machine-to-machine,M2M)通信中承担基站功能的设备等,本申请实施例并不限定。
在一种网络结构中,网络设备可以包括集中单元(centralized unit,CU)节点、或分布单元(distributed unit,DU)节点、或包括CU节点和DU节点的RAN设备、或者控制面CU节点(CU-CP节点)和用户面CU节点(CU-UP节点)以及DU节点的RAN设备。
网络设备为小区内的终端设备提供服务,终端设备通过网络设备分配的传输资源(例如,频域资源,或者说,频谱资源)与小区对应的网络设备或者其他设备进行通信,该网络设备可以为宏基站(例如,宏eNB或宏gNB等),也可以为小小区(small cell)对应的基站,这里的小小区可以包括:城市小区(metro cell)、微小区(micro cell)、微微小区(pico cell)、毫微微小区(femto cell)等,这些小小区具有覆盖范围小、发射功率低的特点,适用于提供高速率的数据传输服务。
本申请实施例并未对本申请实施例提供的方法的执行主体的具体结构特别限定,只要能够通过运行记录有本申请实施例的提供的方法的代码的程序,以根据本申请实施例提供的方法进行通信即可,例如,本申请实施例提供的方法的执行主体可以是终端设备或网络设备,或者,是终端设备或网络设备中能够调用程序并执行程序的功能模块。
另外,本申请的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本申请中使用的术语“制品”涵盖可从任何计算机可读器件、载体或介质访问 的计算机程序。例如,计算机可读介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,压缩盘(compact disc,CD)、数字通用盘(digital versatile disc,DVD)等),智能卡和闪存器件(例如,可擦写可编程只读存储器(erasable programmable read-only memory,EPROM)、卡、棒或钥匙驱动器等)。另外,本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
为便于理解本申请实施例,首先结合图1对适用于本申请实施例的通信系统进行详细说明。
图1为本申请实施例提供的通信系统100的示意图。如图1所示,该通信系统100可以包括一个或者多个小区。图中,规则六边形所在的区域代表小区的覆盖区域,应理解,用规则六边形表示小区仅仅是一个示例。该小区中包括网络设备101和至少一个终端设备102。其中,网络设备101可以为至少一个终端设备102提供通信服务。具体而言,网络设备101可以根据终端设备102的信道质量、服务质量要求等对可用的无线频谱等资源进行分配,还可以对终端设备102进行功率控制以及其他决策。
以无线传输资源分配的决策过程为例,网络设备101进行无线传输资源分配时,通常将分配问题建模为优化问题,通过求解优化问题得到决策动作,也可以将分配问题建模为马尔科夫决策过程,使用强化学习进行求解。
现有技术中,网络设备可以采用共享参数多智能体强化学习的方法以及完全多智能体强化学习的方法对马尔科夫决策过程进行求解,得到决策动作。共享参数多智能体强化学习方法是每个小区的网络设备均采用同一个多智能体强化学习(multi-agent reinforcement learning,MARL)模型对马尔科夫决策过程进行求解,得到决策动作。完全多智能体强化学习方法是每个小区的网络设备均采用不同的多智能体强化学习模型对马尔科夫决策过程进行求解,得到决策动作。
共享参数多智能体强化学习方法中,所有小区的决策策略相同,且不考虑其他小区的决策可能导致的干扰。对于处于边缘的、且位于两个相邻小区的共同覆盖范围内的用户,会出现两个相邻小区的网络设备(或者称为两个相邻小区对应的网络设备)均向该共同覆盖范围发送信号的情况,两个小区传输的信号会发生干扰。由于两个小区的决策策略相同,无法通过调整收益函数实现协调,导致此方法不适应多个小区进行合作或竞争的情况。
完全多智能体强化学习方法,所有小区的决策策略均不相同,导致多智能体强化学习模型的训练复杂程度高;同时,若一个区域同时被两个小区覆盖,两个小区的决策策略不同,会导致通信系统处于不稳定的状态;另外,若小区数量发生改变,优化问题需要重新建立以及重新训练多智能体强化学习模型,导致该方法可扩展性差。
有鉴于此,本申请提供了一种决策方法和决策装置,通过将网络设备分组,每一组包含多个网络设备,为每一组网络设备训练一个多智能体强化学习模型的方法,有利于降低训练强化学习模型的复杂程度,同时网络设备之间可以较好地合作和竞争,并且当新的网络设备加入时,不需要重新训练学习模型,根据新的网络设备的组别从已训练得到的学习模型中匹配合适的学习模型,使得学习模型的灵活性以及可扩展性更高。
图2为本申请实施例提供的决策方法的示意性流程图。该方法可以应用于图1所示的通信系统,但本申请实施例不限于此。如图2所示,该方法可以包括下列步骤:
S201、第一网元为第一通信装置确定决策模型,该决策模型是基于第一通信装置的组别确定的。
S202、第一网元向第一通信装置发送该决策模型,对应地,第一通信装置接收该决策模型。
S203、第一通信装置根据该决策模型进行任务决策。
第一通信装置可以为网络设备或终端设备。应理解,终端设备可以替换为能够实现与终端设备类似的功能的装置或芯片,网络设备也可以替换为能够实现与网络设备类似的功能的装置或芯片,本申请实施例对其名称不作限定。
若第一通信装置为网络设备,则第一网元可以是服务器,其中,网络设备可以为基站,服务器可以是部署在核心网或基站集中单元(centralized unit,CU)上的具有存储功能的服务器,也可以是独立于通信系统的第三方服务器,例如,专用于进行模型存储的服务器。若第一通信装置为终端设备,则第一网元可以是服务器或网络设备或核心网网元,当第一通信装置和第一网元处于设备到设备(device-to-device,D2D)场景中,第一通信装置和第一网元可以均为终端设备。在无线传感器网络中,第一网元可以是簇头传感器节点,第一通信装置可以是簇内其他节点。
本申请实施例提供的决策方法可以适用于无线通信系统中资源的调度以及功率控制等决策任务,还可以适用于以用户为中心的无边界网络(UE centric no cell,UCNC)中终端设备与远端射频单元(remote radio unit,RRU)的连接以及切换问题等。
决策模型可以包括实现决策的任意模型,本申请实施例不作限定。示例性地,决策模型可以是多智能体强化学习中的智能体模型,多智能体深度强化学习中的智能体模型,例如演员-评论家(actor-critic)算法中的智能体模型,深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法中的智能体模型等。
第一网元中可能有多个不同的决策模型,第一网元可以从多个不同的决策模型中为第一通信装置确定一个决策模型。第一网元可以基于自身的策略主动为第一通信装置确定决策模型,也可以根据第一通信装置的请求为第一通信装置确定决策模型。
通常情况下,当第一通信装置入网或者第一通信装置的邻区关系改变时,第一网元为第一通信装置确定一个决策模型。其中,第一通信装置入网是指第一通信装置初次加入网络或者设备重启之后加入网络,例如由于网络容量的扩增,新加入一个基站后,基站向服务器请求决策模型,或者,服务器检测到新加入一个基站后,为新加入的基站确定决策模型。第一通信装置的邻区关系改变是指与第一通信装置相邻的其它通信装置加入网络或者退出网络。若第一通信装置的邻区关系改变,导致第一通信装置的组别发生变化,则第一网元会重新为第一通信装置确定一个决策模型。应理解,第一通信装置的组别发生变化,则第一网元会重新为第一通信装置确定一个决策模型,对使第一通信装置的组别发生变化的情况,本申请实施例不做限定。
第一通信装置的决策模型可以由第一网元确定,还可以由标准、协议或设备安装手册直接硬性指定,然后第一网元向第一通信装置发送该指定的决策模型。
本实施例中,第一网元可以根据第一通信装置的组别,从多个不同的决策模型中为第一通信装置确定决策模型。相应的,第一网元需要获取第一通信装置的组别。
第一网元获取第一通信装置的组别有不同可实现的方式。一种可实现的方式中,第一 通信装置确定组别,将组别发送给第一网元。另一种可实现的方式中,第一网元自己确定第一通信装置的组别。
第一网元和第一通信装置确定第一通信装置的组别的方式可以相同也可以不同,例如,第一网元和第一通信装置均可以根据第一通信装置的物理小区标识、第一通信装置的邻区关系表、第一通信装置的类别或者第一通信装置的位置信息确定第一通信装置的组别。或者,第一网元根据第一通信装置的物理小区标识确定第一通信装置的组别,第一通信装置根据邻区关系表确定第一通信装置的组别。
第一通信装置的物理小区标识、第一通信装置的邻区关系表、第一通信装置的类别或者第一通信装置的位置信息可以是第一通信装置发送给第一网元的,也可以是第一网元从其他设备获取到的,本实施例不对此进行限制。
其中,物理小区标识用于区分不同小区,当第一通信装置为基站时,物理小区标识可以为该基站的标识。
第一通信装置的邻区关系表包括第一通信装置相邻的小区以及相邻的小区对应的物理小区标识。
第一网元和第一通信装置根据物理小区标识或者邻区关系表,确定第一通信装置的组别,有利于降低小区间的干扰,可以适用于与干扰管理相关的决策任务,例如,功率控制、用户调度等。
第一通信装置的类别一般可以描述该通信装置在网络中的等级、自身的硬件处理能力以及其业务类型。示例性地,若第一通信装置是网络设备,类别的分类可以是宏基站、微站、微微站、家庭基站等;若第一通信装置是终端设备,类别的分类可以是手机、平板、台式电脑、笔记本电脑等。
第一网元和第一通信装置根据第一通信装置的类别,确定第一通信装置的组别,该方法可以适用于多级异构网络以及存在多种业务类型的通信系统中。
第一通信装置的位置信息,该位置信息可以是第一通信装置的地理位置信息,比如第一通信装置的经纬度信息。
第一通信装置的位置信息可以间接的反映小区间的干扰情况,例如,位置较近的基站相互干扰较强;第一通信装置的位置信息也可以间接的反映业务类型,例如,位于城市中心商业区的通信装置的业务类型与位于工厂内的通信装置的业务类型有一定的区别。第一网元和第一通信装置根据第一通信装置的位置信息,确定第一通信装置的组别,该方法可以适用于与干扰管理相关的决策任务以及存在多种业务类型的通信系统中。
第一网元中的多个不同的决策模型可以是第一网元训练得到的,也可以是其他网元训练得到发送给第一网元的。
第一网元中的多个不同的决策模型对应多个不同的通信装置组别,以第一网元训练多个不同的决策模型为例,第一网元为每个组别训练了一个决策模型。具体的,第一网元根据网络中的通信装置的信息,对该网络中的通信装置进行分组,得到至少一个组别,每个组别包括至少一个通信装置,至少一个组别中包括第一通信装置属于的组别;第一网元获取每个组别包括的通信装置的训练样本;第一网元基于该每个组别包括的通信装置的训练样本,分别训练该每个组别对应的决策模型。
在本申请实施例中,第一网元可以根据第一通信装置的组别确定决策模型,有利于降 低训练决策模型的复杂程度,并且当新的通信装置加入时,第一网元可以根据新的通信装置的组别从已训练得到的学习模型中匹配合适的决策模型,使得决策模型的灵活性以及可扩展性更高,另外,不同组别的通信装置的决策模型不同,使得多个不同组别的通信装置之间可以较好地合作和竞争。
图3为本申请实施例提供的决策方法的示意性流程图。该方法可以应用于图1所示的通信系统,但本申请实施例不限于此。本实施例以第一通信装置请求第一网元确定决策模型为例进行说明,如图3所示,该方法可以包括下列步骤:
S301、第一通信装置向第一网元发送第一请求消息,该第一请求消息用于请求第一通信装置的决策模型,对应地,第一网元接收该第一请求消息。
S302、第一网元根据该第一请求消息,为第一通信装置确定决策模型。
S303、第一网元向第一通信装置发送该决策模型,对应地,第一通信装置接收该决策模型。
S304、第一通信装置根据该决策模型进行任务决策。
示例性地,在资源调度情况下,第一通信装置可以使用第一网元为其确定的决策模型,对可用的无线频谱资源进行分配。
示例性地,在功率控制情况下,第一通信装置可以使用第一网元为其确定的决策模型,对功率进行控制。
上述第一请求消息用于请求第一通信装置的决策模型,该第一请求消息可以为已有的消息,也可以是新定义的消息,专用于请求决策模型,其中,已有的消息可以是随机接入阶段的MSG3或者上行发送的上行信号(uplink control information,UCI)。当第一通信装置入网时,第一通信装置中还没有配置决策模型,第一通信装置向第一网元发送第一请求消息以获取决策模型。
一种可能的实现中,该第一请求消息可以包括第一通信装置的组别,第一网元接收到第一请求消息后,根据第一通信装置的组别从多个决策模型中为第一通信装置确定决策模型。
另一种可能的实现中,第一请求消息中包括第一通信装置的物理小区标识、第一通信装置的邻区关系表、第一通信装置的类别或者第一通信装置的位置信息。第一网元接收到第一请求消息后,根据第一通信装置的物理小区标识、第一通信装置的邻区关系表、第一通信装置的类别或者第一通信装置的位置信息确定第一通信装置的组别,然后根据第一通信装置的组别从多个决策模型中为第一通信装置确定决策模型。
S305、第一通信装置基于第一通信装置的训练样本,调整该决策模型。
步骤S305为可选步骤,即第一通信装置在接收到该决策模型后也可以不对决策模型进行调整。本实施例中,通过调整该决策模型,可以使该决策模型更适用于第一通信装置所处的场景,使第一通信装置根据场景的实时信息更好地做出决策动作,从而能够提高资源的利用率或者提高网络的性能等。例如,在资源调度情况下,第一通信装置根据第一通信装置所覆盖小区内的用户的当前信道质量、当前服务质量要求等信息,调整决策模型,使第一通信装置可以根据用户的实时信息更好地进行无线频谱的等资源的分配。在功率控制情况下,第一通信装置根据第一通信装置所覆盖小区内的用户的当前信道增益、功率、速率等信息,调整决策模型,使第一通信装置可以根据用户的实时信息更好地进行功率控 制。
第一通信装置将第一通信装置采集的实时信息、采用上述决策模型根据该实时信息做出的决策动作以及获得的惩罚和奖励作为训练样本,调整上述决策模型。其中,第一通信装置采集的实时信息称为状态信息,作为上述决策模型的输入,采用上述决策模型根据该实时信息做出的决策动作称为动作信息,作为上述决策模型的输出,获得的惩罚和奖励称为收益信息,用于评判决策动作的优劣。另外,对决策模型的调整可以包括对决策模型参数的调整、结构的调整。
示例性地,若决策任务是功率控制,则状态信息可以是第一通信装置所在小区的用户的状态的总和。其中,用户的状态可以包括用户及其邻接用户的当前时刻的信道增益、上一时刻的功率以及上一时刻的速率。另外,为了减少特征所占空间,可以取一部分拥有最大信道增益的邻区用户的上述信息。动作信息可以是第一通信装置所在小区的用户的功率。收益信息可以是第一通信装置所在小区的用户的当前时刻的速率以及第一通信装置所在小区的用户的当前时刻的速率的加权总和。
示例性地,若决策任务是资源调度,则状态信息可以是第一通信装置所在小区的用户的状态的总和。其中,用户的状态可以包括信道状态信息(channel state information,CSI)、历史吞吐信息、基于该信道状态信息获得的估计吞吐信息以及用户数据包缓存的状态信息。用户数据包缓存的状态信息可以包括缓存中数据包的大小、缓存的剩余空间大小、缓存中数据包的等待时间以及缓存的历史丢包信息。动作信息可以是资源调度的结果,例如,某块传输资源被具体分给了哪个用户,某块传输资源可以是时频空码任意域上的资源。收益信息可以是第一通信装置所在小区的用户的当前时刻的速率、第一通信装置所在小区的用户的当前时刻的速率的加权总和、公平性、用户的数据包丢包率、用户的时延等。
在本申请实施例中,第一通信装置向第一网元发送第一请求消息,以请求第一通信装置的决策模型。第一网元接收来自第一通信装置的该第一请求消息,确定第一通信装置的组别,进而确定决策模型。此方法第一网元可以根据第一通信装置的请求,为第一通信装置确定决策模型,可以为第一通信装置确定更适合的决策模型。
图4为本申请实施例提供的决策模型的训练方法的流程图,上述实施例中提到的决策模型都可以通过本实施例的方法训练得到,如图4所示,本申请实施例提供的方法包括以下步骤:
S401、第一网元根据网络中的通信装置的信息,对该网络中的通信装置进行分组,得到至少一个组别,每个组别包括至少一个通信装置,该至少一个组别中包括第一通信装置属于的组别。
第一网元可以根据网络中的通信装置的小区间的干扰关系、网络中通信装置的类别或者通信装置的位置信息对网络设备中的通信装置进行分组,得到至少一个组别。
具体的,对于小区间的干扰关系,相邻小区间的相互干扰较大,则相邻小区的决策主体可以使用不同的决策模型,更有利于决策主体间的合作协调,相应的,距离较远的小区间的相互干扰较小,则距离较远的小区的决策主体可以使用相同的决策模型。故第一网元可以根据网络中通信装置的小区间的干扰关系对网络中通信装置进行分组,为相邻小区训练对应的决策模型,减少小区间的互相干扰。
对于网络中通信装置的类别,第一网元可以根据网络中通信装置的类别对网络中通信 装置进行分组,为不同类别的通信装置训练对应的决策模型。
对于网络中通信装置的位置信息,第一网元可以根据网络中通信装置的地理信息对网络中通信装置进行分组,为处于不同地理位置的通信装置训练对应的决策模型。
S402、第一网元获取每个组别包括的通信装置的训练样本。
S403、第一网元基于该每个组别包括的通信装置的训练样本,分别训练该每个组别对应的决策模型。
第一网元中可以包括每个组别对应的初始决策模型,将该每个组别对应的初始决策模型分配给每个组别包括的通信装置,该每个组别包括的通信装置采集的实时信息,使用初始决策模型根据采集的实时信息做出决策以及获得收益,其中,采集的通信装置的实时信息为通信装置的状态信息,通信装置根据实时信息做出的决策为动作信息,通信装置做出决策而获得收益为收益信息。第一网元获取该每个组别包括的通信装置的这些状态信息、动作信息以及收益信息作为训练样本。每个组别包括的通信装置的状态信息作为每个组别对应的初始模型的输入,每个组别包括的通信装置的动作信息作为每个组别对应的初始模型的输出,每个组别包括的通信装置的收益信息用于评判每个组别包括的通信装置的动作信息的优劣。
第一网元可以更新已训练的决策模型,便于适用系统的变化。第一网元对决策模型的更新,有不同的实现方式。一种可能实现的方式中,第一网元可以对决策模型进行周期性的更新。例如,第一网元每隔1个小时对决策模型进行一次更新。第一网元获取1个小时内通信装置的状态信息、动作信息以及收益信息作为更新样本,对决策模型进行更新,并将更新后的决策模型发送给通信装置。另一种可能实现的方式中,第一网元实时监控通信装置的性能,当通信装置的性能低于第一阈值时,对决策模型进行更新。其中,通信装置的性能可以包括通信装置的吞吐量、时延等信息。应理解,若第一网元实时监控通信装置的性能包含多个信息时,第一阈值也将会对应变成多个。
上述更新决策模型的方法是更新决策模型的部分参数,使决策模型更适应系统的变化。若系统的变化较大时,已训练的决策模型无法做出较好地决策,第一网元还可以重新训练决策模型。
对决策模型的重新训练,也有不同的实现方式。一种可能实现的方式中,第一网元可以对决策模型进行周期性的训练。例如,第一网元每隔7天对决策模型进行一次重新训练。第一网元获取7天内通信装置的状态信息、动作信息以及收益信息作为训练样本,对决策模型进行重新训练,并将训练后的决策模型发送给通信装置。另一种可能实现的方式中,第一网元实时监控通信装置的性能,当通信装置的性能低于第二阈值时,对决策模型进行训练。其中,通信装置的性能可以包括通信装置的吞吐量、时延等信息。应理解,若第一网元实时监控通信装置的性能包含多个信息时,第二阈值也将会对应变成多个。应理解,第二阈值小于或等于第一阈值。
上述重新训练决策模型的方法可以更改决策模型的全部参数以及改变决策模型的结构,使决策模型适应系统的变化。
示例性地,如图5所示,在网络中存在25个小区,其中,规则六边形所在的区域代表小区的覆盖区域,应理解,用规则六边形表示小区仅仅是一个示例。
若第一网元根据网络中通信装置的小区间的干扰关系对该网络中的通信装置进行分 组,得到3个组别,不同的组别在图中用不同的线条图案表示。相同线条图案的小区代表小区内的通信装置属于一组,不同线条图案的小区代表小区内的通信装置属于不同的组。
假设每个小区包含M个用户,则状态信息可以是一个小区内M个用户的状态的总和。其中,用户的状态可以是用户的当前时刻的信道增益、上一时刻的功率以及上一时刻的速率。另外,为了减少特征所占空间,可以取一部分拥有最大信道增益的邻接用户的上述信息。动作信息可以是一个小区内M个的用户的当前时刻的功率。收益信息可以是25个小区内的25*M个用户的当前时刻的速率以及25个小区内的25*M个用户的当前时刻的速率的加权总和除以小区的数量。
应理解,每个小区包含的用户的个数是可变的,为了固定决策模型的输入,用户的个数可以取所有小区中的用户的个数的最大值。若当小区的实际用户的个数小于M时,可以对决策模型的输入进行补零操作;当实际用户的个数大于M时,可以设计用户选择算法,从实际用户中选择M个用户进行功率分配。
应理解,每个组别对应的决策模型的参数、结构可以均不相同。
本申请实施例的决策模型的训练方法,只需要训练每个组别对应的决策模型即可,不需要为每个通信装置训练一个决策模型,有利于降低训练模型的复杂程度,并且当新的通信装置加入时,第一网元可以根据新的通信装置的组别从已训练得到的学习模型中匹配合适的决策模型,使得决策模型的灵活性以及可扩展性更高。
图6为本申请实施例提供的决策方法的示意性流程图。本实施例以决策任务为功率控制为例,第一通信装置为基站101,第一网元可以为服务器为例进行说明。如图6所示,本申请实施例的方法可以包括下列步骤:
S601、服务器根据基站的组别训练得到每个组别对应的决策模型。
服务器可以根据小区间的干扰关系对基站进行分组,并为每个组别分别训练一个决策模型。该每个组别对应的决策模型的具体训练过程可以参照图4所示的方法,这里不再赘述。
S602、基站获取基站的物理小区标识。
S603、基站根据该基站的物理小区标识,确定基站的组别。
S604、基站向服务器发送第一请求消息,该第一请求消息包括该基站的组别,对应地,服务器接收该第一请求消息。
S605、服务器根据该第一请求消息,为基站确定决策模型。
S606、服务器向基站发送该决策模型,对应地,基站接收该决策模型。
S607、基站根据该决策模型进行任务决策。
具体而言,基站可以通过邻区发现、与相邻小区的基站协商,获取基站的物理小区标识。
基站可以通过如下方式确定基站的组别:一种方式中,基站对基站的物理小区标识与组别总数做取余运算,得到余数。基站可以根据该余数,确定基站的组别。
示例性地,组别总数为3,并且基站对基站的物理小区标识与3做取余运算,得到1,则该基站的组别为1。
另一种方式中,可以预先定义基站的物理小区标识与基站的组别的对应关系,基站获取到物理小区标识后,根据物理小区的标识查询该对应关系,得到物理小区的标识对应的 基站的组别。该对应关系可以是服务器配置给基站的,也可以核心网设备配置给基站的。
基站确定组别之后,向服务器发送第一请求消息,该第一请求消息包括该基站的组别信息。服务器接收该第一请求消息,并根据该第一请求消息,为基站确定组别信息对应的决策模型。
应理解,当邻区关系改变时,基站可以通过邻区发现、与相邻小区的基站协商,重新确定基站的物理小区标识。其中,邻区关系变化可以包括其它基站加入网络或退出网络。基站会根据更新的物理小区标识,重新确定基站的组别。基站向服务器发送第一请求消息,该第一请求消息包括该重新确定的基站的组别,对应地,服务器根据该第一请求消息,为基站重新确定该基站的组别对应的决策模型。
上述S601,还可以包括,服务器根据基站的类别或者基站的位置信息对基站进行分组,并为每个组别分别训练一个决策模型。
在本申请实施例中,基站根据物理小区标识确定基站的组别,并将组别信息发送给服务器用于确定决策模型。在本申请其他实施例中,基站也可以将物理小区标识携带在第一请求消息中发送给服务器,由服务器根据基站的物理小区标识确定基站的组别,其中,服务器可以采用与基站相同的方式确定基站的组别,这里不再赘述。
图7为本申请实施例提供的决策方法的示意性流程图。本申请实施例以决策任务为功率控制为例,第一通信装置为基站101,第一网元可以为服务器为例进行说明。如图7所示,本申请实施例的方法可以包括下列步骤:
S701、服务器根据基站的组别训练得到每个组别对应的决策模型。
服务器可以根据小区间的干扰关系对基站进行分组,并为每个组别分别训练一个决策模型。该每个组别对应的决策模型的具体训练过程可以参照图4所示的方法,这里不再赘述。
S702、基站获取基站的邻区关系表。
S703、基站根据该基站的邻区关系表,确定基站的组别。
S704、基站向服务器发送第一请求消息,该第一请求消息包括该基站的组别,对应地,服务器接收该第一请求消息。
S705、服务器根据该第一请求消息,为基站确定决策模型。
S706、服务器向基站发送该决策模型,对应地,基站接收该决策模型。
S707、基站根据该决策模型进行任务决策。
具体而言,基站可以通过邻区发现、与相邻小区的基站协商,获取基站的邻区关系表。
基站可以通过如下方式确定基站的组别:一种方式中,基站对基站的邻区关系表中的每个物理小区标识与组别总数做取余运算,得到每个物理小区标识对应的余数,然后根据该每个物理小区标识对应的不同余数的个数,确定基站的组别。
示例性地,组别总数为3,基站有10个邻区,则基站可以分别对10个邻区的物理小区标识与3做取余运算,若得到5个0、4个1以及1个2,则该基站的组别为2。
另一种方式中,可以预先定义基站的物理小区标识与基站的组别的对应关系,基站获取到邻区关系表后,根据邻区关系表中每个物理小区的标识查询该对应关系,得到每个物理小区的标识对应的基站的组别,然后根据每个物理小区标识对应的不同组别的个数,确定基站的组别。该对应关系可以是服务器配置给基站的,也可以核心网设备配置给基站的。
基站确定组别之后,基站向服务器发送第一请求消息,该第一请求消息包括该基站的组别信息。服务器接收该第一请求消息,并根据该第一请求消息,为基站确定组别信息对应的决策模型。
应理解,当邻区关系改变时,基站可以通过邻区发现、与相邻小区的基站协商,重新确定基站的邻区关系表。其中,邻区关系变化可以包括其它基站加入网络或退出网络。基站会根据更新的邻区关系表,重新确定基站的组别。基站向服务器发送第一请求消息,该第一请求消息包括该重新确定的基站的组别,对应地,服务器根据该第一请求消息,为基站重新确定该基站的组别对应的决策模型。
上述S701,还可以包括,服务器根据基站的类别或者基站的位置信息对基站进行分组,并为每个组别分别训练一个决策模型。
在本申请实施例中,基站根据邻区关系表确定基站的组别,并将组别信息发送给服务器用于确定决策模型。在本申请其他实施例中,基站也可以将邻区关系表携带在第一请求消息中发送给服务器,由服务器根据基站的邻区关系表确定基站的组别,其中,服务器可以采用与基站相同的方式确定基站的组别,这里不再赘述。
本申请实施例提供的决策方法可以进行仿真。示例性地,决策任务为功率控制,决策主体为基站。仿真环境中一共有N=25个蜂窝,呈5行5列分布。在每个蜂窝中,有K=4个用户,该用户随机分布在范围[R min,R max]内,R min可以是0.01km,R max可以是1km,则在25个蜂窝中,总用户数M=100。设干扰蜂窝距离L=1km,即每个蜂窝的相邻干扰蜂窝数|I|=6。设信道数量N c=16,多普勒频率fd=10Hz,时间间隔T s=20ms,最大功率P max=38dbm。
本申请实施例提供的决策方法,根据小区间的干扰关系可以将25个蜂窝分为3组,每组对应的决策模型分别训练5次,每组共得到5个决策模型,应理解,每组中5个决策模型的参数和结构可以均不相同。仿真环境中的决策模型采用的是基于DDPG的多智能体深度强化学习(multi‐agent deep deterministic policy gradient,MADDPG)。
仿真中可以设置5种对比方案,分别为:共享参数多智能体方案、分数优化方案(FP)、加权最小均方误差方案(WMMSE)、MAX方案和RANDOM方案。其中MAX方案中,基站每次都选择最大的发送功率,RANDOM方案中,基站每次随机选择小于最大允许发送功率的值进行发送。对比方案中的共享参数多智能体方案、FP、WMMSE、MAX和RANDOM算法可以不需要训练,直接用于测试。
在测试5次的过程中,每组中5个决策模型均只测试一次,对比方案中的算法需要分别测试5次。每个算法均可以得到关于25个蜂窝的测试结果。
对于上述6种算法的测试结果,仿真时可以分别计算25个蜂窝的测试结果的平均加权总和速率均值
Figure PCTCN2021128808-appb-000001
25个蜂窝的测试结果的最好加权总和速率均值r B、25个蜂窝的测试结果的最差加权总和速率均值r W,得到表1,其中,每个蜂窝的权重为1,应理解,每个蜂窝的权重也可以设置不同的值。表1为本申请实施例提供的仿真结果对比表。
表1
Figure PCTCN2021128808-appb-000002
Figure PCTCN2021128808-appb-000003
由表1可以看出,本申请实施例提供的决策方法对应的
Figure PCTCN2021128808-appb-000004
r B、r W的值均大于对比方案中算法对应的值,进一步证实在本申请实施例中,对网络中的通信装置进行分组,为每组训练决策模型的方法,可以更好地解决决策问题,使决策主体更好地进行决策。
上文中结合图1和图7,详细描述了本申请实施例的方法,下面将结合8和图9,详细描述本申请实施例的装置。
图8示出了本申请实施例提供的一种决策装置。该装置可以包括:收发单元、处理单元。
在一种可能的实现方式中,该装置用于执行上述方法实施例中第一通信装置对应的各个流程和步骤。
该收发单元用于:接收来自第一网元的决策模型,该决策模型是基于该装置的组别确定的。
该处理单元用于:根据该决策模型进行任务决策。
可选地,收发单元还用于:向第一网元发送第一请求消息,该第一请求消息用于请求该装置的决策模型。
可选地,上述第一请求消息包括该装置的组别。
可选地,上述第一请求消息包括以下信息中一个或者多个:该装置的物理小区标识、该装置的邻区关系表、该装置的类别、该装置的位置信息。
可选地,处理单元还用于:根据该装置的物理小区标识或者该装置的邻区关系表或者该装置的类别或者该装置的位置信息,确定该装置的组别。
可选地,处理单元还用于:对该装置的物理小区标识与组别总数做取余运算,得到余数;根据该余数,确定该装置的组别。
可选地,处理单元还用于:对该装置的邻区关系表中的每个物理小区标识与组别总数做取余运算,得到该每个物理小区标识对应的余数;根据该每个物理小区标识对应的不同余数的个数,确定该装置的组别。
可选地,处理单元还用于:基于该装置的训练样本,调整决策模型,该装置的训练样本包括该决策模型的状态信息、动作信息和收益信息。
可选地,上述装置的分组依据为小区间的干扰关系、该装置的类别或者该装置的位置信息。
可选地,上述决策模型是第一网元从多个决策模型中确定的,该多个决策模型对应通信装置的多个组别,每个组别对应的决策模型是基于该每个组别包括的通信装置的训练样本训练得到的。
在另一种可能的实现方式中,该装置用于执行上述方法实施例中第一网元对应的各个流程和步骤。
该处理单元用于:为第一通信装置确定决策模型,该决策模型是基于第一通信装置的组别确定的。
该收发单元用于:向第一通信装置发送该决策模型。
可选地,收发单元还用于:接收来自第一通信装置的第一请求消息,该第一请求消息用于请求第一通信装置的该决策模型;
处理单元还用于:根据该第一请求消息,为第一通信装置确定该决策模型。
可选地,上述第一请求消息包括第一通信装置的组别。
可选地,上述第一请求消息包括以下信息中一个或者多个:第一通信装置的物理小区标识、第一通信装置的邻区关系表、第一通信装置的类别、第一通信装置的位置信息。
可选地,处理单元还用于:对第一通信装置的物理小区标识与组别总数做取余运算,得到余数;根据该余数,确定第一通信装置的组别;根据该第一通信装置的组别,为第一通信装置确定该决策模型。
可选地,处理单元还用于:对第一通信装置的邻区关系表中的每个物理小区标识与组别总数做取余运算,得到每个物理小区标识对应的余数;根据该每个物理小区标识对应的不同余数的个数,确定第一通信装置的组别;根据该第一通信装置的组别,为第一通信装置确定决策模型。
可选地,处理单元还用于:从多个决策模型中确定该决策模型。
可选地,处理单元还用于:根据网络中的通信装置的信息,对该网络中的通信装置进行分组,得到至少一个组别,每个组别包括至少一个通信装置,该至少一个组别中包括第一通信装置属于的组别;获取每个组别包括的通信装置的训练样本;基于该每个组别包括的通信装置的训练样本,分别训练每个组别对应的决策模型。
可选地,处理单元还用于:根据网络中通信装置的小区间的干扰关系、网络中通信装置的类别或者网络中通信装置的位置信息,对网络中的通信装置进行分组,得到至少一个组别。
可选地,上述多个决策模型对应通信装置的多个组别,每个组别对应的决策模型是基于该每个组别包括的通信装置的训练样本训练得到的。
可选地,收发单元还用于:接收来自第二网元的一个或多个决策模型,该一个或多个决策模型对应通信装置的一个或多个组别,每个组别对应的决策模型是基于该每个组别包括的通信装置的训练样本训练得到的。
应理解,这里的装置以功能单元的形式体现。这里的术语“单元”可以指应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。在一个可选例子中,本领域技术人员可以理解,装置可以具体为上述实施例中的第一通信装置或第一网元,装置可以用于执行上述方法实施例中与第一通信装置或第一网元对应的各个流程和/或步骤,为避免重复,在此不再赘述。
上述各个方案的装置具有实现上述方法中第一通信装置或第一网元执行的相应步骤的功能;上述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。例如,上述收发单元可以包括发送单元和接收单元,该发送单元可以用于实现上述收发单元对应的用于执行发送动作的各个步骤和/或流程,该接收单元可以用于实现上述收发单元对应的用于执行接收动作的各个步骤和/或流程。该发送单元可以由发射器替代,该接收单元可以由接收器替代,分别执行各个方法实施例中的收发操作以及相关的处理操作。
在本申请的实施例,图8中的装置也可以是芯片或者芯片系统,例如:片上系统(system  on chip,SoC)。对应的,收发单元可以是该芯片的收发电路,在此不做限定。
图9示出了本申请实施例提供的另一种决策装置。该装置包括处理器、收发器和存储器。其中,处理器、收发器和存储器通过内部连接通路互相通信,该存储器用于存储指令,该处理器用于执行该存储器存储的指令,以控制该收发器发送信号和/或接收信号。
在一种可能的实现方式中,该装置用于执行上述方法中第一通信装置对应的各个流程和步骤。
其中,该收发器用于:接收来自第一网元的决策模型,该决策模型是基于所述装置的组别确定的。
该处理器用于:根据该决策模型进行任务决策。
在另一种可能的实现方式中,该装置用于执行上述方法中第一网元对应的各个流程和步骤。
其中,该处理器用于:为第一通信装置确定决策模型,该决策模型是基于第一通信装置的组别确定的。
该收发器用于:向第一通信装置发送该决策模型。
应理解,该装置可以具体为上述实施例中的第一通信装置或第一网元,并且可以用于执行上述方法实施例中与第一通信装置或第一网元对应的各个步骤和/或流程。可选地,该存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。存储器的一部分还可以包括非易失性随机存取存储器。例如,存储器还可以存储设备类型的信息。该处理器可以用于执行存储器中存储的指令,并且当该处理器执行存储器中存储的指令时,该处理器用于执行上述与该第一通信装置或第一网元对应的方法实施例的各个步骤和/或流程。该收发器可以包括发射器和接收器,该发射器可以用于实现上述收发器对应的用于执行发送动作的各个步骤和/或流程,该接收器可以用于实现上述收发器对应的用于执行接收动作的各个步骤和/或流程。
应理解,在本申请实施例中,上述装置的处理器可以是中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件单元组合执行完成。软件单元可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器执行存储器中的指令,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
本申请还提供了一种第一通信装置,包括输入输出接口和逻辑电路,该输入输出接口用于接收第一网元的决策模型,该逻辑电路用于根据该决策模型以及上述实施例中的方法进行任务决策。
本申请还提供了一种第一网元,包括输入输出接口和逻辑电路,该逻辑电路用于按照上述实施例中的方法为第一通信装置确定决策模型,该输入输出接口用于发送所述决策模 型。
本申请提供一种可读计算机存储介质,该可读计算机存储介质用于存储计算机程序,该计算机程序用于实现上述实施例中各种可能的实现方式所示的第一通信装置对应的方法。
本申请提供另一种可读计算机存储介质,该可读计算机存储介质用于存储计算机程序,该计算机程序用于实现上述实施例中各种可能的实现方式所示的第一网元对应的方法。
本申请提供一种计算机程序产品,该计算机程序产品包括计算机程序(也可以称为代码,或指令),当该计算机程序在计算机上运行时,该计算机可以执行上述实施例所示的第一通信装置对应的方法。
本申请提供另一种计算机程序产品,该计算机程序产品包括计算机程序(也可以称为代码,或指令),当该计算机程序在计算机上运行时,该计算机可以执行上述实施例中各种可能的实现方式所示的第一网元对应的方法。
本申请提供一种芯片系统,该芯片系统用于支持上述第一通信装置实现本申请实施例所示的功能。
本申请提供另一种芯片系统,该芯片系统用于支持上述第一网元实现本申请实施例所示的功能。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而 前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (50)

  1. 一种决策方法,其特征在于,包括:
    第一通信装置接收来自第一网元的决策模型,所述决策模型是基于所述第一通信装置的组别确定的;
    所述第一通信装置根据所述决策模型进行任务决策。
  2. 根据权利要求1所述的方法,其特征在于,所述第一通信装置接收来自第一网元的决策模型之前,所述方法还包括:
    所述第一通信装置向所述第一网元发送第一请求消息,所述第一请求消息用于请求所述第一通信装置的决策模型。
  3. 根据权利要求2所述的方法,其特征在于,所述第一请求消息包括所述第一通信装置的组别。
  4. 根据权利要求2所述的方法,其特征在于,所述第一请求消息包括以下信息中一个或者多个:
    所述第一通信装置的物理小区标识、所述第一通信装置的邻区关系表、所述第一通信装置的类别、所述第一通信装置的位置信息。
  5. 根据权利要求3所述的方法,其特征在于,所述第一通信装置向所述第一网元发送第一请求消息之前,所述方法还包括:
    所述第一通信装置根据所述第一通信装置的物理小区标识或者所述第一通信装置的邻区关系表或者所述第一通信装置的类别或者所述第一通信装置的位置信息,确定所述第一通信装置的组别。
  6. 根据权利要求5所述的方法,其特征在于,所述第一通信装置根据所述第一通信装置的物理小区标识,确定所述第一通信装置的组别,包括:
    所述第一通信装置对所述第一通信装置的物理小区标识与组别总数做取余运算,得到余数;
    所述第一通信装置根据所述余数,确定所述第一通信装置的组别。
  7. 根据权利要求5所述的方法,其特征在于,所述第一通信装置根据所述第一通信装置的邻区关系表,所述确定所述第一通信装置的组别,包括:
    所述第一通信装置对所述第一通信装置的邻区关系表中的每个物理小区标识与组别总数做取余运算,得到余数;
    所述第一通信装置根据所述每个物理小区标识对应的不同余数的个数,确定所述第一通信装置的组别。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:
    所述第一通信装置基于所述第一通信装置的训练样本,调整所述决策模型,所述第一通信装置的训练样本包括所述决策模型的状态信息、动作信息和收益信息。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述第一通信装置的分组依据为小区间的干扰关系、所述第一通信装置的类别或者所述第一通信装置的位置信息。
  10. 根据权利要求1至9中任一项所述的方法,其特征在于,所述决策模型是所述第 一网元从多个决策模型中确定的,所述多个决策模型对应通信装置的多个组别,每个所述组别对应的决策模型是基于每个所述组别包括的通信装置的训练样本训练得到的。
  11. 一种决策方法,其特征在于,包括:
    第一网元为第一通信装置确定决策模型,所述决策模型是基于所述第一通信装置的组别确定的;
    所述第一网元向所述第一通信装置发送所述决策模型。
  12. 根据权利要求11所述的方法,其特征在于,所述第一网元为第一通信装置确定决策模型之前,所述方法还包括:
    所述第一网元接收来自所述第一通信装置的第一请求消息,所述第一请求消息用于请求所述第一通信装置的决策模型;
    所述第一网元为第一通信装置确定决策模型,包括:
    所述第一网元根据所述第一请求消息,为所述第一通信装置确定所述决策模型。
  13. 根据权利要求12所述的方法,其特征在于,所述第一请求消息包括所述第一通信装置的组别。
  14. 根据权利要求12所述的方法,其特征在于,所述第一请求消息包括以下信息中一个或者多个:
    所述第一通信装置的物理小区标识、所述第一通信装置的邻区关系表、所述第一通信装置的类别、所述第一通信装置的位置信息。
  15. 根据权利要求14所述的方法,其特征在于,所述第一网元根据所述第一请求消息,为所述第一通信装置确定决策模型,包括:
    所述第一网元对所述第一通信装置的物理小区标识与组别总数做取余运算,得到余数;
    所述第一网元根据所述余数,确定所述第一通信装置的组别;
    所述第一网元根据所述第一通信装置的组别,为所述第一通信装置确定所述决策模型。
  16. 根据权利要求14所述的方法,其特征在于,所述第一网元根据所述第一请求消息,为所述第一通信装置确定决策模型,包括:
    所述第一网元对所述第一通信装置的邻区关系表中的每个物理小区标识与组别总数做取余运算,得到所述每个物理小区标识对应的余数;
    所述第一网元根据所述每个物理小区标识对应的不同余数的个数,确定所述第一通信装置的组别;
    所述第一网元根据所述第一通信装置的组别,为所述第一通信装置确定所述决策模型。
  17. 根据权利要求11至16中任一项所述的方法,其特征在于,所述方法还包括:
    所述第一网元从多个决策模型中确定所述决策模型,所述多个决策模型对应通信装置的多个组别,每个所述组别对应的决策模型是基于每个所述组别包括的通信装置的训练样本训练得到的。
  18. 根据权利要求11至17中任一项所述的方法,其特征在于,所述方法还包括:
    所述第一网元根据网络中的通信装置的信息,对所述网络中的通信装置进行分组,得到至少一个组别,每个组别包括至少一个通信装置,所述至少一个组别中包括所述第一通信装置属于的组别;
    所述第一网元获取每个组别包括的通信装置的训练样本;
    所述第一网元基于所述每个组别包括的通信装置的训练样本,分别训练所述每个组别对应的决策模型。
  19. 根据权利要求18所述的方法,其特征在于,所述第一网元根据网络中的通信装置的信息,对所述网络中的通信装置进行分组,得到至少一个组别,包括:
    所述第一网元根据所述网络中通信装置的小区间的干扰关系、所述网络中通信装置的类别或者所述网络中通信装置的位置信息,对所述网络中的通信装置进行分组,得到至少一个组别。
  20. 根据权利要求11至19中任一项所述的方法,其特征在于,所述方法还包括:
    所述第一网元接收来自第二网元的一个或多个决策模型,所述一个或多个决策模型对应通信装置的一个或多个组别,每个所述组别对应的决策模型是基于每个所述组别包括的通信装置的训练样本训练得到的。
  21. 一种决策装置,其特征在于,包括:
    收发单元,用于接收来自第一网元的决策模型,所述决策模型是基于所述装置的组别确定的;
    处理单元,用于根据所述决策模型进行任务决策。
  22. 根据权利要求21所述的装置,其特征在于,所述收发单元用于:
    向所述第一网元发送第一请求消息,所述第一请求消息用于请求所述装置的决策模型。
  23. 根据权利要求22所述的装置,其特征在于,所述第一请求消息包括所述装置的组别。
  24. 根据权利要求22所述的装置,其特征在于,所述第一请求消息包括以下信息中一个或者多个:
    所述装置的物理小区标识、所述装置的邻区关系表、所述装置的类别、所述装置的位置信息。
  25. 根据权利要求23所述的装置,其特征在于,所述处理单元用于:
    根据所述装置的物理小区标识或者所述装置的邻区关系表或者所述装置的类别或者所述装置的位置信息,确定所述装置的组别。
  26. 根据权利要求25所述的装置,其特征在于,所述处理单元用于:
    对所述装置的物理小区标识与组别总数做取余运算,得到余数;
    根据所述余数,确定所述装置的组别。
  27. 根据权利要求25所述的装置,其特征在于,所述处理单元用于:
    对所述装置的邻区关系表中的每个物理小区标识与组别总数做取余运算,得到每个物理小区标识对应的余数;
    根据所述每个物理小区标识对应的不同余数的个数,确定所述装置的组别。
  28. 根据权利要求21至27中任一项所述的装置,其特征在于,所述处理单元用于:
    基于所述装置的训练样本,调整所述决策模型,所述装置的训练样本包括所述决策模型的状态信息、动作信息和收益信息。
  29. 根据权利要求21至28中任一项所述的装置,其特征在于,所述装置的分组依据为小区间的干扰关系、所述装置的类别或者所述装置的位置信息。
  30. 根据权利要求21至29中任一项所述的装置,其特征在于,所述决策模型是所述 第一网元从多个决策模型中确定的,所述多个决策模型对应通信装置的多个组别,每个所述组别对应的决策模型是基于每个所述组别包括的通信装置的训练样本训练得到的。
  31. 一种决策装置,其特征在于,包括:
    处理单元,用于为第一通信装置确定决策模型,所述决策模型是基于所述第一通信装置的组别确定的;
    收发单元,用于向所述第一通信装置发送所述决策模型。
  32. 根据权利要求31所述的装置,其特征在于,所述收发单元用于:
    接收来自所述第一通信装置的第一请求消息,所述第一请求消息用于请求所述第一通信装置的决策模型;
    所述处理单元用于:
    根据所述第一请求消息,为所述第一通信装置确定所述决策模型。
  33. 根据权利要求32所述的装置,其特征在于,所述第一请求消息包括所述第一通信装置的组别。
  34. 根据权利要求32所述的装置,其特征在于,所述第一请求消息包括以下信息中一个或者多个:
    所述第一通信装置的物理小区标识、所述第一通信装置的邻区关系表、所述第一通信装置的类别、所述第一通信装置的位置信息。
  35. 根据权利要求34所述的装置,其特征在于,所述处理单元用于:
    对所述第一通信装置的物理小区标识与组别总数做取余运算,得到余数;
    根据所述余数,确定所述第一通信装置的组别;
    根据所述第一通信装置的组别,为所述第一通信装置确定所述决策模型。
  36. 根据权利要求34所述的装置,其特征在于,所述处理单元用于:
    对所述第一通信装置的邻区关系表中的每个物理小区标识与组别总数做取余运算,得到所述每个物理小区标识对应的余数;
    根据所述每个物理小区标识对应的不同余数的个数,确定所述第一通信装置的组别;
    根据所述第一通信装置的组别,为所述第一通信装置确定所述决策模型。
  37. 根据权利要求31至36中任一项所述的装置,其特征在于,所述处理单元用于:
    从多个决策模型中确定所述决策模型,所述多个决策模型对应通信装置的多个组别,每个所述组别对应的决策模型是基于每个所述组别包括的通信装置的训练样本训练得到的。
  38. 根据权利要求31至37中任一项所述的装置,其特征在于,所述处理单元用于:
    根据网络中的通信装置的信息,对所述网络中的通信装置进行分组,得到至少一个组别,每个组别包括至少一个通信装置,所述至少一个组别中包括所述第一通信装置属于的组别;
    获取每个组别包括的通信装置的训练样本;
    基于所述每个组别包括的通信装置的训练样本,分别训练所述每个组别对应的决策模型。
  39. 根据权利要求38所述的装置,其特征在于,所述处理单元用于:
    根据所述网络中通信装置的小区间的干扰关系、所述网络中通信装置的类别或者所述 网络中通信装置的位置信息,对所述网络中的通信装置进行分组,得到至少一个组别。
  40. 根据权利要求31至39中任一项所述的装置,其特征在于,所述收发单元用于:
    接收来自第二网元的一个或多个决策模型,所述一个或多个决策模型对应通信装置的一个或多个组别,每个所述组别对应的决策模型是基于每个所述组别包括的通信装置的训练样本训练得到的。
  41. 一种第一通信装置,其特征在于,包括:处理器和收发器,所述收发器用于和其它装置通信,所述处理器与存储器耦合,所述存储器用于存储计算机程序,当所述处理器调用所述计算机程序时,使得所述装置执行权利要求1至10中任一项所述的方法。
  42. 一种第一网元,其特征在于,包括:处理器和收发器,所述收发器用于和其它装置通信,所述处理器与存储器耦合,所述存储器用于存储计算机程序,当所述处理器调用所述计算机程序时,使得所述网元执行权利要求11至20中任一项所述的方法。
  43. 一种芯片系统,其特征在于,包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有所述芯片系统的通信设备执行权利要求1至20中任一项所述的方法。
  44. 一种第一通信装置,其特征在于,包括:输入输出接口和逻辑电路,所述输入输出接口用于接收第一网元的决策模型,所述逻辑电路用于根据所述决策模型进行任务决策,使得所述通信装置执行权利要求1至10中任一项所述的方法。
  45. 一种第一网元,其特征在于,包括:输入输出接口和逻辑电路,所述逻辑电路用于为第一通信装置确定决策模型,所述输入输出接口用于发送所述决策模型,使得所述网元执行权利要求11至20中任一项所述的方法。
  46. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,当所述计算机程序在计算机上运行时,使得权利要求1至10中任一项所述的方法或者权利要求11至20中任一项所述的方法被执行。
  47. 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,当所述指令被执行时,使得权利要求1至10中任一项所述的方法或者权利要求11至20中任一项所述的方法被执行。
  48. 一种计算机程序,当其在计算机上运行时,使得权利要求1至10中任一项所述的方法或者权利要求11至20中任一项所述的方法被执行。
  49. 一种通信系统,其特征在于,包括:权利要求21至30任一项所述的决策装置和权利要求31至40任一项所述的决策装置。
  50. 一种通信系统,其特征在于,包括权利要求41所述的第一通信装置和权利要求42所述的第一网元。
PCT/CN2021/128808 2020-11-11 2021-11-04 决策方法和决策装置 WO2022100514A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011255202.2A CN114501484A (zh) 2020-11-11 2020-11-11 决策方法和决策装置
CN202011255202.2 2020-11-11

Publications (1)

Publication Number Publication Date
WO2022100514A1 true WO2022100514A1 (zh) 2022-05-19

Family

ID=81491021

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/128808 WO2022100514A1 (zh) 2020-11-11 2021-11-04 决策方法和决策装置

Country Status (2)

Country Link
CN (1) CN114501484A (zh)
WO (1) WO2022100514A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188761A (zh) * 2011-12-29 2013-07-03 中兴通讯股份有限公司 多网接入方法及装置
KR101291123B1 (ko) * 2012-12-14 2013-08-01 주식회사 안랩 휴대 단말의 어플리케이션 관리 제어 방법 및 장치와 그 방법을 실행하기 위한 프로그램이 기록된 기록 매체
EP2903332A1 (en) * 2014-01-31 2015-08-05 Alcatel Lucent Load balancing of data flows

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188761A (zh) * 2011-12-29 2013-07-03 中兴通讯股份有限公司 多网接入方法及装置
KR101291123B1 (ko) * 2012-12-14 2013-08-01 주식회사 안랩 휴대 단말의 어플리케이션 관리 제어 방법 및 장치와 그 방법을 실행하기 위한 프로그램이 기록된 기록 매체
EP2903332A1 (en) * 2014-01-31 2015-08-05 Alcatel Lucent Load balancing of data flows

Also Published As

Publication number Publication date
CN114501484A (zh) 2022-05-13

Similar Documents

Publication Publication Date Title
US10117107B2 (en) Method, apparatus, system and computer program
EP3661099A1 (en) Time allocation method, network device, and ue
KR101813822B1 (ko) D2d 발견 신호의 송신 방법과 송신 장치
WO2020164464A1 (zh) 无线通信的方法和装置
CN109845312B (zh) 数据传输方法、装置、计算机设备及系统
JP6980171B2 (ja) 情報の伝送方法および装置
WO2018019113A1 (zh) 电子设备和用于电子设备的方法
WO2022100514A1 (zh) 决策方法和决策装置
Zhao et al. Dynamic and non-centric networking approach using virtual gateway platforms for low power wide area systems
WO2023030032A1 (zh) 信道状态信息的获取方法和装置
JP7425197B2 (ja) スケジューリング方法及び装置
WO2022001650A1 (zh) 干扰协同方法及相关设备
WO2022082767A1 (zh) 一种通信方法及相关设备
WO2021159809A1 (zh) 一种控制信息传输方法及装置
WO2021159841A1 (zh) 一种控制信息传输方法及装置
CN116346593A (zh) 一种通信方法及通信装置
CN112203326B (zh) 一种数据传输方法及通信装置
WO2024138653A1 (zh) 用于通信的方法、终端设备以及网络设备
WO2024002004A1 (zh) 一种参考信号的指示方法及装置
CN108391310A (zh) 通信处理方法和设备
WO2022110245A1 (zh) 一种资源指示方法及装置
WO2024012319A1 (zh) 用于无线通信的电子设备和方法、计算机可读存储介质
WO2024061125A1 (zh) 一种通信方法及装置
WO2024130713A1 (zh) 一种信号处理的方法和通信装置
WO2024098262A1 (zh) 一种通信方法、装置、计算机可读存储介质和程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21891040

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21891040

Country of ref document: EP

Kind code of ref document: A1