WO2022199032A1 - 模型构建方法、任务分配方法、装置、设备及介质 - Google Patents

模型构建方法、任务分配方法、装置、设备及介质 Download PDF

Info

Publication number
WO2022199032A1
WO2022199032A1 PCT/CN2021/128250 CN2021128250W WO2022199032A1 WO 2022199032 A1 WO2022199032 A1 WO 2022199032A1 CN 2021128250 W CN2021128250 W CN 2021128250W WO 2022199032 A1 WO2022199032 A1 WO 2022199032A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
state
drone
mobile terminal
task
Prior art date
Application number
PCT/CN2021/128250
Other languages
English (en)
French (fr)
Inventor
任涛
胡哲源
谷宁波
牛建伟
杜东峰
豆渊博
李青锋
Original Assignee
北京航空航天大学杭州创新研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京航空航天大学杭州创新研究院 filed Critical 北京航空航天大学杭州创新研究院
Publication of WO2022199032A1 publication Critical patent/WO2022199032A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44594Unloading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/12Timing analysis or timing optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload

Definitions

  • the present application relates to the field of data processing, and in particular, to a model construction method, a task assignment method, an apparatus, a device and a medium.
  • UAVs Unmanned Aerial Vehicles
  • MEC Mobile Edge Computing
  • task scheduling means assigning the same computing task to a drone or a mobile terminal (hereinafter referred to as task offloading).
  • task offloading a method based on reinforcement learning has emerged to realize the scheduling strategy of UAV-assisted mobile edge computing in dynamic scenarios.
  • the inventor's research found that with the increase of the number of drones and mobile terminals, the state space and action space of the system using the reinforcement learning algorithm will increase exponentially, which greatly reduces the convergence efficiency of the algorithm. Therefore, it is difficult to obtain an easily convergent scheduling strategy for large-scale UAV-assisted mobile edge computing networks.
  • An embodiment of the present application provides a method for building a model, which is applied to a training device, where the training device is configured with a location model to be trained and a task assignment model, and the method may include:
  • the task assignment result between the first drone and the first mobile terminal at the next moment is determined by the task assignment model
  • the model parameters of the task assignment model are updated.
  • the updating the model parameters of the location model according to the predicted position may include: updating the first state according to the predicted position; presetting a first reward strategy according to the updated first state A first reward value corresponding to the updated first state is obtained; and model parameters of the location model are updated according to the first reward value.
  • obtaining the first reward value corresponding to the updated first state by presetting the first reward strategy according to the updated first state may include: obtaining by presetting the first reward strategy.
  • a negative reward value adjusts the first reward value, wherein the first restriction condition may include: the movement speed of the first drone exceeds a speed threshold; the movement frequency of the first drone exceeds a frequency threshold .
  • the first state may include the position of the first mobile terminal, the remaining power of the first mobile terminal, and the remaining power of the first drone.
  • the updating the model parameters of the task assignment model according to the task assignment result may include: updating the second state according to the task assignment result; according to the updated second state, by preset The second reward strategy obtains a second reward value corresponding to the second state; according to the second reward value, the model parameters of the location model are updated.
  • obtaining a second reward value corresponding to the second state by using a preset second reward strategy according to the updated second state includes: obtaining a second reward value corresponding to the second state by using a preset second reward strategy.
  • the second state may include the predicted position of the first drone, the position of the first mobile terminal, the remaining power of the first drone, the remaining power of the first mobile terminal power and computing tasks in the first mobile terminal.
  • the execution device may be configured with a pre-trained location model and a task assignment model.
  • the pre-trained location model and the task assignment model are configured by the
  • the model construction method is obtained by training, and the method may include:
  • the predicted position of the second drone at the next moment is determined by the position model
  • a task assignment result between the second UAV and the second mobile terminal is determined through the task assignment model.
  • determining the predicted position of the second drone at the next moment by using the position model may include: every first time segment, according to the third state, passing the position The model determines the predicted position of the second drone at the next moment, wherein the first duration segment includes a plurality of second duration segments.
  • determining the task assignment result between the second drone and the second mobile terminal by using the task assignment model may include: for each second duration segment, Keeping the position of the second UAV unchanged, according to the fourth state, the task assignment result between the second UAV and the second mobile terminal is determined through the task assignment model.
  • model construction device is applied to a training device, and the training device is configured with a location model to be trained and a task allocation model, and the model construction device may include:
  • a model initialization module may be configured to initialize the location model, the task assignment model, the state of the first drone, and the state of the first mobile terminal, wherein the first drone is used for providing edge computing services for the first mobile terminal;
  • a model initialization module may be configured to initialize the location model, the task assignment model, the state of the first drone, and the state of the first mobile terminal, wherein the first drone is used for all the first mobile terminal provides edge computing services;
  • the model training module can be configured to perform the following iterations on the location model and the task assignment model until a preset iteration condition is met:
  • the task assignment result between the first drone and the first mobile terminal at the next moment is determined by the task assignment model
  • the model parameters of the task assignment model are updated.
  • Yet another embodiment of the present application provides a task scheduling apparatus, which is applied to an execution device, where the execution device is configured with a pre-trained location model and a task allocation model, and the pre-trained location model and the task allocation model are configured by the The model building device is trained to obtain, and the task scheduling device may include:
  • a state acquisition module which can be configured to acquire the third state of the second drone at the current moment
  • a position determination module which may be configured to determine, according to the third state, the predicted position of the second drone at the next moment through the position model
  • the state acquisition module may also be configured to determine a fourth state between the second drone and the second mobile terminal according to the predicted position of the second drone at the next moment;
  • a task assignment module may be configured to determine a task assignment result between the second UAV and the second mobile terminal through the task assignment model according to the fourth state.
  • the electronic device may include a processor and a memory, the memory stores a computer program, and when the computer program is executed by the processor, realizes the model construction method or the described task assignment method.
  • Yet another embodiment of the present application provides a storage medium, where the storage medium stores a computer program, and when the computer program is executed by a processor, implements the model construction method or the task assignment method.
  • the training device divides the scheduling strategy of UAV-assisted mobile edge computing into two levels: UAV position optimization and task computing offload optimization
  • UAV position optimization and task computing offload optimization In order to reduce the complexity of each sub-problem and improve the learning efficiency and convergence efficiency of the overall system, hierarchical reinforcement learning is used to alternately optimize the corresponding position model and task model.
  • FIG. 1 is a schematic diagram of a scenario provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a model construction method provided by an embodiment of the present application.
  • FIG. 3 is a block diagram of a training process provided by the embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a task allocation method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a model building apparatus provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a task assignment device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Icons 100-UAV; 200-Mobile terminal; 301-Model initial module; 302-Model training module; 401-Status acquisition module; 402-Location determination module; 404-Task allocation module; 520-Memory; device; 540-communication device.
  • UAVs Unmanned Aerial Vehicles
  • MEC Mobile Edge Computing
  • the UAV 100 can be used as a communication relay station or an edge computing platform. After the UAV 100 deploys computing resources, the mobile edge computing network assisted by the UAV 100 will bring many advantages, such as reduced network overhead, reduced latency of computing task execution, and better Quality of Experience (QoE). , extending the battery life of the mobile terminal 200, and the like.
  • QoE Quality of Experience
  • the algorithm strategy in the UAV-assisted mobile edge computing scheduling scenario when faced with a dynamic scenario where the relative position between the UAV and the mobile terminal changes with time, needs to be based on the new position of the UAV and the mobile terminal. , re-solving the new optimization results, resulting in a higher computational burden on the system.
  • the inventor proposes a method based on reinforcement learning to realize the scheduling strategy of UAV-assisted mobile edge computing in dynamic scenarios.
  • the state space and action space of the system using the reinforcement learning algorithm will increase exponentially, which greatly reduces the convergence efficiency of the algorithm.
  • the embodiments of the present application provide a model construction method applied to a training device.
  • the scheduling strategy of UAV-assisted mobile edge computing is divided into two sub-problems of UAV position optimization and task offloading optimization, and corresponding policy models are provided for different sub-problems.
  • a location model is provided for the location optimization problem
  • a task assignment model is provided for the task offload optimization problem.
  • the policy model obtained by pre-training is deployed to the execution device, and based on the pre-trained policy model, a task assignment method is provided.
  • the execution device determines the predicted position of the UAV and the task assignment result according to the state information of the UAV under the usage scenario and the state information of the mobile terminal.
  • a corresponding mathematical model is constructed for the mobile edge computing scene assisted by the UAV. It is assumed that in the scenario of mobile edge computing assisted by large-scale drones, there are M mobile terminals and U drones.
  • the space environment in which the UAV and the mobile terminal are located can be modeled through the three-dimensional Cartesian coordinate system (the coordinate axes represented by x, y, and z in Figure 1).
  • the plane moves, and the drone moves in the plane at the distance height H.
  • the mobile terminal can move horizontally from one position to another, and the moving distances of all mobile terminals obey the normal distribution N(0, ⁇ 2 ).
  • N normal distribution
  • ensures that the mobile terminal can only move to the adjacent horizontal space in most cases.
  • the mobile terminal can move in 4 directions (east, south, west, north) each time.
  • M represents all mobile terminals, Indicates the coordinates of the mobile terminal in the x-direction in the n-th duration segment, The coordinates of the mobile terminal in the y direction within the nth duration segment.
  • U represents all UAVs, Represents the coordinates of the drone in the x-direction within the nth time segment, The coordinate in the y-direction of the nth duration segment.
  • the UAV Since the relative position between the UAV and the mobile terminal will change with time, it is assumed that the UAV is moved to the best position based on the distribution of the current position of the mobile terminal, and the optimal position is allocated between the UAV and the mobile terminal. optimal task assignment results. However, the position of the mobile terminal will change along with the user, so that the current best position of the drone and the current best task assignment result are not applicable to the next moment.
  • the embodiment of the present application adopts a complete unloading strategy, that is, only one of the mobile terminal and the UAV for the same computing task can be selected to execute the task. Assuming that the nth computing task is executed by the mobile terminal, it is expressed as:
  • the UAV when the UAV provides the edge computing service, it is required that the average delay of all mobile terminals to complete the computing task be minimum.
  • the status information to be considered includes the location of the mobile terminal, the location of the drone, the remaining power of the mobile terminal, the remaining power of the drone, and the tasks that the mobile terminal needs to process.
  • the frequency of the UAV movement and the maximum horizontal flight speed of the UAV need to be limited.
  • the drone moves every time interval ⁇ .
  • the duration ⁇ is divided into multiple duration segments.
  • the drone can move the position once in the first duration segment, and then keep the same position in the subsequent duration segments until the duration ⁇ ends.
  • is 10 minutes and is divided into 10 duration segments, that is, each duration segment is 1 minute. Then, after the drone changes position within 1 minute, the position remains unchanged for the following 9 1 minute until the end of 10 minutes.
  • the motion state of the UAV can be expressed as:
  • the distance change between the UAV and the mobile terminal is the main factor affecting the gain of the wireless channel because of the high flying height of the UAV. Therefore, in the nth time segment, the channel gain from UAV u to mobile terminal m can be modeled by free space path loss To represent:
  • the task I m (n) generated by the mobile terminal m can be expressed as:
  • I m (n) ⁇ D m (n), C m (n), ⁇ m (n) ⁇ ;
  • D m (n) represents the amount of data (unit: bit) that needs to be processed by the task I m (n)
  • C m (n) represents the number of CPU cycles required to process 1-bit data
  • ⁇ m (n) represents the task.
  • the mobile terminal uses its maximum power data transfer.
  • Each UAV only receives computing tasks sent by at most one mobile terminal in each time segment.
  • Data transfer rate between drone u and mobile terminal m (unit: bps/Hz) can be expressed as:
  • ⁇ 2 represents the noise power of the UAV u.
  • B represents the channel bandwidth
  • the amount of data D m (n) that needs to be processed for the task I m (n), the data transfer rate Channel bandwidth B, the energy consumed to send the task to the UAV can be expressed as:
  • f m represents the computing capability of the mobile terminal (unit: cycle/second).
  • ⁇ m represents the architecture coefficient related to the CPU architecture of mobile terminal m , Indicates the average power of the mobile terminal when performing tasks.
  • the drone For the drone u, based on the amount of data D m (n) that needs to be processed in the task I m (n), the number of CPU cycles C u (n) required to process 1-bit data, the drone performs the computing task I m (n) ), the delay required to complete the task It can be expressed as:
  • f u represents the computing power of the UAV (unit: cycle/second).
  • the energy consumed by the UAV locally calculated It can be expressed as:
  • ⁇ u denotes the architectural coefficient related to the UAV CPU architecture, Indicates the average power of the mobile terminal when performing tasks.
  • C1-C8 are the constraints that the UAV and the mobile terminal need to meet, and the specific performance is as follows:
  • Constraints C1 and C2 ensure the limited speed and update frequency of the flight position of the UAV
  • Constraints C3, C4 and C5 represent the constraints of task offloading between the UAV and the mobile terminal
  • Constraints C6 and C7 are the energy consumption constraints of the mobile terminal and the drone, ⁇ U represents the electrical energy stored by the drone, and ⁇ M represents the electrical energy stored by the mobile terminal;
  • Constraint C8 guarantees that each computational task should complete within its maximum allowable delay and time slice.
  • t m (n) is related to the calculation method selected for each task (that is, the local calculation of the mobile terminal or the calculation of the drone), and the specific expression is:
  • the distance is to determine the channel gain key factor, while the channel gain will further affect the wireless transmission rate
  • the delay and energy consumption table that ultimately affects wireless transmission the corresponding mathematical expression is:
  • the first restriction restriction condition may include C1 and C2 in the foregoing restriction conditions.
  • the UAV position will remain unchanged in each subsequent time segment of ⁇ .
  • the task assignment result The determination of the distance between the mobile terminal and the UAV needs to be considered and the current task Im (n) of the mobile terminal to ensure that the average delay of all mobile terminals to complete the computing task is the smallest. Therefore, in the After determining the position of the UAV within a period of time, the average delay for the mobile terminal to complete the computing task can be expressed as:
  • Represents the remaining duration segments in duration ⁇ , t m (n) is related to the calculation method selected for each task (ie, local computing on the mobile terminal or computing by the drone). Also, a second constraint needs to be satisfied, wherein the second constraint may include C3-C8 in the above constraints.
  • the existing state can be expressed as:
  • the actions to be performed include:
  • the objective function P is changed as follows:
  • the immediate reward function rn is:
  • the ultimate goal is to maximize the future reward V ⁇ obtained from the environment during the entire task execution time by continuously updating the policy ⁇ in the large-scale UAV-assisted mobile edge computing network.
  • the value function of the future reward V ⁇ can be expressed as:
  • ⁇ [0,1] represents the discount factor for future rewards.
  • the neural network model is used for training in the embodiment of the present application to fit the UAV position optimization strategy and the task offloading strategy.
  • the above-mentioned strategy model is divided into a position model to be trained and a task allocation model, and then the training equipment performs alternate training.
  • the model building method may include:
  • Step S1A initialize the position model, the task assignment model, the state of the first drone, and the state of the first mobile terminal.
  • the first drone is used to provide edge computing services for the first mobile terminal.
  • the UAV during the training period is referred to as the first UAV
  • the mobile terminal during the training period is referred to as the first mobile terminal. terminal.
  • the drone during model use is called the second drone
  • the mobile terminal during training is called the second mobile terminal.
  • Both the location model and the task assignment model are neural network models used for reinforcement learning.
  • reinforcement learning as a machine learning method, is between supervised learning and unsupervised learning. The principle is:
  • the location model and task assignment model to be trained are an agent, and a certain behavioral strategy of the intelligence leads to a positive reward (reinforcement signal) in the environment, then the tendency of the agent to generate this behavioral strategy in the future will be strengthened.
  • the agent's goal is to discover the optimal policy in each discrete state to maximize the desired discounted reward sum.
  • the learning process of reinforcement learning is regarded as a tentative evaluation process.
  • the agent selects an action for the environment, the state changes after the environment accepts the action, and at the same time generates a reinforcement signal (reward or punishment) to feed back to the agent, and the agent according to the reinforcement signal
  • the next action is selected according to the current state of the environment.
  • the principle of selection is to increase the probability of receiving positive reinforcement (reward).
  • the selected action not only affects the immediate enhancement value, but also affects the state of the environment at the next moment and the final enhancement value.
  • Reinforcement learning is different from label-based supervised learning, mainly in the reinforcement signal.
  • the reinforcement signal provided by the environment in reinforcement learning is an evaluation (usually a scalar signal) by the agent of the quality of the action generated, rather than telling How the agent generates the correct actions. Since the external environment provides little information, the agent must learn from its own experience. In this way, the agent acquires knowledge in the environment where actions are evaluated one by one, and improves the action plan to suit the environment.
  • the location model in the embodiment of the present application may select the DDPG (Deep Deterministic Policy Gradient, deep deterministic policy gradient) model; the task assignment model may select the DQN (Deep Q Networks, deep Q network) model.
  • the reinforcement learning model suitable for continuous actions can also be used as a position model; other reinforcement learning models suitable for discrete actions can also be used as task allocation models, which are not specifically limited in this embodiment of the present application.
  • Step S2A perform the following iterations on the location model and the task allocation model until the preset iteration conditions are met:
  • the training terminal obtains the predicted position of the first UAV at the next moment through the position model according to the first state at the current moment between the first mobile terminal and the first UAV; and updates the model parameters of the position model according to the predicted position.
  • the first state may be the location of the first mobile terminal, the remaining power of the first mobile terminal, and the remaining power of the first drone.
  • the training device updates the first state according to the predicted position; according to the updated first state, obtains a first reward value corresponding to the updated first state by presetting the first reward strategy; according to the first reward value , to update the model parameters of the location model.
  • the training terminal determines the second state at the current moment between the first drone and the first mobile terminal according to the predicted position; according to the second state, the task assignment model determines the relationship between the first drone and the first mobile terminal The task assignment result at the next moment; according to the task assignment result, the model parameters of the task assignment model are updated.
  • the second state may include the predicted position of the first drone, the position of the first mobile terminal, the remaining power of the first drone, the remaining power of the first mobile terminal, and the computing task in the first mobile terminal.
  • the training terminal updates the second state according to the task assignment result; according to the updated second state, obtains a second reward value corresponding to the second state through a preset second reward strategy; and updates the position according to the second reward value Model parameters for the model.
  • the training equipment splits the scheduling strategy of UAV-assisted mobile edge computing into two sub-problems of UAV position optimization and task calculation offload optimization, and uses hierarchical reinforcement learning to alternately optimize the corresponding position model and task model to achieve The complexity of each sub-problem is reduced, and the learning efficiency and convergence efficiency of the overall system are improved.
  • the above-mentioned training process will be described in detail below with reference to the above-mentioned DDPG model and DQN model.
  • the DDPG model includes a critic network and an actor network.
  • the actor network is used to determine the movement strategy of the first UAV and perform the action of updating the position according to the state in the environment, while the critic network is used to perform the action of updating the position.
  • a score that represents the expected maximum benefit from the action of updating the location.
  • the actor network adjusts its own strategy according to the score of the critic network, that is, the model parameters in the actor network are updated.
  • the critic network adjusts its scoring strategy according to the first reward value of the feedback from the environment, that is, updates the model parameters in the critic network.
  • the training device when acquiring the first reward value, obtains the first reward value corresponding to the updated first state by presetting the first reward strategy and performs calculation.
  • the training device determines, according to the updated first state, that the first drone satisfies any of the first restriction conditions
  • the first reward value is adjusted by presetting the first negative reward value, wherein the first restriction condition may be include:
  • the movement speed of the first drone exceeds the speed threshold
  • the frequency of movement of the first drone exceeds the frequency threshold.
  • the immediate reward function is the above expression:
  • the future reward V ⁇ is obtained from the environment, and the value function of the future reward V ⁇ is the above expression:
  • the model parameters in the actor network and the critic network can be randomly initialized. After several rounds of training, the actor network and the critic network continue to converge, and the performance results are getting better and better.
  • the optimal Q * value function for scoring by the critic network can be expressed as:
  • s' represents the state information in the remaining duration segment of duration ⁇ after the UAV updates the current position to the predicted position, and the corresponding expression is:
  • r n′ (s′
  • ⁇ ' represents the policy made in state s'. Represents the motion of the first drone (that is, the action of the first drone to update the position based on the predicted position), and ⁇ represents the attenuation coefficient.
  • the purpose of constructing the critic network is to approximate the optimal Q * value function. Therefore, a series of experience pools ⁇ interacting with the environment are used in the embodiments of this application:
  • the critic network is trained and its model parameters ⁇ c are updated, and its corresponding loss function is expressed as:
  • actor network is represented as u(s n′ , ⁇ A ), which means that the actor network determines that the UAV needs to perform position movement after receiving the state s n′ of the network.
  • the gradient function for training actor network parameters ⁇ A is:
  • the trained DDPG model is made more stable through the two target networks corresponding to the critic network and the actor network respectively.
  • the actor network determines the movement strategy of the first UAV and performs the action of updating the position. Please refer to Figure 3 here, keep the updated position of the UAV unchanged, and perform task offloading through the above DQN model.
  • each duration segment in the duration ⁇ is represented by ⁇ , where ⁇ [0, ⁇ -1].
  • the task offloading result in the duration segment n is denoted as ⁇ (n+n). Since the task unloading result is a binary discrete variable, the DQN model is selected as the task allocation model in this embodiment of the present application.
  • a batch of empirical data k is extracted from the experience pool to update the model parameters of the DQN model:
  • the update method of the Q-value function of the DQN model is as follows:
  • the training device obtains a second reward value corresponding to the second state through a preset second reward strategy.
  • the training device determines, according to the updated second state, that the first drone and the first mobile terminal satisfy any of the second restriction conditions, the second reward value is adjusted by presetting the second negative reward value, wherein,
  • the second constraint can include:
  • the same task runs on the first UAV and the first mobile terminal at the same time
  • the total energy consumed by the task during transmission between the first UAV and the first mobile terminal exceeds the energy threshold
  • the completion time of at least one task exceeds the duration threshold.
  • the second reward value is adjusted based on the preset second negative reward value.
  • the embodiment of the present application also provides a task assignment method, which is applied to an execution device, where the execution device is configured with a pre-trained location model and a task assignment model, and the pre-trained location model and the task assignment model are obtained by training with a model construction method.
  • the method may include:
  • Step S1B obtaining the third state at the current moment between the second mobile terminal and the second drone;
  • Step S2B determine the predicted position of the second UAV at the next moment through the position model.
  • Step S3B according to the predicted position of the second drone at the next moment, determine the fourth state between the second drone and the second mobile terminal;
  • Step S4B determine the task assignment result between the second UAV and the second mobile terminal through the task assignment model.
  • the predicted position of the second drone at the next moment is determined by the position model, which may include:
  • the predicted position of the second drone at the next moment is determined by the position model according to the third state, wherein the first duration segment may include a plurality of second duration segments.
  • the first duration segment may be the execution time period ⁇ of the above-mentioned computing task
  • the second duration segment may be that the time period is further divided into N discrete duration segments.
  • step S4B the task assignment result between the second drone and the second mobile terminal is determined through the task assignment model, which may include:
  • the position of the second UAV is kept unchanged, and according to the fourth state, the task assignment result between the second UAV and the second mobile terminal is determined through the task assignment model.
  • an embodiment of the present application further provides a model construction device, which is applied to a training device, and the training device is configured with a position model to be trained and a task assignment model.
  • the model building apparatus may include:
  • the model initialization module 301 can be configured to initialize the location model, the task assignment model, the state of the first drone, and the state of the first mobile terminal, wherein the first drone is used to provide an edge for the first mobile terminal computing services.
  • step S1A in FIG. 2 when the computer-executable instruction corresponding to the model initial module 301 is executed by the processor, step S1A in FIG. 2 is implemented.
  • the model initial module 301 please refer to the detailed description of step S1A.
  • the model training module 302 can be configured to perform at least one of the following iterations on the location model and the task allocation model until a preset iteration condition is met:
  • the predicted position of the first drone at the next moment is obtained through the position model
  • the task assignment result between the first drone and the first mobile terminal at the next moment is determined by the task assignment model
  • the model parameters of the task assignment model are updated.
  • model initial module 301 and model training module 302 can also be used to implement other steps or sub-steps of the model building method, and the model building device can also include other modules according to the realized functions, which is not the case in the embodiment of the present application. Make specific restrictions.
  • step S2A in FIG. 2 when the computer-executable instructions corresponding to the model training module 302 are executed by the processor, step S2A in FIG. 2 is implemented.
  • the model training module 302 please refer to the detailed description of step S2A.
  • the embodiment of the present application further provides a task scheduling apparatus, which is applied to an execution device.
  • the execution device is configured with a pre-trained location model and a task allocation model, and the pre-trained location model and the task allocation model are obtained by training by a model construction device.
  • the task scheduling apparatus may include:
  • the state obtaining module 401 may be configured to obtain the third state at the current moment between the second mobile terminal and the second drone.
  • step S1B in FIG. 4 when the computer-executable instruction corresponding to the state acquisition module 401 is executed by the processor, step S1B in FIG. 4 is implemented.
  • the state acquisition module 401 please refer to the detailed description of step S1B.
  • the position determination module 402 may be configured to determine the predicted position of the second drone at the next moment through the position model according to the third state.
  • step S2B in FIG. 4 when the computer-executable instruction corresponding to the position determination module 402 is executed by the processor, step S2B in FIG. 4 is implemented.
  • the location determination module 402 please refer to the detailed description of step S2B.
  • the state acquisition module 401 may also be configured to determine a fourth state between the second drone and the second mobile terminal according to the predicted position of the second drone at the next moment.
  • step S3B in FIG. 4 for a detailed description of the state acquisition module 401, reference may also be made to the detailed description of step S3B in FIG. 4 .
  • the task assignment module 404 may be configured to determine a task assignment result between the second UAV and the second mobile terminal through the task assignment model according to the fourth state.
  • step S4B in FIG. 4 when the computer-executable instruction corresponding to the task allocation module 404 is executed by the processor, step S4B in FIG. 4 is implemented.
  • the task allocation module 404 please refer to the detailed description of step S4B.
  • Embodiments of the present application further provide an electronic device, which may be a training device or an execution device.
  • the electronic device includes a processor and a memory in which a computer program is stored.
  • the model building method is implemented; when the electronic device is an execution device and the computer program is executed by the processor, the task assignment method is implemented.
  • the execution device may be a server communicatively connected with the drone and the mobile terminal.
  • the electronic device may include a memory 520 , a processor 530 , and a communication device 540 .
  • the elements of the memory 520 , the processor 530 and the communication device 540 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, these elements may be electrically connected to each other through one or more communication buses or signal lines.
  • the memory 520 can be, but is not limited to, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable memory In addition to read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electrical Erasable Programmable Read-Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. Among them, the memory 520 is used for storing computer programs.
  • the communication device 540 is used to send and receive data, wherein the network may be a wired network or a wireless network.
  • the memory 520 may be, but not limited to, random access memory (Random Access Memory, referred to as RAM), read only memory (Read Only Memory, referred to as ROM), programmable read only memory (Programmable Read-Only Memory, referred to as PROM) ), Erasable Programmable Read-Only Memory (EPROM for short), Electrical Erasable Programmable Read-Only Memory (EEPROM for short), etc.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • PROM programmable read only memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrical Erasable Programmable Read-Only Memory
  • the processor 530 may be an integrated circuit chip with signal processing capability.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit ( ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA Field Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • Embodiments of the present application further provide a storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, a model construction method or a task assignment method is implemented.
  • the training device divides the scheduling strategy of UAV-assisted mobile edge computing into UAV position optimization and task calculation.
  • the sub-problems of two levels are offloaded and optimized, and the corresponding position model and task model are optimized alternately using hierarchical reinforcement learning, so as to reduce the complexity of each sub-problem and improve the learning efficiency and convergence efficiency of the overall system.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
  • each functional module in each embodiment of the present application may be integrated together to form an independent part, or each module may exist independently, or two or more modules may be integrated to form an independent part.
  • the functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • This application provides a model construction method, a task allocation method, a device, a device and a medium, in which the training device divides the scheduling strategy of UAV-assisted mobile edge computing into two sub-levels: UAV position optimization and task computing offload optimization.
  • the training device divides the scheduling strategy of UAV-assisted mobile edge computing into two sub-levels: UAV position optimization and task computing offload optimization.
  • hierarchical reinforcement learning is used to alternately optimize the corresponding position model and task model.
  • model building method, task assignment method, apparatus, device and medium of the present application are reproducible and can be used in a variety of industrial applications.
  • model building method, task assignment method, apparatus, device and medium of the present application can be used in any application field of data processing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Automation & Control Theory (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

模型构建方法、任务分配方法、装置、设备及介质中,训练设备将无人机辅助移动边缘计算的调度策略拆分成无人机位置优化与任务计算卸载优化两个层级的子问题,使用层次强化学习交替优化对应位置模型以及任务模型,以达到降低了每个子问题的复杂度,并且提高了整体系统的学习效率与收敛效率。

Description

模型构建方法、任务分配方法、装置、设备及介质
相关申请的交叉引用
本申请要求于2021年03月22日提交中国专利局的申请号为202110302078.9、名称为“模型构建方法、任务分配方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理领域,具体而言,涉及一种模型构建方法、任务分配方法、装置、设备及介质。
背景技术
由于无人机(Unmanned Aerial Vehicles,UAVs)具有的高机动性和灵活性,近年来研究人员提出了使用无人机在多种应用场景下辅助移动边缘计算(Mobile Edge Computing,MEC)的技术。在无人机辅助的移动边缘计算领域,需要对无人机的运动轨迹和无人机与移动终端之间任务进行适当的调度以获得理想的性能。其中,在对任务调度表示将同一计算任务分配给无人机或者移动终端(下面简称任务卸载)。目前,出现了基于强化学习的方法实现动态场景下无人机辅助移动边缘计算的调度策略。
发明人研究发现,随着无人机与移动终端数量的增加,使用强化学习算法的系统状态空间与动作空间会呈指数级别增长,这大大降低了算法的收敛效率。因此,对于大规模的无人机辅助的移动边缘计算网络,很难获得易收敛的调度策略。
发明内容
本申请的一实施例提供一种模型构建方法,应用于训练设备,所述训练设备配置有待训练的位置模型以及任务分配模型,所述方法可以包括:
初始化所述位置模型、所述任务分配模型、第一无人机的状态以及第一移动终端的状态,其中,所述第一无人机可以用于为所述第一移动终端提供边缘计算服务;
将所述位置模型以及任务分配模型进行以下迭代,直到满足预设的迭代条件:
根据所述第一移动终端与所述第一无人机之间当前时刻的第一状态,通过所述位置模型获得所述第一无人机下一时刻的预测位置;
根据所述预测位置更新所述位置模型的模型参数;
根据所述预测位置确定所述第一无人机与所述第一移动终端之间当前时刻的第二状态;
根据所述第二状态,通过所述任务分配模型确定所述第一无人机与所述第一移动终端之间下一时刻的任务分配结果;
根据所述任务分配结果,更新所述任务分配模型的模型参数。
可选地,所述根据所述预测位置更新所述位置模型的模型参数,可以包括:根据所述预测位置更新所述第一状态;根据更新后的第一状态,通过预设第一奖励策略获得与所述更新后的第一状态相对应的第一奖励值;根据所述第一奖励值,更新所述位置模型的模型参数。
可选地,所述根据更新后的第一状态,通过预设第一奖励策略获得与所述更新后的第一状态相对应 的第一奖励值,可以包括:通过预设第一奖励策略获得与所述更新后的第一状态相对应的第一奖励值;当根据所述更新后的第一状态,确定所述第一无人机满足任意一条第一限制条件时,则通过预设第一负奖励值调整所述第一奖励值,其中,所述第一限制条件可以包括:所述第一无人机的移动速度超过速度阈值;所述第一无人机的移动频率超过频率阈值。
可选地,所述第一状态可以包括所述第一移动终端的位置、所述第一移动终端的剩余电量以及所述第一无人机的剩余电量。
可选地,所述根据所述任务分配结果,更新所述任务分配模型的模型参数,可以包括:根据所述任务分配结果更新所述第二状态;根据更新后的第二状态,通过预设第二奖励策略获得与所述第二状态相对应的第二奖励值;根据所述第二奖励值,更新所述位置模型的模型参数。
可选地,所述根据更新后的第二状态,通过预设第二奖励策略获得与所述第二状态相对应的第二奖励值,包括:通过预设第二奖励策略获得与所述第二状态相对应的第二奖励值;当根据所述更新后的第二状态,确定所述第一无人机与所述第一移动终端满足意一条第二限制条件时,则通过预设第二负奖励值调整所述第二奖励值,其中,所述第二限制条件可以包括:同一任务同时在第一无人机以及第一移动终端运行;任务在第一无人机与第一移动终端之间传输时所消耗的总能量超过能量阈值;至少一个任务的完成耗时超过时长阈值。
可选地,所述第二状态可以包括所述第一无人机的预测位置、所述第一移动终端的位置、所述第一无人机的剩余电量、所述第一移动终端的剩余电量以及所述第一移动终端中的计算任务。
本申请的另一实施例提供一种任务分配方法,应用于执行设备,所述执行设备可以配置有预训练的位置模型以及任务分配模型,所述预训练的位置模型以及任务分配模型由所述的模型构建方法进行训练获得,所述方法可以包括:
获取第二无人机当前时刻的第三状态;
根据所述第三状态,通过所述位置模型确定所述第二无人机在下一时刻的预测位置;
根据所述第二无人机在下一时刻的预测位置,确定所述第二无人机与第二移动终端之间的第四状态;
根据所述第四状态,通过所述任务分配模型确定所述第二无人机与所述第二移动终端之间的任务分配结果。
可选地,根据所述第三状态,通过所述位置模型确定所述第二无人机在下一时刻的预测位置,可以包括:每间隔第一时长片段,根据第三状态,通过所述位置模型确定所述第二无人机在下一时刻的预测位置,其中,所述第一时长片段包括多个第二时长片段。
可选地,根据所述第四状态,通过所述任务分配模型确定所述第二无人机与所述第二移动终端之间的任务分配结果,可以包括:针对每个第二时长片段,保持所述第二无人机的位置不变,根据所述第四状态,通过所述任务分配模型确定所述第二无人机与所述第二移动终端之间的任务分配结果。
本申请的又一实施例提供一种模型构建装置,所述模型构建装置应用于训练设备,所述训练设备配置有待训练的位置模型以及任务分配模型,所述模型构建装置可以包括:
模型初始模块,可以被配置成用于将所述位置模型、所述任务分配模型、第一无人机的状态以及第 一移动终端的状态进行初始化,其中,所述第一无人机用于为所述第一移动终端提供边缘计算服务;
模型初始模块,可以被配置成用于初始化所述位置模型、所述任务分配模型、第一无人机的状态以及第一移动终端的状态,其中,所述第一无人机用于为所述第一移动终端提供边缘计算服务;
模型训练模块,可以被配置成用于将所述位置模型以及任务分配模型进行以下迭代,直到满足预设的迭代条件:
根据所述第一移动终端与所述第一无人机之间当前时刻的第一状态,通过所述位置模型获得所述第一无人机下一时刻的预测位置;
根据所述预测位置更新所述位置模型的模型参数;
根据所述预测位置确定所述第一无人机与所述第一移动终端之间当前时刻的第二状态;
根据所述第二状态,通过所述任务分配模型确定所述第一无人机与所述第一移动终端之间下一时刻的任务分配结果;
根据所述任务分配结果,更新所述任务分配模型的模型参数。
本申请的再一实施例提供一种任务调度装置,应用于执行设备,所述执行设备配置有预训练的位置模型以及任务分配模型,所述预训练的位置模型以及任务分配模型由所述的模型构建装置进行训练获得,所述任务调度装置可以包括:
状态获取模块,可以被配置成用于获取第二无人机当前时刻的第三状态;
位置确定模块,可以被配置成用于根据所述第三状态,通过所述位置模型确定所述第二无人机在下一时刻的预测位置;
所述状态获取模块,还可以被配置成用于根据所述第二无人机在下一时刻的预测位置,确定所述第二无人机与第二移动终端之间的第四状态;
任务分配模块,可以被配置成用于根据所述第四状态,通过所述任务分配模型确定所述第二无人机与所述第二移动终端之间的任务分配结果。
本申请的另外的实施例提供一种电子设备,所述电子设备可以包括处理器以及存储器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,实现所述的模型构建方法或者所述的任务分配方法。
本申请的又一另外的实施例提供一种存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时,实现所述的模型构建方法或者所述的任务分配方法。
相对于相关技术而言,本申请具有至少以下有益效果:
本申请实施例提供的模型构建方法、任务分配方法、装置、设备及介质中,训练设备将无人机辅助移动边缘计算的调度策略拆分成无人机位置优化与任务计算卸载优化两个层级的子问题,使用层次强化学习交替优化对应位置模型以及任务模型,以达到降低了每个子问题的复杂度,并且提高了整体系统的学习效率与收敛效率。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍, 应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本申请实施例提供的场景示意图;
图2为本申请实施例提供的模型构建方法的流程示意图;
图3为本申请实施例提供的训练流程框图;
图4为本申请实施例提供的任务分配方法流程示意图;
图5为本申请实施例提供的模型构建装置示意图;
图6为本申请实施例提供的任务分配装置示意图;
图7为本申请实施例提供的电子设备结构示意图。
图标:100-无人机;200-移动终端;301-模型初始模块;302-模型训练模块;401-状态获取模块;402-位置确定模块;404-任务分配模块;520-存储器;530-处理器;540-通信装置。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
在本申请的描述中,需要说明的是,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
由于无人机(Unmanned Aerial Vehicles,UAVs)具有的高机动性和灵活性,近年来研究人员提出了使用无人机作为多种应用场景下辅助移动边缘计算(Mobile Edge Computing,MEC)的技术。
如图1所示,在网络基础设施不可用的场景下(如发生自然灾害的救援现场)、网络设备稀疏分布的场景下(如野外作业)或面对临时增加的移动终端200远远超出网络服务能力时(如足球比赛现场),无人机100就可以作为通信中继站或边缘计算平台。当无人机100部署了计算资源后,无人机100辅助的移动边缘计算网络将带来很多优势,如降低网络开销、降低计算任务执行延迟、更好的体验质量(Quality of Experience,QoE)、延长移动终端200的电池寿命等。
在无人机100辅助的移动边缘计算领域,需要对图1中无人机100的运动轨迹进行优化,以及对无人机100与移动终端200之间进行任务卸载,以获得理想的计算性能。
相关的研究与发明大多集中于静态场景下,无人机辅助的移动边缘计算调度。即在整个任务执行时间内,无人机为位置固定不变的移动终端提供服务。针对这样的场景,可以使用启发式算法(如块坐标 下降法,遗传算法、粒子群优化算法等)进行求解,例如使用块坐标下降法(Block Coodinate Descent,BCD)和逐次凸近似(Successive Convex Approximation,SCA)方法联合优化计算任务的分配和无人机轨迹,最大化所有的移动终端吞吐量。
然而,无人机辅助的移动边缘计算调度场景下的算法策略,在面对无人机与移动终端之间相对位置随时间变化的动态场景时,就需要根据无人机与移动终端的新位置,重新求解新的优化结果,从而导致较高的系统计算负担。
在次基础上,发明人提出了基于强化学习的方法实现动态场景下无人机辅助移动边缘计算的调度策略。但是随着无人机与移动终端数量的增加,使用强化学习算法的系统状态空间与动作空间会呈指数级别增长,这大大降低了算法的收敛效率。
鉴于此,为了至少部分解决上述技术问题,本申请实施例提供一种应用于训练设备的模型构建方法。在模型构建方法中,将无人机辅助移动边缘计算的调度策略拆分成无人机位置优化与任务卸载优化两个层级的子问题,分别为不同的子问题提供对应的策略模型。其中,为位置优化问题提供位置模型,为任务卸载优化问题提供任务分配模型。
然后,使用层次强化学习(Hierarchical Reinforcement Learning,HRL)的方式对位置模型与任务分配模型交替进行优化,以达到降低了每个子问题的复杂度(即降低了输入状态的维度以及输出动作的维度),提高了整体系统的学习效率与收敛效率的目的。
进一步地,将预训练获得的策略模型部署到执行设备,并基于预训练的策略模型,提供一种任务分配方法。在任务分配方法中,执行设备根据使用场景下无人机的状态信息以及移动终端的状态信息,确定无人机的预测位置以及任务分配结果。
在介绍本申请提供的模型构建方法以及任务分配方法之前,先对无人机辅助的移动边缘计算场景构建相应的数学模型。假定在大规模无人机辅助的移动边缘计算的场景中,包含M个移动终端和U架无人机。计算任务的执行时间周期记为Δ,该时间周期被进一步分为了N个离散的时长片段,也就是每一个时长片段的时间长度可以表示为τ=Δ/N,τ的值应足够小,以保证在每个时间段内无人机与移动终端之间的距离大致不变。
无人机与移动终端所处的空间环境可以通过三维笛卡尔坐标系(如图1中的x,y,z所表示的坐标轴)进行空间建模,则移动终端在距离地面高度为0的平面移动,无人机在距离高度为H的平面运动。
在每个时长片段,移动终端可以从一个位置水平地移动到另一个位置,所有移动终端的移动距离服从正太分布N(0,ι 2)。ι为一个较小的值保证在大多数情况下移动终端只能运动到邻近的水平空间。并且,移动终端每次可以向4个方向运动(东、南、西、北)。
在此基础上,移动终端的在第n个时长片段内的水平方向的坐标
Figure PCTCN2021128250-appb-000001
可以表示为:
Figure PCTCN2021128250-appb-000002
式中,M表示全部的移动终端,
Figure PCTCN2021128250-appb-000003
表示第n个时长片段内移动终端在x方向的坐标,
Figure PCTCN2021128250-appb-000004
第n个时长片段内移动终端在y方向的坐标。
同理,无人机在第n个时长片段内水平方向的坐标
Figure PCTCN2021128250-appb-000005
表示为:
Figure PCTCN2021128250-appb-000006
式中,U表示全部的无人机,
Figure PCTCN2021128250-appb-000007
表示第n个时长片段内无人机在x方向的坐标,
Figure PCTCN2021128250-appb-000008
第n个时长片段内在y方向的坐标。
由于无人机与移动终端之间的相对位置会随时间变化,假定基于移动终端当前位置的分布,将无人机移动到最佳的位置,并为无人机与移动终端之间分配了最佳的任务分配结果。然而,移动终端会跟随用户一起发生位置变动,使得无人机当前的最佳位置以及当前最佳的任务分配结果,并不适用于下一时刻。
例如,在工作日时,商场的顾客通常较少,当休息日时,商场的顾客又突然暴增。因此,可以在商场部署无人机作为临时基站,用于进行提供边缘计算服务。然而,商场中顾客的位置又在不断的变化,因此,需要对无人机以及无人机与移动终端之间分配了最佳的任务分配结果动态进行调整,使得完成所有计算任务的平均延迟最小。
此外,在进行任务分配时,本申请实施例采用完全卸载策略,即同一个计算任务只能在移动终端与无人机中,选择一个用于执行该任务。假定第n个计算任务由移动终端执行,表示为:
I m(n);
本申请实施例中,在无人机提供边缘计算服务时,需要所有移动终端完成计算任务的平均延迟最小。
同时,本申请实施例中,需要考虑的状态信息包括移动终端的位置、无人机的位置、移动终的端剩余电量、无人机的剩余电量、移动终端需要处理的任务。
并且,为了使得训练出的位置模型在制定策略时,减少无人机移动位置时的电量消耗,需要限定了无人机移动的频率以及无人机的最大水平飞行速度。
例如,无人机每间隔时长Δ移动一次。其中,时长Δ又被划分成多个时长片段,无人机可以在第一个时长片段移动一次位置,然后,在后续的时长片段内保持位置不变,直到时长Δ结束。
示例性的,假定Δ为10分钟时,且被拆分成10个时长片段,即每个时长片段为1分钟。则无人机在个1分钟内改变位置后,后续的9个1分钟均保持位置不变,直到10分钟结束。
因此,将无人第n个时长片段内的速度表示为v u(n),允许的最大水平飞行速度表示为V U,则无人机的运动状态可以表示为:
Figure PCTCN2021128250-appb-000009
式中,v u(n)的表达式如下:
Figure PCTCN2021128250-appb-000010
式中,
Figure PCTCN2021128250-appb-000011
表示无人机移动前位置,
Figure PCTCN2021128250-appb-000012
移动后的位置,τ为无人机移动位置所花费的时长。
由于无人机只能以有限的速度V U,以及在有限的时间内移动,因此无人机无线通信的信道增益在一个时长片段内保持衡定,则移动位置消耗的电量
Figure PCTCN2021128250-appb-000013
表示为:
Figure PCTCN2021128250-appb-000014
式中,其中K u表示无人机的工作负载,n modΔ=0表示n对Δ求余,结果为0。
在无人机提供边缘计算服务的场景中,因为无人机的飞行高度较高,无人机与移动终端之间的距离变化是影响无线信道增益的主要因素,所以在第n个时长片段,无人机u到移动终端m的信道增益可以用自由空间路径损耗模型
Figure PCTCN2021128250-appb-000015
来表示:
Figure PCTCN2021128250-appb-000016
式中,其中g 0表示在1m的参考距离以及1W的传输功率下,所接收到的信号功率,
Figure PCTCN2021128250-appb-000017
表示无人机u和移动终端m之间的欧式距离。
在第n个时长片段内,移动终端m产生的任务I m(n)可以表示为:
I m(n)={D m(n),C m(n),Γ m(n)};
式中,D m(n)表示任务I m(n)的需要处理的数据量(单位:比特),C m(n)表示处理1比特数据需要的CPU周期数,Γ m(n)表示任务I m(n)允许的最大执行延迟。
为了提高信噪比,移动终端以其最大功率
Figure PCTCN2021128250-appb-000018
进行数据传输。每架无人机在每个时长片段内最多只接收一个移动终端的发送的计算任务。结合上述自由空间路径损耗模型
Figure PCTCN2021128250-appb-000019
无人机u和移动终端m之间的数据传输速率
Figure PCTCN2021128250-appb-000020
(单位:bps/Hz)可以表示为:
Figure PCTCN2021128250-appb-000021
式中,σ 2表示无人机u的噪声功率。
则基于数据传输速率
Figure PCTCN2021128250-appb-000022
任务I m(n)的需要处理的数据量D m(n),移动终端m将任务I m(n)传输到无人机u所需要的时间延迟
Figure PCTCN2021128250-appb-000023
为:
Figure PCTCN2021128250-appb-000024
式中,B表示信道带宽,
Figure PCTCN2021128250-appb-000025
表示计算任务的执行设备,当计算任务在移动终端本地运行时,
Figure PCTCN2021128250-appb-000026
当计算任务在无人机一端执行时,
Figure PCTCN2021128250-appb-000027
进一步地,基于移动终端的最大功率
Figure PCTCN2021128250-appb-000028
任务I m(n)的需要处理的数据量D m(n),数据传输速率
Figure PCTCN2021128250-appb-000029
信道带宽B,则将任务发送给无人机所消耗的能量
Figure PCTCN2021128250-appb-000030
可以表示为:
Figure PCTCN2021128250-appb-000031
式中,
Figure PCTCN2021128250-appb-000032
表示移动终端发送数据时的平均功率。
基于任务I m(n)的需要处理的数据量D m(n),处理1比特数据需要的CPU周期数C m(n),在移动终端本地执行的计算任务I m(n),完成任务所需要的延迟
Figure PCTCN2021128250-appb-000033
可以表示为:
Figure PCTCN2021128250-appb-000034
式中,f m表示移动终端的计算能力(单位:周期/秒)。
则相对应的,基于任务I m(n)的需要处理的数据量D m(n),处理1比特数据需要的CPU周期数C m(n),移动终端的计算能力f m以及完成任务所需要的延迟
Figure PCTCN2021128250-appb-000035
移动终端本地计算消耗的能量
Figure PCTCN2021128250-appb-000036
可以表 示为:
Figure PCTCN2021128250-appb-000037
其中,γ m表示与移动终端m的CPU架构相关的架构系数,
Figure PCTCN2021128250-appb-000038
表示移动终端执行任务时的平均功率。
对于无人机u,基于任务I m(n)的需要处理的数据量D m(n),处理1比特数据需要的CPU周期数C u(n),无人机执行计算任务I m(n)时,完成任务所需要的延迟
Figure PCTCN2021128250-appb-000039
可以表示为:
Figure PCTCN2021128250-appb-000040
式中,f u表示无人机的计算能力(单位:周期/秒)。
相对应的,基于任务I m(n)的需要处理的数据量D m(n)以及处理1比特数据需要的CPU周期数C u(n),无人机在本地计算消耗的能量
Figure PCTCN2021128250-appb-000041
可以表示为:
Figure PCTCN2021128250-appb-000042
其中,γ u表示与无人机CPU架构有关的架构系数,
Figure PCTCN2021128250-appb-000043
表示移动终端执行任务时的平均功率。
基于上述构建的数学模型,本申请示例中需要对无人机进行位置优化以及移动终端与无人机之间进行任务卸载,使所有移动终端完成计算任务的平均延迟最小,对应的目标函数P为可以表示为:
Figure PCTCN2021128250-appb-000044
Figure PCTCN2021128250-appb-000045
Figure PCTCN2021128250-appb-000046
Figure PCTCN2021128250-appb-000047
Figure PCTCN2021128250-appb-000048
Figure PCTCN2021128250-appb-000049
Figure PCTCN2021128250-appb-000050
Figure PCTCN2021128250-appb-000051
Figure PCTCN2021128250-appb-000052
式中,C1-C8为无人机以及移动终端需要满足的限制条件,具体表现为:
限制条件C1与C2保证了无人机有限的速度和飞行位置更新频率;
限制条件C3、C4和C5表示无人机与移动终端之间任务卸载的约束;
限制条件C6与C7为移动终端与无人机的消耗能量约束,Φ U表示无人机储存的电能,Φ M表示移动终端储存的电能;
限制条件C8保证每项计算任务应在其最大允许延迟和时间片内完成。
Figure PCTCN2021128250-appb-000053
为执行所有计算任务I m(n)的平均延迟,可以表示为:
Figure PCTCN2021128250-appb-000054
式中,t m(n)与每个任务选择的计算方式相关(即移动终端本地计算或无人机进行计算),具体表达式为:
Figure PCTCN2021128250-appb-000055
在上述建立的数学模型的基础上,在无人机位置优化问题中,需要基于无人机、移动终端当前的状态确定无人机下一时刻的预测位置,从而获得无人机与移动终端的期望的距离
Figure PCTCN2021128250-appb-000056
其中,距离
Figure PCTCN2021128250-appb-000057
是决定信道增益
Figure PCTCN2021128250-appb-000058
的关键因素,而信道增益
Figure PCTCN2021128250-appb-000059
会进一步地影响无线传输速率
Figure PCTCN2021128250-appb-000060
最终影响无线传输的延迟和能量消耗表,相应的数学表达式为:
Figure PCTCN2021128250-appb-000061
式中,
Figure PCTCN2021128250-appb-000062
表示n个时长片段中的第一个时长片段,并且,需要满足第一限制条件,其中,第一限制限制条件可以包括上述限制条件中的C1、C2。
在移动终端与无人机之间的任务卸载优化问题中,确定了无人机位置后,在Δ后续的每个时长片段内,无人机位置将保持不变。此时,任务分配结果
Figure PCTCN2021128250-appb-000063
的确定,需要虑到移动终端与无人机的距离
Figure PCTCN2021128250-appb-000064
和移动终端当前的任务I m(n),以保证所有移动终端完成计算任务的平均延迟最小。因此,在第
Figure PCTCN2021128250-appb-000065
个时长片段内确定无人机的位置后,移动终端完成计算任务的平均延迟可以表示为:
Figure PCTCN2021128250-appb-000066
式中,
Figure PCTCN2021128250-appb-000067
表示时长Δ中剩余的时长片段,t m(n)与每个任务选择的计算方式相关(即移动终端本地计算或无人机进行计算)。并且,需要满足第二限制条件,其中,第二限制条件可以包括上述限制条件中的C3-C8。
由于本申请实施例采用强化学习的方式对位置模型以及任务分配模型进行优化,因此,需要分别为无人机位置优化问题以及任务卸载优化问题生成对应的奖励函数。
在大规模无人机辅助移动边缘计算网络场景中,不失一般性,在第n个时长片段内,存在的状态可以表示为:
Figure PCTCN2021128250-appb-000068
式中,
Figure PCTCN2021128250-appb-000069
表示移动终端的位置,I m(n)表示移动终端待执行的任务,
Figure PCTCN2021128250-appb-000070
Figure PCTCN2021128250-appb-000071
分别表示移动终端的剩余电量以及无人机的剩余电量,对应的数学表达式如下:
Figure PCTCN2021128250-appb-000072
式中,
Figure PCTCN2021128250-appb-000073
表示移动终端本地计算消耗的电量,
Figure PCTCN2021128250-appb-000074
表示将任务发送给无人机所消耗的电量。
Figure PCTCN2021128250-appb-000075
式中,
Figure PCTCN2021128250-appb-000076
可以表示无人机在本地计算消耗的能量。
在上述状态S n的基础上,需要执行的动作包括:
Figure PCTCN2021128250-appb-000077
式中,
Figure PCTCN2021128250-appb-000078
表示移动无人机的位置,
Figure PCTCN2021128250-appb-000079
表示无人机与移动终端之间的任务卸载。
结合表示所有移动终端的平均延迟最小的目标函数P,对目标函数P如下变化:
Figure PCTCN2021128250-appb-000080
式中,
Figure PCTCN2021128250-appb-000081
因此,在第n个时长片段内,立即奖励函数r n为:
Figure PCTCN2021128250-appb-000082
并且,当违反限制条件C1-C8中的任何一条时,则产生一个负奖励值作为惩罚。
最终的目的是通过不断更新大规模无人机辅助移动边缘计算网络中的策略π,最大化在整个任务执行时间从环境获得未来奖励V π,未来奖励V π的价值函数可以表示为:
Figure PCTCN2021128250-appb-000083
式中,γ∈[0,1]表示未来奖励的折扣因子。
在上述大规模无人机辅助的移动边缘计算的场景相关数学模型的基础上,本申请实施例中通过神经网络模型进行训练,以拟合无人机位置优化策略以及任务卸载策略。
需要说明的是,上述大规模无人机辅助移动边缘计算网络场景下的数学模型,是在发明人做出了创造性的研究后得出的,因此,上述发明人所总结的数学表达式以及参数的选取均因视为对本申请创造性的贡献。
鉴于此,本申请实施中将上述策略模型拆分成待训练的位置模型以及任务分配模型,然后由训练设备进行交替训练。下面结合图2所示的模型构建方法的流程示意图,对各个步骤进行详细阐述。如图2所示,该模型构建方法可以包括:
步骤S1A,初始化位置模型、任务分配模型、第一无人机的状态以及第一移动终端的状态。
其中,第一无人机用于为第一移动终端提供边缘计算服务。为便于对训练期间与模型使用期间的无人机以及移动终端进行区分,本申请实施例中,将训练期间的无人机称为第一无人机,训练期间的移动终端称为第一移动终端。
相对应的,将模型使用期间的无人机称为第二无人机,训练期间的移动终端称为第二移动终端。
位置模型与任务分配模型均为用于强化学习的神经网络模型。其中,强化学习作为一种机器学习方法,介于监督学习与非监督学习之间。其原理在于:
假定待训练的位置模型与任务分配模型为智能体,该智能的某个行为策略导致环境正的奖励(强化信号),那么该智能体以后产生这个行为策略的趋势便会加强。智能体的目标是在每个离散状态发现最优策略以使期望的折扣奖励和最大。
强化学习的学习过程看作试探评价过程,当智能体选择一个动作用于环境,环境接受该动作后状态发生变化,同时产生一个强化信号(奖或惩)反馈给智能体,智能体根据强化信号和环境当前状态再选择下一个动作,选择的原则是使受到正强化(奖)的概率增大。选择的动作不仅影响立即强化值,而且影响环境下一时刻的状态及最终的强化值。
强化学习不同于基于标签的监督学习,主要表现在强化信号上,强化学习中由环境提供的强化信号是智能体对所产生动作的好坏作一种评价(通常为标量信号),而不是告诉智能体如何去产生正确的动作。由于外部环境提供了很少的信息,智能体必须靠自身的经历进行学习。通过这种方式,智能体在行动一一评价的环境中获得知识,改进行动方案以适应环境。
示例性的,本申请实施例中的位置模型可以选取DDPG(Deep Deterministic Policy Gradient,深度确定性策略梯度)模型;任务分配模型可以选取DQN(Deep Q Networks,深度Q网络)模型。当然,他适用于连续动作的强化学习模型也可以用于作为位置模型;其他适用于离散动作的强化学习模型也可以用于作为任务分配模型,本申请实施例不对此做具体的限定。
步骤S2A,将位置模型以及任务分配模型进行以下迭代,直到满足预设的迭代条件:
训练终端根据第一移动终端与第一无人机之间当前时刻的第一状态,通过位置模型获得第一无人机下一时刻的预测位置;根据预测位置更新位置模型的模型参数。
其中,第一状态可以是第一移动终端的位置、第一移动终端的剩余电量以及第一无人机的剩余电量。
具体地,该训练设备根据预测位置更新第一状态;根据更新后的第一状态,通过预设第一奖励策略获得与更新后的第一状态相对应的第一奖励值;根据第一奖励值,更新位置模型的模型参数。
进一步地,训练终端根据预测位置确定第一无人机与第一移动终端之间当前时刻的第二状态;根据第二状态,通过任务分配模型确定第一无人机与第一移动终端之间下一时刻的任务分配结果;根据任务分配结果,更新任务分配模型的模型参数。
其中,第二状态可以包括第一无人机的预测位置、第一移动终端的位置、第一无人机的剩余电量、第一移动终端的剩余电量以及第一移动终端中的计算任务。
具体地,训练终端根据任务分配结果更新第二状态;根据更新后的第二状态,通过预设第二奖励策略获得与第二状态相对应的第二奖励值;根据第二奖励值,更新位置模型的模型参数。
由此,训练设备将无人机辅助移动边缘计算的调度策略拆分成无人机位置优化与任务计算卸载优化两个层级的子问题,使用层次强化学习交替优化对应位置模型以及任务模型,以达到降低了每个子问题的复杂度,并且提高了整体系统的学习效率与收敛效率。
示例性的,下面结合上述DDPG模型以及DQN模型对上述训练过程进行详细的介绍。当位置模型 为上述DDPG模型时,DDPG模型包括评论家网络和演员网络。如图3所示,在DDPG模型中,演员网络用于根据环境中的状态,确定第一无人机的移动策略并执行更新位置的动作,而评论家网络则用于对更新位置的动作进行评分,该评分表示更新位置的动作所期望的最大收益。
然后,演员网络根据评论家网络的打分调整自己的策略,即更新演员网络中的模型参数。
评论家网络根据环境的反馈的第一奖励值调整自己的打分策略,即更新评论家网络中的模型参数。
其中,在获取第一奖励值时,训练设备通过预设第一奖励策略获得与更新后的第一状态相对应的第一奖励值进行计算得到。
然后,当训练设备根据更新后的第一状态,确定第一无人机满足任意一条第一限制条件时,则通过预设第一负奖励值调整第一奖励值,其中,第一限制条件可以包括:
第一无人机的移动速度超过速度阈值;
第一无人机的移动频率超过频率阈值。
可以理解为,当无人机移动位置的动作满足C1、C2中的任意一条限制条件时,则在第一奖励值的基础上通过该预设第一负奖励值进行调整。其中,立即奖励函数为上述表达式:
Figure PCTCN2021128250-appb-000084
从环境获得未来奖励V π,未来奖励V π的价值函数则为上述表达式:
Figure PCTCN2021128250-appb-000085
在对DDPG模型进行训练之前,可以随机初始化演员网络与评论家网络中的模型参数,经过多轮训练之后,演员网络与评论家网络不断收敛,表现结果也越来越好。
本申请实施例中,针对演员网络拟合出的位置策略,评论家网络进行评分的最优Q *值函数,可以表示为:
Figure PCTCN2021128250-appb-000086
式中,
Figure PCTCN2021128250-appb-000087
表示无人机将当前位置更新到预测位置后,在第n′个时长片段内位置模型从环境中获取的状态信息。
s′表示无人机将当前位置更新到预测位置后,在时长Δ剩余的时长片段内的状态信息,对应的表达式为:
s′=s n′+Δ
r n′(s′|s n′,a n′)表示在第n′个时长片段,s n′,a n′分别表示的状态与动作对应的奖励值。
α′表示在状态s′下所做出的策略。
Figure PCTCN2021128250-appb-000088
表示第一无人机运动的动作(即第一无人机基于预测位置进行更新位置的动作),γ表示衰减系数。
构建评论家网络目的是为了逼近最优Q *值函数,因此,本申请实施例中使用一系列与环境交互的经 验池χ:
χ={s n′,a n′,r n′,s n′+γ}
对评论家网络进行训练,更新其模型参数θ c,其对应的损失函数表示为:
Figure PCTCN2021128250-appb-000089
式中,ε u表示一组经验的集合,即多个χ={s n′,a n′,r n′,s n′+Δ}的集合。
相对应的,演员网络表示为u(s n′A),表示演员网络在接收网络的状态s n′后,确定出无人机需要执行位置移动动作。训练演员网络参数θ A的梯度函数为:
Figure PCTCN2021128250-appb-000090
由此,通过评论家网络和演员网络分别对应的两个目标网络,使得训练出的DDPG模型更加稳定。
基于演员网络根据环境中的状态,确定第一无人机的移动策略并执行更新位置的动作后。请在此参照图3,保持无人机更新后的位置不变,通过上述DQN模型进行任务卸载。
值得说明的是,在DQN模型中,将时长Δ中的每个时长片段用η进行表示,其中,η∈[0,Δ-1]。时长片段η中任务卸载结果表示为α(n+η)。由于任务卸载结果为二元离散变量,因此,本申请实施例选取DQN模型作为任务分配模型。
具体地,从经验池中抽取一个批次的经验数据k,用于更新DQN模型的模型参数:
Figure PCTCN2021128250-appb-000091
其中,DQN模型的Q值函数的更新方式如下:
Figure PCTCN2021128250-appb-000092
式中,
Figure PCTCN2021128250-appb-000093
为第二奖励策略对应的奖励函数,其对应的表达式为:
Figure PCTCN2021128250-appb-000094
其中,训练设备通过预设第二奖励策略获得与第二状态相对应的第二奖励值。
然后,当训练设备根据更新后的第二状态,确定第一无人机与第一移动终端满足任意一条第二限制条件时,则通过预设第二负奖励值调整第二奖励值,其中,第二限制条件可以包括:
同一任务同时在第一无人机以及第一移动终端运行;
任务在第一无人机与第一移动终端之间传输时所消耗的总能量超过能量阈值;
至少一个任务的完成耗时超过时长阈值。
可以理解为可以理解为,当任务卸载的动作满足C3-C8中的任意一条限制条件时,则在第二奖励值 的基础上通过该预设第二负奖励值进行调整。
本申请实施例还提供一种任务分配方法,应用于执行设备,执行设备配置有预训练的位置模型以及任务分配模型,预训练的位置模型以及任务分配模型由的模型构建方法进行训练获得。请参照图4,方法可以包括:
步骤S1B,获取第二移动终端与第二无人机之间当前时刻的第三状态;
步骤S2B,根据第三状态,通过位置模型确定第二无人机在下一时刻的预测位置。
步骤S3B,根据第二无人机在下一时刻的预测位置,确定第二无人机与第二移动终端之间的第四状态;
步骤S4B,根据第四状态,通过任务分配模型确定第二无人机与第二移动终端之间的任务分配结果。
可选地,为了降低第二无人机的功耗,本申请实施例中,步骤S2B中,根据第三状态,通过位置模型确定第二无人机在下一时刻的预测位置,可以包括:
每间隔第一时长片段,根据第三状态,通过位置模型确定第二无人机在下一时刻的预测位置,其中,第一时长片段可以包括多个第二时长片段。
示例性的,第一时长片段可以是上述计算任务的执行时间周期Δ,第二时长片段可以是时间周期被进一步分为了N个离散的时长片段。
步骤S4B中,根据第四状态,通过任务分配模型确定第二无人机与第二移动终端之间的任务分配结果,可以包括:
针对每个第二时长片段,保持第二无人机的位置不变,根据第四状态,通过任务分配模型确定第二无人机与第二移动终端之间的任务分配结果。
基于相同的发明构思,本申请实施例还提供一种模型构建装置,模型构建装置应用于训练设备,训练设备配置有待训练的位置模型以及任务分配模型。如图5所示,模型构建装置可以包括:
模型初始模块301,可以被配置成用于初始化位置模型、任务分配模型、第一无人机的状态以及第一移动终端的状态,其中,第一无人机用于为第一移动终端提供边缘计算服务。
本申请实施例中,模型初始模块301对应的计算机可执行指令被处理器执行时,实现图2中的步骤S1A。关于模型初始模块301的详细描述,可以参见步骤S1A的详细描述。
模型训练模块302,可以被配置成用于将位置模型以及任务分配模型进行以下至少一次迭代,直到满足预设的迭代条件:
根据第一移动终端与第一无人机之间当前时刻的第一状态,通过位置模型获得第一无人机下一时刻的预测位置;
根据预测位置更新位置模型的模型参数;
根据预测位置确定第一无人机与第一移动终端之间当前时刻的第二状态;
根据第二状态,通过任务分配模型确定第一无人机与第一移动终端之间下一时刻的任务分配结果;
根据任务分配结果,更新任务分配模型的模型参数。
值得说明的是,上述模型初始模块301以及模型训练模块302还可以用于实现模型构建方法的其他 步骤或者子步骤,模型构建装置还可以根据所实现的功能包括其他模块,本申请实施例不对此做具体的限定。
本申请实施例中,模型训练模块302对应的计算机可执行指令被处理器执行时,实现图2中的步骤S2A。关于模型训练模块302的详细描述,可以参见步骤S2A的详细描述。
本申请实施例还提供一种任务调度装置,应用于执行设备,执行设备配置有预训练的位置模型以及任务分配模型,预训练的位置模型以及任务分配模型由模型构建装置进行训练获得。如图6所示,任务调度装置可以包括:
状态获取模块401,可以被配置成用于获取第二移动终端与第二无人机之间当前时刻的第三状态。
本申请实施例中,状态获取模块401对应的计算机可执行指令被处理器执行时,实现图4中的步骤S1B。关于状态获取模块401的详细描述,可以参见步骤S1B的详细描述。
位置确定模块402,可以被配置成用于根据第三状态,通过位置模型确定第二无人机在下一时刻的预测位置。
本申请实施例中,位置确定模块402对应的计算机可执行指令被处理器执行时,实现图4中的步骤S2B。关于位置确定模块402的详细描述,可以参见步骤S2B的详细描述。
状态获取模块401,还可以被配置成用于根据第二无人机在下一时刻的预测位置,确定第二无人机与第二移动终端之间的第四状态。
本申请实施例中,关于状态获取模块401的详细描述,还可以参见图4中的步骤S3B的详细描述。
任务分配模块404,可以被配置成用于根据第四状态,通过任务分配模型确定第二无人机与第二移动终端之间的任务分配结果。
本申请实施例中,任务分配模块404对应的计算机可执行指令被处理器执行时,实现图4中的步骤S4B。关于任务分配模块404的详细描述,可以参见步骤S4B的详细描述。
值得说明的是,上述状态获取模块401、位置确定模块402以及任务分配模块404还可以用于实现任务分配方法的其他步骤或者子步骤,任务调度装置还可以根据所实现的功能包括其他模块,本申请实施例不对此做具体的限定。
本申请实施例还提供一种电子设备,该电子设备可以是训练设备,还可以是执行设备可。电子设备包括处理器以及存储器,存储器存储有计算机程序。
当电子设备是训练设备时,计算机程序被处理器执行时,实现模型构建方法;当电子设备是执行设备时,计算机程序被处理器执行时,实现任务分配方法。
示例性的,该执行设备可以是与无人机以及移动过终端通信连接的服务器。
本申请实施例提供一种该电子设备的结构示意图。如图7所示,该电子设备可以包括存储器520、处理器530、通信装置540。其中,存储器520、处理器530以及通信装置540各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。
该存储器520可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器 (Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。其中,存储器520用于存储计算机程序。当该电子设备是训练设备时,处理器530在接收到执行指令后,执行该计算机程序时,以实现模型构建方法;当该电子设备是执行设备时,处理器530在接收到执行指令后,执行该计算机程序时,以实现任务分配方法。通信装置540用于通过收发数据,其中,该网络可以是有线网络,还可以是无线网络。
其中,存储器520可以是,但不限于,随机存取存储器(Random Access Memory,简称RAM),只读存储器(Read Only Memory,简称ROM),可编程只读存储器(Programmable Read-Only Memory,简称PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,简称EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,简称EEPROM)等。其中,存储器520用于存储程序,处理器530在接收到执行指令后,执行程序。处理器530以及其他可能的组件对存储器520的访问可在存储控制器的控制下进行。
处理器530可能是一种集成电路芯片,具有信号的处理能力。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本申请实施例还提供一种存储介质,存储介质存储有计算机程序,计算机程序被处理器执行时,实现模型构建方法或者任务分配方法。
综上所述,本申请实施例提供的模型构建方法、任务分配方法、装置、设备及介质中,训练设备将无人机辅助移动边缘计算的调度策略拆分成无人机位置优化与任务计算卸载优化两个层级的子问题,使用层次强化学习交替优化对应位置模型以及任务模型,以达到降低了每个子问题的复杂度,并且提高了整体系统的学习效率与收敛效率。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本申请的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
另外,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述,仅为本申请的各种实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。
工业实用性
本申请提供了模型构建方法、任务分配方法、装置、设备及介质中,训练设备将无人机辅助移动边缘计算的调度策略拆分成无人机位置优化与任务计算卸载优化两个层级的子问题,使用层次强化学习交替优化对应位置模型以及任务模型,以达到降低了每个子问题的复杂度,并且提高了整体系统的学习效率与收敛效率。
此外,可以理解的是,本申请的模型构建方法、任务分配方法、装置、设备及介质是可以重现的,并且可以用在多种工业应用中。例如,本申请的模型构建方法、任务分配方法、装置、设备及介质可以用于数据处理的任何应用领域。

Claims (14)

  1. 一种模型构建方法,其特征在于,应用于训练设备,所述训练设备配置有待训练的位置模型以及任务分配模型,所述方法包括:
    初始化所述位置模型、所述任务分配模型、第一无人机的状态以及第一移动终端的状态,其中,所述第一无人机用于为所述第一移动终端提供边缘计算服务;
    将所述位置模型以及任务分配模型进行以下迭代,直到满足预设的迭代条件:
    根据所述第一移动终端与所述第一无人机之间当前时刻的第一状态,通过所述位置模型获得所述第一无人机下一时刻的预测位置;
    根据所述预测位置更新所述位置模型的模型参数;
    根据所述预测位置确定所述第一无人机与所述第一移动终端之间当前时刻的第二状态;
    根据所述第二状态,通过所述任务分配模型确定所述第一无人机与所述第一移动终端之间下一时刻的任务分配结果;
    根据所述任务分配结果,更新所述任务分配模型的模型参数。
  2. 根据权利要求1所述的模型构建方法,其特征在于,所述根据所述预测位置更新所述位置模型的模型参数,包括:
    根据所述预测位置更新所述第一状态;
    根据更新后的第一状态,通过预设第一奖励策略获得与所述更新后的第一状态相对应的第一奖励值;
    根据所述第一奖励值,更新所述位置模型的模型参数。
  3. 根据权利要求2所述的模型构建方法,其特征在于,所述根据更新后的第一状态,通过预设第一奖励策略获得与所述更新后的第一状态相对应的第一奖励值,包括:
    通过预设第一奖励策略获得与所述更新后的第一状态相对应的第一奖励值;
    当根据所述更新后的第一状态,确定所述第一无人机满足任意一条第一限制条件时,则通过预设第一负奖励值调整所述第一奖励值,其中,所述第一限制条件包括:
    所述第一无人机的移动速度超过速度阈值;
    所述第一无人机的移动频率超过频率阈值。
  4. 根据权利要求1至3中任一项所述的模型构建方法,其特征在于,所述第一状态是所述第一移动终端的位置、所述第一移动终端的剩余电量以及所述第一无人机的剩余电量。
  5. 根据权利要求1至4中任一项所述的模型构建方法,其特征在于,所述根据所述任务分配结果,更新所述任务分配模型的模型参数,包括:
    根据所述任务分配结果更新所述第二状态;
    根据更新后的第二状态,通过预设第二奖励策略获得与所述第二状态相对应的第二奖励值;
    根据所述第二奖励值,更新所述位置模型的模型参数。
  6. 根据权利要求5所述的模型构建方法,其特征在于,所述根据更新后的第二状态,通过预设第二 奖励策略获得与所述第二状态相对应的第二奖励值,包括:
    通过预设第二奖励策略获得与所述第二状态相对应的第二奖励值;
    当根据所述更新后的第二状态,确定所述第一无人机与所述第一移动终端满足意一条第二限制条件时,则通过预设第二负奖励值调整所述第二奖励值,其中,所述第二限制条件包括:
    同一任务同时在第一无人机以及第一移动终端运行;
    任务在第一无人机与第一移动终端之间传输时所消耗的总能量超过能量阈值;
    至少一个任务的完成耗时超过时长阈值。
  7. 根据权利要求5或6所述的模型构建方法,其特征在于,所述第二状态包括所述第一无人机的预测位置、所述第一移动终端的位置、所述第一无人机的剩余电量、所述第一移动终端的剩余电量以及所述第一移动终端中的计算任务。
  8. 一种任务分配方法,其特征在于,应用于执行设备,所述执行设备配置有预训练的位置模型以及任务分配模型,所述预训练的位置模型以及任务分配模型由权利要求1-7任意一项所述的模型构建方法进行训练获得,所述方法包括:
    获取第二移动终端与第二无人机之间当前时刻的第三状态;
    根据所述第三状态,通过所述位置模型确定所述第二无人机在下一时刻的预测位置;
    根据所述第二无人机在下一时刻的预测位置,确定所述第二无人机与第二移动终端之间的第四状态;
    根据所述第四状态,通过所述任务分配模型确定所述第二无人机与所述第二移动终端之间的任务分配结果。
  9. 根据权利要求8所述的任务分配方法,其特征在于,根据所述第三状态,通过所述位置模型确定所述第二无人机在下一时刻的预测位置,包括:
    每间隔第一时长片段,根据所述第三状态,通过所述位置模型确定所述第二无人机在下一时刻的预测位置,其中,所述第一时长片段包括多个第二时长片段。
  10. 根据权利要求9所述的任务分配方法,其特征在于,根据所述第四状态,通过所述任务分配模型确定所述第二无人机与所述第二移动终端之间的任务分配结果,包括:
    针对每个所述第二时长片段,保持所述第二无人机的位置不变,根据所述第四状态,通过所述任务分配模型确定所述第二无人机与所述第二移动终端之间的任务分配结果。
  11. 一种模型构建装置,其特征在于,所述模型构建装置应用于训练设备,所述训练设备配置有待训练的位置模型以及任务分配模型,所述模型构建装置包括:
    模型初始模块,被配置成用于初始化所述位置模型、所述任务分配模型、第一无人机的状态以及第一移动终端的状态,其中,所述第一无人机用于为所述第一移动终端提供边缘计算服务;
    模型训练模块,被配置成用于将所述位置模型以及任务分配模型进行以下迭代,直到满足预设的迭代条件:
    根据所述第一移动终端与所述第一无人机之间当前时刻的第一状态,通过所述位置模型获得所述第一无人机下一时刻的预测位置;
    根据所述预测位置更新所述位置模型的模型参数;
    根据所述预测位置确定所述第一无人机与所述第一移动终端之间当前时刻的第二状态;
    根据所述第二状态,通过所述任务分配模型确定所述第一无人机与所述第一移动终端之间下一时刻的任务分配结果;
    根据所述任务分配结果,更新所述任务分配模型的模型参数。
  12. 一种任务调度装置,其特征在于,应用于执行设备,所述执行设备配置有预训练的位置模型以及任务分配模型,所述预训练的位置模型以及任务分配模型由权利要求7所述的模型构建装置进行训练获得,所述任务调度装置包括:
    状态获取模块,被配置成用于获取第二移动终端与第二无人机之间当前时刻的第三状态;
    位置确定模块,被配置成用于根据所述第三状态,通过所述位置模型确定所述第二无人机在下一时刻的预测位置;
    所述状态获取模块,还被配置成用于根据所述第二无人机在下一时刻的预测位置,确定所述第二无人机与第二移动终端之间的第四状态;
    任务分配模块,被配置成用于根据所述第四状态,通过所述任务分配模型确定所述第二无人机与所述第二移动终端之间的任务分配结果。
  13. 一种电子设备,其特征在于,所述电子设备包括处理器以及存储器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,实现权利要求1至7中的任意一项所述的模型构建方法或者权利要求8至10中的任一项所述的任务分配方法。
  14. 一种存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时,实现权利要求1至7中的任意一项所述的模型构建方法或者权利要求8至10中的任一项所述的任务分配方法。
PCT/CN2021/128250 2021-03-22 2021-11-02 模型构建方法、任务分配方法、装置、设备及介质 WO2022199032A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110302078.9A CN113032904B (zh) 2021-03-22 2021-03-22 模型构建方法、任务分配方法、装置、设备及介质
CN202110302078.9 2021-03-22

Publications (1)

Publication Number Publication Date
WO2022199032A1 true WO2022199032A1 (zh) 2022-09-29

Family

ID=76472366

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/128250 WO2022199032A1 (zh) 2021-03-22 2021-11-02 模型构建方法、任务分配方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN113032904B (zh)
WO (1) WO2022199032A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115915275A (zh) * 2022-10-25 2023-04-04 大连海事大学 一种面向近海的无人机辅助中继数据卸载方法
CN116112981A (zh) * 2023-04-13 2023-05-12 东南大学 一种基于边缘计算的无人机任务卸载方法
CN116384695A (zh) * 2023-04-11 2023-07-04 中国人民解放军陆军工程大学 基于独立否决和联合否决的无人机运用监测方法及系统
CN116757450A (zh) * 2023-08-17 2023-09-15 浪潮通用软件有限公司 一种共享中心的任务分配的方法、装置、设备及介质
CN117311991A (zh) * 2023-11-28 2023-12-29 苏州元脑智能科技有限公司 模型训练方法、任务分配方法、装置、设备、介质及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032904B (zh) * 2021-03-22 2021-11-23 北京航空航天大学杭州创新研究院 模型构建方法、任务分配方法、装置、设备及介质
CN114594793B (zh) * 2022-03-07 2023-04-25 四川大学 一种基站无人机的路径规划方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109976909A (zh) * 2019-03-18 2019-07-05 中南大学 边缘计算网络中基于学习的低延时任务调度方法
CN111132009A (zh) * 2019-12-23 2020-05-08 北京邮电大学 物联网的移动边缘计算方法、装置及系统
CN111160525A (zh) * 2019-12-17 2020-05-15 天津大学 一种边缘计算环境下基于无人机群的任务卸载智能决策方法
US20200302431A1 (en) * 2019-03-21 2020-09-24 Verizon Patent And Licensing Inc. System and method for allocating multi-access edge computing services
CN112351503A (zh) * 2020-11-05 2021-02-09 大连理工大学 基于任务预测的多无人机辅助边缘计算资源分配方法
CN113032904A (zh) * 2021-03-22 2021-06-25 北京航空航天大学杭州创新研究院 模型构建方法、任务分配方法、装置、设备及介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428115A (zh) * 2019-08-13 2019-11-08 南京理工大学 基于深度强化学习的动态环境下的最大化系统效益方法
CN110794965B (zh) * 2019-10-23 2021-06-04 湖南师范大学 一种基于深度强化学习的虚拟现实语言任务卸载方法
CN111708355B (zh) * 2020-06-19 2023-04-18 中国人民解放军国防科技大学 基于强化学习的多无人机动作决策方法和装置
CN112118287B (zh) * 2020-08-07 2023-01-31 北京工业大学 基于交替方向乘子算法与移动边缘计算的网络资源优化调度决策方法
CN112069903B (zh) * 2020-08-07 2023-12-22 之江实验室 基于深度强化学习实现人脸识别端边卸载计算方法及装置
US20210014132A1 (en) * 2020-09-22 2021-01-14 Ned M. Smith Orchestrator execution planning using a distributed ledger
CN112491964B (zh) * 2020-11-03 2022-05-31 中国人民解放军国防科技大学 移动辅助边缘计算方法、装置、介质和设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109976909A (zh) * 2019-03-18 2019-07-05 中南大学 边缘计算网络中基于学习的低延时任务调度方法
US20200302431A1 (en) * 2019-03-21 2020-09-24 Verizon Patent And Licensing Inc. System and method for allocating multi-access edge computing services
CN111160525A (zh) * 2019-12-17 2020-05-15 天津大学 一种边缘计算环境下基于无人机群的任务卸载智能决策方法
CN111132009A (zh) * 2019-12-23 2020-05-08 北京邮电大学 物联网的移动边缘计算方法、装置及系统
CN112351503A (zh) * 2020-11-05 2021-02-09 大连理工大学 基于任务预测的多无人机辅助边缘计算资源分配方法
CN113032904A (zh) * 2021-03-22 2021-06-25 北京航空航天大学杭州创新研究院 模型构建方法、任务分配方法、装置、设备及介质

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115915275A (zh) * 2022-10-25 2023-04-04 大连海事大学 一种面向近海的无人机辅助中继数据卸载方法
CN115915275B (zh) * 2022-10-25 2023-08-08 大连海事大学 一种面向近海的无人机辅助中继数据卸载方法
CN116384695A (zh) * 2023-04-11 2023-07-04 中国人民解放军陆军工程大学 基于独立否决和联合否决的无人机运用监测方法及系统
CN116384695B (zh) * 2023-04-11 2024-01-26 中国人民解放军陆军工程大学 基于独立否决和联合否决的无人机运用监测方法及系统
CN116112981A (zh) * 2023-04-13 2023-05-12 东南大学 一种基于边缘计算的无人机任务卸载方法
CN116757450A (zh) * 2023-08-17 2023-09-15 浪潮通用软件有限公司 一种共享中心的任务分配的方法、装置、设备及介质
CN116757450B (zh) * 2023-08-17 2024-01-30 浪潮通用软件有限公司 一种共享中心的任务分配的方法、装置、设备及介质
CN117311991A (zh) * 2023-11-28 2023-12-29 苏州元脑智能科技有限公司 模型训练方法、任务分配方法、装置、设备、介质及系统
CN117311991B (zh) * 2023-11-28 2024-02-23 苏州元脑智能科技有限公司 模型训练方法、任务分配方法、装置、设备、介质及系统

Also Published As

Publication number Publication date
CN113032904B (zh) 2021-11-23
CN113032904A (zh) 2021-06-25

Similar Documents

Publication Publication Date Title
WO2022199032A1 (zh) 模型构建方法、任务分配方法、装置、设备及介质
CN112351503B (zh) 基于任务预测的多无人机辅助边缘计算资源分配方法
Asheralieva et al. Hierarchical game-theoretic and reinforcement learning framework for computational offloading in UAV-enabled mobile edge computing networks with multiple service providers
CN112911648A (zh) 一种空地结合的移动边缘计算卸载优化方法
CN113543176A (zh) 基于智能反射面辅助的移动边缘计算系统的卸载决策方法
Nguyen et al. Real-time energy harvesting aided scheduling in UAV-assisted D2D networks relying on deep reinforcement learning
Chen et al. Deep reinforcement learning based resource allocation in multi-UAV-aided MEC networks
CN113254188B (zh) 调度优化方法和装置、电子设备及存储介质
Callegaro et al. Optimal edge computing for infrastructure-assisted UAV systems
US20230199061A1 (en) Distributed computation offloading method based on computation-network collaboration in stochastic network
CN115640131A (zh) 一种基于深度确定性策略梯度的无人机辅助计算迁移方法
WO2022242468A1 (zh) 任务卸载方法、调度优化方法和装置、电子设备及存储介质
CN114169234A (zh) 一种无人机辅助移动边缘计算的调度优化方法及系统
Dai et al. Mobile crowdsensing for data freshness: A deep reinforcement learning approach
Braquet et al. Greedy decentralized auction-based task allocation for multi-agent systems
Wei et al. Joint UAV trajectory planning, DAG task scheduling, and service function deployment based on DRL in UAV-empowered edge computing
Hwang et al. Deep reinforcement learning approach for uav-assisted mobile edge computing networks
Yan et al. Optimizing mobile edge computing multi-level task offloading via deep reinforcement learning
CN115967430A (zh) 一种基于深度强化学习的成本最优空地网络任务卸载方法
CN116723548A (zh) 一种基于深度强化学习的无人机辅助计算卸载方法
Zhu et al. Online Distributed Learning-Based Load-Aware Heterogeneous Vehicular Edge Computing
CN116321181A (zh) 一种多无人机辅助边缘计算的在线轨迹及资源优化方法
CN115499441A (zh) 超密集网络中基于深度强化学习的边缘计算任务卸载方法
CN114698125A (zh) 移动边缘计算网络的计算卸载优化方法、装置及系统
CN114302456A (zh) 一种移动边缘计算网络考虑任务优先级的计算卸载方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932640

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932640

Country of ref document: EP

Kind code of ref document: A1