CN111767991B - Measurement and control resource scheduling method based on deep Q learning - Google Patents

Measurement and control resource scheduling method based on deep Q learning Download PDF

Info

Publication number
CN111767991B
CN111767991B CN202010609039.9A CN202010609039A CN111767991B CN 111767991 B CN111767991 B CN 111767991B CN 202010609039 A CN202010609039 A CN 202010609039A CN 111767991 B CN111767991 B CN 111767991B
Authority
CN
China
Prior art keywords
measurement
control
task
resource
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010609039.9A
Other languages
Chinese (zh)
Other versions
CN111767991A (en
Inventor
郭茂耘
武艺
唐奇
梁皓星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202010609039.9A priority Critical patent/CN111767991B/en
Publication of CN111767991A publication Critical patent/CN111767991A/en
Application granted granted Critical
Publication of CN111767991B publication Critical patent/CN111767991B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to a measurement and control resource scheduling method based on deep Q learning, and belongs to the field of intelligent scheduling. The method comprises the following steps: s1: describing a complex measurement and control scene; s2: designing measurement and control scheduling performance evaluation indexes; s3: forming a measurement and control resource scheduling scheme; s4: the DQN algorithm is applied to the generation of a measurement and control resource scheduling scheme; s5: and (3) implementing the DQN-based measurement and control resource scheduling method. According to the method and the system, the measurement and control resource scheduling strategy which is suitable for the measurement and control scene can be generated in the complex measurement and control environment under the condition that accurate modeling is not needed for the measurement and control environment, so that the maximization of the measurement and control resource scheduling efficiency is achieved.

Description

Measurement and control resource scheduling method based on deep Q learning
Technical Field
The invention belongs to the field of intelligent scheduling, and relates to a measurement and control resource scheduling method based on deep Q learning.
Background
At present, the method for solving the problem of satellite measurement and control resource scheduling mainly comprises the following steps: intelligent algorithms such as ant colony algorithm, particle swarm algorithm, SVM method, etc., deterministic algorithms such as Lagrange relaxation algorithm, etc., heuristic algorithms such as greedy algorithm, neighborhood search algorithm, simulated annealing algorithm, etc. The research on the aspect of the space-earth integrated measurement and control resources is relatively less, and the research is carried out from the aspect of the traditional algorithm, such as Lagrange relaxation algorithm, ant colony algorithm and genetic algorithm, and the application on the aspect of the deep reinforcement learning algorithm is relatively less.
The invention mainly solves the conflict between the measurement and control resources and the measurement and control objects caused by the increasing measurement and control tasks. From the perspective of visibility between measurement and control resources and measurement and control objects, a measurement and control scene based on a measurement and control time window is constructed, the optimal running period of a measurement and control task is solved by deep Q learning (DeepQNetwork, DQN), and finally an optimal measurement and control scheduling scheme is formed, so that the optimal running of a measurement and control system under specific indexes is realized.
Disclosure of Invention
In view of the above, the present invention aims to provide a measurement and control resource scheduling method based on deep Q learning. Aiming at the current situation that the conflict between the existing measurement and control tasks and the number of measurement and control resources is increasingly strong, the situation that the number of the measurement and control resources is limited is considered that the measurement and control tasks are still limited by various conditions such as the visibility of the measurement and control resources and the measurement and control objects, the measurement and control duration time, the priority of the measurement and control tasks and the like, so that the scheduling of the measurement and control resources becomes a complex combination optimization problem under various space-time constraint conditions. The measurement and control service and the measurement and control range of a single kind of measurement and control resources have the difference and limitation, and the measurement and control tasks tend to be complicated and diversified, so that the difficulty of measurement and control scheduling decision is increased continuously, and the joint scheduling of the space-earth measurement and control resources is necessary, so that the comprehensive scheduling performance of the space-earth integrated measurement and control resources is optimal.
The invention aims to construct a measurement and control resource scheduling realization method based on deep reinforcement learning, which utilizes the deep reinforcement learning to realize the intelligent scheduling of the measurement and control resources of the integration of the heaven and earth, performs more accurate abstraction and feature extraction on a measurement and control system and a measurement and control scene, and finds a measurement and control resource scheduling scheme adaptive to the measurement and control scene so as to fulfill the purposes of completing measurement and control tasks and improving the comprehensive efficiency of the utilization of the measurement and control resources. The novel application of the DQN algorithm is realized by abstracting the resource scheduling problem under the multi-constraint condition.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a measurement and control resource scheduling method based on deep Q learning comprises the following steps:
s1: describing a complex measurement and control scene;
s2: designing measurement and control scheduling performance evaluation indexes;
s3: forming a measurement and control resource scheduling scheme;
s4: the DQN algorithm is applied to the generation of a measurement and control resource scheduling scheme;
s5: and (3) implementing the DQN-based measurement and control resource scheduling method.
Optionally, the step S1 specifically includes:
(1) Description of entities in measurement and control scenarios
From the perspective of measurement and control resources of the heaven-earth integrated measurement and control system, describing elements in a measurement and control scene based on a visible time window;
the world integration measurement and control resource is described as follows:
RESOURCE={S,TYPE,TS,D S ,L,L MAX }
wherein S is a set of space-earth integrated measurement and control resources, wherein a plurality of measurement and control resources of multiple types are numbered uniformly, S= { S 1 ,s 2 ,...s j ,...s M -a }; j is the number of the measurement and control resources, M is the total number of all the measurement and control resources;
TYPE characterizes the TYPEs of measurement and control resources, the measurement and control resources are day-based measurement and control resources when TYPE is 1, and the resources are foundation measurement and control resources when TYPE is 0;
TS characterizes idle time window for each measurement and control resource, namely the time window currently available for measurement and control;
TS={TS 1 ,TS 2 ,...TS j ,...TS M }
={[t b1 (s 1 ),t e1 (s 1 )],[t b2 (s 1 ),t e2 (s 1 )],...,[t b1 (s 2 ),t e1 (s 2 )],[t b2 (s 2 ),t e2 (s 2 )].....,....[t b1 (s M ),t e1 (s M )]}
TS j characterizing all available time windows, i.e. idle time windows, t, of the jth measurement and control resource b1 (s j ) And t e1 (s j ) Respectively represent the 1 st measurement and control resourceThe start time and the end time of the 1 st visible time window, the order of the visible windows is marked according to the time sequence, and so on;
D S characterizing the length of each idle time window of a measurement and control resource
Characterizing the length of a kth idle time window of a jth measurement and control resource;
LS j representing occupation of single measurement and control resources by all medium-low orbit satellites Representing the load occupation condition of a measurement and control task i on a single measurement and control resource j, wherein i represents the sequence of the measurement and control tasks, and n is the total number of the measurement and control tasks;
l represents the occupation of all middle-low orbit satellites on the space-earth integrated measurement and control resources; the method comprises the following steps:
L={L S1 ,L S2 ,...,L Sj ,...L SM }
={L 1 ,L 2 ,...L i ,...L n },
L Sj representing the load occupation condition of all measurement and control tasks on a single measurement and control resource j;
L MAX ={L MAX1 ,L MAX2 ,...L MAXj ,...L MAXM }
L MAXj the maximum receivable measurement and control task load of the measurement and control resource j is represented, namely the maximum load of the measurement and control resource;
from the perspective of measurement and control tasks, describing elements in a measurement and control scene based on a visible time window; the measurement and control tasks are described as follows:
TASK={T,Sat,P,D,T A ,T C ,T Oi }
wherein T is the numbered set of all measurement and control tasks, t= { T 1 ,T 2 ,...T i ...T n };
T i The number of the measurement and control task is represented; in the formula and the following formulas, i is the order of measurement and control tasks, and n is the total number of the measurement and control tasks;
sat characterizes measurement and control task sources, namely corresponding task satellites, and sat= { Sat 1 ,Sat 2 ,…Sat o }
Sat i A source satellite representing measurement and control tasks in the order of i;
p is the priority of the measurement and control task, P= { P 1 ,P 2 ,...P i ...P n },P i The priority of measurement and control tasks with the sequence of i is represented;
d is the shortest measurement and control time D= { D corresponding to each measurement and control task 1 ,d 2 ,...d i ...d n );d i Representing the shortest duration of the measurement and control tasks in order i;
T A time interval representing measurement and control task capable of measuring and controlling
T A ={[t 1B ,t 1E ],[t 2B ,t 2E ],....[t iB ,t iE ],...[t nB ,t nE ]};
[t iB ,t iE ]Time window for indicating measurement and control tasks with order of i to be available for measurement and control tasks, t iB To measure and control the earliest start time of a task, t iE The latest ending time of the measurement and control task;
T C actual measurement and control interval for representing task
T C ={[t 1b ,t 1e ],[t 2b ,t 2e ],....[t ib ,t ie ],...[t nb ,t ne ]};
[t ib ,t ie ]Indicating that the measurement and control tasks in order i are actually performedTime window, t ib To measure and control the actual start time after task scheduling, t ie The actual end time after actual scheduling of the measurement and control task is obtained;
To i describing a set of visible arc segments corresponding to each task
Representing the kth visible time window of the mth measurement and control resource for the measurement and control task with the order of i, specifically represented as [ t ] b1 (s im ),t e1 (s im )],t b1 (s im ) T is the start time of the visible window e1 (s im ) An end time for the visible window;
(2) Measurement and control state design
The design of the measurement and control state s is expressed by utilizing a visible time window for different visible states/available states in the measurement and control system on the basis of the utilization condition of measurement and control resources, namely on the basis of the visibility of a time space; for a specific measurement and control scene, a 0-1 matrix capable of representing the state of each measurement and control resource is used as the state of the measurement and control scene, and the size of the 0-1 matrix is determined by the number of measurement and control resources and the dividing scale of a measurement and control time window; for each measurement and control resource, determining a division scale according to specific requirements, dividing the daily working time of the measurement and control resource, and marking the visible state of the divided measurement and control equipment time interval, wherein the matrix state corresponding to visible/usable unit time is set to 0, the matrix state corresponding to invisible/unusable unit time is set to 1, and determining the service condition of the measurement and control equipment at a certain determined moment, namely the measurement and control state;
(3) Design of measurement and control actions
The design of the measurement and control actions adopts a layer-by-layer progressive decision idea to sequentially determine whether to accept the measurement and control task, the measurement and control resources of the accepted measurement and control task are specifically used for the measurement and control time interval of the task, and the measurement and control actions are designed as follows:
X i =(a i ,type,x ij ,y jk ,t ib )
wherein ,ai Whether the measurement and control task is accepted or not is represented, type represents the type of measurement and control resources accepting the measurement and control task, and x ij A measurement and control resource number for representing and receiving measurement and control tasks, y jk Representing the execution of a measurement and control task with the kth visible time window of resource j, t ib The actual start time of the measurement and control task is characterized.
Optionally, the step S2 specifically includes:
designing a comprehensive measurement and control performance evaluation index taking the satisfaction degree of a measurement and control task, the load balance degree of measurement and control resources and the average utilization rate of the measurement and control resources into consideration, and using the comprehensive measurement and control performance evaluation index as a decision basis for the application of the DQN algorithm in measurement and control scheduling; a scheduling strategy for maximizing the comprehensive evaluation index is expected to be obtained by measuring and controlling the resource scheduling;
setting the evaluation index of the scheduling performance of the measurement and control resource to r=s R *RUR/load;
wherein ,sR The satisfaction degree of the measurement and control task is represented, the load represents the load balance degree of the measurement and control resources, and the RUR represents the average utilization rate of all the measurement and control resources;
satisfaction of measurement and control task:
measuring and controlling the load balance degree of the resource:
wherein :
average utilization rate of measurement and control resources:
optionally, the step S3 specifically includes:
according to the design of the measurement and control actions in the step S1, the formation of a measurement and control scheduling scheme also mainly comprises the steps of determining whether to accept a measurement and control task, determining measurement and control resources for carrying out the measurement and control task and determining three aspects of measurement and control arc segments for completing the measurement and control task;
specifically: determining whether to accept the measurement and control task by judging whether the visible time window of the measurement and control task exists or not according to the modeling basis that the visible time window can be used for the measurement and control state of the visible arc section; in the process of modeling a measurement and control scene, uniformly numbering measurement and control resources and measurement and control tasks, solving visible arc segments meeting the conditions aiming at specific measurement and control tasks, and determining the resource types and numbers for completing the measurement and control tasks according to the corresponding relation between the visible arc segments and the measurement and control resources;
in the design of the measurement and control state, the visible arc segments corresponding to the measurement and control tasks are discretized, and the sliding of the measurement and control arc segments is carried out on the selected visible arc segments according to the possible starting time of the measurement and control tasks, so that the optimal measurement and control arc segments capable of completing the tasks are determined.
Optionally, the step S4 specifically includes:
(1) The task state at the current moment changes, the visible time window of the measurement and control resource changes, and the measurement and control state of the system changes;
(2) Updating the measurement and control environment, extracting scene characteristics, and updating the measurement and control state of the system;
(3) According to action selection rules of the deep reinforcement learning algorithm, a decision strategy of measurement and control actions is selected, so that measurement and control resources are matched with measurement and control tasks in time and space, and the realization of the measurement and control tasks is completed;
(4) Evaluating and feeding back the result of measurement and control scheduling of the measurement and control environment and the measurement and control state caused by the selected measurement and control strategy;
(5) According to the evaluation feedback result of the measurement and control strategy, the deep reinforcement learning network is utilized to update the measurement and control decision strategy, and the measurement and control scene and the measurement and control state are observed to update;
and through cyclic and reciprocating algorithm updating, the selection and optimization of the measurement and control resource allocation strategy are realized, and the selection of the optimal measurement and control scheduling strategy is realized.
Optionally, the step S5 specifically includes:
(1) Describing a measurement and control scene, and defining basic physical elements in the scene; based on actual physical scenes, relevant elements involved in the DQN method of measurement and control scheduling are arranged and summarized, and the measurement and control state, measurement and control actions, measurement and control action rewards and the constitution of basic elements of a measurement and control scheme are defined;
(2) Initializing a deep Q learning measurement and control resource scheduling network, initializing a memory bank according to actual capacity requirements, and initializing network parameters including learning rate, discount factors, and structures and parameters of an actual value neural network and a target value neural network for describing a Q value;
(3) Designing a measurement and control state s according to the measurement and control scene model, initializing the input of a measurement and control scheduling network, and calculating the corresponding output; the measurement and control actions are randomly selected according to the probability epsilon, the measurement and control actions are selected according to the probability 1-epsilon through the Q value output by the measurement and control scheduling network, namely epsilon-greedy strategy, and corresponding measurement and control actions are executed in the measurement and control resource scheduling network; obtaining rewards r after action execution, namely evaluation indexes of measurement and control actions, and measuring and control states before next action execution, namely measuring and control states s at next moment i+1 The method comprises the steps of carrying out a first treatment on the surface of the Calculating the Q values of the actual value neural network and the next moment of the current value neural network in the measurement and control scheduling network according to the currently selected measurement and control action and the current state, namely the actual Q value and the estimated Q value;
(4) Four parameters (s i ,X i ,r i ,s i+1 ) Storing the samples together as a sample in a memory bank;
(5) Randomly taking out a certain number of sample states from the memory library, calculating a target value of each state, and updating the Q value as the target value through the executed reward; updating the actual value neural network parameters through a random gradient descent method, and assigning the current parameters in the actual value neural network to the target value neural network after the actual value neural network parameters are iteratively updated every N times, so that the updating of the target value neural network parameters in the measurement and control scheduling network is realized; continuously updating parameters to train the measurement and control scheduling network;
(6) The selection and optimization of the measurement and control resource allocation strategy are realized through the cyclic and reciprocating algorithm updating, and the selection of the optimal measurement and control scheduling strategy is realized; and (5) completing the measurement and control resource scheduling process.
The invention has the beneficial effects that: the method can generate the measurement and control resource scheduling strategy which is suitable for the measurement and control scene under the condition that accurate modeling is not needed for the measurement and control environment in the complex measurement and control environment, thereby maximizing the measurement and control resource scheduling efficiency.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a measurement and control state design;
FIG. 2 is a flow chart for forming a measurement and control resource scheduling scheme;
FIG. 3 is a DQN-based measurement and control resource scheduling decision flow;
FIG. 4 is a schematic diagram of measurement and control states in an embodiment.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.
Please refer to fig. 1-4, which are a measurement and control resource scheduling method based on deep Q learning.
The invention relates to a measurement and control resource scheduling method based on a DQN algorithm, which mainly utilizes the strong model description capability of a neural network in the DQN algorithm to describe long-term rewards of measurement and control actions by constructing a measurement and control scene based on a visible window between measurement and control resources and measurement and control objects, breaks the relevance between data by utilizing a memory playback mechanism mode in the measurement and control scene, learns and evaluates the quality of the current state to learn an optimal strategy through interaction with the measurement and control scene, and adapts to a complex measurement and control resource scheduling environment. The technical scheme of the method is as follows:
1. description of Complex measurement and control scenarios
In the complex measurement and control scene, measurement and control resources mainly refer to space-day integrated measurement and control resources, namely foundation measurement and control resources and space-base measurement and control resources, wherein the foundation measurement and control resources mainly aim at ground stations, and the space-base measurement and control resources mainly consider tracking and data relay satellites. The type of the measurement and control resource is defined through a type variable. The description of the measurement and control scene is mainly based on the description of the visible states of the measurement and control resources and the measurement and control objects and the visible time window. Specifically, the complex measurement and control scene is described by carrying out abstract expression (including description of measurement and control tasks and related constraint conditions therein) on each physical entity in the measurement and control scene and designing the measurement and control state and the measurement and control action.
(1) Description of entities in measurement and control scenarios
From the perspective of measurement and control resources of the heaven-earth integrated measurement and control system, elements in a measurement and control scene are described based on a visible time window.
The world integrated measurement and control resources can be described as:
RESOURCE={S,TYPE,TS,D S ,L,L MAX }
wherein S is a set of space-earth integrated measurement and control resources, wherein a plurality of measurement and control resources of multiple types are numbered uniformly, S= { S 1 ,s 2 ,...s j ,...s M -a }; in the formula and the following formulas, j is the number of the measurement and control resources, and M is the total number of all the measurement and control resources.
TYPE characterizes the TYPEs of measurement and control resources, the measurement and control resources are day-based measurement and control resources when TYPE is 1, and the resources are foundation measurement and control resources when TYPE is 0;
TS characterizes an idle time window (i.e., a time window currently available for measurement and control) for each measurement and control resource;
TS={TS 1 ,TS 2 ,...TS j ,...TS M }
={[t b1 (s 1 ),t e1 (s 1 )],[t b2 (s 1 ),t e2 (s 1 )],...,[t b1 (s 2 ),t e1 (s 2 )],[t b2 (s 2 ),t e2 (s 2 )].....,....[t b1 (s M ),t e1 (s M )]}
TS j characterizing all available time windows (i.e., idle time windows) of the jth measurement and control resource, t b1 (s j ) And t e1 (s j ) The starting time and the ending time of the 1 st visible time window of the 1 st measurement and control resource are respectively represented, and the sequence of the visible windows is marked according to the time sequence. And so on.
D S Characterizing the length of each idle time window of a measurement and control resource And the length of a kth idle time window of the jth measurement and control resource is represented.
LS j Representing occupation of single measurement and control resources by all medium-low orbit satellites And (3) representing the load occupation condition of the measurement and control task i on a single measurement and control resource j, wherein i represents the sequence of the measurement and control tasks, and n is the total number of the measurement and control tasks.
L represents the occupation of all middle-low orbit satellites on the space-earth integrated measurement and control resources. The method comprises the following steps:
L Sj and the load occupation condition of all measurement and control tasks for a single measurement and control resource j is represented.
L MAX ={L MAX1 ,L MAX2 ,...L MAXj ,...L MAXM }
L MAXj And the maximum load of the measurement and control resource j can be received.
From the perspective of measurement and control tasks, elements in a measurement and control scene are described based on a visible time window. The measurement and control tasks can be described as:
wherein T is the numbered set of all measurement and control tasks, t= { T 1 ,T 2 ,...T i ...T n }
T i And the number of the measurement and control task is represented. In this formula and the following formulas, i is the order of the measurement and control tasks, and n is the total number of the measurement and control tasks.
Sat characterizes measurement and control task sources, namely corresponding task satellites, and sat= { Sat 1 ,Sat 2 ,…Sat o }
Sat i Representing the source satellites of measurement and control tasks in order i.
P is the priority of the measurement and control task, P= { P 1 ,P 2 ,...P i ...P n },P i Indicating the priority of the measurement and control tasks in order i.
D is the shortest measurement and control time D= { D corresponding to each measurement and control task 1 ,d 2 ,...d i ...d n );d i Representing the shortest duration of the measurement and control tasks in order i.
T A Time interval representing measurement and control task capable of measuring and controlling
T A ={[t 1B ,t 1E ],[t 2B ,t 2E ],....[t iB ,t iE ],...[t nB ,t nE ]};
[t iB ,t iE ]Time window for indicating measurement and control tasks with order of i to be available for measurement and control tasks, t iB To measure and control the earliest start time of a task, t iE To measure and control the latest ending time of the task.
T C Actual measurement and control interval for representing task
T C ={[t 1b ,t 1e ],[t 2b ,t 2e ],....[t ib ,t ie ],...[t nb ,t ne ]};
[t ib ,t ie ]Time window, t, representing actual progress of measurement and control task in order i ib To measure and control the actual start time after task scheduling, t ie To measure and control the actual end time after the actual scheduling of tasks.
To i Describing a set of visible arc segments corresponding to each task
Represents the kth visible time window of the mth measurement and control resource for the measurement and control task with the order of i, which can be specifically represented as [ t ] b1 (s im ),t e1 (s im )],t b1 (s im ) T is the start time of the visible window e1 (s im ) Is the end time of the visible window.
(2) Measurement and control state design
The design of the measurement and control state s is expressed by utilizing a visible time window for different visible states/available states in the measurement and control system on the basis of the utilization condition of measurement and control resources, namely on the basis of time space visibility. As shown in fig. 1, for a specific measurement and control scenario, a 0-1 matrix capable of representing the state of each measurement and control resource is used as the state of the measurement and control scenario, and the size of the matrix is determined by the number of measurement and control resources and the dividing scale of the measurement and control time window. For each measurement and control resource, determining a division scale according to specific requirements, dividing the daily working time of the measurement and control resource, and marking the visible state of the divided measurement and control equipment time interval, wherein the matrix state corresponding to the visible/usable unit time is set to 0, and the matrix state corresponding to the invisible/unusable unit time is set to 1, so that the use condition, namely the measurement and control state, of the measurement and control equipment at a certain determined moment is determined.
(3) Design of measurement and control actions
The design of the measurement and control action adopts a layer-by-layer progressive decision idea to sequentially determine whether to accept the measurement and control task, and the measurement and control resource of the accepted measurement and control task is specifically used for the measurement and control time interval of the task, so that the measurement and control action is designed as follows:
X i =(a i ,type,x ij ,y jk ,t ib )
wherein ,ai Whether the measurement and control task is accepted or not is represented, type represents the type of measurement and control resources accepting the measurement and control task, and x ij A measurement and control resource number for representing and receiving measurement and control tasks, y jk Representing the execution of a measurement and control task with the kth visible time window of resource j, t ib The actual start time of the measurement and control task is characterized.
2. Measurement and control scheduling performance evaluation index design
In the method, a comprehensive measurement and control performance evaluation index taking the three indexes of the satisfaction degree of the measurement and control task, the load balance degree of the measurement and control resource and the average utilization rate of the measurement and control resource into consideration is designed and is used as a decision basis for the application of the DQN algorithm in measurement and control scheduling. The scheduling strategy for maximizing the comprehensive evaluation index is expected to be obtained by the measurement and control resource scheduling.
Specifically, the measurement and control resource scheduling performance evaluation index is set to r=s R *RUR/load。
wherein ,sR The satisfaction degree of the measurement and control task is represented, the load represents the load balance degree of the measurement and control resources, and the RUR represents the average utilization rate of all the measurement and control resources.
Satisfaction of measurement and control task:
measuring and controlling the load balance degree of the resource:
wherein :
average utilization rate of measurement and control resources:
3. measurement and control resource scheduling scheme formation
According to the design of the measurement and control actions in the step 1, the formation of the measurement and control scheduling scheme mainly comprises the steps of determining whether to accept the measurement and control task, determining measurement and control resources for carrying out the measurement and control task and determining three aspects of measurement and control arc segments for completing the measurement and control task. Specifically: the invention mainly takes the visible arc section as the modeling basis of the measurement and control state according to the visible time window, so that whether the measurement and control task is accepted or not is determined by judging whether the visible time window of the measurement and control task exists or not according to the specific measurement and control task. In the process of modeling the measurement and control scene, the measurement and control resources and the measurement and control tasks are uniformly numbered, so that the visible arc segments meeting the conditions are solved for the specific measurement and control tasks, and the resource types and the numbers for completing the measurement and control tasks can be determined according to the corresponding relation between the visible arc segments and the measurement and control resources. In the design of the measurement and control state, the visible arc segments corresponding to the measurement and control task are discretized, so that the sliding of the measurement and control arc segments is performed on the selected visible arc segments according to the possible starting time of the measurement and control task, and the optimal measurement and control arc segments capable of completing the task are determined.
Therefore, the measurement and control resource scheduling scheme forming flow is as shown in fig. 2:
application of DQN algorithm in generation of measurement and control resource scheduling scheme
In the method, based on a deep reinforcement learning framework and a DQN learning principle, the following measurement and control resource scheduling decision flow can be constructed, so that a measurement and control resource scheduling strategy with optimal measurement and control efficiency is selected.
The implementation steps can be summarized as follows:
(1) The task state at the current moment changes, the visible time window of the measurement and control resource changes, and the measurement and control state of the system changes.
(2) And updating the measurement and control environment, extracting scene characteristics, and updating the measurement and control state of the system.
(3) And selecting a decision strategy of measurement and control actions according to action selection rules of the deep reinforcement learning algorithm, so that the measurement and control resources are matched with the measurement and control tasks in time and space, and the realization of the measurement and control tasks is completed.
(4) And evaluating and feeding back the measurement and control scheduling result aiming at the measurement and control environment and the update of the measurement and control state caused by the selected measurement and control strategy.
(5) And updating the measurement and control decision strategy by using the deep reinforcement learning network according to the evaluation feedback result of the measurement and control strategy, and observing the update of the measurement and control scene and the measurement and control state.
And through cyclic and reciprocating algorithm updating, the selection and optimization of the measurement and control resource allocation strategy are realized, and the selection of the optimal measurement and control scheduling strategy is realized.
5. DQN-based measurement and control resource scheduling method implementation flow
(1) Describing a measurement and control scene, and defining basic physical elements in the scene. Based on the actual physical scene, relevant elements related to the DQN method of measurement and control scheduling are arranged and summarized, and the basic elements such as measurement and control states, measurement and control actions, measurement and control action rewards, measurement and control schemes and the like are defined.
(2) Initializing a deep Q learning measurement and control resource scheduling network, initializing a memory bank according to actual capacity requirements, and initializing network parameters including learning rate, discount factors, and structures and parameters of an actual value neural network and a target value neural network for describing a Q value.
(3) And designing a measurement and control state s according to the measurement and control scene model, initializing the input of a measurement and control scheduling network, and calculating the corresponding output. With probabilityAnd epsilon randomly selecting a measurement and control action, selecting the measurement and control action (namely epsilon-greedy strategy) by using the probability 1-epsilon through the Q value output by the measurement and control scheduling network, and executing the corresponding measurement and control action in the measurement and control resource scheduling network. Obtaining rewards r (namely evaluation index of measurement and control actions) after action execution, and measuring and control state s before next action execution, namely measurement and control state s at next moment i+1 . And calculating the Q values of the actual value neural network and the next moment of the current value neural network in the measurement and control scheduling network according to the currently selected measurement and control action and the current state, namely the actual Q value and the estimated Q value.
(4) Four parameters (s i ,X i ,r i ,s i+1 ) Stored together as a sample in a memory bank.
(5) A certain number of sample states are randomly fetched from the memory bank, and a target value of each state is calculated (Q value is updated as a target value by the reward after execution). And updating the actual value neural network parameters through a random gradient descent method, and assigning the current parameters in the actual value neural network to the target value neural network after the actual value neural network parameters are iteratively updated every N times, so that the updating of the target value neural network parameters in the measurement and control scheduling network is realized. And continuously updating parameters to train the measurement and control scheduling network.
(6) And through cyclic and reciprocating algorithm updating, the selection and optimization of the measurement and control resource allocation strategy are realized, and the selection of the optimal measurement and control scheduling strategy is realized. And (5) completing the measurement and control resource scheduling process.
Examples:
1. and describing a complex measurement and control scene. Taking 2 foundation measurement and control resources and 1 space-based measurement and control resources as an example, and taking a measurement and control scene needing to finish 9 measurement and control tasks as a sample, initializing and uniformly describing the measurement and control resource scene. According to the actual measurement and control scene, from the aspect of the integrated measurement and control resources, the measurement and control scene can be described as the following form:
the measurement and control resources of the heaven and earth integrated measurement and control system are as follows:
RESOURCE={S,TYPE,TS,D S ,L,L MAX }
wherein S is one of heaven and earthSet of materialized measurement and control resources, s= { S 1 ,s 2 ,...s j ,...s M }
TYPE characterizes the TYPEs of measurement and control resources, the measurement and control resources are day-based measurement and control resources when TYPE is 1, and the resources are foundation measurement and control resources when TYPE is 0;
the TS characterizes the idle time window (i.e. the time window currently available for measurement and control) for each measurement and control resource,
D S characterizing the length of each idle time window of a measurement and control resource
LS j Representing occupation of single measurement and control resources by all medium-low orbit satellites
L represents the occupation of all middle-low orbit satellites on the space-earth integrated measurement and control resources. The method comprises the following steps:
L={L S1 ,L S2 ,...,L Sj ,...L SM }
={L 1 ,L 2 ,...L i ,...L n }
from the perspective of measurement and control tasks, the description of elements in a measurement and control scene based on a visible time window is as follows:
TASK={T,Sat,P,D,T A ,T C ,T Oi }
wherein T is a set of measurement and control tasks of all medium-low orbit satellites, and T= { T 1 ,T 2 ,...T i ...T n }
Sat characterizes measurement and control task sources, namely corresponding task satellites, and sat= { Sat 1 ,Sat 2 ,...Sat o }
P is the priority of the measurement and control task, P= { P 1 ,P 2 ,...P i ...P n }
D is the shortest measurement and control time D= { D corresponding to each measurement and control task 1 ,d 2 ,...d i ...d n );
T A Time interval T representing measurement and control task capable of measuring and controlling A ={[t 1B ,t 1E ],[t 2B ,t 2E ],....[t iB ,t iE ],...[t nB ,t nE ]},
T C Actual measurement and control interval T for representing task C ={[t 1b ,t 1e ],[t 2b ,t 2e ],....[t ib ,t ie ],...[t nb ,t ne ]},
To i Describing a set of visible arc segments corresponding to each task
And designing a measurement and control state s according to the measurement and control scene model, wherein for a specific measurement and control scene, a 0-1 matrix capable of representing the state of each measurement and control resource is used as the measurement and control state of the measurement and control scene. Taking 1h as a division scale as an example, under the measurement and control scene, 3 measurement and control resources are shared, so that the measurement and control state matrix size is 3×24 for each day, wherein the matrix state corresponding to visible/usable unit time is set to 0, and the matrix state corresponding to invisible/unusable unit time is set to 1. Accordingly, in this case, the measurement and control state can be visualized by fig. 4.
The measurement and control actions, namely decision variables, are described as follows:
wherein ,ai Whether the measurement and control task is accepted or not is represented, type represents the type of measurement and control resources accepting the measurement and control task, and x ij A measurement and control resource number for representing and receiving measurement and control tasks, y jk Representing resource jThe kth visible time window executes the measurement and control task, t ib The actual start time of the measurement and control task is characterized.
The measurement and control scheduling performance evaluation index is expressed as r=s R * RUR/load, comprehensively evaluating scheduling performance of measurement and control resources, wherein s R The satisfaction degree of the measurement and control task is represented, the load represents the balance degree of the utilization of measurement and control resources, and the RUR represents the average utilization rate of all the measurement and control resources.
2. Constructing a convolutional neural network to describe the Q value in the measurement and control resource scheduling network according to the measurement and control scene requirement, wherein the actual value neural network and the target value neural network are respectively two convolutional neural networks with the same structure and the incompletely same parameters, the convolutional neural network comprises 2 convolutional layers and 1 full-connection layer, and a sigmoid function is adopted as an activation function of the convolutional neural network. In the initialization process of the deep Q learning measurement and control resource scheduling network, a memory library is initialized according to the actual capacity requirement, and network parameters including learning rate, discount factors and related parameters of an actual value neural network and a target value neural network for describing the Q value are initialized.
3. And (3) further refining the measurement and control state, the measurement and control action rewards and the measurement and control scheme according to the specific description of the measurement and control scene in the step (1). On the basis, the design of the measurement and control state s is carried out according to the measurement and control scene model, the input of a measurement and control scheduling network is initialized, and the corresponding output is calculated. And randomly selecting measurement and control actions by using the probability epsilon, selecting the measurement and control actions by using the probability 1-epsilon through the Q value output by the measurement and control scheduling network (namely epsilon-greedy strategy), and executing corresponding actions in the measurement and control resource scheduling network. Obtaining the rewards r after the action is executed and the measurement and control state before the next action is executed, namely the measurement and control state s at the next moment i+1 . And calculating the actual value neural network and the Q value of the current value neural network in the measurement and control scheduling network at the next moment according to the currently selected measurement and control action and the current state.
4. Four parameters (s i ,X i ,r i ,s i+1 ) Stored together as a sample in a memory bank.
5. A certain number of sample states are randomly fetched from the memory bank, and a target value of each state is calculated (Q value is updated as a target value by the reward after execution). And updating the actual value neural network parameters through a random gradient descent method, and assigning the current parameters in the actual value neural network to the target value neural network after the actual value neural network parameters are iteratively updated every N times, so that the updating of the target value neural network parameters in the measurement and control scheduling network is realized.
And continuously updating parameters to train the measurement and control scheduling network.
6. And through cyclic and reciprocating algorithm updating, the selection and optimization of the measurement and control resource allocation strategy are realized, and the selection of the optimal measurement and control scheduling strategy is realized. And (5) completing the measurement and control resource scheduling process.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (5)

1. A measurement and control resource scheduling method based on deep Q learning is characterized in that: the method comprises the following steps:
s1: describing a complex measurement and control scene;
s2: designing measurement and control scheduling performance evaluation indexes;
s3: forming a measurement and control resource scheduling scheme;
s4: the DQN algorithm is applied to the generation of a measurement and control resource scheduling scheme;
s5: the measurement and control resource scheduling method based on DQN is implemented;
the step S1 specifically comprises the following steps:
(1) Description of entities in measurement and control scenarios
From the perspective of measurement and control resources of the heaven-earth integrated measurement and control system, describing elements in a measurement and control scene based on a visible time window;
the world integration measurement and control resource is described as follows:
RESOURCE={S,TYPE,TS,D S ,L,L MAX }
wherein S is a set of space-earth integrated measurement and control resources, wherein a plurality of measurement and control resources of multiple types are numbered uniformly, S= { S 1 ,s 2 ,...s j ,...s M -a }; j is the number of the measurement and control resources, M is the total number of all the measurement and control resources;
TYPE characterizes the TYPEs of measurement and control resources, the measurement and control resources are day-based measurement and control resources when TYPE is 1, and the resources are foundation measurement and control resources when TYPE is 0;
TS characterizes idle time window for each measurement and control resource, namely the time window currently available for measurement and control;
TS={TS 1 ,TS 2 ,...TS j ,...TS M }
={[t b1 (s 1 ),t e1 (s 1 )],[t b2 (s 1 ),t e2 (s 1 )],...,[t b1 (s 2 ),t e1 (s 2 )],[t b2 (s 2 ),t e2 (s 2 )].....,....[t b1 (s M ),t e1 (s M )]}
TS j characterizing all available time windows, i.e. idle time windows, t, of the jth measurement and control resource b1 (s j ) And t e1 (s j ) The starting time and the ending time of the 1 st visible time window of the jth measurement and control resource are respectively represented, the sequence of the visible windows is marked according to the time sequence, and so on;
D S characterizing the length of each idle time window of a measurement and control resource
Characterizing the length of a kth idle time window of a jth measurement and control resource;
LS j representing occupation of single measurement and control resources by all medium-low orbit satellites Representing the load occupation condition of a measurement and control task i on a single measurement and control resource j, wherein i represents the sequence of the measurement and control tasks, and n is the total number of the measurement and control tasks;
l represents the occupation of all middle-low orbit satellites on the space-earth integrated measurement and control resources; the method comprises the following steps:
L Sj representing the load occupation condition of all measurement and control tasks on a single measurement and control resource j;
L MAX ={L MAX1 ,L MAX2 ,...L MAXj ,...L MAXM }
L MAXj the maximum receivable measurement and control task load of the measurement and control resource j is represented, namely the maximum load of the measurement and control resource;
from the perspective of measurement and control tasks, describing elements in a measurement and control scene based on a visible time window; the measurement and control tasks are described as follows:
wherein T is the numbered set of all measurement and control tasks, t= { T 1 ,T 2 ,...T i ...T n };
T i The number of the measurement and control task is represented; in the formula and the following formulas, i is the order of measurement and control tasks, and n is the total number of the measurement and control tasks;
sat characterizes measurement and control task sources, namely corresponding task satellites, and sat= { Sat 1 ,Sat 2 ,…Sat o }
Sat i A source satellite representing measurement and control tasks in the order of i;
p is the priority of the measurement and control task, P= { P 1 ,P 2 ,...P i ...P n },P i The priority of measurement and control tasks with the sequence of i is represented;
d is the shortest measurement and control time D= { D corresponding to each measurement and control task 1 ,d 2 ,...d i ...d n );d i Representing the shortest duration of the measurement and control tasks in order i;
T A time interval representing measurement and control task capable of measuring and controlling
T A ={[t 1B ,t 1E ],[t 2B ,t 2E ],....[t iB ,t iE ],...[t nB ,t nE ]};
[t iB ,t iE ]Time window for indicating measurement and control tasks with order of i to be available for measurement and control tasks, t iB To measure and control the earliest start time of a task, t iE The latest ending time of the measurement and control task;
T C actual measurement and control interval for representing task
T C ={[t 1b ,t 1e ],[t 2b ,t 2e ],....[t ib ,t ie ],...[t nb ,t ne ]};
[t ib ,t ie ]Time window, t, representing actual progress of measurement and control task in order i ib To measure and control the actual start time after task scheduling, t ie The actual end time after actual scheduling of the measurement and control task is obtained;
To i describing a set of visible arc segments corresponding to each task
Representing the kth visible time window of the mth measurement and control resource for the measurement and control task with the order of i, specifically represented as [ t ] b1 (s im ),t e1 (s im )],t b1 (s im ) T is the start time of the visible window e1 (s im ) An end time for the visible window;
(2) Measurement and control state design
The design of the measurement and control state s is expressed by utilizing a visible time window for different visible states/available states in the measurement and control system on the basis of the utilization condition of measurement and control resources, namely on the basis of the visibility of a time space; for a specific measurement and control scene, a 0-1 matrix capable of representing the state of each measurement and control resource is used as the state of the measurement and control scene, and the size of the 0-1 matrix is determined by the number of measurement and control resources and the dividing scale of a measurement and control time window; for each measurement and control resource, determining a division scale according to specific requirements, dividing the daily working time of the measurement and control resource, and marking the visible state of the divided measurement and control equipment time interval, wherein the matrix state corresponding to visible/usable unit time is set to 0, the matrix state corresponding to invisible/unusable unit time is set to 1, and determining the service condition of the measurement and control equipment at a certain determined moment, namely the measurement and control state;
(3) Design of measurement and control actions
The design of the measurement and control actions adopts a layer-by-layer progressive decision idea to sequentially determine whether to accept the measurement and control task, the measurement and control resources of the accepted measurement and control task are specifically used for the measurement and control time interval of the task, and the measurement and control actions with the sequence of i are designed as follows:
X i =(a i ,type,x ij ,y jk ,t ib )
wherein ,ai Whether the observing and controlling task with the order of i is accepted or not is represented, type represents the type of observing and controlling resource of the observing and controlling task with the order of i, and x ij Measurement and control resource number for representing measurement and control task with receiving order of i, y jk Representing the execution of a measurement and control task with the kth visible time window of resource j, t ib Characterization of the actual measurement and control tasks in order iThe time of the start.
2. The measurement and control resource scheduling method based on deep Q learning according to claim 1, wherein the method comprises the following steps: the step S2 specifically comprises the following steps:
designing a comprehensive measurement and control performance evaluation index taking the satisfaction degree of a measurement and control task, the load balance degree of measurement and control resources and the average utilization rate of the measurement and control resources into consideration, and using the comprehensive measurement and control performance evaluation index as a decision basis for the application of the DQN algorithm in measurement and control scheduling; a scheduling strategy for maximizing the comprehensive evaluation index is expected to be obtained by measuring and controlling the resource scheduling;
setting the evaluation index of the scheduling performance of the measurement and control resource to r=s R *RUR/load;
wherein ,sR The satisfaction degree of the measurement and control task is represented, the load represents the load balance degree of the measurement and control resources, and the RUR represents the average utilization rate of all the measurement and control resources;
satisfaction of measurement and control task:
measuring and controlling the load balance degree of the resource:
wherein ,
average utilization rate of measurement and control resources:
3. the measurement and control resource scheduling method based on deep Q learning according to claim 1, wherein the method comprises the following steps: the step S3 specifically comprises the following steps:
according to the design of the measurement and control actions in the step S1, the formation of a measurement and control scheduling scheme also mainly comprises the steps of determining whether to accept a measurement and control task, determining measurement and control resources for carrying out the measurement and control task and determining three aspects of measurement and control arc segments for completing the measurement and control task;
specifically: determining whether to accept the measurement and control task by judging whether the visible time window of the measurement and control task exists or not according to the modeling basis that the visible time window can be used for the measurement and control state of the visible arc section; in the process of modeling a measurement and control scene, uniformly numbering measurement and control resources and measurement and control tasks, solving visible arc segments meeting the conditions aiming at specific measurement and control tasks, and determining the resource types and numbers for completing the measurement and control tasks according to the corresponding relation between the visible arc segments and the measurement and control resources;
in the design of the measurement and control state, the visible arc segments corresponding to the measurement and control tasks are discretized, and the sliding of the measurement and control arc segments is carried out on the selected visible arc segments according to the possible starting time of the measurement and control tasks, so that the optimal measurement and control arc segments capable of completing the tasks are determined.
4. The measurement and control resource scheduling method based on deep Q learning according to claim 1, wherein the method comprises the following steps: the step S4 specifically includes:
(1) The task state at the current moment changes, the visible time window of the measurement and control resource changes, and the measurement and control state of the system changes;
(2) Updating the measurement and control environment, extracting scene characteristics, and updating the measurement and control state of the system;
(3) According to action selection rules of the deep reinforcement learning algorithm, a decision strategy of measurement and control actions is selected, so that measurement and control resources are matched with measurement and control tasks in time and space, and the realization of the measurement and control tasks is completed;
(4) Evaluating and feeding back the result of measurement and control scheduling of the measurement and control environment and the measurement and control state caused by the selected measurement and control strategy;
(5) According to the evaluation feedback result of the measurement and control strategy, the deep reinforcement learning network is utilized to update the measurement and control decision strategy, and the measurement and control scene and the measurement and control state are observed to update;
and through cyclic and reciprocating algorithm updating, the selection and optimization of the measurement and control resource allocation strategy are realized, and the selection of the optimal measurement and control scheduling strategy is realized.
5. The measurement and control resource scheduling method based on deep Q learning according to claim 1, wherein the method comprises the following steps: the step S5 specifically comprises the following steps:
(1) Describing a measurement and control scene, and defining basic physical elements in the scene; based on actual physical scenes, relevant elements involved in the DQN method of measurement and control scheduling are arranged and summarized, and the measurement and control state, measurement and control actions, measurement and control action rewards and the constitution of basic elements of a measurement and control scheme are defined;
(2) Initializing a deep Q learning measurement and control resource scheduling network, initializing a memory bank according to actual capacity requirements, and initializing network parameters including learning rate, discount factors, and structures and parameters of an actual value neural network and a target value neural network for describing a Q value;
(3) Designing a measurement and control state s according to the measurement and control scene model, initializing the input of a measurement and control scheduling network, and calculating the corresponding output; the measurement and control actions are randomly selected according to the probability epsilon, the measurement and control actions are selected according to the probability 1-epsilon through the Q value output by the measurement and control scheduling network, namely epsilon-greedy strategy, and corresponding measurement and control actions are executed in the measurement and control resource scheduling network; obtaining rewards r after action execution, namely evaluation indexes of measurement and control actions, and measuring and control states before next action execution, namely measuring and control states s at next moment i+1 The method comprises the steps of carrying out a first treatment on the surface of the Calculating the Q values of the actual value neural network and the next moment of the current value neural network in the measurement and control scheduling network according to the currently selected measurement and control action and the current state, namely the actual Q value and the estimated Q value;
(4) Four parameters (s i ,X i ,r i ,s i+1 ) Storing the samples together as a sample in a memory bank;
(5) Randomly taking out a certain number of sample states from the memory library, calculating a target value of each state, and updating the Q value as the target value through the executed reward; updating the actual value neural network parameters through a random gradient descent method, and assigning the current parameters in the actual value neural network to the target value neural network after the actual value neural network parameters are iteratively updated every N times, so that the updating of the target value neural network parameters in the measurement and control scheduling network is realized; continuously updating parameters to train the measurement and control scheduling network;
(6) The selection and optimization of the measurement and control resource allocation strategy are realized through the cyclic and reciprocating algorithm updating, and the selection of the optimal measurement and control scheduling strategy is realized; and (5) completing the measurement and control resource scheduling process.
CN202010609039.9A 2020-06-29 2020-06-29 Measurement and control resource scheduling method based on deep Q learning Active CN111767991B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010609039.9A CN111767991B (en) 2020-06-29 2020-06-29 Measurement and control resource scheduling method based on deep Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010609039.9A CN111767991B (en) 2020-06-29 2020-06-29 Measurement and control resource scheduling method based on deep Q learning

Publications (2)

Publication Number Publication Date
CN111767991A CN111767991A (en) 2020-10-13
CN111767991B true CN111767991B (en) 2023-08-15

Family

ID=72724129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010609039.9A Active CN111767991B (en) 2020-06-29 2020-06-29 Measurement and control resource scheduling method based on deep Q learning

Country Status (1)

Country Link
CN (1) CN111767991B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613332B (en) * 2021-07-14 2023-06-09 广东工业大学 Spectrum resource allocation method and system based on cooperative distributed DQN (differential signal quality network) joint simulated annealing algorithm
CN113779856B (en) * 2021-09-15 2023-06-27 成都中科合迅科技有限公司 Discrete particle swarm optimization modeling method for electronic system function online recombination

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798388A (en) * 2017-11-23 2018-03-13 航天天绘科技有限公司 The method of TT&C Resources dispatching distribution based on Multi Agent and DNN
CN109388484A (en) * 2018-08-16 2019-02-26 广东石油化工学院 A kind of more resource cloud job scheduling methods based on Deep Q-network algorithm
CN109409763A (en) * 2018-11-08 2019-03-01 北京航空航天大学 A kind of dynamic test assignment dispatching method and dispatching platform based on Greedy grouping strategy
CN109542613A (en) * 2017-09-22 2019-03-29 中兴通讯股份有限公司 Distribution method, device and the storage medium of service dispatch in a kind of CDN node
CN109729586A (en) * 2017-10-30 2019-05-07 上海诺基亚贝尔股份有限公司 Dispatching method, equipment and computer-readable medium based on window
CN109960544A (en) * 2019-03-26 2019-07-02 中国人民解放军国防科技大学 Task parallel scheduling method based on data driving type agile satellite
CN110781614A (en) * 2019-12-06 2020-02-11 北京工业大学 Shipboard aircraft tripping recovery online scheduling method based on deep reinforcement learning
CN111026549A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Automatic test resource scheduling method for power information communication equipment
CN111026548A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Power communication equipment test resource scheduling method for reverse deep reinforcement learning
CN111162831A (en) * 2019-12-24 2020-05-15 中国科学院遥感与数字地球研究所 Ground station resource scheduling method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9373960B2 (en) * 2013-03-13 2016-06-21 Oracle International Corporation Computerized system and method for distributed energy resource scheduling

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109542613A (en) * 2017-09-22 2019-03-29 中兴通讯股份有限公司 Distribution method, device and the storage medium of service dispatch in a kind of CDN node
CN109729586A (en) * 2017-10-30 2019-05-07 上海诺基亚贝尔股份有限公司 Dispatching method, equipment and computer-readable medium based on window
CN107798388A (en) * 2017-11-23 2018-03-13 航天天绘科技有限公司 The method of TT&C Resources dispatching distribution based on Multi Agent and DNN
CN109388484A (en) * 2018-08-16 2019-02-26 广东石油化工学院 A kind of more resource cloud job scheduling methods based on Deep Q-network algorithm
CN109409763A (en) * 2018-11-08 2019-03-01 北京航空航天大学 A kind of dynamic test assignment dispatching method and dispatching platform based on Greedy grouping strategy
CN109960544A (en) * 2019-03-26 2019-07-02 中国人民解放军国防科技大学 Task parallel scheduling method based on data driving type agile satellite
CN111026549A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Automatic test resource scheduling method for power information communication equipment
CN111026548A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Power communication equipment test resource scheduling method for reverse deep reinforcement learning
CN110781614A (en) * 2019-12-06 2020-02-11 北京工业大学 Shipboard aircraft tripping recovery online scheduling method based on deep reinforcement learning
CN111162831A (en) * 2019-12-24 2020-05-15 中国科学院遥感与数字地球研究所 Ground station resource scheduling method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的多星测控资源调度方法研究;武艺;《中国优秀硕士学位论文全文数据库 工程科技II辑》(第(2022)04期);C031-341 *

Also Published As

Publication number Publication date
CN111767991A (en) 2020-10-13

Similar Documents

Publication Publication Date Title
CN111176807B (en) Multi-star collaborative task planning method
CN112685165B (en) Multi-target cloud workflow scheduling method based on joint reinforcement learning strategy
US20220176248A1 (en) Information processing method and apparatus, computer readable storage medium, and electronic device
CN112231091B (en) Parallel cloud workflow scheduling method based on reinforcement learning strategy
CN111767991B (en) Measurement and control resource scheduling method based on deep Q learning
CN113784410B (en) Heterogeneous wireless network vertical switching method based on reinforcement learning TD3 algorithm
CN113361680A (en) Neural network architecture searching method, device, equipment and medium
CN109925718A (en) A kind of system and method for distributing the micro- end map of game
CN115115389A (en) Express customer loss prediction method based on value subdivision and integrated prediction
Gao et al. Multi-UAV task allocation based on improved algorithm of multi-objective particle swarm optimization
CN115099606A (en) Training method and terminal for power grid dispatching model
CN113887748B (en) Online federal learning task allocation method and device, and federal learning method and system
CN111832817A (en) Small world echo state network time sequence prediction method based on MCP penalty function
Fan et al. Generalized data distribution iteration
Taetragool et al. NeSS: A modified artificial bee colony approach based on nest site selection behavior
CN116892866B (en) Rocket sublevel recovery track planning method, rocket sublevel recovery track planning equipment and storage medium
Goel et al. Evolutionary ant colony algorithm using firefly-based transition for solving vehicle routing problems
Zhou et al. A novel mission planning method for UAVs’ course of action
CN114444737B (en) Pavement maintenance intelligent planning method based on transfer learning
CN113220437B (en) Workflow multi-target scheduling method and device
CN110378464A (en) The management method and device of the configuration parameter of artificial intelligence platform
Zhan et al. Dueling network architecture for multi-agent deep deterministic policy gradient
CN107480768A (en) Bayesian network structure adaptive learning method and device, storage device and terminal device
Tang et al. Deep sparse representation via deep dictionary learning for reinforcement learning
Chen et al. Reverse engineering a social agent-based hidden markov model—visage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant