CN111767991A - Measurement and control resource scheduling method based on deep Q learning - Google Patents

Measurement and control resource scheduling method based on deep Q learning Download PDF

Info

Publication number
CN111767991A
CN111767991A CN202010609039.9A CN202010609039A CN111767991A CN 111767991 A CN111767991 A CN 111767991A CN 202010609039 A CN202010609039 A CN 202010609039A CN 111767991 A CN111767991 A CN 111767991A
Authority
CN
China
Prior art keywords
measurement
control
task
resource
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010609039.9A
Other languages
Chinese (zh)
Other versions
CN111767991B (en
Inventor
郭茂耘
武艺
唐奇
梁皓星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202010609039.9A priority Critical patent/CN111767991B/en
Publication of CN111767991A publication Critical patent/CN111767991A/en
Application granted granted Critical
Publication of CN111767991B publication Critical patent/CN111767991B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to a measurement and control resource scheduling method based on deep Q learning, and belongs to the field of intelligent scheduling. The method comprises the following steps: s1: describing a complex measurement and control scene; s2: designing evaluation indexes of measurement and control scheduling performance; s3: forming a measurement and control resource scheduling scheme; s4: the DQN algorithm is applied to the generation of the measurement and control resource scheduling scheme; s5: and implementing the measurement and control resource scheduling method based on the DQN. The method and the device can generate the measurement and control resource scheduling strategy adaptive to the measurement and control scene in the complex measurement and control environment without accurately modeling the measurement and control environment, thereby achieving the maximization of the measurement and control resource scheduling efficiency.

Description

Measurement and control resource scheduling method based on deep Q learning
Technical Field
The invention belongs to the field of intelligent scheduling, and relates to a measurement and control resource scheduling method based on deep Q learning.
Background
At present, the method for solving the problem of scheduling the satellite measurement and control resources mainly comprises the following steps: intelligent algorithms such as an ant colony algorithm, a particle swarm algorithm, an SVM method and the like, a branch and bound algorithm, a deterministic algorithm such as a Lagrange relaxation algorithm and the like, and a heuristic algorithm such as a greedy algorithm, a neighborhood search algorithm, a simulated annealing algorithm and the like. The research on the aspect of space-ground integrated measurement and control resources is relatively less, and more researches are carried out from the perspective of traditional algorithms, such as Lagrange relaxation algorithm, ant colony algorithm and genetic algorithm, so that the application in the aspect of deep reinforcement learning algorithm is relatively less.
The invention mainly solves the conflict between the measurement and control resources and the measurement and control objects caused by the increasing measurement and control tasks. From the perspective of visibility between measurement and control resources and measurement and control objects, a measurement and control scene based on a measurement and control time window is constructed, the optimal execution time period of a measurement and control task is solved by utilizing deep Q learning (DeepQnetwork, DQN), and finally, an optimal measurement and control scheduling scheme is formed, so that the optimal operation of a measurement and control system under a specific index is realized.
Disclosure of Invention
In view of this, the present invention provides a measurement and control resource scheduling method based on deep Q learning. Aiming at the current situation that the conflict between the existing measurement and control tasks and the quantity of the measurement and control resources is increasingly severe, the situation that the measurement and control tasks are still limited by various conditions such as the visibility of the measurement and control resources and the measurement and control objects, the measurement and control duration, the priority of the measurement and control tasks and the like under the condition that the quantity of the measurement and control resources is limited is considered, so that the scheduling of the measurement and control resources becomes a complex combination optimization problem under various space-time constraint conditions. The measurement and control services and the measurement and control range of a single type of measurement and control resource have differences and limitations, and the measurement and control tasks tend to be more and more complex and diversified, so that the measurement and control scheduling decision difficulty is increased continuously, and therefore the combined scheduling of the space-to-ground measurement and control resources is necessary, and the comprehensive scheduling performance of the space-to-ground integrated measurement and control resources is optimal.
The invention aims to construct a measurement and control resource scheduling implementation method based on deep reinforcement learning, which utilizes the deep reinforcement learning to realize intelligent scheduling of the space-ground integrated measurement and control resources, performs more accurate abstraction and feature extraction on a measurement and control system and a measurement and control scene, and finds a measurement and control resource scheduling scheme adaptive to the measurement and control scene so as to fulfill the purposes of completing the measurement and control task and improving the comprehensive utilization efficiency of the measurement and control resources. The innovative application of the DQN algorithm is realized by abstracting the resource scheduling problem under the multi-constraint condition.
In order to achieve the purpose, the invention provides the following technical scheme:
a measurement and control resource scheduling method based on deep Q learning comprises the following steps:
s1: describing a complex measurement and control scene;
s2: designing evaluation indexes of measurement and control scheduling performance;
s3: forming a measurement and control resource scheduling scheme;
s4: the DQN algorithm is applied to the generation of the measurement and control resource scheduling scheme;
s5: and implementing the measurement and control resource scheduling method based on the DQN.
Optionally, step S1 specifically includes:
(1) description of entities in a measurement and control scenario
From the perspective of measurement and control resources of the space-ground integrated measurement and control system, elements in a measurement and control scene are described based on a visible time window;
the space-ground integrated measurement and control resources are described as follows:
RESOURCE={S,TYPE,TS,DS,L,LMAX}
wherein, S is a set of the space-ground integrated measurement and control resources, a plurality of measurement and control resources are numbered in a unified way, and S is { S ═ S }1,s2,...sj,...sM}; j is the number of the measurement and control resources, and M is the total number of all the measurement and control resources;
the TYPE represents the TYPE of the measurement and control resource, if the TYPE is 1, the measurement and control resource is a space-based measurement and control resource, and if the TYPE is 0, the resource is a foundation measurement and control resource;
TS represents an idle time window for each measurement and control resource, namely the current time window which can be used for measurement and control;
TS={TS1,TS2,...TSj,...TSM}
={[tb1(s1),te1(s1)],[tb2(s1),te2(s1)],...,[tb1(s2),te1(s2)],[tb2(s2),te2(s2)].....,....[tb1(sM),te1(sM)]}
TSjall available time windows, i.e. idle time windows, t, characterizing the jth measurement and control resourceb1(s1) And te1(s1) Respectively representing the starting time and the ending time of a 1 st visible time window of a jth measurement and control resource, marking the sequence of the visible windows according to the time sequence, and so on;
DS characterizes the length of each idle time window of the measurement and control resource
Figure BDA0002560215510000021
Figure BDA0002560215510000022
Characterizing the length of a kth idle time window of a jth measurement and control resource;
LSjindicating the occupation of a single measurement and control resource by all medium and low orbit satellites
Figure BDA0002560215510000023
Representing the load occupation condition of a single measurement and control resource j by a measurement and control task i, wherein i represents the number of the measurement and control tasks, and n is the total number of the measurement and control tasks;
l represents the occupation of all the medium and low orbit satellites on the heaven-earth integrated measurement and control resources; the method comprises the following specific steps:
Figure BDA0002560215510000031
Figure BDA0002560215510000032
representing the load occupation condition of all measurement and control tasks on a single measurement and control resource j;
LMAX={LMAX1,LMAX2,...LMAXj,...LMAXM}
LMAXjthe measurement and control task load which can be received by the measurement and control resource j at most, namely the maximum load of the measurement and control resource, is represented;
from the perspective of a measurement and control task, elements in a measurement and control scene are described based on a visible time window; the measurement and control task is described as follows:
TASK={T,Sat,P,D,TA,TC,TOi}
wherein, T is the number set of all measurement and control tasks, and T is { T ═ T1,T2,...Ti...Tn};
TiA number representing a measurement and control task; in the formula and the following formula, i is the order of the measurement and control tasks, and n is the total number of the measurement and control tasks;
sat represents a measurement and control task source, namely a corresponding task satellite, and Sat is { Sat ═1,Sat2,…Sato}
SatiA source satellite representing the measurement and control tasks with the sequence i;
p is the priority of the measurement and control task, and P is { P ═ P1,P2,...Pi...Pn},PiThe priority of the measurement and control tasks with the sequence i is represented;
d is the shortest measurement and control time D ═ D corresponding to each measurement and control task1,d2,...di...dn);diRepresenting the shortest duration of the measurement and control tasks with the sequence i;
TAtime interval for representing measurement and control task
TA={[t1B,t1E],[t2B,t2E],....[tiB,tiE],...[tnB,tnE]};
[tiB,tiE]Time window, t, indicating that the measurement and control task with the order i can perform the measurement and control taskiBFor the earliest starting time of the measurement and control task, tiEThe latest ending time of the measurement and control task is taken as the latest ending time of the measurement and control task;
TCactual measurement and control interval of characterization task
TC={[t1b,t1e],[t2b,t2e],....[tib,tie],...[tnb,tne]};
[tib,tie]Representing the time window, t, during which the measurement and control tasks in the order i are actually performedibActual start time, t, after scheduling for measurement and control tasksieThe actual end time after the actual scheduling of the measurement and control task is obtained;
Toidescribing sets of visible arc segments corresponding to respective tasks
Figure BDA0002560215510000041
Figure BDA0002560215510000042
The k-th visible time window of the m-th measurement and control resource for the measurement and control task with the sequence i is shown, and is specifically shown as [ tb1(sim),te1(sim)],tb1(sim) Is the start time of the visible window, te1(sim) Is the end time of the visible window;
(2) measurement and control state design
The design of the measurement and control state s is that different visual states/available states in the measurement and control system are expressed by using a visible time window according to the utilization condition of measurement and control resources, namely on the basis of the visibility of time space; for a specific measurement and control scene, a 0-1 matrix capable of representing the state of each measurement and control resource is used as the state of the measurement and control scene, and the size of the matrix is determined by the number of the measurement and control resources and the division scale of a measurement and control time window; for each measurement and control resource, determining a division scale according to specific requirements to divide the daily working time of the measurement and control resource, marking the visual state of the divided time interval of the measurement and control equipment, wherein the matrix state corresponding to the visual/available unit time is set to be 0, the matrix state corresponding to the invisible/unavailable unit time is set to be 1, and determining the use condition of the measurement and control equipment at a certain determined moment, namely the measurement and control state;
the step S3 specifically includes:
(3) design of measurement and control action
The design of the measurement and control actions adopts a progressive decision idea layer by layer to determine whether to accept the measurement and control task and receive the measurement and control resources of the measurement and control task, the measurement and control resources of the accepted task are specifically used in a measurement and control time interval of the task, and the measurement and control actions are designed as follows:
Figure BDA0002560215510000043
wherein ,aiWhether the measurement and control task is accepted is represented, type represents the type of the measurement and control resource accepting the measurement and control task, xijMeasurement and control resource number, y, representing the task of receiving measurement and controljkIndicating the measurement and control task is executed with the kth visible time window of the resource j, tibAnd characterizing the actual starting time of the measurement and control task.
Optionally, step S2 specifically includes:
designing a comprehensive measurement and control performance evaluation index taking three indexes of measurement and control task completion degree, measurement and control resource utilization balance degree and measurement and control resource load balance degree into consideration, and using the comprehensive measurement and control performance evaluation index as a decision basis for applying a DQN algorithm in measurement and control scheduling; the measurement and control resource scheduling expectation obtains a scheduling strategy which enables the comprehensive evaluation index to be maximum;
setting the evaluation index of the scheduling performance of the measurement and control resource as r ═ sR*RUR/load;
wherein ,sRRepresenting the satisfaction degree of the measurement and control task, representing the balance degree of the utilization of the measurement and control resources by load, and representing the average utilization rate of all the measurement and control resources by RUR;
the satisfaction degree of the measurement and control task is as follows:
Figure BDA0002560215510000051
Figure BDA0002560215510000052
and (3) measuring and controlling the resource load balance degree:
Figure BDA0002560215510000053
average utilization rate of measurement and control resources:
Figure BDA0002560215510000054
optionally, step S3 specifically includes:
according to the design of the measurement and control actions in the S1, the measurement and control scheduling scheme is formed by mainly determining whether to receive the measurement and control task, determining the measurement and control resources for performing the measurement and control task, and determining the measurement and control arc section for completing the measurement and control task;
specifically, the method comprises the following steps: according to the visible time window, namely the visible arc section, as a modeling basis of the measurement and control state, and aiming at a specific measurement and control task, whether the measurement and control task is received or not is determined by judging whether the visible time window of the measurement and control task exists or not; in the process of modeling a measurement and control scene, uniformly numbering measurement and control resources and measurement and control tasks, solving visible arc sections meeting conditions for specific measurement and control tasks, and determining the types and the numbers of the resources completing the measurement and control tasks according to the corresponding relation between the visible arc sections and the measurement and control resources;
in the design of a measurement and control state, a visible arc section corresponding to a measurement and control task is discretized, the measurement and control arc section slides on the selected visible arc section according to the possible starting time of the measurement and control task, and the optimal measurement and control arc section capable of completing the task is determined.
Optionally, step S4 specifically includes:
(1) when the task state at the current moment changes and the visible time window of the measurement and control resource changes, the measurement and control state of the system changes;
(2) updating a measurement and control environment, extracting scene characteristics, and updating the measurement and control state of the system;
(3) selecting a decision strategy of the measurement and control action according to an action selection rule of a deep reinforcement learning algorithm, so that measurement and control resources are matched with the measurement and control task in time and space, and the measurement and control task is realized;
(4) evaluating and feeding back the measurement and control scheduling result aiming at the update of the measurement and control environment and the measurement and control state caused by the selected measurement and control strategy;
(5) updating the measurement and control decision strategy by using a deep reinforcement learning network according to the evaluation feedback result of the measurement and control strategy, and observing the measurement and control scene and the updating of the measurement and control state;
and through cyclic algorithm updating, selection and optimization of measurement and control resource allocation strategies are realized, and selection of an optimal measurement and control scheduling strategy is realized.
Optionally, step S5 specifically includes:
(1) describing a measurement and control scene, and defining basic physical elements in the scene; based on an actual physical scene, related elements related in the DQN method for measurement and control scheduling are sorted and summarized, and the composition of measurement and control states, measurement and control actions, measurement and control action rewards and measurement and control scheme basic elements are determined;
(2) initializing a deep Q learning measurement and control resource scheduling network, initializing a memory base according to actual capacity requirements, and initializing network parameters including learning rate, discount factors and structures and parameters of an actual value neural network and a target value neural network describing a Q value;
(3) designing a measurement and control state s according to the measurement and control scene model, initializing the input of a measurement and control scheduling network, and calculating corresponding output; selecting the measurement and control action randomly according to the probability, selecting the measurement and control action according to the probability 1, namely a greedy strategy through a Q value output by a measurement and control scheduling network, and executing the corresponding measurement and control action in the measurement and control resource scheduling network; obtaining the reward r after the action is executed, namely the evaluation index of the measurement and control action and the measurement and control state s before the next action is executed, namely the measurement and control state s at the next momenti+1(ii) a Calculating the Q values of the actual value neural network and the current value neural network in the measurement and control scheduling network at the next moment according to the currently selected measurement and control action and the current state, namely the actual Q value and the estimated Q value;
(4) four parameters(s)i,Xi,ri,si+1) As a sample oneStoring the data into a memory library;
(5) randomly taking a certain number of sample states from a memory base, calculating a target value of each state, and updating a Q value as the target value through reward after execution; updating the actual value neural network parameters by a random gradient descent method, and assigning the current parameters in the actual value neural network to the target value neural network after the actual value neural network parameters are iteratively updated every N times, so that the target value neural network parameters in the measurement and control scheduling network are updated; continuously updating parameters to train the measurement and control scheduling network;
(6) the selection and optimization of measurement and control resource allocation strategies are realized through cyclic algorithm updating, and the selection of an optimal measurement and control scheduling strategy is realized; and finishing the measurement and control resource scheduling process.
The invention has the beneficial effects that: the method can generate the measurement and control resource scheduling strategy adaptive to the measurement and control scene in the complex measurement and control environment without accurately modeling the measurement and control environment, thereby achieving the maximization of the measurement and control resource scheduling efficiency.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic view of a measurement and control state design;
FIG. 2 is a flowchart illustrating the measurement and control resource scheduling scheme;
fig. 3 is a DQN-based measurement and control resource scheduling decision flow;
fig. 4 is a schematic view of a measurement and control state in the embodiment.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Please refer to fig. 1 to 4, which illustrate a measurement and control resource scheduling method based on deep Q learning.
The invention relates to a measurement and control resource scheduling method based on a DQN algorithm, which mainly comprises the steps of constructing a measurement and control scene based on a visible window between a measurement and control resource and a measurement and control object, depicting a long-term reward of a measurement and control action by utilizing the strong model description capability of a neural network in the DQN algorithm, and breaking the relevance between data by utilizing a memory playback mechanism mode, so that the optimal strategy is learned by learning and evaluating the quality of the current state through interaction with the measurement and control scene, and the method is suitable for a complex measurement and control resource scheduling environment. The technical scheme of the method is as follows:
1. description of complex measurement and control scenarios
In a complex measurement and control scene related by the method, measurement and control resources mainly refer to space integrated measurement and control resources, namely foundation measurement and control resources and space-based measurement and control resources, wherein the foundation measurement and control resources mainly aim at ground stations, and the space-based measurement and control resources mainly consider tracking and data relay satellites. The type of the measurement and control resource is clear through a type variable. The description of the measurement and control scene is mainly performed based on the description of the visible states and the visible time windows of the measurement and control resources and the measurement and control objects. Specifically, the description of the complex measurement and control scene is completed by performing abstract expression (including description of measurement and control tasks and related constraint conditions) on each physical entity in the measurement and control scene and designing a measurement and control state and measurement and control actions.
(1) Description of entities in a measurement and control scenario
From the perspective of measurement and control resources of the space-ground integrated measurement and control system, elements in a measurement and control scene are described based on a visible time window.
The space-ground integrated measurement and control resource can be described as follows:
RESOURCE={S,TYPE,TS,DS,L,LMAX}
wherein, S is a set of the space-ground integrated measurement and control resources, a plurality of measurement and control resources are numbered in a unified way, and S is { S ═ S }1,s2,...sj,...sM}; in the formula and the following formula, j is the number of the measurement and control resource, and M is the total number of all the measurement and control resources.
The TYPE represents the TYPE of the measurement and control resource, if the TYPE is 1, the measurement and control resource is a space-based measurement and control resource, and if the TYPE is 0, the resource is a foundation measurement and control resource;
TS characterizes an idle time window for each measurement and control resource (i.e. the time window currently available for measurement and control);
TS={TS1,TS2,...TSj,...TSM}
={[tb1(s1),te1(s1)],[tb2(s1),te2(s1)],...,[tb1(s2),te1(s2)],[tb2(s2),te2(s2)].....,....[tb1(sM),te1(sM)]}
TSjall available time windows (i.e. idle time windows), t, characterizing the jth measurement and control resourceb1(s1) And te1(s1) Respectively representing the starting time and the ending time of a 1 st visible time window of a jth measurement and control resource, wherein the sequence of the visible windows is marked according to the time sequence. And so on.
DS characterizes the length of each idle time window of the measurement and control resource
Figure BDA0002560215510000081
And characterizing the length of the kth idle time window of the jth measurement and control resource.
LSjIndicating the occupation of a single measurement and control resource by all medium and low orbit satellites
Figure BDA0002560215510000082
And representing the load occupation condition of the measurement and control task i to a single measurement and control resource j, wherein i represents the number of the measurement and control tasks, and n is the total number of the measurement and control tasks.
L represents the occupation of all the medium and low orbit satellites on the all-in-one measurement and control resources. The method comprises the following specific steps:
Figure BDA0002560215510000091
LSjand the load occupation situation of all the measurement and control tasks on a single measurement and control resource j is shown.
LMAX={LMAX1,LMAX2,...LMAXj,...LMAXM}
LMAXjAnd the measurement and control task load which can be received by the measurement and control resource j at most, namely the maximum load of the measurement and control resource, is shown.
From the perspective of the measurement and control task, elements in the measurement and control scene are described based on a visible time window. The measurement and control tasks can be described as:
TASK={T,Sat,P,D,TA,TC,TOi}
wherein, T is the number set of all measurement and control tasks, and T is { T ═ T1,T2,...Ti...Tn}
TiAnd the number of the measurement and control task is shown. In this formula and the following formulas, i is the order of the measurement and control tasks, and n is the total number of the measurement and control tasks.
Sat represents a measurement and control task source, namely a corresponding task satellite, and Sat is { Sat ═1,Sat2,…Sato}
SatiThe source satellite of the measurement and control task with the sequence i is shown.
P is the priority of the measurement and control task, and P is { P ═ P1,P2,...Pi...Pn},PiAnd indicating the priority of the measurement and control tasks with the sequence i.
D is the shortest measurement and control time D ═ D corresponding to each measurement and control task1,d2,...di...dn);diThe shortest duration of the measurement and control tasks in order i is indicated.
TATime interval for representing measurement and control task
TA={[t1B,t1E],[t2B,t2E],....[tiB,tiE],...[tnB,tnE]};
[tiB,tiE]Time window, t, indicating that the measurement and control task with the order i can perform the measurement and control taskiBFor the earliest starting time of the measurement and control task, tiEThe latest finishing time of the measurement and control task.
TCActual measurement and control interval of characterization task
TC={[t1b,t1e],[t2b,t2e],....[tib,tie],...[tnb,tne]};
[tib,tie]Representing the time window, t, during which the measurement and control tasks in the order i are actually performedibActual start time, t, after scheduling for measurement and control tasksieThe actual end time after the actual scheduling of the measurement and control task.
ToiDescribing sets of visible arc segments corresponding to respective tasks
Figure BDA0002560215510000101
Figure BDA0002560215510000102
The k-th visible time window of the m-th measurement and control resource for the measurement and control task with the sequence i is expressed as [ t ]b1(sim),te1(sim)],tb1(sim) Is the start time of the visible window, te1(sim) Is the end time of the visible window.
(2) Measurement and control state design
The design of the measurement and control state s is to express different visual states/available states in the measurement and control system by using a visible time window according to the utilization condition of measurement and control resources, namely on the basis of the visibility of time space. As shown in fig. 1, for a specific measurement and control scenario, a 0-1 matrix capable of representing the state of each measurement and control resource is used as the state of the measurement and control scenario, and the size of the matrix is determined by the number of measurement and control resources and the division scale of a measurement and control time window. For each measurement and control resource, determining a division scale according to specific requirements to divide the daily working time of the measurement and control resource, and marking the visual state of the divided time interval of the measurement and control equipment, wherein the matrix state corresponding to the visual/available unit time is set to be 0, and the matrix state corresponding to the invisible/unavailable unit time is set to be 1, so that the use condition of the measurement and control equipment at a certain determined moment, namely the measurement and control state, is determined.
(3) Design of measurement and control action
The design of the measurement and control actions adopts a progressive decision idea layer by layer to determine whether to accept the measurement and control task and receive the measurement and control resources of the measurement and control task, and the measurement and control resources of the accepted task are specifically used in a measurement and control time interval of the task, so the measurement and control actions are designed as follows:
Figure BDA0002560215510000103
wherein ,aiWhether the measurement and control task is accepted is represented, type represents the type of the measurement and control resource accepting the measurement and control task, xijMeasurement and control resource number, y, representing the task of receiving measurement and controljkIndicating the measurement and control task is executed with the kth visible time window of the resource j, tibAnd characterizing the actual starting time of the measurement and control task.
2. Evaluation index design for measurement and control scheduling performance
In the method, a comprehensive measurement and control performance evaluation index which takes three indexes of measurement and control task completion degree, measurement and control resource utilization balance degree and measurement and control resource load balance degree into consideration is designed and used as a decision basis for applying a DQN algorithm in measurement and control scheduling. The measurement and control resource scheduling expects to obtain a scheduling strategy which enables the comprehensive evaluation index to be maximum.
Specifically, the measurement and control resource scheduling performance evaluation index is set to be r ═ sR*RUR/load。
wherein ,sRThe method comprises the steps of representing the satisfaction degree of a measurement and control task, representing the balance degree of utilization of measurement and control resources by load, and representing the average utilization rate of all the measurement and control resources by RUR.
The satisfaction degree of the measurement and control task is as follows:
Figure BDA0002560215510000111
Figure BDA0002560215510000112
and (3) measuring and controlling the resource load balance degree:
Figure BDA0002560215510000113
average utilization rate of measurement and control resources:
Figure BDA0002560215510000114
3. measurement and control resource scheduling scheme formation
According to the design of the measurement and control actions in the step 1, the measurement and control scheduling scheme is formed by mainly determining whether to receive the measurement and control task, determining the measurement and control resources for performing the measurement and control task, and determining the measurement and control arc section for completing the measurement and control task. Specifically, the method comprises the following steps: the invention mainly takes the visible arc section as the modeling basis of the measurement and control state according to the visible time window, so that whether the measurement and control task is received or not is determined by judging whether the visible time window of the measurement and control task exists or not aiming at the specific measurement and control task. In the process of modeling the measurement and control scene, the measurement and control resources and the measurement and control tasks are uniformly numbered, so that the visible arc sections meeting the conditions are solved for the specific measurement and control tasks, and the types and the numbers of the resources completing the measurement and control tasks can be determined according to the corresponding relation between the visible arc sections and the measurement and control resources. In the design of the measurement and control state, the visible arc section corresponding to the measurement and control task is discretized, so that the measurement and control arc section slides on the selected visible arc section according to the possible starting time of the measurement and control task, and the optimal measurement and control arc section capable of completing the task is determined.
Therefore, the forming flow of the measurement and control resource scheduling scheme is shown in fig. 2:
application of DQN algorithm in generation of measurement and control resource scheduling scheme
In the method, based on a deep reinforcement learning framework and a learning principle of DQN, the following measurement and control resource scheduling decision process can be constructed, so that a measurement and control resource scheduling strategy with optimal measurement and control efficiency is selected.
The implementation steps can be summarized as follows:
(1) when the task state at the current moment changes and the visible time window of the measurement and control resource changes, the measurement and control state of the system changes.
(2) And updating the measurement and control environment, extracting scene characteristics and updating the measurement and control state of the system.
(3) And selecting a decision strategy of the measurement and control action according to an action selection rule of the deep reinforcement learning algorithm, so that the measurement and control resources are matched with the measurement and control task in time and space, and the measurement and control task is realized.
(4) And evaluating and feeding back the measurement and control scheduling result aiming at the update of the measurement and control environment and the measurement and control state caused by the selected measurement and control strategy.
(5) And updating the measurement and control decision strategy by using a deep reinforcement learning network according to the evaluation feedback result of the measurement and control strategy, and observing the measurement and control scene and the updating of the measurement and control state.
And through cyclic algorithm updating, selection and optimization of measurement and control resource allocation strategies are realized, and selection of an optimal measurement and control scheduling strategy is realized.
5. DQN-based measurement and control resource scheduling method implementation process
(1) Describing a measurement and control scene, and defining basic physical elements in the scene. Based on the actual physical scene, related elements related in the DQN method for measurement and control scheduling are arranged and summarized, and the composition of basic elements such as a measurement and control state, a measurement and control action, measurement and control action reward, a measurement and control scheme and the like is determined.
(2) The method comprises the steps of initializing a deep Q learning measurement and control resource scheduling network, initializing a memory base according to actual capacity requirements, and initializing network parameters including learning rate, discount factors and structures and parameters of an actual value neural network and a target value neural network for describing a Q value.
(3) And designing a measurement and control state s according to the measurement and control scene model, initializing the input of a measurement and control scheduling network, and calculating corresponding output. And randomly selecting the measurement and control action according to the probability, selecting the measurement and control action (namely-greedy strategy) according to the probability 1-through the Q value output by the measurement and control scheduling network, and executing the corresponding measurement and control action in the measurement and control resource scheduling network. Deriving reward r after action execution (i.e. measurement and control)Evaluation index of action) and the measurement and control state before the next action is executed, i.e. the measurement and control state s at the next momenti+1. And calculating the Q value of the actual value neural network and the current value neural network in the measurement and control scheduling network at the next moment according to the currently selected measurement and control action and the current state, namely the actual Q value and the estimated Q value.
(4) Four parameters(s)i,Xi,ri,si+1) Stored together as a sample in the memory bank.
(5) A certain number of sample states are randomly taken from the memory base, and the target value of each state is calculated (the Q value is updated as the target value by means of reward after execution). Updating the actual value neural network parameters by a random gradient descent method, and assigning the current parameters in the actual value neural network to the target value neural network after the actual value neural network parameters are iteratively updated every N times, so that the target value neural network parameters in the measurement and control scheduling network are updated. And continuously updating the parameters to train the measurement and control scheduling network.
(6) And through cyclic algorithm updating, selection and optimization of measurement and control resource allocation strategies are realized, and selection of an optimal measurement and control scheduling strategy is realized. And finishing the measurement and control resource scheduling process.
Example (b):
1. and describing a complex measurement and control scene. Taking a measurement and control scene with 2 foundation measurement and control resources, 1 space-based measurement and control resource and 9 measurement and control tasks to be completed as an example, initializing and uniformly describing the measurement and control resource scene. According to the actual measurement and control scene, from the perspective of the space-ground integrated measurement and control resources, the measurement and control scene can be described as the following form:
the measurement and control resources of the space-ground integrated measurement and control system are as follows:
RESOURCE={S,TYPE,TS,DS,L,LMAX}
wherein, S is a set of space-ground integrated measurement and control resources, and S is { S ═ S1,s2,...sj,...sM}
The TYPE represents the TYPE of the measurement and control resource, if the TYPE is 1, the measurement and control resource is a space-based measurement and control resource, and if the TYPE is 0, the resource is a foundation measurement and control resource;
the TS characterizes an idle time window for each measurement and control resource (i.e. the time window currently available for measurement and control),
TS={TS1,TS2,...TSj,...TSM};
={[tb1(s1),te1(s1)],[tb2(s1),te2(s1)],...,[tb1(s2),te1(s2)],[tb2(s2),te2(s2)].....,....[tb1(sM),te1(sM)]}
DS characterizes the length of each idle time window of the measurement and control resource
Figure BDA0002560215510000131
LSjIndicating the occupation of a single measurement and control resource by all medium and low orbit satellites
Figure BDA0002560215510000134
L represents the occupation of all the medium and low orbit satellites on the all-in-one measurement and control resources. The method comprises the following specific steps:
L={LS1,LS2,...,LSj,...LSM}
={L1,L2,...Li,...Ln}
from the perspective of the measurement and control task, the description of the elements in the measurement and control scene based on the visible time window is as follows:
TASK={T,Sat,P,D,TA,TC,TOi}
wherein, T is the set of measurement and control tasks of all the middle and low orbit satellites, and T is { T ═ T1,T2,...Ti...Tn}
Sat represents a measurement and control task source, namely a corresponding task satellite, and Sat is { Sat ═1,Sat2,…Sato}
P is measure and control renPriority of traffic, P ═ P1,P2,...Pi...Pn}
D is the shortest measurement and control time D ═ D corresponding to each measurement and control task1,d2,...di...dn);
TATime interval T for representing measurable and controllable taskA={[t1B,t1E],[t2B,t2E],....[tiB,tiE],...[tnB,tnE]},
TCActual measurement and control interval T of characterization taskC={[t1b,t1e],[t2b,t2e],....[tib,tie],...[tnb,tne]},
ToiDescribing sets of visible arc segments corresponding to respective tasks
Figure BDA0002560215510000133
And designing a measurement and control state s according to the measurement and control scene model, and regarding a specific measurement and control scene, taking a 0-1 matrix capable of representing the state of each measurement and control resource as the measurement and control state of the measurement and control scene. Taking 1h as an example of division scale, in the measurement and control scene, 3 measurement and control resources are shared, so that for each day, the measurement and control state matrix size is 3 × 24, and the matrix state corresponding to the visible/available unit time is set to be 0, and the matrix state corresponding to the invisible/unavailable unit time is set to be 1. Accordingly, in this case, the measurement and control state can be visually described by referring to fig. 4.
The measurement and control actions, i.e. the decision variables, are described as:
Figure BDA0002560215510000141
wherein ,aiWhether the measurement and control task is accepted is represented, type represents the type of the measurement and control resource accepting the measurement and control task, xijMeasurement and control resource number, y, representing the task of receiving measurement and controljkThe first to represent resource jk visible time windows perform measurement and control tasks, tibAnd characterizing the actual starting time of the measurement and control task.
The evaluation index of the measurement and control scheduling performance is expressed as r ═ sRRUR/load, comprehensively evaluating the measurement and control resource scheduling performance, wherein sRThe method comprises the steps of representing the satisfaction degree of a measurement and control task, representing the balance degree of utilization of measurement and control resources by load, and representing the average utilization rate of all the measurement and control resources by RUR.
2. According to the measurement and control scene requirements, a convolutional neural network is constructed to describe a Q value in a measurement and control resource scheduling network, wherein the actual value neural network and the target value neural network are respectively two convolutional neural networks with the same structure and incompletely identical parameters, the convolutional neural networks comprise 2 convolutional layers and 1 full-connection layer, and a sigmoid function is adopted as an activation function of the convolutional neural networks. In the initialization process of the deep Q learning measurement and control resource scheduling network, a memory base is initialized according to the actual capacity requirement, and network parameters including learning rate, discount factors and relevant parameters of an actual value neural network and a target value neural network describing a Q value are initialized.
3. According to the detailed description of the measurement and control scene in the step 1, the measurement and control state, the measurement and control action reward and the measurement and control scheme are further refined. On the basis, the measurement and control state s is designed according to the measurement and control scene model, the input of a measurement and control scheduling network is initialized, and corresponding output is calculated. And randomly selecting the measurement and control action according to the probability, selecting the measurement and control action (namely-greedy strategy) according to the probability 1 and the Q value output by the measurement and control scheduling network, and executing the corresponding action in the measurement and control resource scheduling network. Obtaining the reward r after the action is executed and the measurement and control state s before the next action is executed, namely the measurement and control state s at the next momenti+1. And calculating the Q value of the actual value neural network and the current value neural network in the measurement and control scheduling network at the next moment according to the currently selected measurement and control action and the current state.
4. Four parameters(s)i,Xi,ri,si+1) Stored together as a sample in the memory bank.
5. A certain number of sample states are randomly taken from the memory base, and the target value of each state is calculated (the Q value is updated as the target value by means of reward after execution). Updating the actual value neural network parameters by a random gradient descent method, and assigning the current parameters in the actual value neural network to the target value neural network after the actual value neural network parameters are iteratively updated every N times, so that the target value neural network parameters in the measurement and control scheduling network are updated.
And continuously updating the parameters to train the measurement and control scheduling network.
6. And through cyclic algorithm updating, selection and optimization of measurement and control resource allocation strategies are realized, and selection of an optimal measurement and control scheduling strategy is realized. And finishing the measurement and control resource scheduling process.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (6)

1. A measurement and control resource scheduling method based on deep Q learning is characterized in that: the method comprises the following steps:
s1: describing a complex measurement and control scene;
s2: designing evaluation indexes of measurement and control scheduling performance;
s3: forming a measurement and control resource scheduling scheme;
s4: the DQN algorithm is applied to the generation of the measurement and control resource scheduling scheme;
s5: and implementing the measurement and control resource scheduling method based on the DQN.
2. The measurement and control resource scheduling method based on deep Q learning according to claim 1, characterized in that: the step S1 specifically includes:
(1) description of entities in a measurement and control scenario
From the perspective of measurement and control resources of the space-ground integrated measurement and control system, elements in a measurement and control scene are described based on a visible time window;
the space-ground integrated measurement and control resources are described as follows:
RESOURCE={S,TYPE,TS,DS,L,LMAX}
wherein, S is a set of the space-ground integrated measurement and control resources, a plurality of measurement and control resources are numbered in a unified way, and S is { S ═ S }1,s2,...sj,...sM}; j is the number of the measurement and control resources, and M is the total number of all the measurement and control resources;
the TYPE represents the TYPE of the measurement and control resource, if the TYPE is 1, the measurement and control resource is a space-based measurement and control resource, and if the TYPE is 0, the resource is a foundation measurement and control resource;
TS represents an idle time window for each measurement and control resource, namely the current time window which can be used for measurement and control;
TS={TS1,TS2,...TSj,...TSM}
={[tb1(s1),te1(s1)],[tb2(s1),te2(s1)],...,[tb1(s2),te1(s2)],[tb2(s2),te2(s2)].....,....[tb1(sM),te1(sM)]}
TSjall available time windows, i.e. idle time windows, t, characterizing the jth measurement and control resourceb1(s1) And te1(s1) Respectively representing the starting time and the ending time of a 1 st visible time window of a jth measurement and control resource, marking the sequence of the visible windows according to the time sequence, and so on;
DS characterizes the length of each idle time window of the measurement and control resource
Figure FDA0002560215500000011
Figure FDA0002560215500000012
Characterize the jthMeasuring and controlling the length of the kth idle time window of the resource;
LSjindicating the occupation of a single measurement and control resource by all medium and low orbit satellites
Figure FDA0002560215500000014
Figure FDA0002560215500000013
Representing the load occupation condition of a single measurement and control resource j by a measurement and control task i, wherein i represents the number of the measurement and control tasks, and n is the total number of the measurement and control tasks;
l represents the occupation of all the medium and low orbit satellites on the heaven-earth integrated measurement and control resources; the method comprises the following specific steps:
Figure FDA0002560215500000021
LSjrepresenting the load occupation condition of all measurement and control tasks on a single measurement and control resource j;
LMAX={LMAX1,LMAX2,...LMAXj,...LMAXM}
LMAXjthe measurement and control task load which can be received by the measurement and control resource j at most, namely the maximum load of the measurement and control resource, is represented;
from the perspective of a measurement and control task, elements in a measurement and control scene are described based on a visible time window; the measurement and control task is described as follows:
TASK={T,Sat,P,D,TA,TC,TOi}
wherein, T is the number set of all measurement and control tasks, and T is { T ═ T1,T2,...Ti...Tn};
TiA number representing a measurement and control task; in the formula and the following formula, i is the order of the measurement and control tasks, and n is the total number of the measurement and control tasks;
sat represents a measurement and control task source, namely a corresponding task satellite, and Sat is { Sat ═1,Sat2,…Sato}
SatiA source satellite representing the measurement and control tasks with the sequence i;
p is the priority of the measurement and control task, and P is { P ═ P1,P2,...Pi...Pn},PiThe priority of the measurement and control tasks with the sequence i is represented;
d is the shortest measurement and control time D ═ D corresponding to each measurement and control task1,d2,...di...dn);diRepresenting the shortest duration of the measurement and control tasks with the sequence i;
TAtime interval for representing measurement and control task
TA={[t1B,t1E],[t2B,t2E],....[tiB,tiE],...[tnB,tnE]};
[tiB,tiE]Time window, t, indicating that the measurement and control task with the order i can perform the measurement and control taskiBFor the earliest starting time of the measurement and control task, tiEThe latest ending time of the measurement and control task is taken as the latest ending time of the measurement and control task;
TCactual measurement and control interval of characterization task
TC={[t1b,t1e],[t2b,t2e],....[tib,tie],...[tnb,tne]};
[tib,tie]Representing the time window, t, during which the measurement and control tasks in the order i are actually performedibActual start time, t, after scheduling for measurement and control tasksieThe actual end time after the actual scheduling of the measurement and control task is obtained;
Toidescribing sets of visible arc segments corresponding to respective tasks
Figure FDA0002560215500000031
Figure FDA0002560215500000032
Indicating for the measurement and control tasks in order iThe k-th visible time window for which m measurement and control resources are represented specifically as [ t ]b1(sim),te1(sim)],tb1(sim) Is the start time of the visible window, te1(sim) Is the end time of the visible window;
(2) measurement and control state design
The design of the measurement and control state s is that different visual states/available states in the measurement and control system are expressed by using a visible time window according to the utilization condition of measurement and control resources, namely on the basis of the visibility of time space; for a specific measurement and control scene, a 0-1 matrix capable of representing the state of each measurement and control resource is used as the state of the measurement and control scene, and the size of the matrix is determined by the number of the measurement and control resources and the division scale of a measurement and control time window; for each measurement and control resource, determining a division scale according to specific requirements to divide the daily working time of the measurement and control resource, marking the visual state of the divided time interval of the measurement and control equipment, wherein the matrix state corresponding to the visual/available unit time is set to be 0, the matrix state corresponding to the invisible/unavailable unit time is set to be 1, and determining the use condition of the measurement and control equipment at a certain determined moment, namely the measurement and control state;
the step S3 specifically includes:
(3) design of measurement and control action
The design of the measurement and control actions adopts a progressive decision idea layer by layer to determine whether to accept the measurement and control task and receive the measurement and control resources of the measurement and control task, the measurement and control resources of the accepted task are specifically used in a measurement and control time interval of the task, and the measurement and control actions are designed as follows:
Figure FDA0002560215500000033
wherein ,aiWhether the measurement and control task is accepted is represented, type represents the type of the measurement and control resource accepting the measurement and control task, xijMeasurement and control resource number, y, representing the task of receiving measurement and controljkIndicating the measurement and control task is executed with the kth visible time window of the resource j, tibAnd characterizing the actual starting time of the measurement and control task.
3. The measurement and control resource scheduling method based on deep Q learning according to claim 1, characterized in that: the step S2 specifically includes:
designing a comprehensive measurement and control performance evaluation index taking three indexes of measurement and control task completion degree, measurement and control resource utilization balance degree and measurement and control resource load balance degree into consideration, and using the comprehensive measurement and control performance evaluation index as a decision basis for applying a DQN algorithm in measurement and control scheduling; the measurement and control resource scheduling expectation obtains a scheduling strategy which enables the comprehensive evaluation index to be maximum;
setting the evaluation index of the scheduling performance of the measurement and control resource as r ═ sR*RUR/load;
wherein ,sRRepresenting the satisfaction degree of the measurement and control task, representing the balance degree of the utilization of the measurement and control resources by load, and representing the average utilization rate of all the measurement and control resources by RUR;
the satisfaction degree of the measurement and control task is as follows:
Figure FDA0002560215500000041
Figure FDA0002560215500000042
and (3) measuring and controlling the resource load balance degree:
Figure FDA0002560215500000043
average utilization rate of measurement and control resources:
Figure FDA0002560215500000044
4. the measurement and control resource scheduling method based on deep Q learning according to claim 1, characterized in that: the step S3 specifically includes:
according to the design of the measurement and control actions in the S1, the measurement and control scheduling scheme is formed by mainly determining whether to receive the measurement and control task, determining the measurement and control resources for performing the measurement and control task, and determining the measurement and control arc section for completing the measurement and control task;
specifically, the method comprises the following steps: according to the visible time window, namely the visible arc section, as a modeling basis of the measurement and control state, and aiming at a specific measurement and control task, whether the measurement and control task is received or not is determined by judging whether the visible time window of the measurement and control task exists or not; in the process of modeling a measurement and control scene, uniformly numbering measurement and control resources and measurement and control tasks, solving visible arc sections meeting conditions for specific measurement and control tasks, and determining the types and the numbers of the resources completing the measurement and control tasks according to the corresponding relation between the visible arc sections and the measurement and control resources;
in the design of a measurement and control state, a visible arc section corresponding to a measurement and control task is discretized, the measurement and control arc section slides on the selected visible arc section according to the possible starting time of the measurement and control task, and the optimal measurement and control arc section capable of completing the task is determined.
5. The measurement and control resource scheduling method based on deep Q learning according to claim 1, characterized in that: the step S4 specifically includes:
(1) when the task state at the current moment changes and the visible time window of the measurement and control resource changes, the measurement and control state of the system changes;
(2) updating a measurement and control environment, extracting scene characteristics, and updating the measurement and control state of the system;
(3) selecting a decision strategy of the measurement and control action according to an action selection rule of a deep reinforcement learning algorithm, so that measurement and control resources are matched with the measurement and control task in time and space, and the measurement and control task is realized;
(4) evaluating and feeding back the measurement and control scheduling result aiming at the update of the measurement and control environment and the measurement and control state caused by the selected measurement and control strategy;
(5) updating the measurement and control decision strategy by using a deep reinforcement learning network according to the evaluation feedback result of the measurement and control strategy, and observing the measurement and control scene and the updating of the measurement and control state;
and through cyclic algorithm updating, selection and optimization of measurement and control resource allocation strategies are realized, and selection of an optimal measurement and control scheduling strategy is realized.
6. The measurement and control resource scheduling method based on deep Q learning according to claim 1, characterized in that: the step S5 specifically includes:
(1) describing a measurement and control scene, and defining basic physical elements in the scene; based on an actual physical scene, related elements related in the DQN method for measurement and control scheduling are sorted and summarized, and the composition of measurement and control states, measurement and control actions, measurement and control action rewards and measurement and control scheme basic elements are determined;
(2) initializing a deep Q learning measurement and control resource scheduling network, initializing a memory base according to actual capacity requirements, and initializing network parameters including learning rate, discount factors and structures and parameters of an actual value neural network and a target value neural network describing a Q value;
(3) designing a measurement and control state s according to the measurement and control scene model, initializing the input of a measurement and control scheduling network, and calculating corresponding output; selecting the measurement and control action randomly according to the probability, selecting the measurement and control action according to the probability 1, namely a greedy strategy through a Q value output by a measurement and control scheduling network, and executing the corresponding measurement and control action in the measurement and control resource scheduling network; obtaining the reward r after the action is executed, namely the evaluation index of the measurement and control action and the measurement and control state s before the next action is executed, namely the measurement and control state s at the next momenti+1(ii) a Calculating the Q values of the actual value neural network and the current value neural network in the measurement and control scheduling network at the next moment according to the currently selected measurement and control action and the current state, namely the actual Q value and the estimated Q value;
(4) four parameters(s)i,Xi,ri,si+1) Storing the samples as a sample in a memory bank;
(5) randomly taking a certain number of sample states from a memory base, calculating a target value of each state, and updating a Q value as the target value through reward after execution; updating the actual value neural network parameters by a random gradient descent method, and assigning the current parameters in the actual value neural network to the target value neural network after the actual value neural network parameters are iteratively updated every N times, so that the target value neural network parameters in the measurement and control scheduling network are updated; continuously updating parameters to train the measurement and control scheduling network;
(6) the selection and optimization of measurement and control resource allocation strategies are realized through cyclic algorithm updating, and the selection of an optimal measurement and control scheduling strategy is realized; and finishing the measurement and control resource scheduling process.
CN202010609039.9A 2020-06-29 2020-06-29 Measurement and control resource scheduling method based on deep Q learning Active CN111767991B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010609039.9A CN111767991B (en) 2020-06-29 2020-06-29 Measurement and control resource scheduling method based on deep Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010609039.9A CN111767991B (en) 2020-06-29 2020-06-29 Measurement and control resource scheduling method based on deep Q learning

Publications (2)

Publication Number Publication Date
CN111767991A true CN111767991A (en) 2020-10-13
CN111767991B CN111767991B (en) 2023-08-15

Family

ID=72724129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010609039.9A Active CN111767991B (en) 2020-06-29 2020-06-29 Measurement and control resource scheduling method based on deep Q learning

Country Status (1)

Country Link
CN (1) CN111767991B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613332A (en) * 2021-07-14 2021-11-05 广东工业大学 Spectrum resource allocation method and system based on cooperative distributed DQN (differential Quadrature reference network) combined simulated annealing algorithm
CN113779856A (en) * 2021-09-15 2021-12-10 成都中科合迅科技有限公司 Discrete particle swarm algorithm modeling method for electronic system function online recombination

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140277599A1 (en) * 2013-03-13 2014-09-18 Oracle International Corporation Innovative Approach to Distributed Energy Resource Scheduling
CN107798388A (en) * 2017-11-23 2018-03-13 航天天绘科技有限公司 The method of TT&C Resources dispatching distribution based on Multi Agent and DNN
CN109388484A (en) * 2018-08-16 2019-02-26 广东石油化工学院 A kind of more resource cloud job scheduling methods based on Deep Q-network algorithm
CN109409763A (en) * 2018-11-08 2019-03-01 北京航空航天大学 A kind of dynamic test assignment dispatching method and dispatching platform based on Greedy grouping strategy
CN109542613A (en) * 2017-09-22 2019-03-29 中兴通讯股份有限公司 Distribution method, device and the storage medium of service dispatch in a kind of CDN node
CN109729586A (en) * 2017-10-30 2019-05-07 上海诺基亚贝尔股份有限公司 Dispatching method, equipment and computer-readable medium based on window
CN109960544A (en) * 2019-03-26 2019-07-02 中国人民解放军国防科技大学 Task parallel scheduling method based on data driving type agile satellite
CN110781614A (en) * 2019-12-06 2020-02-11 北京工业大学 Shipboard aircraft tripping recovery online scheduling method based on deep reinforcement learning
CN111026549A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Automatic test resource scheduling method for power information communication equipment
CN111026548A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Power communication equipment test resource scheduling method for reverse deep reinforcement learning
CN111162831A (en) * 2019-12-24 2020-05-15 中国科学院遥感与数字地球研究所 Ground station resource scheduling method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140277599A1 (en) * 2013-03-13 2014-09-18 Oracle International Corporation Innovative Approach to Distributed Energy Resource Scheduling
CN109542613A (en) * 2017-09-22 2019-03-29 中兴通讯股份有限公司 Distribution method, device and the storage medium of service dispatch in a kind of CDN node
CN109729586A (en) * 2017-10-30 2019-05-07 上海诺基亚贝尔股份有限公司 Dispatching method, equipment and computer-readable medium based on window
CN107798388A (en) * 2017-11-23 2018-03-13 航天天绘科技有限公司 The method of TT&C Resources dispatching distribution based on Multi Agent and DNN
CN109388484A (en) * 2018-08-16 2019-02-26 广东石油化工学院 A kind of more resource cloud job scheduling methods based on Deep Q-network algorithm
CN109409763A (en) * 2018-11-08 2019-03-01 北京航空航天大学 A kind of dynamic test assignment dispatching method and dispatching platform based on Greedy grouping strategy
CN109960544A (en) * 2019-03-26 2019-07-02 中国人民解放军国防科技大学 Task parallel scheduling method based on data driving type agile satellite
CN111026549A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Automatic test resource scheduling method for power information communication equipment
CN111026548A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Power communication equipment test resource scheduling method for reverse deep reinforcement learning
CN110781614A (en) * 2019-12-06 2020-02-11 北京工业大学 Shipboard aircraft tripping recovery online scheduling method based on deep reinforcement learning
CN111162831A (en) * 2019-12-24 2020-05-15 中国科学院遥感与数字地球研究所 Ground station resource scheduling method

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BERND WASCHNECK等: "Optimization of global production scheduling with deep reinforcement learning", 《51ST CIRP CONFERENCE ON MANUFACTURING SYSTEMS》, vol. 72, pages 1264 - 1269 *
XIAOYU CHEN等: "A mixed integer linear programming model for multi-satellite scheduling", 《EUROPEAN JOURNAL OF OPERATIONAL RESEARCH》, vol. 275, no. 2, pages 694 - 707 *
YI WU等: "A TT&C Resources Schedule Method Based on Markov Decision Process", 《PROCEEDINGS OF 2018 CHINESE INTELLIGENT SYSTEMS CONFERENCE》, pages 815 - 825 *
刘冰雁等: "基于改进DQN的复合模式在轨服务资源分配", 《航空学报》, vol. 41, no. 5, pages 1 - 9 *
康宁等: "基于任务开始时刻的天地基测控资源调度模型", 《装备指挥技术学院学报》, vol. 22, no. 6, pages 97 - 101 *
张天骄等: "基于混合蚁群优化的天地一体化调度方法", 《系统工程与电子技术》, vol. 38, no. 7, pages 1555 - 1562 *
武艺: "基于深度强化学习的多星测控资源调度方法研究", 《中国优秀硕士学位论文全文数据库 工程科技II辑》, no. 2022, pages 031 - 341 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613332A (en) * 2021-07-14 2021-11-05 广东工业大学 Spectrum resource allocation method and system based on cooperative distributed DQN (differential Quadrature reference network) combined simulated annealing algorithm
CN113613332B (en) * 2021-07-14 2023-06-09 广东工业大学 Spectrum resource allocation method and system based on cooperative distributed DQN (differential signal quality network) joint simulated annealing algorithm
CN113779856A (en) * 2021-09-15 2021-12-10 成都中科合迅科技有限公司 Discrete particle swarm algorithm modeling method for electronic system function online recombination
CN113779856B (en) * 2021-09-15 2023-06-27 成都中科合迅科技有限公司 Discrete particle swarm optimization modeling method for electronic system function online recombination

Also Published As

Publication number Publication date
CN111767991B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
Li et al. Prediction for tourism flow based on LSTM neural network
CN112685165B (en) Multi-target cloud workflow scheduling method based on joint reinforcement learning strategy
CN112231091B (en) Parallel cloud workflow scheduling method based on reinforcement learning strategy
Liao et al. Accurate sub-swarms particle swarm optimization algorithm for service composition
CN112631717A (en) Network service function chain dynamic deployment system and method based on asynchronous reinforcement learning
CN113784410B (en) Heterogeneous wireless network vertical switching method based on reinforcement learning TD3 algorithm
CN111767991A (en) Measurement and control resource scheduling method based on deep Q learning
CN114330863A (en) Time series prediction processing method, device, storage medium and electronic device
CN111371644A (en) Multi-domain SDN network traffic situation prediction method and system based on GRU
CN110098964A (en) A kind of disposition optimization method based on ant group algorithm
CN115115389A (en) Express customer loss prediction method based on value subdivision and integrated prediction
CN114896899A (en) Multi-agent distributed decision method and system based on information interaction
CN113887748B (en) Online federal learning task allocation method and device, and federal learning method and system
CN104217296A (en) Listed company performance comprehensive evaluation method
Wu et al. A reinforcement learning-based admission control strategy for elastic network slices
CN116911459A (en) Multi-input multi-output ultra-short-term power load prediction method suitable for virtual power plant
CN117077511A (en) Multi-element load prediction method, device and storage medium based on improved firefly algorithm and SVR
CN111046156A (en) Method and device for determining reward data and server
Lu et al. AI-assisted resource advertising and pricing to realize distributed tenant-driven virtual network slicing in inter-DC optical networks
CN114444737B (en) Pavement maintenance intelligent planning method based on transfer learning
Liu et al. 5G/B5G Network Slice Management via Staged Reinforcement Learning
Quan et al. Dynamic service selection based on user feedback in the IoT environment
Mueller Multi-objective optimization of software architectures using ant colony optimization
CN116070714B (en) Cloud edge cooperative training method and system based on federal learning and neural architecture search
CN113240189B (en) Reputation value-based dynamic vehicle task and calculation force matching method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant