CN112395690A - Reinforced learning-based shipboard aircraft surface guarantee flow optimization method - Google Patents

Reinforced learning-based shipboard aircraft surface guarantee flow optimization method Download PDF

Info

Publication number
CN112395690A
CN112395690A CN202011328243.XA CN202011328243A CN112395690A CN 112395690 A CN112395690 A CN 112395690A CN 202011328243 A CN202011328243 A CN 202011328243A CN 112395690 A CN112395690 A CN 112395690A
Authority
CN
China
Prior art keywords
guarantee
carrier
based aircraft
aircraft
station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011328243.XA
Other languages
Chinese (zh)
Inventor
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Naval Aeronautical University
Original Assignee
Naval Aeronautical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Naval Aeronautical University filed Critical Naval Aeronautical University
Priority to CN202011328243.XA priority Critical patent/CN112395690A/en
Publication of CN112395690A publication Critical patent/CN112395690A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model

Abstract

The invention discloses a reinforced learning-based method for optimizing a ship surface guarantee process of a ship-based aircraft, and belongs to the technical field of ship surface guarantee of ship-based aircraft. Firstly, establishing a guarantee scheduling model according to the characteristics of a ship-based aircraft surface guarantee flow, and resolving the guarantee scheduling model into a mixed flow shop scheduling problem; then, combining a reinforcement learning process, designing corresponding state and action expressions in the scheduling problem of the carrier-based aircraft surface guarantee process, summarizing the carrier-based aircraft surface guarantee process into a Markov decision process, and designing a corresponding reward function in reinforcement learning; and finally, optimizing and solving the scheduling problem by using a reinforcement learning algorithm so as to realize the minimum guarantee completion time. The invention obviously improves the real-time performance of the solving process on the basis of ensuring the optimal quality of the carrier-based aircraft surface guarantee scheduling process, and can provide a reasonable solution for the carrier-based aircraft surface guarantee real-time scheduling.

Description

Reinforced learning-based shipboard aircraft surface guarantee flow optimization method
Technical Field
The invention belongs to the technical field of shipboard aircraft surface guarantee, and particularly relates to a shipboard aircraft surface guarantee flow optimization method based on reinforcement learning.
Background
The carrier-based aircraft operation is a periodic process, and comprises catapult takeoff, task execution, return landing, deck aircraft service guarantee and catapult takeoff again, and the process is repeated in a circulating manner. Because the number of the carrier-based aircraft carried on the aircraft carrier is limited, the circulation is ensured to be efficiently and orderly carried out, and the method is an effective way for fully exerting the fighting capacity of the carrier-based aircraft. In the process, the carrier-based aircraft safely and quickly completes the surface guarantee and the catapult takeoff, which is the key for restricting the takeoff capability of the carrier-based aircraft. The guarantee process optimization technology of the shipboard aircraft on the deck mainly comprises modeling and optimizing a guarantee process, and the optimization quality and the real-time performance need to be considered when an optimization method is designed.
At present, a commonly used method for optimizing a ship surface guarantee process of a carrier-based aircraft mainly comprises the following steps: (1) the operation conditions of the carrier-based aircraft and the guarantee equipment on the aircraft carrier deck are simulated and deduced by means of a 'display board' (invented by reducing the aircraft carrier deck and related equipment thereof in equal proportion, carrier-based models corresponding to various aircraft carrier-based aircraft are laid on the display board, and corresponding marks are adopted to represent the operation states of the carrier-based aircraft), so that a reasonable scheduling plan is made; (2) modeling is carried out by establishing a ship surface guarantee flow of the carrier-based aircraft, and optimization is carried out by utilizing a traditional intelligent optimization method, so that a scheduling plan is formulated. The scheduling method based on the 'demon board' is simple to operate, the situation of the carrier-based aircraft is clear, but the scheduling method too depends on manual experience and has the defects of untimely state updating, lack of interactivity and the like, and although the scheduling method optimized through the traditional intelligent algorithm can achieve better optimization quality, the scheduling method has the defect of poor real-time performance.
Therefore, an optimization method for the ship-based aircraft surface guarantee flow, which has both optimization quality and solution real-time performance and good applicability, is urgently needed in engineering application.
Disclosure of Invention
In order to solve the technical problems, the invention provides a reinforced learning-based carrier-based aircraft surface guarantee flow optimization method, which can give consideration to both optimization quality and solution efficiency and has good applicability.
The technical scheme adopted by the invention is as follows: according to the guarantee process of the carrier-based aircraft on the deck, a carrier-based aircraft surface guarantee scheduling model is established and is concluded to be a Markov decision process, according to the characteristics of the model, appropriate states, action expressions and reward functions in reinforcement learning are designed, and the problem is solved based on a reinforcement learning algorithm so as to minimize guarantee completion time.
Specifically, the technical scheme of the invention is as follows:
a reinforced learning-based shipboard aircraft surface guarantee flow optimization method comprises the following steps:
s1, determining guarantee time of the shipboard aircraft on each guarantee station according to the type of the shipboard aircraft to be guaranteed and historical experience guarantee data;
s2, establishing a ship-based aircraft surface guarantee scheduling model according to a ship-based aircraft surface guarantee flow;
s3, combining a reinforcement learning process, designing corresponding state and action expressions in reinforcement learning in the scheduling problem of the carrier-based aircraft surface guarantee process, and summarizing the carrier-based aircraft surface guarantee process into a Markov decision process;
s4, designing a reward function r (s, a) ═ omega c _ t in the reinforcement learning process according to the characteristics that the guarantee efficiency of the ship-based aircraft surface is closely related to the guarantee efficiency of a single guarantee station and the optimization target of the modeli,j,l+β;
And S5, optimizing and solving the problem by using a reinforcement learning algorithm.
In step S2, the specific carrier-based aircraft surface guarantee scheduling model is:
min max{ETi,j,l}i∈I,j=f,l∈Mj (1)
s.t.
Figure BDA0002794933060000021
STi,j,l=max{ATi,j,l,FTj,l}i∈I,j∈F,l∈Mj (3)
ETi,j,l=STi,j,l+ti,j,l i∈I,j∈F,l∈Mj (4)
ATi,j+1,l=ETi,j,l+ctj,j+1i belongs to I, j belongs to F and j is not equal to F (5)
Figure BDA0002794933060000022
Figure BDA0002794933060000023
Figure BDA0002794933060000024
Wherein I is a set of shipboard aircrafts to be guaranteed, F is the number of guarantee stages, F is a set of guarantee stages, and M isj(j belongs to F) is a guarantee station set of the j stage, ti,j,lEnsuring the required time, ct, for the shipboard aircraft at the corresponding stationj,j+1For the transfer time of the carrier-based aircraft between adjacent guarantee phases, ATi,j,lFT for corresponding to the moment when the carrier-based aircraft reaches the guarantee stationj,lEnsuring the station to guarantee the completion of the current time of waiting for the carrier-based aircraft, STi,j,lET for the moment that the shipboard aircraft starts to receive the guarantee at the corresponding guarantee stationi,j,lBT corresponds to the moment when the shipboard aircraft finishes the guaranteej,lTo ensure the time when the station begins to ensure the carrier-based aircraft, yi,j,lAre decision variables.
In the step S3, in the carrier-based aircraft surface guarantee scheduling process, the binary group (I, j) is regarded as a state in the markov decision process, wherein (I belongs to I, j belongs to F); making the next stage guarantee station a be in the middle of Mj+1Considered as an action in the markov decision process.
In step S4, in order to uniformly represent the influence of the station guarantee completion time on the overall completion time of the scheduling of the carrier-based aircraft, a linear reward function is proposed, as shown in formula (9), that is, the reward r (S, a) obtained by executing the action is inversely related to the time consumption of the carrier-based aircraft at the station.
r(s,a)=ω*c_ti,j,l+β (9)
Wherein, c _ ti,j,l=FTj,l-ATi,j,lThe time of the carrier-based aircraft waiting and receiving the guarantee at a certain guarantee station is represented, omega and beta are integers, omega belongs to the range of-5 to-1],β∈(0,300]。
Furthermore, the problem is optimized and solved through a reinforcement learning method.
Aiming at the ship-based aircraft surface guarantee process, the invention designs the state, action expression and corresponding reward function in reinforcement learning, and provides the ship-based aircraft surface guarantee process optimization method based on reinforcement learning on the basis of guaranteeing the optimization quality of the ship-based aircraft surface guarantee process, so that the calculation time is greatly reduced, and the requirement of real-time scheduling of the ship-based aircraft surface can be met.
Drawings
FIG. 1: the invention is a flow chart of the shipboard aircraft surface guarantee process.
FIG. 2: the Q-Learning algorithm designed in the invention is implemented by a flow chart.
FIG. 3: is a flow chart of the invention.
Detailed Description
Specifically, the method for optimizing the ship-based aircraft surface guarantee process based on reinforcement learning comprises the following steps:
s1, determining guarantee time of the shipboard aircraft to be guaranteed on each guarantee station according to the type of the shipboard aircraft to be guaranteed and historical experience guarantee data
The carrier-based aircraft needs to be guaranteed in three stages of detection, maintenance, refueling and weapon mounting on a deck, and then catapult takeoff is carried out. A guarantee stage set F and a guarantee station set Mj(j belongs to F) to form a carrier-based aircraft surface guarantee system, n carrier-based aircraft form a carrier-based aircraft set I to be guaranteed, and historical guarantee experience data are used for determining the time t required for the carrier-based aircraft to receive guarantee at each guarantee stationi,j,l(i∈I,j∈F,l∈Mj). Meanwhile, in order to make up for errors of empirical data, the transfer time ct between the protection stages is takenj,j+1Taking the transfer time ct for transferring the weapon mounting guarantee stage to the ejector as N (2,0.1) (j belongs to F and j is not equal to F)f-1,f=N(2,0.2)。
S2, establishing a corresponding guarantee scheduling model according to a ship-based aircraft surface guarantee flow
According to the carrier-based aircraft surface guarantee process, the scheduling model is modeled as a scheduling model shown in figure 1. After all the carrier-based aircraft finish landing and slide to a temporary stop area positioned on a ship bow, the carrier-based aircraft is sequentially pulled to a maintenance detection stage, an oiling stage and a weapon hanging stage from the temporary stop area by a tractor at intervals (usually about 2min) to carry out corresponding guarantee, and the carrier-based aircraft can be guaranteed at different guarantee stations in each guarantee stage due to different guarantee stations with different personnel and equipment. The purpose of the aviation guarantee scheduling of the carrier-based aircraft deck is to ensure that the carrier-based aircraft deck guarantee is safely and efficiently carried out, optimize the completion time of the carrier-based aircraft guarantee and improve the operational capability of an aircraft carrier. According to the process shown in fig. 1, the problem of carrier-based aircraft deck aviation guarantee scheduling can be summarized as a problem of hybrid flow shop scheduling.
S3, designing corresponding state and action expression in the scheduling problem of the carrier-based aircraft surface guarantee process according to the Markov property of the scheduling process and combining with a reinforcement learning process, and summarizing the carrier-based aircraft surface guarantee process into a Markov decision process
In the carrier-based aircraft surface guarantee scheduling process, a binary group (I, j) is regarded as a state s in a Markov decision process, wherein (I belongs to I, j belongs to F); making the next stage guarantee station a be in the middle of Mj+1Considered as action a in the markov decision process.
S4, designing a corresponding reward function according to the characteristics of the ship surface guarantee process of the ship-based aircraft and the optimization target of the model
In order to uniformly represent the influence of the station guarantee completion time on the overall completion time of the scheduling of the carrier-based aircraft, a linear reward function is provided, namely the reward r (s, a) obtained by executing the action is inversely related to the guarantee completion time of the carrier-based aircraft at the station. Setting the reward function in reinforcement learning as r (s, a) ═ ω c _ ti,j,l+ b; and will take learning rate α to be 0.1 and discount factor γ to be 0.9.
S5, optimizing and solving the problem by using a reinforcement learning algorithm
The Q-learning algorithm is one of the most prominent algorithms in reinforcement learning, and is an algorithm based on a value function. The value function Q (s, a) at a particular state can be expressed as,
Figure BDA0002794933060000051
wherein r ist+1(s, a) is the reward obtained at time step t, γ ∈ (0, 1)]Is a discount factor.
The value function Q (s, a) is iterated as follows,
Figure BDA0002794933060000052
where α is the learning rate and γ is the discount factor.
In the Q-learning algorithm the agent needs to interact continuously with the environment. Whether the intelligent entity can select the correct action according to the observed external information determines whether the interaction is effective. When selecting an action, on the one hand the agent should select the action that maximizes the value function Q (s, a) in each state, in order to obtain as many rewards as possible, to be utilized; on the other hand, the agent should explore better actions to obtain the optimal Q (s, a) (called exploration) in order not to fall into local optima. In the invention, Boltzmann exploration strategy is adopted, and the probability of each action selection is determined by using a random distribution function. Given a random temperature coefficient T (> 1), in the current time step T state, the probability that the ith action is selected is,
Figure BDA0002794933060000053
where N is the total number of actions available for selection in the current state.
At the beginning of learning, the temperature coefficient T is large in value and Q (s, a) is relatively small, so that the selection probability of all actions is almost the same, and the actions corresponding to the non-optimal Q values are favorably explored. As learning progresses, the temperature coefficient T gradually decreases, the probability that each action is selected changes with the change in Q (s, a), while the probability that random actions are employed decreases, which helps to select the optimum action with the largest Q (s, a). In the later stage of learning, the temperature parameter T tends to 0, the agent selects the action corresponding to the larger Q (s, a) with a larger probability, and finally selects the action corresponding to the maximum Q (s, a) each time, at this moment, the exploration strategy also becomes a greedy strategy.
The iteration of the temperature coefficient T is taken as,
Figure BDA0002794933060000061
where e is the number of learning times, e0Is aConstant, T0Is the initial value of the temperature coefficient T (taking T in the implementation process of the invention)0=500)。
Further, the specific implementation process of step S5 is as follows:
s5.1, inputting a discount factor gamma, a learning rate alpha and a random initialization Q value;
s5.2, for each training round, initializing the state S to the starting state, STi,j,l、ETi,j,l、BTj,l、FTj,lInitialized to 0, ATi,j,lInitializing to an initial arrival time;
s5.3, in each state, according to the formula ATi,j+1,l=ETi,j,l+ctj,j+1(j belongs to F and j is not equal to F), calculating the arrival time of the carrier-based aircraft at a certain stage;
s5.4, selecting an action a to belong to M according to the Q value and the exploration strategy for each carrier-based aircraftjAnd executing;
s5.5, observe the status of the next step and according to the formula r (S, a) ═ ω c _ ti,j,l+ b, calculating the reward r (s, a) for performing the selected action;
s5.6 according to formula
Figure BDA0002794933060000062
And updating the Q value.

Claims (6)

1. A ship-based aircraft surface guarantee process optimization method based on reinforcement learning is characterized by comprising the following steps:
s1, determining time required by guarantee on each guarantee station according to the type of a shipboard aircraft to be guaranteed and historical guarantee experience data;
s2, establishing a guarantee scheduling model according to a ship surface guarantee flow of the ship-based aircraft;
s3, combining a reinforcement learning process, designing a corresponding state s and action a in the scheduling problem of the carrier-based aircraft surface guarantee flow, and summarizing the carrier-based aircraft surface guarantee flow into a Markov decision process;
s4, according to the characteristic that the guarantee efficiency of the ship-based aircraft surface is determined by the guarantee efficiency of each guarantee station, and model optimizationAiming at the goal, the corresponding reward function in the design reinforcement learning is r (s, a) ═ ω c _ ti,j,l+ β, wherein, c _ ti,j,l=FTj,l-ATi,j,lThe time from the arrival of the carrier-based aircraft at a certain guarantee station to the completion of the guarantee at the station is represented, both omega and beta are integers, and omega belongs to the range of-5 and-1],β∈(0,300);
And S5, optimizing and solving the scheduling problem by using a reinforcement learning algorithm.
2. The reinforcement learning-based carrier-based aircraft surface guarantee process optimization method according to claim 1, wherein the step S1 specifically comprises:
a guarantee stage set F and a guarantee station set Mj(j belongs to F) to form a carrier-based aircraft surface guarantee system, n carrier-based aircraft form a carrier-based aircraft set I to be guaranteed, and historical guarantee experience data are used for determining the time t required for the carrier-based aircraft to receive guarantee at each guarantee stationi,j,l(i∈I,j∈F,l∈Mj)。
3. The reinforcement learning-based carrier-based aircraft surface guarantee process optimization method according to claim 1, wherein the step S2 specifically comprises:
taking a to-be-guaranteed shipboard aircraft set I formed by n shipboard aircraft as a to-be-processed workpiece set, and establishing a hybrid flow shop scheduling model:
min max{ETi,j,l}i∈I,j=f,l∈Mj
s.t.
Figure FDA0002794933050000011
STi,j,l=max{ATi,j,l,FTj,l}i∈I,j∈F,l∈Mj
ETi,j,l=STi,j,l+ti,j,li∈I,j∈F,l∈Mj
ATi,j+1,l=ETi,j,l+ctj,j+1i belongs to I, j belongs to F and j is not equal to F
Figure FDA0002794933050000021
Figure FDA0002794933050000022
Figure FDA0002794933050000023
Wherein, I ═ {1, 2., n } is the set of the shipboard aircrafts to be guaranteed, F is the number of the guarantee phases, F is the set of the guarantee phases, Mj(j belongs to F) is a guarantee station set of the j stage, ti,j,lEnsuring the required time, ct, for the shipboard aircraft at the corresponding stationj,j+1For the transfer time of the carrier-based aircraft between adjacent guarantee phases, ATi,j,lFT for corresponding to the moment when the carrier-based aircraft reaches the guarantee stationj,lEnsuring the station to guarantee the completion of the current time of waiting for the carrier-based aircraft, STi,j,lET for the moment that the shipboard aircraft starts to receive the guarantee at the corresponding guarantee stationi,j,lBT corresponds to the moment when the shipboard aircraft finishes the guaranteej,lTo ensure the time when the station begins to ensure the carrier-based aircraft, yi,j,lAre decision variables.
4. The reinforcement learning-based carrier-based aircraft surface safeguard process optimization method according to claim 1, wherein the state and action representation in step S3 is specifically:
in the carrier-based aircraft surface guarantee scheduling process, a carrier-based aircraft and a guarantee phase binary group (I, j) are regarded as a state s in a Markov decision process, wherein (I belongs to I, j belongs to F); each guarantee station l (l belongs to M) in the next guarantee stagej+1) Considered as action a in the markov decision process.
5. The reinforcement learning-based carrier-based aircraft surface guarantee process optimization method according to claim 1, wherein the coefficients ω and β in the excitation function in step S4 are specifically selected from the following values: ω -2 and β -150.
6. The reinforcement learning-based carrier-based aircraft surface safeguard process optimization method according to claim 1, wherein in the step S5, the reinforcement learning algorithm solving process specifically comprises:
inputting a discount factor gamma and a learning rate alpha, randomly initializing a Q value table, and initializing a time parameter for each training round; in each state, by the ATi,j+1,l=ETi,j,l+ctj,j+1(j belongs to F and j is not equal to F) calculating the arrival time of the carrier-based aircraft at a certain stage; selecting an action a to be in an M state according to a Q value table and an exploration strategy aiming at each carrier-based aircraftjAnd executing; the state of the next step was observed and according to the formula r (s, a) ═ ω c _ ti,j,l+ β calculating the reward r (s, a) for executing the current action; and updating the Q value table.
CN202011328243.XA 2020-11-24 2020-11-24 Reinforced learning-based shipboard aircraft surface guarantee flow optimization method Pending CN112395690A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011328243.XA CN112395690A (en) 2020-11-24 2020-11-24 Reinforced learning-based shipboard aircraft surface guarantee flow optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011328243.XA CN112395690A (en) 2020-11-24 2020-11-24 Reinforced learning-based shipboard aircraft surface guarantee flow optimization method

Publications (1)

Publication Number Publication Date
CN112395690A true CN112395690A (en) 2021-02-23

Family

ID=74607713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011328243.XA Pending CN112395690A (en) 2020-11-24 2020-11-24 Reinforced learning-based shipboard aircraft surface guarantee flow optimization method

Country Status (1)

Country Link
CN (1) CN112395690A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595958A (en) * 2022-02-28 2022-06-07 哈尔滨理工大学 Shipboard aircraft guarantee operator scheduling method for emergency
CN117215196A (en) * 2023-10-17 2023-12-12 成都正扬博创电子技术有限公司 Ship-borne comprehensive control computer intelligent decision-making method based on deep reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390195A (en) * 2013-05-28 2013-11-13 重庆大学 Machine workshop task scheduling energy-saving optimization system based on reinforcement learning
CN110716550A (en) * 2019-11-06 2020-01-21 南京理工大学 Gear shifting strategy dynamic optimization method based on deep reinforcement learning
CN110781614A (en) * 2019-12-06 2020-02-11 北京工业大学 Shipboard aircraft tripping recovery online scheduling method based on deep reinforcement learning
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN110930016A (en) * 2019-11-19 2020-03-27 三峡大学 Cascade reservoir random optimization scheduling method based on deep Q learning
CN111738488A (en) * 2020-05-14 2020-10-02 华为技术有限公司 Task scheduling method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390195A (en) * 2013-05-28 2013-11-13 重庆大学 Machine workshop task scheduling energy-saving optimization system based on reinforcement learning
CN110716550A (en) * 2019-11-06 2020-01-21 南京理工大学 Gear shifting strategy dynamic optimization method based on deep reinforcement learning
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN110930016A (en) * 2019-11-19 2020-03-27 三峡大学 Cascade reservoir random optimization scheduling method based on deep Q learning
CN110781614A (en) * 2019-12-06 2020-02-11 北京工业大学 Shipboard aircraft tripping recovery online scheduling method based on deep reinforcement learning
CN111738488A (en) * 2020-05-14 2020-10-02 华为技术有限公司 Task scheduling method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEI HAN 等: "A Reinforcement Learning Method for a Hybrid Flow-Shop Scheduling Problem", 《MDPI:ALGORITHMS》 *
张东阳 等: "应用强化学习算法求解置换流水车间调度问题", 《计算机系统应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114595958A (en) * 2022-02-28 2022-06-07 哈尔滨理工大学 Shipboard aircraft guarantee operator scheduling method for emergency
CN117215196A (en) * 2023-10-17 2023-12-12 成都正扬博创电子技术有限公司 Ship-borne comprehensive control computer intelligent decision-making method based on deep reinforcement learning
CN117215196B (en) * 2023-10-17 2024-04-05 成都正扬博创电子技术有限公司 Ship-borne comprehensive control computer intelligent decision-making method based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
Masmoudi et al. Project scheduling under uncertainty using fuzzy modelling and solving techniques
CN107219858B (en) Multi-unmanned aerial vehicle cooperative coupling task allocation method for improving firefly algorithm
CN108170147B (en) Unmanned aerial vehicle task planning method based on self-organizing neural network
CN112395690A (en) Reinforced learning-based shipboard aircraft surface guarantee flow optimization method
CN109359888B (en) Comprehensive scheduling method for tight connection constraint among multiple equipment processes
CN115204497A (en) Prefabricated part production scheduling optimization method and system based on reinforcement learning
CN112836974B (en) Dynamic scheduling method for multiple field bridges between boxes based on DQN and MCTS
CN109615188A (en) A kind of predistribution combines the multi-robot Task Allocation of Hungary Algorithm
CN111783357A (en) Transfer route optimization method and system based on passenger delay reduction
CN113706023A (en) Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning
CN112508398A (en) Dynamic production scheduling method and device based on deep reinforcement learning and electronic equipment
CN115564242A (en) Ship power equipment-oriented scheduling method and system for preemptible task maintenance personnel
CN112685883B (en) Guarantee operation scheduling method for shipboard aircraft
CN113139747A (en) Method for reordering coating of work returning vehicle based on deep reinforcement learning
Jiang et al. Optimization of support scheduling on deck of carrier aircraft based on improved differential evolution algorithm
CN107958313A (en) A kind of discrete ripples optimization algorithm
CN117196169A (en) Machine position scheduling method based on deep reinforcement learning
CN113158549B (en) Diversified task oriented ship formation grade repair plan compilation method
CN115756789A (en) GPU scheduling optimization method for deep learning inference service system
CN115293623A (en) Training method and device for production scheduling model, electronic equipment and medium
CN113988443A (en) Automatic wharf cooperative scheduling method based on deep reinforcement learning
CN113743784A (en) Production time sequence table intelligent generation method based on deep reinforcement learning
CN114326644A (en) Double-field bridge flexible scheduling method under dynamic port intercepting time
Narapureddy et al. Optimal scheduling methodology for machines, tool transporter and tools in a multi-machine flexible manufacturing system without tool delay using flower pollination algorithm
CN114384931A (en) Unmanned aerial vehicle multi-target optimal control method and device based on strategy gradient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210223

WD01 Invention patent application deemed withdrawn after publication