CN111047014B - Multi-agent air countermeasure distributed sampling training method and equipment - Google Patents

Multi-agent air countermeasure distributed sampling training method and equipment Download PDF

Info

Publication number
CN111047014B
CN111047014B CN201911266811.5A CN201911266811A CN111047014B CN 111047014 B CN111047014 B CN 111047014B CN 201911266811 A CN201911266811 A CN 201911266811A CN 111047014 B CN111047014 B CN 111047014B
Authority
CN
China
Prior art keywords
sampling
node
learning
agent air
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911266811.5A
Other languages
Chinese (zh)
Other versions
CN111047014A (en
Inventor
孙智孝
彭宣淇
朴海音
杨晟琦
孙阳
李思凝
杜冲
刘仲
葛俊
杨芳
詹光
王言伟
张少卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Original Assignee
Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC filed Critical Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Priority to CN201911266811.5A priority Critical patent/CN111047014B/en
Publication of CN111047014A publication Critical patent/CN111047014A/en
Application granted granted Critical
Publication of CN111047014B publication Critical patent/CN111047014B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application belongs to the field of multi-agent air countermeasure game, and particularly relates to a multi-agent air countermeasure distributed sampling training method and equipment. The method comprises the following steps: step one: acquiring a learning node and a sampling node, establishing a connection between the learning node and the sampling node, and initializing a multi-agent air countermeasure network; step two: the learning node sends a sampling instruction to the sampling node, the sampling node receives the sampling instruction and starts sampling, and the sampling node sends a sample to the learning node after collecting the sample; step three: and training by the learning node through the sample, and updating and storing the multi-agent air countermeasure network. The method and the device can complete distributed sampling and training of the multi-agent air countermeasure network, and improve sample collection and transmission efficiency of the system and training efficiency of the countermeasure network.

Description

Multi-agent air countermeasure distributed sampling training method and equipment
Technical Field
The application belongs to the field of multi-agent air countermeasure game, and particularly relates to a multi-agent air countermeasure distributed sampling training method and equipment.
Background
Reinforcement learning is an important approach to solving the sequential decision problem at present, and has achieved excellent results in many fields. The multi-agent air countermeasure problem is a typical sequential decision problem and is characterized in that the state space and the action space are large in dimension, and a large number of samples are required for training the neural network of the multi-agent air countermeasure.
In the case of limited stand-alone computing resources, a method of multi-machine distributed sampling is required to increase the sample collection efficiency and the overall training efficiency of the challenge network. When the reinforcement learning method is used for multi-agent air countermeasure network distributed sampling training, the following difficulties are mainly existed: a. the countermeasure network is composed of a plurality of neural networks, such as a state value network, a maneuver strategy network, a target distribution network, a missile launching decision network and the like, and different nodes need to share and continuously update the parameters of the neural networks; b. the learning node needs to control the start and stop of sampling of the sampling node, and the sample acquired by the sampling node is efficiently acquired; c. because the time consumption is long in the sampling training process, redundancy and fault tolerance design are required to be carried out on the learning node and the sampling node, the training system is ensured to stably run without manual intervention; d. the sampling nodes transmit samples through the network cable, so that the problems of network occupation and blockage are required to be solved under the condition of more sampling nodes, and meanwhile, in order to reduce the calculation time cost, the read-write operation of a computer hard disk is required to be reduced as much as possible. Due to the difficulties, the prior art generally has the defects of poor stability, low efficiency and the like when performing multi-agent air countermeasure network distributed sampling training.
It is therefore desirable to have a solution that overcomes or at least alleviates at least one of the above-mentioned drawbacks of the prior art.
Disclosure of Invention
The purpose of the application is to provide a multi-agent air countermeasure distributed sampling training method and equipment, so as to solve at least one problem existing in the prior art.
The technical scheme of the application is as follows:
a first aspect of the present application provides a multi-agent air countermeasure distributed sampling training method, comprising:
step one: acquiring a learning node and a sampling node, establishing a connection between the learning node and the sampling node, and initializing a multi-agent air countermeasure network;
step two: the learning node sends a sampling instruction to the sampling node, the sampling node receives the sampling instruction and starts sampling, and the sampling node sends a sample to the learning node after collecting the sample;
step three: and training by the learning node through the sample, and updating and storing the multi-agent air countermeasure network.
Optionally, in step one: the establishing a connection between the learning node and the sampling node includes:
assigning a computer network address of a sampling node to the learning node;
the learning node inquires the number of available sampling nodes through grpc service, and records the network positions of the available sampling nodes in the memory of the learning node.
Optionally, in the second step, the learning node sends a sampling instruction to the sampling node, the sampling node receives the sampling instruction and starts sampling, and sending the sample to the learning node after the sampling node collects the sample includes:
s21, the learning node sends a sampling instruction to the sampling node, and the sampling node receives the sampling instruction and starts sampling;
s22, after the sampling node collects the samples, serializing the samples, and sending the serialized samples to a redis server of the learning node;
s23, the learning node reads the samples in the redis server, deserializes the samples and stores the samples into a memory, and after a certain sample is acquired, the learning node stops sending a sampling instruction to the sampling node and stops sampling.
Optionally, in step S21, the learning node sends a sampling instruction to the sampling node, and the sampling node receives the sampling instruction, where the starting of sampling is specifically:
s211, the learning node sequences a sampling zone bit 1 and the multi-agent air countermeasure network and sends the serialized sampling zone bit 1 and the multi-agent air countermeasure network to the sampling node;
s212, the sampling node receives the sampling zone bit 1, and the multi-agent air countermeasure network is in reverse sequence, and sampling is started.
Optionally, in step S211, the learning node sends the result of the multi-agent air countermeasure network serialization to the sampling node through a grpc service through a proto3 protocol.
Optionally, in step S23, the learning node reads a sample in the redis server by a blocking pop-up method.
Optionally, in step S23, after a certain sample is collected, the learning node stops sending a sampling instruction to the sampling node, where the stopping of sampling specifically includes:
after a certain sample is acquired, the learning node changes the sampling zone bit 1 into a sampling zone bit 0, and the sampling node stops sampling after receiving the sampling zone bit 0.
Optionally, the method further comprises the step four: and iterating the second step to the third step, and continuously updating the multi-agent air countermeasure network.
A second aspect of the present application provides a multi-agent air countermeasure distributed sampling training device, based on the multi-agent air countermeasure distributed sampling training method as described above, comprising:
the system comprises an initialization module, a sampling module and a multi-agent air countermeasure network, wherein the initialization module is used for acquiring a learning node and a sampling node, establishing a connection between the learning node and the sampling node and initializing the multi-agent air countermeasure network;
the sampling module is used for sending a sampling instruction to the sampling node by the learning node, the sampling node receives the sampling instruction and starts sampling, and the sampling node sends a sample to the learning node after collecting the sample;
and the training module is used for training the learning node through the sample, updating and storing the multi-agent air countermeasure network.
The invention has at least the following beneficial technical effects:
the multi-agent air countermeasure distributed sampling training method can complete distributed sampling and training of the multi-agent air countermeasure network, and improves sample collection and transmission efficiency and countermeasure network training efficiency of the system.
Drawings
FIG. 1 is a flow chart of a multi-agent air countermeasure distributed sampling training method in accordance with one embodiment of the present application;
FIG. 2 is a graph showing the variation of sampling time with the number of sampling nodes when the same number of samples are collected;
fig. 3 is a graph of peak network traffic versus sample size for a single transmission for a learning node of the present application.
Detailed Description
In order to make the purposes, technical solutions and advantages of the implementation of the present application more clear, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the accompanying drawings in the embodiments of the present application. In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The described embodiments are some, but not all, of the embodiments of the present application. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application. Embodiments of the present application are described in detail below with reference to the accompanying drawings.
In the description of the present application, it should be understood that the terms "center," "longitudinal," "lateral," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientations or positional relationships illustrated in the drawings, merely to facilitate description of the present application and simplify the description, and do not indicate or imply that the device or element being referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the scope of protection of the present application.
The present application is described in further detail below in conjunction with fig. 1-3.
A first aspect of the present application provides a multi-agent air countermeasure distributed sampling training method, comprising:
step one: acquiring a learning node and a sampling node, establishing a connection between the learning node and the sampling node, and initializing a multi-agent air countermeasure network;
step two: the learning node sends a sampling instruction to the sampling node, the sampling node receives the sampling instruction and starts sampling, and the sampling node sends a sample to the learning node after collecting the sample;
step three: the learning node trains through the samples, updates and stores the multi-agent air countermeasure network.
Specifically, in one embodiment of the present application, in step one, establishing a connection between the learning node and the sampling node includes:
designating a computer network address of the sampling node for the learning node;
the learning node inquires the number of available sampling nodes through grpc service, and records the network positions of the available sampling nodes in the memory of the learning node.
In the second step: the learning node sends a sampling instruction to the sampling node, the sampling node receives the sampling instruction and starts sampling, and the sampling node sends the sample to the learning node after collecting the sample comprises:
s21, the learning node sends a sampling instruction to the sampling node, and the sampling node receives the sampling instruction and starts sampling; the method specifically comprises the following steps: s211, the learning node sequences a sampling zone bit 1 and a multi-agent air countermeasure network and then sends the serialized sampling zone bit 1 and the multi-agent air countermeasure network to the sampling node; s212, the sampling node receives the sampling flag bit 1, and the multi-agent air countermeasure network in reverse sequence is realized, and sampling is started. In this embodiment, the learning node sends the result of the multi-agent air countermeasure network serialization to the sampling node through the grpc service through the proto3 protocol.
S22, after the sampling node collects the samples, serializing the samples, and sending the serialized samples to a redis server of the learning node;
s23, the learning node reads the samples in the redis server, deserializes the samples and stores the samples into the memory, and after a certain sample is acquired, the learning node stops sending sampling instructions to the sampling node and stops sampling. In this embodiment, the learning node reads the sample in the redis server by the blocking pop-up method. After a certain sample is acquired, the learning node stops sending a sampling instruction to the sampling node, and the stopping of sampling specifically comprises the following steps: after a certain sample is acquired, the learning node changes the sampling zone bit 1 into a sampling zone bit 0, and the sampling node stops sampling after receiving the sampling zone bit 0.
The multi-agent air countermeasure distributed sampling training method further comprises the following steps: and (3) iterating the second step to the third step until the training cycle number requirement is met or the user stops manually, and continuously updating the multi-agent air countermeasure network.
Based on the above-mentioned multi-agent air countermeasure distributed sampling training method, a second aspect of the present application provides a multi-agent air countermeasure distributed sampling training device, including:
the initialization module is used for acquiring the learning node and the sampling node, establishing the connection between the learning node and the sampling node, and initializing the multi-agent air countermeasure network;
the sampling module is used for sending a sampling instruction to the sampling node by the learning node, the sampling node receives the sampling instruction and starts sampling, and the sampling node sends the sample to the learning node after collecting the sample;
and the training module is used for training the learning nodes through the samples, updating and storing the multi-agent air countermeasure network.
Specifically, in the initialization module, a distributed sampling training program is firstly loaded on different computers, one computer is designated as a learning node, the computer network address of the sampling node is designated for the learning node, and the countermeasure network is initialized or loaded.
In the sampling module, a learning node is started, available sampling nodes are detected by using grpc, sampling is started if available sampling nodes exist, and otherwise, circulation is waited. The method comprises the steps that a sampling start instruction is sent by a learning node, after serialization of an countermeasure network, the sampling node sends the result to the sampling node, the sampling node receives parameters and reverses the sequence of the countermeasure network, and a sampling program is started to serialize a sample and store the serialized sample in a learning node redis; the learning node reads the sample information in redis and saves the deserialized result on memory. After enough samples are taken, the learning node changes the sampling flag bit state, and the sampling node stops sampling and waits for the next call. In this embodiment, the learning node serves as a center of the distributed sampling training framework, may be started independently, may pop up an event that no sampling node is available under the condition that no sampling node is available, and may cycle to wait for access of the sampling node, and may start sampling under the condition that one sampling node is available. In the sampling module, system monitoring and overtime monitoring can be added, when a learning node crashes or a program crashes, the program is restarted, the initial state is automatically recovered, a stored network is loaded, and the sampling can be continued; when the sampling node crashes or the program crashes, the current sampling cycle can be continued by other sampling nodes, and the crashed sampling node can be automatically added into the next cycle after restarting. In this embodiment, the samples to be transmitted are serialized and then transmitted, and the serialized files are used as memory virtual files to read and write, so as to avoid excessive hard disk read-write operations.
In the training module, the learning node performs training according to the acquired sample, updates network parameters and stores the network. The update formulas of the median network V_net, the strategy network A_net, the target distribution network target_net and the bullet decision network shoot_net in the countermeasure network are as follows:
Figure BDA0002313080890000061
Figure BDA0002313080890000062
Figure BDA0002313080890000063
Figure BDA0002313080890000064
wherein S is i For the current multi-agent air countermeasure state quantity r i Giving a prize value for the environment at the ith step length, wherein gamma is a discount factor; p (a) i |S i ),p(target i |S i ),p(shoot i |S i ) Respectively at S i In state, the aircraft makes the current maneuver, selects the current target, and takes the probability of a firing decision.
Advantageously, in this embodiment, the training module sequences the trained network and stores the serialized network in the hard disk of the learning node in the training process, so as to support checking the training result without interrupting the training process.
Advantageously, the multi-agent over-the-air challenge distributed sampling training device of the present application supports the addition of sampling nodes without interrupting the training process.
The multi-agent air countermeasure distributed sampling training method and device have the following advantages:
1) In the sampling process, multi-machine multi-process sampling can be realized, and the sampling efficiency of samples is increased, so that the sampling-training iterative process is reduced, the training and convergence of the neural network are accelerated, and the single sampling time is changed along with the number of sampling nodes as shown in figure 2;
2) Due to the addition of error-tolerant mechanisms such as a query and overtime protection mechanism, the correct execution of the current cycle is not affected when part of sampling nodes cannot be connected in the one-time cycle process, and the self restarting function of the learning node can be realized;
3) The transmitted data, the sample and the neural network do not need to carry out hard disk IO operation, which is beneficial to the rapid and efficient execution of the program;
4) For multi-agent air countermeasure sample data with high dimension and large capacity, a segmented transmission method is adopted, and the optimal number of single transmission samples is searched through experiments, so that the network load is reduced, and the network bandwidth utilization rate is increased. Under the condition of a certain network line transmission speed, the number of mountable learning nodes can be increased, the sampling efficiency is increased, and the relation between the network flow and the single transmission sample size is respectively shown in figure 3;
5) In the period of waiting for the sample by the learning node, the learning node can sample by using the spare CPU resources, so that the sampling efficiency is increased;
6) By adopting the method of blocking and reading data, the waiting time of the sampling node and the learning node can be reduced by adopting the method of stopping sampling by the flag bit, and the completion of current cyclic sampling can not be influenced after the program of part of the sampling nodes crashes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the present application should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. A multi-agent air countermeasure distributed sampling training method, comprising:
step one: acquiring a learning node and a sampling node, establishing a connection between the learning node and the sampling node, and initializing a multi-agent air countermeasure network;
step two: the learning node sends a sampling instruction to the sampling node, the sampling node receives the sampling instruction and starts sampling, and the sampling node sends a sample to the learning node after collecting the sample;
step three: the learning node trains through samples, updates and stores the multi-agent air countermeasure network;
in the first step: the establishing a connection between the learning node and the sampling node includes:
assigning a computer network address of a sampling node to the learning node;
the learning node inquires the number of available sampling nodes through grpc service, and records the network positions of the available sampling nodes in the memory of the learning node;
the learning node sends a sampling instruction to the sampling node, the sampling node receives the sampling instruction and starts sampling, and the sampling node sends the sample to the learning node after collecting the sample comprises:
s21, the learning node sends a sampling instruction to the sampling node, and the sampling node receives the sampling instruction and starts sampling;
s22, after the sampling node collects the samples, serializing the samples, and sending the serialized samples to a redis server of the learning node;
s23, the learning node reads the samples in the redis server, deserializes the samples and stores the samples into a memory, and after a certain sample is acquired, the learning node stops sending a sampling instruction to the sampling node and stops sampling.
2. The multi-agent air countermeasure distributed sampling training method according to claim 1, wherein in step S21, the learning node sends a sampling instruction to the sampling node, the sampling node receives the sampling instruction, and the starting of sampling is specifically:
s211, the learning node sequences a sampling zone bit 1 and the multi-agent air countermeasure network and sends the serialized sampling zone bit 1 and the multi-agent air countermeasure network to the sampling node;
s212, the sampling node receives the sampling zone bit 1, and the multi-agent air countermeasure network is in reverse sequence, and sampling is started.
3. The multi-agent air countermeasure distributed sampling training method according to claim 2, wherein in step S211, the learning node sends the multi-agent air countermeasure network serialized result to the sampling node through a grpc service by a proto3 protocol.
4. A multi-agent over-the-air challenge distributed sampling training method as claimed in claim 3 wherein in step S23 the learning node reads the samples in the redis server by a block and pop method.
5. The multi-agent air countermeasure distributed sampling training method according to claim 4, wherein in step S23, the learning node stops sending a sampling instruction to the sampling node after a certain sample is collected, and the stopping of sampling is specifically:
after a certain sample is acquired, the learning node changes the sampling zone bit 1 into a sampling zone bit 0, and the sampling node stops sampling after receiving the sampling zone bit 0.
6. The multi-agent over-the-air challenge distributed sampling training method of claim 5, further comprising the step four: and iterating the second step to the third step, and continuously updating the multi-agent air countermeasure network.
7. A multi-agent air countermeasure distributed sampling training apparatus, based on the multi-agent air countermeasure distributed sampling training method as claimed in any one of claims 1 to 6, comprising:
the system comprises an initialization module, a sampling module and a multi-agent air countermeasure network, wherein the initialization module is used for acquiring a learning node and a sampling node, establishing a connection between the learning node and the sampling node and initializing the multi-agent air countermeasure network;
the sampling module is used for sending a sampling instruction to the sampling node by the learning node, the sampling node receives the sampling instruction and starts sampling, and the sampling node sends a sample to the learning node after collecting the sample;
and the training module is used for training the learning node through the sample, updating and storing the multi-agent air countermeasure network.
CN201911266811.5A 2019-12-11 2019-12-11 Multi-agent air countermeasure distributed sampling training method and equipment Active CN111047014B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911266811.5A CN111047014B (en) 2019-12-11 2019-12-11 Multi-agent air countermeasure distributed sampling training method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911266811.5A CN111047014B (en) 2019-12-11 2019-12-11 Multi-agent air countermeasure distributed sampling training method and equipment

Publications (2)

Publication Number Publication Date
CN111047014A CN111047014A (en) 2020-04-21
CN111047014B true CN111047014B (en) 2023-06-23

Family

ID=70235634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911266811.5A Active CN111047014B (en) 2019-12-11 2019-12-11 Multi-agent air countermeasure distributed sampling training method and equipment

Country Status (1)

Country Link
CN (1) CN111047014B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629422A (en) * 2018-05-10 2018-10-09 浙江大学 A kind of intelligent body learning method of knowledge based guidance-tactics perception
WO2018206504A1 (en) * 2017-05-10 2018-11-15 Telefonaktiebolaget Lm Ericsson (Publ) Pre-training system for self-learning agent in virtualized environment
CN109635917A (en) * 2018-10-17 2019-04-16 北京大学 A kind of multiple agent Cooperation Decision-making and training method
CN109829500A (en) * 2019-01-31 2019-05-31 华南理工大学 A kind of position composition and automatic clustering method
CN109947567A (en) * 2019-03-14 2019-06-28 深圳先进技术研究院 A kind of multiple agent intensified learning dispatching method, system and electronic equipment
CN110309268A (en) * 2019-07-12 2019-10-08 中电科大数据研究院有限公司 A kind of cross-language information retrieval method based on concept map

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9082082B2 (en) * 2011-12-06 2015-07-14 The Trustees Of Columbia University In The City Of New York Network information methods devices and systems
US11301774B2 (en) * 2017-02-28 2022-04-12 Nec Corporation System and method for multi-modal graph-based personalization
US11562287B2 (en) * 2017-10-27 2023-01-24 Salesforce.Com, Inc. Hierarchical and interpretable skill acquisition in multi-task reinforcement learning
US10984905B2 (en) * 2017-11-03 2021-04-20 Siemens Healthcare Gmbh Artificial intelligence for physiological quantification in medical imaging
US10715395B2 (en) * 2017-11-27 2020-07-14 Massachusetts Institute Of Technology Methods and apparatus for communication network
US10726025B2 (en) * 2018-02-19 2020-07-28 Microsoft Technology Licensing, Llc Standardized entity representation learning for smart suggestions
US20190347933A1 (en) * 2018-05-11 2019-11-14 Virtual Traffic Lights, LLC Method of implementing an intelligent traffic control apparatus having a reinforcement learning based partial traffic detection control system, and an intelligent traffic control apparatus implemented thereby

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018206504A1 (en) * 2017-05-10 2018-11-15 Telefonaktiebolaget Lm Ericsson (Publ) Pre-training system for self-learning agent in virtualized environment
CN108629422A (en) * 2018-05-10 2018-10-09 浙江大学 A kind of intelligent body learning method of knowledge based guidance-tactics perception
CN109635917A (en) * 2018-10-17 2019-04-16 北京大学 A kind of multiple agent Cooperation Decision-making and training method
CN109829500A (en) * 2019-01-31 2019-05-31 华南理工大学 A kind of position composition and automatic clustering method
CN109947567A (en) * 2019-03-14 2019-06-28 深圳先进技术研究院 A kind of multiple agent intensified learning dispatching method, system and electronic equipment
CN110309268A (en) * 2019-07-12 2019-10-08 中电科大数据研究院有限公司 A kind of cross-language information retrieval method based on concept map

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning;Ning等;《IEEE》;第372-382页 *
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks;Foerster等;《10.48550/arXiv.1602.02672》;第1-10页 *
深度强化学习进展:从AlphaGo到AlphaGo Zero;唐振韬等;《控制理论与应用》;第34卷(第12期);第1529-1546页 *

Also Published As

Publication number Publication date
CN111047014A (en) 2020-04-21

Similar Documents

Publication Publication Date Title
JPH0789337B2 (en) Distributed file recovery method
CN111078147A (en) Processing method, device and equipment for cache data and storage medium
EP2637091A1 (en) Management interface for multiple storage subsystems virtualization
US20190057032A1 (en) Cache Coherence Management Method and Node Controller
US8190857B2 (en) Deleting a shared resource node after reserving its identifier in delete pending queue until deletion condition is met to allow continued access for currently accessing processor
CN110740145B (en) Message consumption method and device, storage medium and electronic equipment
CN109241128B (en) Automatic triggering method and system for overdue event
CN110910921A (en) Command read-write method and device and computer storage medium
CN109086168A (en) A kind of method and its system using hardware backup solid state hard disk writing rate
CN113596010B (en) Data processing method, device, node equipment and computer storage medium
CN111047014B (en) Multi-agent air countermeasure distributed sampling training method and equipment
CN211376201U (en) Command read-write device and memory
CN116719764B (en) Data synchronization method, system and related device
CN115951845B (en) Disk management method, device, equipment and storage medium
CN112650449A (en) Release method and release system of cache space, electronic device and storage medium
CN113220639B (en) File storage system control device for space application
CN112860595B (en) PCI (peripheral component interconnect express) equipment or PCIE (peripheral component interconnect express) equipment, data access method and related assembly
CN112506431B (en) I/O instruction scheduling method and device based on disk device attributes
CN114036195A (en) Data request processing method, device, server and storage medium
CN109189746B (en) Method, device, equipment and storage medium for realizing universal stream type Shuffle engine
CN114036085A (en) Multitask read-write scheduling method based on DDR4, computer equipment and storage medium
CN111045703A (en) Battery management unit software upgrading method and system
US7111298B1 (en) Inter-processor competition for a shared resource
CN113721846B (en) Method, system, equipment and medium for optimizing mechanical hard disk input and output performance
CN111966295B (en) Multi-journ recording method, device and medium based on ceph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant