CN110751869B - Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method - Google Patents
Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method Download PDFInfo
- Publication number
- CN110751869B CN110751869B CN201910968327.0A CN201910968327A CN110751869B CN 110751869 B CN110751869 B CN 110751869B CN 201910968327 A CN201910968327 A CN 201910968327A CN 110751869 B CN110751869 B CN 110751869B
- Authority
- CN
- China
- Prior art keywords
- environment
- training
- real
- encoder
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B9/00—Simulators for teaching or training purposes
- G09B9/003—Simulators for teaching or training purposes for military purposes and tactics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a simulated environment and battlefield situation strategy transfer technology based on a countermeasure migration method, which adopts a countermeasure method for regression migration, evaluates the effectiveness of tactical strategies from the simulated environment to the real environment, records real battle scenes such as mountains, rivers, military bases and the like by domain randomization and uses a real RGB camera, and the strategy learned in simulation has strong robustness and is enough to directly migrate the learned strategy to the real battle scenes; by the simulation method, the dependence cost on the real world during the simulation migration is greatly reduced.
Description
Technical Field
The invention belongs to the field of artificial intelligence, and particularly relates to a simulated environment and battlefield situation strategy transfer technology based on an antagonism discrimination migration method.
Background
At present, military units mostly use war chess to deduce and carry out round-based tactical exercises. The war chess deduction cannot achieve real-time drilling in actual combat drilling and cannot truly simulate the combat scene. Therefore, when large-scale and long-time combat is carried out, the war game deduction can only be concentrated on the effect of small-range and short-time combat, and the long-time tactical transfer cannot be realized. By constructing the combat mimicry environment, a real mimicry environment can be well constructed through various sensors, information data and the like, a real-time tactical strategy can be generated, and the dynamic change of a battlefield can be better coped with.
In some environment-based training models, when a learning model is applied to a real scene, most current systems are very vulnerable if the scene is different from the scene configured by training. Some research methods use data of a real environment for training a model, but the model collects data of a real scene in the real environment at a very expensive cost.
Disclosure of Invention
In order to solve the problems, the real-world tactical drilling information is acquired from some military databases to simulate the training of the real-world tactical drilling scene, a strategy representation frame based on CNN is introduced into the part, the guided strategy search is added, the attributes of weaponry and the real environment information are mapped into a data matrix, the number of real-world training samples can be reduced, and the method has a good effect in some complex tasks. The proposed simulated environment and battlefield situation strategy transfer technique based on the countermeasure discrimination migration method comprises the following steps:
(1) building network module structure
The deep Q learning network DQN based on deep reinforcement learning is constructed and comprises a control module and a sensing module, wherein the control module is connected with the sensing module through a bottleneck layer;
(training by antagonism discriminant migration method, the perception module is firstly trained by marked mimicry environment data, and a supervised loss function is used in the pre-training processWhere m is the order of the samples, IjIs the input of a sample, yp(Ij) Is IjLabel of (2), x* jIs to IjJ is a sample; after the pre-training is finished, training by using the other part of the data of the mimicry environment and the data of the real environment, and summing two loss functions in the trainingIs counter-propagating, whereinIs a loss function, L, for supervised training using mimicry environmental dataP AdLoss of supervised training using real environmental dataThe function of the function is that of the function,wherein D is a supervision function, ErIs the loss of the target encoder mimicry environment data set, Es is the loss of the target encoder real environment data set,is an input sample in a simulated environment,is an input sample in a real environment;
(3) making real-time decisions
And (3) completing the model transfer in the mimicry environment after the training in the step (2), and obtaining a real-time battlefield environment by using the data of the camera sensor to make a real-time decision.
As an improvement, the control module is used for learning the position of a given object and obtaining the motion parameters of the object in the image, including the direction, the angle and the speed of the motion.
As an improvement, the sensing module is used for acquiring the position information parameters of the object in the image from the original RGB image.
As an improvement, the sensing module comprises an encoder submodule and a regression submodule, wherein the encoder submodule comprises all convolution layer structures, and the regression submodule comprises all full-connection layer structures.
As an improvement, the encoder submodule comprises an active encoder source encoder and a target encoder; after the source encoder is trained through the mimicry environment data, the weight is fixed and is used as a reference in the countermeasure discriminant migration for training the target encoder.
As an improvement, a regression submodule is trained by adopting a loss function, wherein the loss function isWhereinIs a loss of the discriminator and is,is the loss of the target encoder and gamma is the discounting factor for the range (0, 1).
Has the advantages that: the invention uses a countermeasure discrimination method for regression migration, evaluates the effectiveness of tactical strategies from a mimicry environment to a real environment, records real battle scenes such as mountains, rivers, military bases and the like by using a real RGB camera through domain randomization, and has strong robustness enough for directly migrating the learned strategies to the real battle scenes; by the simulation method, the dependence cost on the real world during the simulation migration is greatly reduced.
Drawings
FIG. 1 is a basic flow diagram of the present technology.
FIG. 2 is a pre-training process of the present technology.
FIG. 3 is a process of countering discriminant migration in accordance with the present technology.
Detailed Description
The figures of the present invention are further described below in conjunction with the embodiments.
The real-world tactical drilling scene training is simulated by acquiring real-world tactical drilling information in some military databases. In the part, a strategy characterization framework based on CNN is introduced, and guided strategy search (mapping the attributes of weaponry, environment real information and the like into a data matrix) is added, so that the number of real-world training samples can be reduced. This approach has worked well in a number of complex tasks.
Firstly, based on the DQN, modular design is carried out, and a bottleeck structure is used for connecting a sensing module and a control module, so that the bottleeck structure can help a neural network to learn low-dimensional feature representation. How to determine the position information of the object in the image can be learned from the original RGB image through a bottleeck structure perception module. The control module may learn where a given object is located, in what direction, angle, speed, etc. the object in the image is moving.
After the network structure is established, how to train the neural network of the invention follows. The perception module is firstly trained through marked mimicry environment data, and in the pre-training process, the invention uses a supervised loss function. After the pre-training is finished, the other part of the data of the mimicry environment and the data of the real environment are used for training, and the training at the moment is carried out in a mode of summing two loss functions for back propagation. This process is a competitive discriminant training process. In the neural network training in the control module, the invention only uses the data of the mimicry environment for training.
As shown in fig. 2, the sensing module is decomposed into two parts, one part is an encoder submodule, and the other part is a regression submodule, wherein the encoder submodule includes all convolutional layer structures, and the regression submodule includes all fully-connected layer structures. Meanwhile, the encoder submodule comprises a source encoder and a target encoder, the source encoder is trained through mimicry environment data, the weight is fixed, the source encoder is used as a reference in countermeasure discriminant migration to train the target encoder, loss functions of the source encoder and the target encoder are also used for training a regressor, and only a small amount of marked real environment data is used for training the regressor.
Through the training process, the model migration in the mimicry environment is completed. When the method is applied to a real battle scene, the data of the camera sensor can be directly used for acquiring a real-time battlefield environment and making a real-time decision.
In the invention, when a tactical strategy is migrated from a mimicry environment to a real environment, in order to avoid catastrophic forgetting in a complex sequential task, an antagonistic migration method is used, the requirement of the antagonistic migration method on marked real battlefield environment data is reduced by 50%, and migration is successfully completed by using only 100000 marked and 200000 unmarked real battlefield environment pictures. By using the weighting loss and performing fine adjustment on the combined network in an end-to-end mode, the tactical transfer success rate is remarkably improved by 32.5 percent compared with that before fine adjustment, and the combat success rate is 96 percent. Through the learning strategy, the method has strong robustness to a cluttered environment and even a noise interference environment.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (4)
1. A transfer technology of a simulated environment and battlefield situation strategy based on an antagonistic discrimination migration method is characterized in that:
(1) building network module structure
The deep Q learning network DQN based on deep reinforcement learning is constructed and comprises a control module and a sensing module, wherein the control module is connected with the sensing module through a bottleneck layer; the control module is used for learning the position of a given object and obtaining the motion parameters of the object in the image, including the direction, the angle and the speed of motion; the sensing module is used for acquiring position information parameters of an object in the image from the original RGB image;
(2) training neural networks
Training by adopting an antagonism discrimination migration method, wherein the sensing module is trained by marked mimicry environment data, and a supervised loss function is used in the pre-training processWhere m is the order of the samples, IjIs the input of a sample, yp(Ij) Is IjLabel of (2), x* jIs to IjJ is a sample; after the pre-training is finished, training by using the other part of the data of the mimicry environment and the data of the real environment, and summing two loss functions in the trainingIs counter-propagating, whereinIs a loss function, L, for supervised training using mimicry environmental dataP AdIs a loss function for supervised training using real environment data,wherein D is a supervision function, ErIs the loss of the target encoder mimicry environment data set, Es is the loss of the target encoder real environment data set,is an input sample in a simulated environment,is an input sample in a real environment;
(3) making real-time decisions
And (3) completing the model transfer in the mimicry environment after the training in the step (2), and obtaining a real-time battlefield environment by using the data of the camera sensor to make a real-time decision.
2. The simulated environment and battlefield situation strategy transfer technology based on the antagonistic discriminative migration method according to claim 1, characterized in that: the sensing module comprises an encoder submodule and a regression submodule, wherein the encoder submodule comprises all convolution layer structures, and the regression submodule comprises all full-connection layer structures.
3. The simulated environment and battlefield situation strategy transfer technology based on the antagonistic discriminative migration method according to claim 2, characterized in that: the encoder submodule comprises an active encoder source encoder and a target encoder; after the source encoder is trained through the mimicry environment data, the weight is fixed and is used as a reference in the countermeasure discriminant migration for training the target encoder.
4. The simulated environment and battlefield situation strategy transfer technology based on the antagonistic discriminative migration method according to claim 3, wherein: training a regression submodule by using a loss function which isWhereinIs a loss of the discriminator and is,is the loss of the target encoder and gamma is the discounting factor for the range (0, 1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910968327.0A CN110751869B (en) | 2019-10-12 | 2019-10-12 | Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910968327.0A CN110751869B (en) | 2019-10-12 | 2019-10-12 | Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110751869A CN110751869A (en) | 2020-02-04 |
CN110751869B true CN110751869B (en) | 2021-11-05 |
Family
ID=69278089
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910968327.0A Active CN110751869B (en) | 2019-10-12 | 2019-10-12 | Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110751869B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364500B (en) * | 2020-11-09 | 2021-07-20 | 中国科学院自动化研究所 | Multi-concurrency real-time countermeasure system oriented to reinforcement learning training and evaluation |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107527068A (en) * | 2017-08-07 | 2017-12-29 | 南京信息工程大学 | Model recognizing method based on CNN and domain adaptive learning |
CN108090412A (en) * | 2017-11-17 | 2018-05-29 | 西北工业大学 | A kind of radar emission source category recognition methods based on deep learning |
CN108537743A (en) * | 2018-03-13 | 2018-09-14 | 杭州电子科技大学 | A kind of face-image Enhancement Method based on generation confrontation network |
CN109902861A (en) * | 2019-01-31 | 2019-06-18 | 南京航空航天大学 | A kind of order manufacturing schedule real-time predicting method based on the double-deck transfer learning |
CN110045336A (en) * | 2019-02-28 | 2019-07-23 | 合肥工业大学 | Radar chaff recognition methods and device based on convolutional neural networks |
CN110245602A (en) * | 2019-06-12 | 2019-09-17 | 哈尔滨工程大学 | A kind of underwater quiet target identification method based on depth convolution feature |
CN110287800A (en) * | 2019-05-29 | 2019-09-27 | 河海大学 | A kind of remote sensing images scene classification method based on SGSE-GAN |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9764468B2 (en) * | 2013-03-15 | 2017-09-19 | Brain Corporation | Adaptive predictor apparatus and methods |
-
2019
- 2019-10-12 CN CN201910968327.0A patent/CN110751869B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107527068A (en) * | 2017-08-07 | 2017-12-29 | 南京信息工程大学 | Model recognizing method based on CNN and domain adaptive learning |
CN108090412A (en) * | 2017-11-17 | 2018-05-29 | 西北工业大学 | A kind of radar emission source category recognition methods based on deep learning |
CN108537743A (en) * | 2018-03-13 | 2018-09-14 | 杭州电子科技大学 | A kind of face-image Enhancement Method based on generation confrontation network |
CN109902861A (en) * | 2019-01-31 | 2019-06-18 | 南京航空航天大学 | A kind of order manufacturing schedule real-time predicting method based on the double-deck transfer learning |
CN110045336A (en) * | 2019-02-28 | 2019-07-23 | 合肥工业大学 | Radar chaff recognition methods and device based on convolutional neural networks |
CN110287800A (en) * | 2019-05-29 | 2019-09-27 | 河海大学 | A kind of remote sensing images scene classification method based on SGSE-GAN |
CN110245602A (en) * | 2019-06-12 | 2019-09-17 | 哈尔滨工程大学 | A kind of underwater quiet target identification method based on depth convolution feature |
Also Published As
Publication number | Publication date |
---|---|
CN110751869A (en) | 2020-02-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109711529B (en) | Cross-domain federated learning model and method based on value iterative network | |
CN109948642B (en) | Multi-agent cross-modal depth certainty strategy gradient training method based on image input | |
WO2021208771A1 (en) | Reinforced learning method and device | |
US20200372822A1 (en) | Training system for autonomous driving control policy | |
CN109682392A (en) | Vision navigation method and system based on deeply study | |
CN108819948B (en) | Driver behavior modeling method based on reverse reinforcement learning | |
CN112445823A (en) | Searching method of neural network structure, image processing method and device | |
CN112884131A (en) | Deep reinforcement learning strategy optimization defense method and device based on simulation learning | |
CN109782600A (en) | A method of autonomous mobile robot navigation system is established by virtual environment | |
CN112017085B (en) | Intelligent virtual teacher image personalization method | |
CN110826453A (en) | Behavior identification method by extracting coordinates of human body joint points | |
CN107351080B (en) | Hybrid intelligent research system based on camera unit array and control method | |
CN113076615B (en) | High-robustness mechanical arm operation method and system based on antagonistic deep reinforcement learning | |
CN110327624A (en) | A kind of game follower method and system based on course intensified learning | |
CN106022471A (en) | Wavelet neural network model ship rolling real-time prediction method based on particle swarm optimization algorithm | |
CN113627596A (en) | Multi-agent confrontation method and system based on dynamic graph neural network | |
CN110751869B (en) | Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method | |
CN108920805A (en) | Driving behavior modeling with state feature extraction functions | |
CN113110101B (en) | Production line mobile robot gathering type recovery and warehousing simulation method and system | |
CN114445684A (en) | Method, device and equipment for training lane line segmentation model and storage medium | |
CN108944940B (en) | Driver behavior modeling method based on neural network | |
CN115293022A (en) | Aviation soldier intelligent agent confrontation behavior modeling method based on OptiGAN and spatiotemporal attention | |
CN107038450A (en) | Unmanned plane policing system based on deep learning | |
CN112525194A (en) | Cognitive navigation method based on endogenous and exogenous information of hippocampus-striatum | |
CN117058547A (en) | Unmanned ship dynamic target tracking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210312 Address after: 210000 rooms 1201 and 1209, building C, Xingzhi Science Park, Qixia Economic and Technological Development Zone, Nanjing, Jiangsu Province Applicant after: Nanjing Xingyao Intelligent Technology Co.,Ltd. Address before: Room 1211, building C, Xingzhi Science Park, 6 Xingzhi Road, Nanjing Economic and Technological Development Zone, Jiangsu Province, 210000 Applicant before: Nanjing Shixing Intelligent Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |