CN110751869B - Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method - Google Patents

Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method Download PDF

Info

Publication number
CN110751869B
CN110751869B CN201910968327.0A CN201910968327A CN110751869B CN 110751869 B CN110751869 B CN 110751869B CN 201910968327 A CN201910968327 A CN 201910968327A CN 110751869 B CN110751869 B CN 110751869B
Authority
CN
China
Prior art keywords
environment
training
real
encoder
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910968327.0A
Other languages
Chinese (zh)
Other versions
CN110751869A (en
Inventor
杨理想
张侨
王银瑞
范鹏炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xingyao Intelligent Technology Co.,Ltd.
Original Assignee
Nanjing Xingyao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xingyao Intelligent Technology Co ltd filed Critical Nanjing Xingyao Intelligent Technology Co ltd
Priority to CN201910968327.0A priority Critical patent/CN110751869B/en
Publication of CN110751869A publication Critical patent/CN110751869A/en
Application granted granted Critical
Publication of CN110751869B publication Critical patent/CN110751869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B9/00Simulators for teaching or training purposes
    • G09B9/003Simulators for teaching or training purposes for military purposes and tactics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a simulated environment and battlefield situation strategy transfer technology based on a countermeasure migration method, which adopts a countermeasure method for regression migration, evaluates the effectiveness of tactical strategies from the simulated environment to the real environment, records real battle scenes such as mountains, rivers, military bases and the like by domain randomization and uses a real RGB camera, and the strategy learned in simulation has strong robustness and is enough to directly migrate the learned strategy to the real battle scenes; by the simulation method, the dependence cost on the real world during the simulation migration is greatly reduced.

Description

Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method
Technical Field
The invention belongs to the field of artificial intelligence, and particularly relates to a simulated environment and battlefield situation strategy transfer technology based on an antagonism discrimination migration method.
Background
At present, military units mostly use war chess to deduce and carry out round-based tactical exercises. The war chess deduction cannot achieve real-time drilling in actual combat drilling and cannot truly simulate the combat scene. Therefore, when large-scale and long-time combat is carried out, the war game deduction can only be concentrated on the effect of small-range and short-time combat, and the long-time tactical transfer cannot be realized. By constructing the combat mimicry environment, a real mimicry environment can be well constructed through various sensors, information data and the like, a real-time tactical strategy can be generated, and the dynamic change of a battlefield can be better coped with.
In some environment-based training models, when a learning model is applied to a real scene, most current systems are very vulnerable if the scene is different from the scene configured by training. Some research methods use data of a real environment for training a model, but the model collects data of a real scene in the real environment at a very expensive cost.
Disclosure of Invention
In order to solve the problems, the real-world tactical drilling information is acquired from some military databases to simulate the training of the real-world tactical drilling scene, a strategy representation frame based on CNN is introduced into the part, the guided strategy search is added, the attributes of weaponry and the real environment information are mapped into a data matrix, the number of real-world training samples can be reduced, and the method has a good effect in some complex tasks. The proposed simulated environment and battlefield situation strategy transfer technique based on the countermeasure discrimination migration method comprises the following steps:
(1) building network module structure
The deep Q learning network DQN based on deep reinforcement learning is constructed and comprises a control module and a sensing module, wherein the control module is connected with the sensing module through a bottleneck layer;
(training by antagonism discriminant migration method, the perception module is firstly trained by marked mimicry environment data, and a supervised loss function is used in the pre-training process
Figure BDA0002231244620000011
Where m is the order of the samples, IjIs the input of a sample, yp(Ij) Is IjLabel of (2), x* jIs to IjJ is a sample; after the pre-training is finished, training by using the other part of the data of the mimicry environment and the data of the real environment, and summing two loss functions in the training
Figure BDA0002231244620000012
Is counter-propagating, wherein
Figure BDA0002231244620000013
Is a loss function, L, for supervised training using mimicry environmental dataP AdLoss of supervised training using real environmental dataThe function of the function is that of the function,
Figure BDA0002231244620000021
wherein D is a supervision function, ErIs the loss of the target encoder mimicry environment data set, Es is the loss of the target encoder real environment data set,
Figure BDA0002231244620000022
is an input sample in a simulated environment,
Figure BDA0002231244620000023
is an input sample in a real environment;
(3) making real-time decisions
And (3) completing the model transfer in the mimicry environment after the training in the step (2), and obtaining a real-time battlefield environment by using the data of the camera sensor to make a real-time decision.
As an improvement, the control module is used for learning the position of a given object and obtaining the motion parameters of the object in the image, including the direction, the angle and the speed of the motion.
As an improvement, the sensing module is used for acquiring the position information parameters of the object in the image from the original RGB image.
As an improvement, the sensing module comprises an encoder submodule and a regression submodule, wherein the encoder submodule comprises all convolution layer structures, and the regression submodule comprises all full-connection layer structures.
As an improvement, the encoder submodule comprises an active encoder source encoder and a target encoder; after the source encoder is trained through the mimicry environment data, the weight is fixed and is used as a reference in the countermeasure discriminant migration for training the target encoder.
As an improvement, a regression submodule is trained by adopting a loss function, wherein the loss function is
Figure BDA0002231244620000024
Wherein
Figure BDA0002231244620000025
Is a loss of the discriminator and is,
Figure BDA0002231244620000026
is the loss of the target encoder and gamma is the discounting factor for the range (0, 1).
Has the advantages that: the invention uses a countermeasure discrimination method for regression migration, evaluates the effectiveness of tactical strategies from a mimicry environment to a real environment, records real battle scenes such as mountains, rivers, military bases and the like by using a real RGB camera through domain randomization, and has strong robustness enough for directly migrating the learned strategies to the real battle scenes; by the simulation method, the dependence cost on the real world during the simulation migration is greatly reduced.
Drawings
FIG. 1 is a basic flow diagram of the present technology.
FIG. 2 is a pre-training process of the present technology.
FIG. 3 is a process of countering discriminant migration in accordance with the present technology.
Detailed Description
The figures of the present invention are further described below in conjunction with the embodiments.
The real-world tactical drilling scene training is simulated by acquiring real-world tactical drilling information in some military databases. In the part, a strategy characterization framework based on CNN is introduced, and guided strategy search (mapping the attributes of weaponry, environment real information and the like into a data matrix) is added, so that the number of real-world training samples can be reduced. This approach has worked well in a number of complex tasks.
Firstly, based on the DQN, modular design is carried out, and a bottleeck structure is used for connecting a sensing module and a control module, so that the bottleeck structure can help a neural network to learn low-dimensional feature representation. How to determine the position information of the object in the image can be learned from the original RGB image through a bottleeck structure perception module. The control module may learn where a given object is located, in what direction, angle, speed, etc. the object in the image is moving.
After the network structure is established, how to train the neural network of the invention follows. The perception module is firstly trained through marked mimicry environment data, and in the pre-training process, the invention uses a supervised loss function. After the pre-training is finished, the other part of the data of the mimicry environment and the data of the real environment are used for training, and the training at the moment is carried out in a mode of summing two loss functions for back propagation. This process is a competitive discriminant training process. In the neural network training in the control module, the invention only uses the data of the mimicry environment for training.
As shown in fig. 2, the sensing module is decomposed into two parts, one part is an encoder submodule, and the other part is a regression submodule, wherein the encoder submodule includes all convolutional layer structures, and the regression submodule includes all fully-connected layer structures. Meanwhile, the encoder submodule comprises a source encoder and a target encoder, the source encoder is trained through mimicry environment data, the weight is fixed, the source encoder is used as a reference in countermeasure discriminant migration to train the target encoder, loss functions of the source encoder and the target encoder are also used for training a regressor, and only a small amount of marked real environment data is used for training the regressor.
Through the training process, the model migration in the mimicry environment is completed. When the method is applied to a real battle scene, the data of the camera sensor can be directly used for acquiring a real-time battlefield environment and making a real-time decision.
In the invention, when a tactical strategy is migrated from a mimicry environment to a real environment, in order to avoid catastrophic forgetting in a complex sequential task, an antagonistic migration method is used, the requirement of the antagonistic migration method on marked real battlefield environment data is reduced by 50%, and migration is successfully completed by using only 100000 marked and 200000 unmarked real battlefield environment pictures. By using the weighting loss and performing fine adjustment on the combined network in an end-to-end mode, the tactical transfer success rate is remarkably improved by 32.5 percent compared with that before fine adjustment, and the combat success rate is 96 percent. Through the learning strategy, the method has strong robustness to a cluttered environment and even a noise interference environment.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (4)

1. A transfer technology of a simulated environment and battlefield situation strategy based on an antagonistic discrimination migration method is characterized in that:
(1) building network module structure
The deep Q learning network DQN based on deep reinforcement learning is constructed and comprises a control module and a sensing module, wherein the control module is connected with the sensing module through a bottleneck layer; the control module is used for learning the position of a given object and obtaining the motion parameters of the object in the image, including the direction, the angle and the speed of motion; the sensing module is used for acquiring position information parameters of an object in the image from the original RGB image;
(2) training neural networks
Training by adopting an antagonism discrimination migration method, wherein the sensing module is trained by marked mimicry environment data, and a supervised loss function is used in the pre-training process
Figure FDA0003216782250000011
Where m is the order of the samples, IjIs the input of a sample, yp(Ij) Is IjLabel of (2), x* jIs to IjJ is a sample; after the pre-training is finished, training by using the other part of the data of the mimicry environment and the data of the real environment, and summing two loss functions in the training
Figure FDA0003216782250000012
Is counter-propagating, wherein
Figure FDA0003216782250000013
Is a loss function, L, for supervised training using mimicry environmental dataP AdIs a loss function for supervised training using real environment data,
Figure FDA0003216782250000014
wherein D is a supervision function, ErIs the loss of the target encoder mimicry environment data set, Es is the loss of the target encoder real environment data set,
Figure FDA0003216782250000015
is an input sample in a simulated environment,
Figure FDA0003216782250000016
is an input sample in a real environment;
(3) making real-time decisions
And (3) completing the model transfer in the mimicry environment after the training in the step (2), and obtaining a real-time battlefield environment by using the data of the camera sensor to make a real-time decision.
2. The simulated environment and battlefield situation strategy transfer technology based on the antagonistic discriminative migration method according to claim 1, characterized in that: the sensing module comprises an encoder submodule and a regression submodule, wherein the encoder submodule comprises all convolution layer structures, and the regression submodule comprises all full-connection layer structures.
3. The simulated environment and battlefield situation strategy transfer technology based on the antagonistic discriminative migration method according to claim 2, characterized in that: the encoder submodule comprises an active encoder source encoder and a target encoder; after the source encoder is trained through the mimicry environment data, the weight is fixed and is used as a reference in the countermeasure discriminant migration for training the target encoder.
4. The simulated environment and battlefield situation strategy transfer technology based on the antagonistic discriminative migration method according to claim 3, wherein: training a regression submodule by using a loss function which is
Figure FDA0003216782250000021
Wherein
Figure FDA0003216782250000022
Is a loss of the discriminator and is,
Figure FDA0003216782250000023
is the loss of the target encoder and gamma is the discounting factor for the range (0, 1).
CN201910968327.0A 2019-10-12 2019-10-12 Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method Active CN110751869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910968327.0A CN110751869B (en) 2019-10-12 2019-10-12 Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910968327.0A CN110751869B (en) 2019-10-12 2019-10-12 Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method

Publications (2)

Publication Number Publication Date
CN110751869A CN110751869A (en) 2020-02-04
CN110751869B true CN110751869B (en) 2021-11-05

Family

ID=69278089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910968327.0A Active CN110751869B (en) 2019-10-12 2019-10-12 Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method

Country Status (1)

Country Link
CN (1) CN110751869B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364500B (en) * 2020-11-09 2021-07-20 中国科学院自动化研究所 Multi-concurrency real-time countermeasure system oriented to reinforcement learning training and evaluation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107527068A (en) * 2017-08-07 2017-12-29 南京信息工程大学 Model recognizing method based on CNN and domain adaptive learning
CN108090412A (en) * 2017-11-17 2018-05-29 西北工业大学 A kind of radar emission source category recognition methods based on deep learning
CN108537743A (en) * 2018-03-13 2018-09-14 杭州电子科技大学 A kind of face-image Enhancement Method based on generation confrontation network
CN109902861A (en) * 2019-01-31 2019-06-18 南京航空航天大学 A kind of order manufacturing schedule real-time predicting method based on the double-deck transfer learning
CN110045336A (en) * 2019-02-28 2019-07-23 合肥工业大学 Radar chaff recognition methods and device based on convolutional neural networks
CN110245602A (en) * 2019-06-12 2019-09-17 哈尔滨工程大学 A kind of underwater quiet target identification method based on depth convolution feature
CN110287800A (en) * 2019-05-29 2019-09-27 河海大学 A kind of remote sensing images scene classification method based on SGSE-GAN

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9764468B2 (en) * 2013-03-15 2017-09-19 Brain Corporation Adaptive predictor apparatus and methods

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107527068A (en) * 2017-08-07 2017-12-29 南京信息工程大学 Model recognizing method based on CNN and domain adaptive learning
CN108090412A (en) * 2017-11-17 2018-05-29 西北工业大学 A kind of radar emission source category recognition methods based on deep learning
CN108537743A (en) * 2018-03-13 2018-09-14 杭州电子科技大学 A kind of face-image Enhancement Method based on generation confrontation network
CN109902861A (en) * 2019-01-31 2019-06-18 南京航空航天大学 A kind of order manufacturing schedule real-time predicting method based on the double-deck transfer learning
CN110045336A (en) * 2019-02-28 2019-07-23 合肥工业大学 Radar chaff recognition methods and device based on convolutional neural networks
CN110287800A (en) * 2019-05-29 2019-09-27 河海大学 A kind of remote sensing images scene classification method based on SGSE-GAN
CN110245602A (en) * 2019-06-12 2019-09-17 哈尔滨工程大学 A kind of underwater quiet target identification method based on depth convolution feature

Also Published As

Publication number Publication date
CN110751869A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
CN109711529B (en) Cross-domain federated learning model and method based on value iterative network
CN109948642B (en) Multi-agent cross-modal depth certainty strategy gradient training method based on image input
WO2021208771A1 (en) Reinforced learning method and device
US20200372822A1 (en) Training system for autonomous driving control policy
CN109682392A (en) Vision navigation method and system based on deeply study
CN108819948B (en) Driver behavior modeling method based on reverse reinforcement learning
CN112445823A (en) Searching method of neural network structure, image processing method and device
CN112884131A (en) Deep reinforcement learning strategy optimization defense method and device based on simulation learning
CN109782600A (en) A method of autonomous mobile robot navigation system is established by virtual environment
CN112017085B (en) Intelligent virtual teacher image personalization method
CN110826453A (en) Behavior identification method by extracting coordinates of human body joint points
CN107351080B (en) Hybrid intelligent research system based on camera unit array and control method
CN113076615B (en) High-robustness mechanical arm operation method and system based on antagonistic deep reinforcement learning
CN110327624A (en) A kind of game follower method and system based on course intensified learning
CN106022471A (en) Wavelet neural network model ship rolling real-time prediction method based on particle swarm optimization algorithm
CN113627596A (en) Multi-agent confrontation method and system based on dynamic graph neural network
CN110751869B (en) Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method
CN108920805A (en) Driving behavior modeling with state feature extraction functions
CN113110101B (en) Production line mobile robot gathering type recovery and warehousing simulation method and system
CN114445684A (en) Method, device and equipment for training lane line segmentation model and storage medium
CN108944940B (en) Driver behavior modeling method based on neural network
CN115293022A (en) Aviation soldier intelligent agent confrontation behavior modeling method based on OptiGAN and spatiotemporal attention
CN107038450A (en) Unmanned plane policing system based on deep learning
CN112525194A (en) Cognitive navigation method based on endogenous and exogenous information of hippocampus-striatum
CN117058547A (en) Unmanned ship dynamic target tracking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210312

Address after: 210000 rooms 1201 and 1209, building C, Xingzhi Science Park, Qixia Economic and Technological Development Zone, Nanjing, Jiangsu Province

Applicant after: Nanjing Xingyao Intelligent Technology Co.,Ltd.

Address before: Room 1211, building C, Xingzhi Science Park, 6 Xingzhi Road, Nanjing Economic and Technological Development Zone, Jiangsu Province, 210000

Applicant before: Nanjing Shixing Intelligent Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant