CN110751869B

CN110751869B - Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method

Info

Publication number: CN110751869B
Application number: CN201910968327.0A
Authority: CN
Inventors: 杨理想; 张侨; 王银瑞; 范鹏炜
Original assignee: Nanjing Xingyao Intelligent Technology Co ltd
Current assignee: Nanjing Xingyao Intelligent Technology Co.,Ltd.
Priority date: 2019-10-12
Filing date: 2019-10-12
Publication date: 2021-11-05
Anticipated expiration: 2039-10-12
Also published as: CN110751869A

Abstract

The invention provides a simulated environment and battlefield situation strategy transfer technology based on a countermeasure migration method, which adopts a countermeasure method for regression migration, evaluates the effectiveness of tactical strategies from the simulated environment to the real environment, records real battle scenes such as mountains, rivers, military bases and the like by domain randomization and uses a real RGB camera, and the strategy learned in simulation has strong robustness and is enough to directly migrate the learned strategy to the real battle scenes; by the simulation method, the dependence cost on the real world during the simulation migration is greatly reduced.

Description

Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method

Technical Field

The invention belongs to the field of artificial intelligence, and particularly relates to a simulated environment and battlefield situation strategy transfer technology based on an antagonism discrimination migration method.

Background

At present, military units mostly use war chess to deduce and carry out round-based tactical exercises. The war chess deduction cannot achieve real-time drilling in actual combat drilling and cannot truly simulate the combat scene. Therefore, when large-scale and long-time combat is carried out, the war game deduction can only be concentrated on the effect of small-range and short-time combat, and the long-time tactical transfer cannot be realized. By constructing the combat mimicry environment, a real mimicry environment can be well constructed through various sensors, information data and the like, a real-time tactical strategy can be generated, and the dynamic change of a battlefield can be better coped with.

In some environment-based training models, when a learning model is applied to a real scene, most current systems are very vulnerable if the scene is different from the scene configured by training. Some research methods use data of a real environment for training a model, but the model collects data of a real scene in the real environment at a very expensive cost.

Disclosure of Invention

In order to solve the problems, the real-world tactical drilling information is acquired from some military databases to simulate the training of the real-world tactical drilling scene, a strategy representation frame based on CNN is introduced into the part, the guided strategy search is added, the attributes of weaponry and the real environment information are mapped into a data matrix, the number of real-world training samples can be reduced, and the method has a good effect in some complex tasks. The proposed simulated environment and battlefield situation strategy transfer technique based on the countermeasure discrimination migration method comprises the following steps:

(1) building network module structure

The deep Q learning network DQN based on deep reinforcement learning is constructed and comprises a control module and a sensing module, wherein the control module is connected with the sensing module through a bottleneck layer;

(training by antagonism discriminant migration method, the perception module is firstly trained by marked mimicry environment data, and a supervised loss function is used in the pre-training process

Where m is the order of the samples, I_jIs the input of a sample, y_p(I_j) Is I_jLabel of (2), x^* _jIs to I_jJ is a sample; after the pre-training is finished, training by using the other part of the data of the mimicry environment and the data of the real environment, and summing two loss functions in the training

Is counter-propagating, wherein

Is a loss function, L, for supervised training using mimicry environmental data_P ^AdLoss of supervised training using real environmental dataThe function of the function is that of the function,

wherein D is a supervision function, E_rIs the loss of the target encoder mimicry environment data set, Es is the loss of the target encoder real environment data set,

is an input sample in a simulated environment,

is an input sample in a real environment;

(3) making real-time decisions

And (3) completing the model transfer in the mimicry environment after the training in the step (2), and obtaining a real-time battlefield environment by using the data of the camera sensor to make a real-time decision.

As an improvement, the control module is used for learning the position of a given object and obtaining the motion parameters of the object in the image, including the direction, the angle and the speed of the motion.

As an improvement, the sensing module is used for acquiring the position information parameters of the object in the image from the original RGB image.

As an improvement, the sensing module comprises an encoder submodule and a regression submodule, wherein the encoder submodule comprises all convolution layer structures, and the regression submodule comprises all full-connection layer structures.

As an improvement, the encoder submodule comprises an active encoder source encoder and a target encoder; after the source encoder is trained through the mimicry environment data, the weight is fixed and is used as a reference in the countermeasure discriminant migration for training the target encoder.

As an improvement, a regression submodule is trained by adopting a loss function, wherein the loss function is

Wherein

Is a loss of the discriminator and is,

is the loss of the target encoder and gamma is the discounting factor for the range (0, 1).

Has the advantages that: the invention uses a countermeasure discrimination method for regression migration, evaluates the effectiveness of tactical strategies from a mimicry environment to a real environment, records real battle scenes such as mountains, rivers, military bases and the like by using a real RGB camera through domain randomization, and has strong robustness enough for directly migrating the learned strategies to the real battle scenes; by the simulation method, the dependence cost on the real world during the simulation migration is greatly reduced.

Drawings

FIG. 1 is a basic flow diagram of the present technology.

FIG. 2 is a pre-training process of the present technology.

FIG. 3 is a process of countering discriminant migration in accordance with the present technology.

Detailed Description

The figures of the present invention are further described below in conjunction with the embodiments.

The real-world tactical drilling scene training is simulated by acquiring real-world tactical drilling information in some military databases. In the part, a strategy characterization framework based on CNN is introduced, and guided strategy search (mapping the attributes of weaponry, environment real information and the like into a data matrix) is added, so that the number of real-world training samples can be reduced. This approach has worked well in a number of complex tasks.

Firstly, based on the DQN, modular design is carried out, and a bottleeck structure is used for connecting a sensing module and a control module, so that the bottleeck structure can help a neural network to learn low-dimensional feature representation. How to determine the position information of the object in the image can be learned from the original RGB image through a bottleeck structure perception module. The control module may learn where a given object is located, in what direction, angle, speed, etc. the object in the image is moving.

After the network structure is established, how to train the neural network of the invention follows. The perception module is firstly trained through marked mimicry environment data, and in the pre-training process, the invention uses a supervised loss function. After the pre-training is finished, the other part of the data of the mimicry environment and the data of the real environment are used for training, and the training at the moment is carried out in a mode of summing two loss functions for back propagation. This process is a competitive discriminant training process. In the neural network training in the control module, the invention only uses the data of the mimicry environment for training.

As shown in fig. 2, the sensing module is decomposed into two parts, one part is an encoder submodule, and the other part is a regression submodule, wherein the encoder submodule includes all convolutional layer structures, and the regression submodule includes all fully-connected layer structures. Meanwhile, the encoder submodule comprises a source encoder and a target encoder, the source encoder is trained through mimicry environment data, the weight is fixed, the source encoder is used as a reference in countermeasure discriminant migration to train the target encoder, loss functions of the source encoder and the target encoder are also used for training a regressor, and only a small amount of marked real environment data is used for training the regressor.

Through the training process, the model migration in the mimicry environment is completed. When the method is applied to a real battle scene, the data of the camera sensor can be directly used for acquiring a real-time battlefield environment and making a real-time decision.

In the invention, when a tactical strategy is migrated from a mimicry environment to a real environment, in order to avoid catastrophic forgetting in a complex sequential task, an antagonistic migration method is used, the requirement of the antagonistic migration method on marked real battlefield environment data is reduced by 50%, and migration is successfully completed by using only 100000 marked and 200000 unmarked real battlefield environment pictures. By using the weighting loss and performing fine adjustment on the combined network in an end-to-end mode, the tactical transfer success rate is remarkably improved by 32.5 percent compared with that before fine adjustment, and the combat success rate is 96 percent. Through the learning strategy, the method has strong robustness to a cluttered environment and even a noise interference environment.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A transfer technology of a simulated environment and battlefield situation strategy based on an antagonistic discrimination migration method is characterized in that:

(1) building network module structure

The deep Q learning network DQN based on deep reinforcement learning is constructed and comprises a control module and a sensing module, wherein the control module is connected with the sensing module through a bottleneck layer; the control module is used for learning the position of a given object and obtaining the motion parameters of the object in the image, including the direction, the angle and the speed of motion; the sensing module is used for acquiring position information parameters of an object in the image from the original RGB image;

(2) training neural networks

Training by adopting an antagonism discrimination migration method, wherein the sensing module is trained by marked mimicry environment data, and a supervised loss function is used in the pre-training process

Is counter-propagating, wherein

Is a loss function, L, for supervised training using mimicry environmental data_P ^AdIs a loss function for supervised training using real environment data,

is an input sample in a simulated environment,

is an input sample in a real environment;

(3) making real-time decisions

2. The simulated environment and battlefield situation strategy transfer technology based on the antagonistic discriminative migration method according to claim 1, characterized in that: the sensing module comprises an encoder submodule and a regression submodule, wherein the encoder submodule comprises all convolution layer structures, and the regression submodule comprises all full-connection layer structures.

3. The simulated environment and battlefield situation strategy transfer technology based on the antagonistic discriminative migration method according to claim 2, characterized in that: the encoder submodule comprises an active encoder source encoder and a target encoder; after the source encoder is trained through the mimicry environment data, the weight is fixed and is used as a reference in the countermeasure discriminant migration for training the target encoder.

4. The simulated environment and battlefield situation strategy transfer technology based on the antagonistic discriminative migration method according to claim 3, wherein: training a regression submodule by using a loss function which is

Wherein

Is a loss of the discriminator and is,