CN112464549B - Dynamic allocation method of countermeasure unit - Google Patents

Dynamic allocation method of countermeasure unit Download PDF

Info

Publication number
CN112464549B
CN112464549B CN202010642635.7A CN202010642635A CN112464549B CN 112464549 B CN112464549 B CN 112464549B CN 202010642635 A CN202010642635 A CN 202010642635A CN 112464549 B CN112464549 B CN 112464549B
Authority
CN
China
Prior art keywords
countermeasure
unit
units
vector
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010642635.7A
Other languages
Chinese (zh)
Other versions
CN112464549A (en
Inventor
马贤明
齐智敏
张海林
王全东
黄谦
陈敏
皮雄军
高和顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baiyang Times Beijing Technology Co ltd
Evaluation Argument Research Center Academy Of Military Sciences Pla China
Original Assignee
Baiyang Times Beijing Technology Co ltd
Evaluation Argument Research Center Academy Of Military Sciences Pla China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baiyang Times Beijing Technology Co ltd, Evaluation Argument Research Center Academy Of Military Sciences Pla China filed Critical Baiyang Times Beijing Technology Co ltd
Priority to CN202010642635.7A priority Critical patent/CN112464549B/en
Publication of CN112464549A publication Critical patent/CN112464549A/en
Application granted granted Critical
Publication of CN112464549B publication Critical patent/CN112464549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a dynamic allocation method of a countermeasure unit, which specifically comprises the following steps: acquiring countermeasure information in a digital battlefield environment, preprocessing the countermeasure information to obtain preprocessed information, processing a packet start vector serving as an initial state and preprocessed information serving as initial input by using a trained recurrent neural network model to obtain a first-step output, and processing the first-step output, attribute information of n countermeasure units and a packet end vector of a packet end unit by using an attention model; sampling the normalized result according to a preset random sampling rule to obtain a sampling result; and repeating the steps until the current sampling result is the component corresponding to the grouping end unit. The grouping information obtained by the method is closer to the digital battlefield environment, is beneficial to completing the countermeasure task, and can also adapt to the change of the digital battlefield situation.

Description

Dynamic allocation method of countermeasure unit
Technical Field
The invention belongs to the technical field of digital battlefield simulation, and particularly relates to a dynamic allocation method of countermeasure units.
Background
In a digital simulation battlefield environment, there are two confrontation parties, one of which initiates a confrontation task for the other. To complete the countermeasure task, the party groups the own parties who can perform the countermeasure task, typically into groups, and the members of the same group will do the same or similar actions. The packet will be described below by taking the red side as one of the two competing sides and the blue side as the other of the two competing sides.
When the confrontation task is the defense burst, the red party often sends out a plurality of fighters to simultaneously defend the defense burst; when the counterwork task is to approach a unit with attack capability of the reconnaissance blue, such as defense work, the red party often sends out a plurality of unmanned aerial vehicles for reconnaissance at the same time. In order to simplify the control, the current packets are generally fixed packets designated in advance, and the fixed packets have a large disadvantage, for example, some of the fixed packets are a group of red units, if a certain task is executed by only one red unit, the red unit may be knocked down by a blue attack unit during the process of executing the task, and then only one red unit is dispatched to execute the task, the red unit is still knocked down, so that the task still fails. Some of the red square units are in a group, the red square units of each group are the same in number, and at the moment, when the blue square firepower is in a region with stronger firepower, the members in the group are the same as those in the group when the blue square firepower is weaker, so that the task cannot be effectively completed. Generally, in the current digital simulation battlefield, the countermeasure unit distribution mode is relatively solidified, and a corresponding strategy cannot be adopted according to the change of the external environment.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for dynamically allocating a countermeasure unit, comprising:
s1, obtaining the countermeasure information in the digital battlefield environment, wherein the countermeasure information comprises: the system comprises attribute information of n countermeasure units, attribute information of a plurality of attack units and attribute information of a plurality of target units, wherein the countermeasure units are used for executing countermeasure tasks and belong to one of two countermeasures, the attack units are used for preventing the countermeasure units from executing the countermeasure tasks, the target units are target parties of the countermeasure units for executing the countermeasure tasks, the attack units and the target units both belong to the other of the two countermeasures, n is more than or equal to 2 and is a positive integer;
s2, preprocessing the countermeasure information to obtain preprocessed information meeting the input requirement of the recurrent neural network model;
s3, processing a packet start vector as an initial state and the preprocessing information as an initial input by using a trained recurrent neural network model to obtain a first-step output, processing the first-step output, attribute information of n countermeasure units and a packet end vector of a packet end unit by using an attention model, and then performing normalization processing to obtain a normalization result vector, wherein the normalization result vector is a (n +1) -dimensional vector, and each component in the (n +1) -dimensional normalization result vector sequentially represents the probability of acquiring each countermeasure unit and each packet end unit;
s4, sampling the normalization result according to a preset random sampling rule to obtain a sampling result, wherein the countermeasure unit corresponding to the sampling result belongs to grouping information;
s5, repeatedly executing S3-S4, and sequentially using the state obtained in the last step of the trained recurrent neural network model and the preprocessing information as the input of the current step of the trained recurrent neural network model until the current sampling result is the component corresponding to the grouping end unit; and the grouping start vector and the grouping end vector are obtained by training.
In the dynamic allocation method as described above, optionally, the step S2 includes: and performing a plurality of layers of transformation and fusion processing on the attribute information of each countermeasure unit, the attribute information of each attack unit and the attribute information of each target unit to obtain a high-dimensional vector, wherein the high-dimensional vector is preprocessed information meeting the input requirement of a recurrent neural network model, and the transformation comprises full-connection layer transformation and activation function processing. The fusion processing refers to processing a plurality of vectors into one vector, and the processing procedure is to add or take the maximum value at the corresponding position of each matrix or vector.
In the dynamic allocation method as described above, optionally, the activation function is a modified linear unit function.
In the dynamic allocation method as described above, optionally, the random sampling rule in step S4 includes: randomly generating a judgment threshold value t, wherein t belongs to [0,1 ]; if k is the minimum number of all i satisfying p0+ p1+ … + pi > -t, the current sampling result is pk; wherein k and i are both natural numbers and belong to [0, n ], and pi represents the (i +1) th component in the (n +1) dimensional normalization result vector.
In the dynamic allocation method as described above, optionally, after step S5, the dynamic allocation method further includes: judging whether the interval between the current moment and the last decision moment is a preset decision interval or not; if yes, executing the steps S1-S5; the last decision time is the time from the sampling to the end of packet unit in step S5. If not, continuing to wait until the time interval is a preset decision interval.
The invention has the beneficial effects that:
(1) the grouping information obtained by the dynamic allocation method is closer to the digital battlefield environment, and is more beneficial to completing the confrontation task; the invention can also adapt to the change of the digital battlefield situation in the deduction process, and the invention can automatically group the countermeasure units by inputting the current characteristic information including the red and blue parties and the battlefield environment into the neural network after training, and can dynamically group the countermeasure units in the digital battlefield according to the current scene.
(2) The invention has the advantage of wide application range. The dynamic grouping method for the Hongfang units on the digital battlefield is mainly applied to the simulation construction of the digital battlefield, the digital battlefield is based on battlefield environment data, and the complete battlefield environment and the space-time data model of weaponry of both sides of the war are constructed by using the simulation description of the data of the battlefield environment and the simulation description of the data of the battlefield environment by using the electronic computer simulation technology for war simulation. Meanwhile, in the actual operation process, both the red and the blue can be used. The invention can also be used for testing the weapon platform.
Drawings
FIG. 1 is a schematic diagram of a neural network structure for implementing a dynamic assignment method of countermeasure units according to the present invention;
FIG. 2 is a flow chart illustrating a method for dynamically allocating countermeasure units according to the present invention;
fig. 3 is a schematic diagram of a proximity reconnaissance scene according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Example 1: a dynamic allocation method of countermeasure units.
In the digital battlefield environment, there are two confrontation parties, and the following describes the dynamic allocation method of the confrontation unit provided by the embodiment of the present invention, taking the red party as one of the confrontation parties and the blue party as the other of the confrontation parties as an example. Fig. 1 is a schematic structural diagram of a neural network for implementing a dynamic assignment method of a countermeasure unit according to an embodiment of the present invention.
Referring to fig. 2, an embodiment of the present invention provides a method for dynamically allocating a countermeasure unit for use in a digital battlefield environment, the method including the steps of:
step S1, obtaining countermeasure information in the digital battlefield environment, the countermeasure information including: the system comprises n countermeasure units, a plurality of attack units and a plurality of target units, wherein the countermeasure units are used for executing countermeasure tasks and belong to one of two countermeasures, the attack units are used for preventing the countermeasure units from executing the countermeasure tasks, and the attack units and the target units both belong to the other of the two countermeasures.
Specifically, based on the countermeasure mission, each unit in the digital battlefield environment is divided into: the system comprises a countermeasure unit, an attack unit and a target unit, wherein the countermeasure unit belongs to the red party and is a unit which can be controlled by the red party, namely, a command sent by the red party or controlled by the red party can be received, and then corresponding action is executed according to the command; the attack unit belongs to the blue side and is controlled by the blue side, and is used for preventing the countermeasure unit from performing countermeasures, in other words, the attack unit is a unit which can attack to the red side including but not limited to the countermeasure unit; the target unit is a target party of the countermeasure unit for executing the countermeasure task. Based on different confrontation tasks, a unit of the blue party can be both an attack unit and a target unit, for example, when the confrontation task is a defense, a fighter of the blue party is both the attack unit and the target unit. In the confrontation task, the number of confrontation units is n, and n is a positive integer greater than or equal to 2. The number of attack units can be one or more, and the number of target units is the same, and the specific number depends on the digital battlefield environment and the confrontation task.
The battlefield scene is taken as an example to describe the countermeasure task, the countermeasure unit, the attack unit and the target unit.
In a battle scene, the red side has radar, air-to-air fighter and ground missile, and the blue side has radar, air-to-ground fighter and air-to-air fighter. The red party's challenge task is to destroy all blue party attack units. All units of the red side belong to the confrontation units because all units of the red side can be controlled. The blue radar has no attack capability, but has an influence on the war bureau, if the radar is not opened, the blue fighter cannot know the positions of the red fighter and the missile to the ground, so that the fighter cannot attack, and therefore the fighter belongs to a target unit, so that the attack unit refers to: the target unit is all blue square units, namely radar, air-to-ground fighter and air-to-air fighter.
In a close-by scout scenario, as shown in fig. 3, the red space has two unmanned scout aircrafts 20, 21, and the blue space has three camps and one pillbox 25, where two camps are real camps 22, 24 and one is a false camps 24. The countermeasures of the red party are the proximity detection, and the accurate positions and the truth of the three camps of the blue party are obtained through the proximity detection. Under this scene, to the unit of fighting be two unmanned scouting machines, attack the unit and be blockhouse, the target unit is three camps and lands. The approximate locations and types of attack units and target units can be obtained in advance by the early warning machine of the red party, but the precise locations and the authenticity of the attack units and the target units are not known, so that a proximity reconnaissance task needs to be performed.
Obtaining countermeasure information, the countermeasure information including: the attribute information of each countermeasure unit, the attribute information of the attack unit and the attribute information of the target unit, each attribute information being represented by a vector, the attribute information being information related to the countermeasure task (or a feature that has an effect on the completion of the countermeasure task), which may include: unit type, current coordinates, attack distance, and moving speed, and may further include: the moving direction may further include other information in other embodiments, which is not limited in this embodiment. The unit type is that the units are represented by digitalization, such as camp 1, helicopter 2 and so on. The attribute information can be obtained by the early warning machine.
And step S2, preprocessing the countermeasure information to obtain preprocessed information meeting the input requirement of the recurrent neural network model.
Specifically, a plurality of layers of transformation and fusion processing are carried out on the attribute information of each countermeasure unit, the attribute information of each attack unit and the attribute information of each target unit to obtain a high-dimensional vector, and the high-dimensional vector is preprocessed information which meets the input requirement of the recurrent neural network model. The purpose of the several layers of transformation is to obtain changed features from the original features, which are features that are more consistent with the calculation conditions (or expectations). The transformation comprises the following steps: FC (fully connected layers) transforms and activation function processing, i.e. one layer transforms into two transforms, preferably a relu (Rectified linear unit) function, where the output after one layer transform can be expressed as: output ═ relu (input)), one layer of transforms can be denoted by fc _ relu. When in application, the method can be as follows: firstly, carrying out a plurality of layers of transformation, and then carrying out fusion treatment; the method can also be as follows: firstly, several layers of transformation are carried out, then fusion treatment is carried out, and then several layers of transformation are carried out. The number of layers to be converted may be one or more, which is not limited in this embodiment. The fusion process refers to processing a plurality of vectors into a vector, and the vector is converted into a high-dimensional vector, such as 256-dimensional vector. The specific processing procedure may be: the corresponding positions of the matrices or vectors are added, the maximum value is taken, and the like.
S3, processing the initial state of the packet start vector and the initial input of the preprocessing information by using the trained recurrent neural network model to obtain a first-step output, processing the first-step output and the total vector by using the attention model, and then performing normalization to obtain a normalization result, wherein the normalization result is a (n +1) -dimensional normalization result vector, and each component in the (n +1) -dimensional normalization result vector sequentially and correspondingly represents the probability of acquiring each reactance unit and each grouping end unit.
Specifically, a high-dimensional vector is used as an initial input of a trained RNN (Recurrent Neural Network) model, and a grouping start vector is used as an initial state of the RNN. The packet start vector may be obtained through training (or learning), and may be arbitrarily specified, such as an all 0 vector. The RNN model obtains a first step output, namely RNN0, according to an initial state (or state) and initial inputs. The RNN0 is used as an attention Model (attention Model) with a total vector, the total vector is a vector composed of attribute information of n countermeasure units and a grouping end vector, and each component of the total vector is attribute information (or called a vector composed of attribute information) of each countermeasure unit and a grouping end vector. And normalizing the result output by the attention model to obtain a (n +1) -dimensional vector, and sampling to obtain which confrontation unit is selected at this time. The normalization process may use a softmax function, where n in (n +1) represents the number of countermeasure units and 1 represents an end-of-packet vector. The end-of-packet vector belongs to an end-of-packet unit, which can be obtained through training (or learning), and can be arbitrarily specified, such as an all-1 vector. The softmax output is a normalized result vector of dimension (n +1), such as (p0, p1, … pn), where p0+ p1+ … + pn is 1, which represents the probability of sampling the first and second dyadic units, … and end-of-packet unit, respectively.
The following describes processing of the attention model by taking RNN0 as a vector a, attribute information of 2 countermeasure units as vectors b0 and b1, and a grouping end vector as bn.
The total vector xxx is (b0, b1, bn), then the processing of the attention model can be written as attention (RNN0, (xxx)), and a, b0, b1 and bn are all vectors with the same dimension, if the processing is dot product, the vector of the output of the attention model can be (dot product of a and b0, dot product of a and b1, …, dot product of a and bn), therefore, the vector closer to a in b0, b1, …, bn has a larger value of the corresponding position in its output vector. In other embodiments, the processing procedure of the attention model may also be in other operation manners, which is not limited in this embodiment.
And S4, sampling the normalization result according to a preset random sampling rule to obtain a sampling result, wherein the countermeasure unit corresponding to the sampling result belongs to the grouping information.
Specifically, the random sampling rule is: and randomly generating a judgment threshold t, t ∈ [0,1], and if k is the minimum number of all i satisfying that p0+ p1+ … + pi > -t, the current sampling result is pk, wherein k and i are natural numbers and both belong to [0, n ], and pi represents the (i +1) th component in the (n +1) -dimensional normalization result vector. In other words, the sampling process is as follows: and generating a random number t between 0 and 1, finding the minimum k meeting the condition that p0+ p1+ … + pk > -t, wherein the sampling is k, and the sampling result is the countermeasure unit corresponding to pk.
For example, in the above-described approach reconnaissance mission, the number n of countermeasure units is 2, and is: drone 1, drone 2, therefore, this (n +1) -dimensional vector is 3-dimensional, and if the sampling is 0, i.e., k is 0, and p0 corresponds to drone 1, it indicates that drone 1 belongs to the grouping information; if the sampling is 1, that is, k is 1, and p1 corresponds to drone 2, it indicates that drone 2 belongs to the group information; if the sample is 2, i.e., k is 2, and p2 corresponds to the end-of-packet unit, it indicates the end of the packet.
And S5, judging whether the sampling result contains a component corresponding to the grouping end unit, if not, repeatedly executing the steps S3-S4, and taking the state (or called state) and the preprocessing information obtained in the last step of the trained recurrent neural network model as the input of the current step of the trained recurrent neural network model until the current sampling result is the component corresponding to the grouping end unit.
Specifically, when the sampling result contains a component corresponding to the end-of-packet unit, the end of the packet is indicated, and thus a complete packet is obtained. Taking the approaching scout mission as an example, referring to fig. 1, the number of the countermeasure units is 3, and 3 countermeasure units are respectively the first countermeasure unit, the second countermeasure unit, and the third countermeasure unit, then the vector of (3+1) dimension is expressed as: (p0, p1, p2, p3), p0, p1, p2, and p3 correspond to the first countermeasure unit, the second countermeasure unit, the third countermeasure unit, and the packet ending unit respectively, the first sampling result is obtained through sampling, if the result is 1, the second countermeasure unit is selected, since the sampling result does not correspond to the packet ending unit, sampling continues, if the result is 2, the third countermeasure unit is selected, since the sampling result does not correspond to the packet ending unit, sampling continues, and since the sampling result is 3, the packet ending unit is selected, since the packet ending unit indicates that the packet is ended, sampling does not continue, and therefore, the packet information of this time is: a second countermeasure unit, a third countermeasure unit.
The sampling process is illustrated by taking RNN0 as the first step output and RNN1 as the second step output: RNN0 ═ RNN (output of fc _ relu, packet start vector), sample (softmax (attention (RNN0, (xxx)))), RNN1 ═ RNN (RNN0, countermeasure (attention (RNN1, (xxx))), sample (afn 1, xxx))), fc _ relu denotes full-link transform and activation function processing, and sample denotes sampling processing.
Fig. 1 is a structural diagram of a neural network for implementing a dynamic assignment method of countermeasure units, the neural network being formed by reinforcement learning training, in which three countermeasure units, N1 attack units, N2 target units are illustrated, and a first sampling result, a second sampling result, and a third sampling result, N1, and N2 are positive integers.
As the deduction proceeds, the numbers of the countermeasure unit, the attack unit, and the target unit and the partial properties such as the number of carried shots also vary. In order to better adapt to the changing digital battlefield scene, after step S5, the method further includes:
judging whether the interval between the current moment and the last decision moment is a preset decision interval or not; if yes, executing the step S1 to the step S5, wherein the ground decision time is the time when the complete grouping information is obtained last time or the time when the grouping end unit is sampled; if not, continuing to wait until the time interval is a preset decision interval.
If the current decision interval (i.e., the interval between the current time and the previous decision time) is a preset decision interval, that is, the decision time is up, step S1-step S5 are executed to obtain complete grouping information corresponding to the current decision time, and at this time, the countermeasure information in the digital battlefield environment obtained in step S1 is the countermeasure information in the digital battlefield environment at the current time. The preset decision interval may be determined experimentally, or may be determined according to a preset rule, for example: the countermeasure unit for executing the task is knocked down, and may be determined according to a preset time, which is not limited in this embodiment.
In application, at the first decision moment, the neural network outputs a packet, and the packet only has the unmanned aerial vehicle 1; at the second decision moment, the neural network outputs a packet, and the packet only has the unmanned aerial vehicle 2; at the third decision time, the neural network outputs a packet with drone 1 and drone 2. Obtaining a group, and executing corresponding tasks by the group; if the decision interval is up, obtaining a next packet, executing a corresponding task which may be different from the previous packet, if the packet has the same unit as the previous packet, and if the same unit has not executed the task corresponding to the previous packet when the packet executes the task, interrupting the task, and turning to executing a new task, namely the task to be executed by the packet.
In the unmanned approach reconnaissance scenario of fig. 3, the redside has two unmanned reconnaissance aircraft, the blueside has three camps (one is false) and one pillbox, the approximate location and type of the blueside attack unit and target unit has been derived in advance by the forewarning engine of the redside, but its precise location and true and false are not known. The red party's tasks are: 1) the accurate positions and true and false of three camps of the blue party are obtained by approaching reconnaissance 2) the shorter the time is, the better the reconnaissance is completed 3) the less the loss of the unmanned aerial vehicle of the own party is, the better the reconnaissance is completed. Because blue side campsite 3 is close to blue side pillbox, blue side pillbox has certain attack power, and red side unmanned aerial vehicle reconnaissance every unit of blue side needs certain time, if red side sends an unmanned aerial vehicle reconnaissance blue side campsite 3 at every turn, then can cause this unmanned aerial vehicle loss, and can't accomplish the reconnaissance task.
Corresponding to the neural network of fig. 3, the unit of red square (or called countermeasure unit) is the red square unmanned aerial vehicle 20 and the red square unmanned aerial vehicle 21, the unit of blue square attack is the blue square pillbox 25, and the target units are the blue square camp 22, the blue square camp 23, and the blue square camp 24.
During training, reinforcement learning attempts various grouping schemes, three of which are listed below:
1. grouping scheme a: the unmanned aerial vehicle is divided into two groups of unmanned aerial vehicles for respectively detecting the camp 22 and the camp 23 of the blue party, and then the unmanned aerial vehicles are compiled into 1 group of unmanned aerial vehicles for simultaneously detecting the camp 24 of the blue party, so that the detection task can be completed, and meanwhile, the detection time is short, but one unmanned aerial vehicle is lost.
2. Grouping scheme b: the red drones 20 and 21 are always divided into 1 group, and scout the blue camp 22, the blue camp 23 and the blue camp 24 in sequence, so that the scout task can be completed, but the time is longer than that of the grouping scheme a, and the return is lower.
3. Grouping scheme c: the red drones 20 and 21 are always grouped separately, initially scouting the blue camps 22 and 23, respectively, and then the two drones scout the blue camps 24 in turn. This would result in drones 20 and 21 being in turn destroyed by pillbox and not completing the reconnaissance mission. Such a scheme has a low return.
In addition to the three grouping schemes described above, reinforcement learning attempts different schemes. Each scheme gets a different result/return, corresponding to a training sample. In the training process of reinforcement learning, the grouping scheme corresponding to a large return is optimized, so that the grouping scheme a is obtained.
After training is completed, the neural network will be grouped according to grouping scheme a when derived.
If the scene is larger, the task is more complicated, the scheme of reinforcement learning attempt is more, the training samples are more, and if a more ideal result is desired, the calculation power required by training is more, but the basic principle is consistent.
The following explains how to train the neural network in fig. 1 to have better grouping effect in different scenes by using reinforcement learning.
Reinforcement learning is used to describe and address agents (agents) in interacting with the environment by learning strategies to achieve maximum returns. The grouping scheme of the neural network in fig. 1 is an important component of the strategy of the agent. In the training process, the agent may try different strategies in different offices (one office is a complete deduction of the red and blue parties), and the neural network in fig. 1 may try different grouping schemes in the same/similar scenes in different offices, so as to obtain different results and different returns (the return refers to quantitative evaluation of different result persons, and the return is higher if the evaluation is better). After each round is completed, a training sample is formed. Training for reinforcement learning automatically optimizes the algorithm to a strategy that can get a larger return in the sample, thereby getting a better grouping scheme.
Example 2: an apparatus for performing a dynamic allocation of countermeasure units using the allocation method of embodiment 1.
The embodiment of the invention provides a dynamic allocation device of a countermeasure unit, which is used in a digital battlefield environment and comprises: the device comprises an acquisition module, a preprocessing module, an output module, a sampling module and a circulation module.
Specifically, the obtaining module is configured to obtain countermeasure information in the digital battlefield environment, where the countermeasure information includes: the system comprises n countermeasure units, a plurality of attack units and a plurality of target units, wherein the countermeasure units are used for executing countermeasure tasks and belong to one of two countermeasures, the attack units are used for preventing the countermeasure units from executing the countermeasure tasks, the attack units and the target units both belong to the other of the two countermeasures, and n is not less than 2 and is a positive integer.
The preprocessing module is used for preprocessing the countermeasure information to obtain preprocessed information meeting the input requirements of the recurrent neural network model.
The output module is used for processing a packet start vector as an initial state and preprocessing information as an initial input by using a trained recurrent neural network model to obtain a first-step output, processing attribute information of n countermeasure units and a packet end vector of a packet end unit output in the first step by using an attention model, and then performing normalization processing to obtain a normalization result, wherein the normalization result is an (n +1) -dimensional vector, and each component in the (n +1) -dimensional vector sequentially represents the probability of acquiring each countermeasure unit and each packet end unit, and the packet start vector and the packet end vector are obtained by training.
And the sampling module is used for sampling the normalized result according to a preset random sampling rule to obtain a sampling result, and the confrontation unit corresponding to the sampling result belongs to the grouping information.
And the circulating module is used for repeatedly executing the functions of the output module and the sampling module, and sequentially using the state and the preprocessing information obtained in the last step of the trained circulating neural network model as the input of the current step of the trained circulating neural network model until the current sampling result is the component corresponding to the grouping ending unit.
Optionally, the preprocessing module is configured to: carrying out a plurality of layers of transformation and fusion processing on the attribute information of each countermeasure unit, the attribute information of each attack unit and the attribute information of each target unit to obtain a high-dimensional vector, wherein the high-dimensional vector is preprocessed information meeting the input requirement of a recurrent neural network model; wherein the transforming comprises: full connection layer transformation and activation function processing.
Optionally, the activation function is a modified linear unit function.
Optionally, the sampling module is configured to: randomly generating t, wherein t belongs to [0,1 ]; if k is the minimum number of all i satisfying p0+ p1+ … + pi > -t, the current sampling result is pk; wherein k and i are both natural numbers and belong to [0, n ], and pi represents the (i +1) th component in the (n +1) dimensional vector.
Optionally, the dynamic allocation apparatus further includes: the device comprises a judging module and an executing module.
The judging module is used for judging whether the interval between the current moment and the last decision moment is a preset decision interval. And the execution module is used for sequentially realizing the functions of the acquisition module, the preprocessing module, the output module, the sampling module and the circulation module if the judgment module judges that the judgment module is yes, wherein the last decision time is the time from sampling to the end of grouping unit in the circulation module.
It should be noted that, for the implementation of the functions of the obtaining module, the preprocessing module, the output module, the sampling module, and the loop module, reference is made to relevant contents of steps S1 to S5 in the foregoing embodiment, and details are not repeated here.
It will be appreciated by those skilled in the art that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The embodiments disclosed above are therefore to be considered in all respects as illustrative and not restrictive. All changes which come within the scope of or equivalence to the invention are intended to be embraced therein.

Claims (4)

1. A dynamic allocation method of a countermeasure unit is characterized by specifically comprising the following steps:
s1, obtaining the countermeasure information in the digital battlefield environment, wherein the countermeasure information comprises: the system comprises attribute information of n countermeasure units, attribute information of a plurality of attack units and attribute information of a plurality of target units, wherein the countermeasure units are used for executing countermeasure tasks and belong to one of two countermeasures, the attack units are used for preventing the countermeasure units from executing the countermeasure tasks, the target units are target parties of the countermeasure units for executing the countermeasure tasks, the attack units and the target units both belong to the other of the two countermeasures, n is more than or equal to 2 and is a positive integer;
s2, preprocessing the countermeasure information to obtain preprocessed information meeting the input requirement of the recurrent neural network model;
s3, processing a packet start vector as an initial state and the preprocessing information as an initial input by using a trained recurrent neural network model to obtain a first-step output, processing the first-step output, attribute information of n countermeasure units and a packet end vector of a packet end unit by using an attention model, and then performing normalization processing to obtain a normalization result vector, wherein the normalization result vector is a (n +1) -dimensional vector, and each component in the (n +1) -dimensional normalization result vector sequentially represents the probability of acquiring each countermeasure unit and each packet end unit;
s4, sampling the normalization result according to a preset random sampling rule to obtain a sampling result, wherein the countermeasure unit corresponding to the sampling result belongs to grouping information;
s5, repeatedly executing S3-S4, and sequentially using the state obtained in the last step of the trained recurrent neural network model and the preprocessing information as the input of the current step of the trained recurrent neural network model until the current sampling result is the component corresponding to the grouping end unit; and the grouping start vector and the grouping end vector are obtained by training.
2. The method for dynamically allocating a countermeasure unit according to claim 1, wherein the step S2 includes: carrying out a plurality of layers of transformation and fusion processing on the attribute information of each countermeasure unit, the attribute information of each attack unit and the attribute information of each target unit to obtain a high-dimensional vector, wherein the high-dimensional vector is preprocessed information meeting the input requirement of a recurrent neural network model, and the transformation comprises full-connection layer transformation and activation function processing; the fusion processing refers to processing a plurality of vectors into one vector, and the processing procedure is to add or take the maximum value at the corresponding position of each matrix or vector.
3. The method for dynamically allocating a countermeasure unit according to claim 1, wherein the random sampling rule in step S4 includes: randomly generating a judgment threshold value t, wherein t belongs to [0,1 ]; if k is the minimum number of all i satisfying p0+ p1+ … + pi > -t, the current sampling result is pk; wherein k and i are both natural numbers and belong to [0, n ], and pi represents the (i +1) th component in the (n +1) dimensional normalization result vector.
4. The dynamic allocation method of countermeasure units according to claim 1, wherein after step S5, the dynamic allocation method further comprises: judging whether the interval between the current moment and the last decision moment is a preset decision interval or not; if yes, executing the steps S1 to S5, wherein the last decision time is the time when the packet end unit is sampled in the step S5; if not, continuing to wait until the time interval is a preset decision interval.
CN202010642635.7A 2020-07-06 2020-07-06 Dynamic allocation method of countermeasure unit Active CN112464549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010642635.7A CN112464549B (en) 2020-07-06 2020-07-06 Dynamic allocation method of countermeasure unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010642635.7A CN112464549B (en) 2020-07-06 2020-07-06 Dynamic allocation method of countermeasure unit

Publications (2)

Publication Number Publication Date
CN112464549A CN112464549A (en) 2021-03-09
CN112464549B true CN112464549B (en) 2021-05-14

Family

ID=74833529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010642635.7A Active CN112464549B (en) 2020-07-06 2020-07-06 Dynamic allocation method of countermeasure unit

Country Status (1)

Country Link
CN (1) CN112464549B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115829034B (en) * 2023-01-09 2023-05-30 白杨时代(北京)科技有限公司 Method and device for constructing knowledge rule execution framework

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908097B (en) * 2010-07-13 2012-03-21 北京航空航天大学 Particle swarm optimization method for air combat decision
KR101534192B1 (en) * 2014-12-08 2015-07-08 한국인터넷진흥원 System for providing cybersecurity realtime training against attacks and method thereof
CN109299491B (en) * 2018-05-17 2023-02-10 西京学院 Meta-model modeling method based on dynamic influence graph strategy and using method
CN109636699A (en) * 2018-11-06 2019-04-16 中国电子科技集团公司第五十二研究所 A kind of unsupervised intellectualized battle deduction system based on deeply study
CN110147883B (en) * 2019-05-28 2022-06-03 航天科工系统仿真科技(北京)有限公司 Training method, device, equipment and storage medium for model for combat simulation
CN110163519B (en) * 2019-05-29 2023-08-18 哈尔滨工程大学 UUV red and blue threat assessment method for base attack and defense tasks
CN110334749B (en) * 2019-06-20 2021-08-03 浙江工业大学 Anti-attack defense model based on attention mechanism, construction method and application
CN110991545B (en) * 2019-12-10 2021-02-02 中国人民解放军军事科学院国防科技创新研究院 Multi-agent confrontation oriented reinforcement learning training optimization method and device

Also Published As

Publication number Publication date
CN112464549A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN108647414A (en) Operation plan adaptability analysis method based on emulation experiment and storage medium
CN107871138A (en) A kind of target intention recognition methods based on improvement D S evidence theories
CN111476288A (en) Intelligent perception method for cognitive sensor network to electromagnetic behaviors with unknown threats
CN113109770B (en) Interference resource allocation method and system
CN105678030A (en) Air-combat tactic team simulating method based on expert system and tactic-military-strategy fractalization
CN112464549B (en) Dynamic allocation method of countermeasure unit
CN115047907B (en) Air isomorphic formation command method based on multi-agent PPO algorithm
CN115951695A (en) Dynamic tactical control domain resolving method based on three-party game in air combat simulation environment
CN113625569A (en) Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving
CN113741186B (en) Double-aircraft air combat decision-making method based on near-end strategy optimization
CN112464548B (en) Dynamic allocation device for countermeasure unit
Li et al. Intercepts allocation for layered defense
CN113312789A (en) Simulation system and method based on multi-information fusion
CN113128698B (en) Reinforced learning method for multi-unmanned aerial vehicle cooperative confrontation decision
Gang et al. A methods of operational effectiveness for C4ISR system based on system dynamics analysis
CN114202185A (en) System contribution rate evaluation method for high-power microwave weapon collaborative air defense
CN113126651A (en) Intelligent decision-making device and system for cooperative confrontation of multiple unmanned aerial vehicles
Zhao et al. Deep Reinforcement Learning‐Based Air Defense Decision‐Making Using Potential Games
CN109670672A (en) Cooperative combat plan representation method and system based on event ontology
Nurkin et al. Emerging Technologies and the Future of US-Japan Defense Collaboration
Li et al. Self-organized Cooperative Electronic Jamming by UAV Swarm Based on Contribution Priority and Cost Competition
Xu et al. A Combat Decision Support Method Based on OODA and Dynamic Graph Reinforcement Learning
Guo et al. Mission simulation and stealth effectiveness evaluation based on fighter engagement manager (FEM)
Li et al. Double Deep Q-learning for Anti-saturation Attack Problem of Warship Group
Li et al. Research on Multi-radar Anti-jamming Strategy Allocation Method Under Small Risk Based on Game Theory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant