CN116489193B

CN116489193B - Combat network self-adaptive combination method, device, equipment and medium

Info

Publication number: CN116489193B
Application number: CN202310487406.6A
Authority: CN
Inventors: 张婷婷; 孙云鹏; 陈岩; 李辉; 肖春霞; 宦蕾
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2023-05-04
Filing date: 2023-05-04
Publication date: 2024-01-23
Anticipated expiration: 2043-05-04
Also published as: CN116489193A

Abstract

The invention discloses a combat network self-adaptive combination method, device, equipment and medium, wherein the method comprises the following steps: acquiring a control node, a investigation node, a hit node and a target node; connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network; constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node; calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network; constructing a Markov decision process according to the decision space network and the combat network; constructing a Belman optimal equation of a Markov decision process, and solving and obtaining a combined result; the invention can adapt to complex combat environment and improve the elasticity and flexibility of combat network.

Description

Combat network self-adaptive combination method, device, equipment and medium

Technical Field

The invention relates to a combat network self-adaptive combination method, device, equipment and medium, belonging to the technical field of system management.

Background

With the rapid development of network information technology and military technology, the scale and style of war have changed deeply. The traditional centralized command control mode is difficult to adapt to a battlefield with high dynamic and high uncertainty. In this regard, the strategic technical office of the advanced research planning agency of the united states department of defense presents a "mosaic" combat concept that is expected to be rapidly integrated into a sensor network, multi-domain command control system, by utilizing a low cost, low complexity autonomous system, thereby imposing complexity on an adversary. In mosaic battles, dispersed manned or unmanned battles on the battlefield may be dynamically assembled into an elastic battle network through communication.

However, the combined decision making of the warfare network is a difficult problem due to cascading effects resulting from interdependencies between heterogeneous weapon systems, and capturing and quantifying such dependencies is critical to maximizing the warfare network's capacity. High uncertainty battlefield environments can cause damage to the battlefield network, such as the decline of the battlefield network capability caused by major faults or damages of weapon equipment, and traditional manual decisions based on commander experience are difficult to quickly respond to battlefield tasks.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a combat network self-adaptive combination method, device, equipment and medium, and solves the technical problem that the traditional manual decision based on commander experience is difficult to quickly respond to combat tasks, so that the combat network capacity is reduced.

In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:

in a first aspect, the present invention provides a method for adaptive combining of a combat network, including:

acquiring a control node, a investigation node, a hit node and a target node;

connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;

constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;

calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;

constructing a Markov decision process according to the decision space network and the combat network;

and constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.

Optionally, the directed edge representing the dependency includes:

directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node.

Optionally, the combat chain comprises a investigation node, a control node, a striking node and a target node which are sequentially connected through directed edges;

the combat capability of the combat chain is as follows:

wherein E is _OC (l _j ) For the j-th combat chain l _j Is the combat ability of(s) _j 、d _j 、i _j 、t _j Respectively fight chain l _j In (a) a detection node, a control node, a hit node, and a target node, O _S (s _j )、O _D (d _j )、O _T (t _j ) Respectively, scout nodes s _j Control node d _j Striking node i _j Capability value of O _T (t _j ) For the target node t _j The value of the degree of damage, t _j ∈T；

The combat capability of the combat network is as follows:

wherein E is _N (G) To combat the combat capability of the combat network G.

Optionally, the dependency relationship between the nodes connected by the directed edges satisfies:

O _j ＝min(SOD_O _j ,COD_O _j )

SOD_O _j ＝Average(SOD_O _j1 ,SOD_O _j2 ,…,SOD_O _jn )

SOD_O _ji ＝α _ij O _i +(1-α _ij )SE _j

COD_O _j ＝min(COD_O _j1 ,COD_O _j2 ,…,COD_O _jn )

COD_O _j ＝O _i +β _ij

wherein O is _i 、O _j For the running performance of the nodes i and j, average is an Average function, alpha _ij 、β _ij The dependent strength SOD and dependent critical COD, SE of the nodes i, j respectively _j Active performance for node j;

wherein, when node i is a investigation node, a control node or a hit node, the operation performance O _i A capability value for node i; when the node i is the target node, the operation performance O _i Is the value of the damage degree of the node i.

Optionally, the constructing a markov decision process includes:

taking nodes and directed edges in a combat network as states and marking the states as G _t ＝(N _t ,E _t ) Wherein G is _t 、N _t 、E _t The method comprises the steps of respectively obtaining a combat network at a moment t, a node set and a directed edge set corresponding to the combat network;

taking a directed edge in a combat network as a removable action, taking a directed edge capable of being connected with a node in the combat network in a decision space network as an addable action, taking the removable action and the addable action as decision actions, and marking as x _t ＝(n _t ,e _t ) Wherein x is _t 、e _t 、n _t The decision action at the moment t and the directed edge and the node corresponding to the decision action are respectively; constructing a decision action space according to the decision action;

selecting decision motion from a decision motion spaceExecuting the operation, taking the change of the combat capability of the combat network after the execution as a return value, and recording as delta C _t+1 ＝E _N (G _t+1 )-E _N (G _t ) Wherein E is _N (G _t+1 )、E _N (G _t ) The combat capabilities of the combat network at the moments t+1 and t respectively;

taking the state of the combat network after execution as state transition, and marking the state as G _t+1 (N _t+1 ,E _t+1 )＝G _t (N _t ±n _t ,E _t+1 ±e _t )；

And constructing a Markov decision process according to the states, the decision actions, the return values and the state transitions.

Optionally, the constructing the bellman optimal equation of the markov decision process includes:

initializing a decision action sequence which is expected to be executed: { x ₁ ,x ₂ ,…x _t …,x _k -wherein k is the total number of times;

constructing an objective function with a maximum return value based on the decision action sequence:

wherein DeltaC _t The return value is the time t;

and (3) expanding and pouring the objective function to obtain a Belman optimal equation:

wherein, gamma is a discount factor, gamma is E (0, 1)]；ΔC _t+1 For the return value at time t+1, V _t (G _t ,x _t ) Is a state-decision action cost function.

Optionally, the solving to obtain the combined result includes:

updating a Belman optimal equation by adopting a time sequence difference method:

V _t (G _t ,x _t )←V _t (G _t ,x _t )+ηδ _t

wherein eta is the update step length delta _t As error, delta _t ＝ΔC _t+1 +γV _t+1 (G _t+1 ,x _t+1 )-V _t (G _t ,x _t )；

The decision action is selected according to epsilon-greedy strategy, and V is selected by probability epsilon _t (G _t ,x _t ) The decision action with the largest value is randomly selected according to the probability 1-epsilon;

selecting V _t (G _t ,x _t ) Maximum value decision action x _t As an optimal decision action

From initial state G based on state transition ₀ According to the optimal decision actionForward transferring to generate an optimal decision action sequence;

and obtaining the combat network combination result at each moment according to the execution result of the optimal decision action sequence.

In a second aspect, the present invention provides an adaptive assembly apparatus for a combat network, the apparatus comprising:

the node acquisition module is used for acquiring control nodes, investigation nodes, hit nodes and target nodes;

the space construction module is used for connecting the nodes by adopting directed edges representing the dependency relationship to construct a decision space network;

the network construction module is used for constructing a combat chain aiming at each target node and combining each combat chain to construct a combat network of the target node;

the capacity calculation module is used for calculating and summing the combat capacity of each combat chain to obtain the combat capacity of the combat network;

the process construction module is used for constructing a Markov decision process according to the decision space network and the combat network;

and the process solving module is used for constructing a Belman optimal equation of the Markov decision process and solving and obtaining a combined result.

In a third aspect, the present invention provides an electronic device, including a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is operative according to the instructions to perform steps according to the method described above.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a combat network self-adaptive combination method, device, equipment and medium, which combines a function-dependent network analysis method to establish a decision action space and bring cascade effect into a system combat capability evaluation index; solving an optimal strategy function solution aiming at a target node, and drawing out a combat network with maximum combat capability; when the weapon equipment encounters serious faults or damages, a new combat network is reconfigured and combat capability is partially restored, so that the elasticity and flexibility of the combat network are improved; and the method can provide reference for mosaic combat design and planning.

Drawings

Fig. 1 is a flowchart of a method for adaptive combining of a combat network according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a decision space network according to an embodiment of the present invention;

FIG. 3 is a schematic view of a combat graph according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a battle diagram after being changed according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Embodiment one:

as shown in fig. 1, the present invention provides a combat network adaptive combining method, which includes:

1. acquiring a control node, a investigation node, a hit node and a target node.

2. Connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;

2.1, directed edges representing dependencies include:

directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node;

2.2, the dependency relationship between nodes connected by directed edges satisfies:

O _j ＝min(SOD_O _j ,COD_O _j )

SOD_O _j ＝Average(SOD_O _j1 ,SOD_O _j2 ,…,SOD_O _jn )

SOD_O _ji ＝α _ij O _i +(1-α _ij )SE _j

COD_O _j ＝min(COD_O _j1 ,COD_O _j2 ,…,COD_O _jn )

COD_O _j ＝O _i +β _ij

3. Constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;

and 3.1, the combat chain comprises a investigation node, a control node, a hit node and a target node which are sequentially connected through directed edges.

4. Calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;

4.1, the combat capability of the combat chain is as follows:

4.2, the combat capability of the combat network is as follows:

wherein E is _N (G) To combat the combat capability of the combat network G.

5. Constructing a Markov decision process according to the decision space network and the combat network;

the construction of the markov decision process includes:

taking nodes and directed edges in a combat network as states and marking the states as G _t ＝(N _t ,E _t ) Wherein G is _t 、N _t 、E _t Respectively time of dayA combat network of t, a node set and a directed edge set corresponding to the combat network;

selecting and executing a decision action from the decision action space, taking the change of the combat capability of the combat network after execution as a return value, and recording the return value as delta C _t+1 ＝E _N (G _t+1 )-E _N (G _t ) Wherein E is _N (G _t+1 )、E _N (G _t ) The combat capabilities of the combat network at the moments t+1 and t respectively;

6. And constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.

6.1, constructing a Belman optimal equation of a Markov decision process, wherein the Belman optimal equation comprises the following steps:

wherein DeltaC _t Return value for time t；

And 6.2, solving to obtain a combined result comprises the following steps:

V _t (G _t ,x _t )←V _t (G _t ,x _t )+ηδ _t

As shown in fig. 2, assuming a decision space network formed by 104 nodes and 732 sides, the unidirectional arrow between the nodes is used as a directed side representing the dependency relationship, and the node types and the number are divided into: scout node s= {1,2, …,52}, command node d= {53,54, …,68}, hit node i= {69,70, …,99}, target node t= {100,101, …,104};

from initial state G ₀ Starting state transition, and selecting decision action by using epsilon-greedy strategy; setting the maximum number of nodes of the combat network as k _N The method comprises the steps of carrying out a first treatment on the surface of the The algorithm operation parameters are set to learning rate eta=0.01, discount coefficient gamma=0.9, greedy strategy probability epsilon=0.75, and maximum node number k _N =10; setting the operational capacity profit delta C of the decision action _t <When 0 or when the number of nodes is at most k _N >10, a round of training ending mark; the combat network performs interactive training with the decision space, and the obtained return value is used for updating the strategy function; the strategy function is trained to reach a convergence state, in this example a training round number of 3000 rounds.

Selecting initial state G ₀ According to an optimal strategy function V as a starting point _t (G _t ,x _t ) The decision action with the largest selection function value continuously carries out state transition until the number k of nodes of the combat network _N Or the operational capacity benefit DeltaC of the decision action _t <And 0, stopping state transition, wherein the state transition time is the optimal combat network state. The optimal combat network for the target node 103 in this example is shown in fig. 3, and the corresponding combat capability is 125658;

to embody the flexibility of the proposed method, a random node attack is performed on fig. 2, the attacked node and all the relevant edges thereof are removed, and the removed edges cannot be selected by a decision action; under the random node attack strategy, nodes 80 and 85 of the combat network in fig. 3 are removed after being attacked, and the combat capability is reduced to 87880; then from the initial state G ₀ According to an optimal strategy function V as a starting point _t (G _t ,x _t ) State transitions are made, but the removed edge is not selectable in the processThe method comprises the steps of carrying out a first treatment on the surface of the The recombined combat network is shown in fig. 4, and the combat capability is 103250; the recovery rate of lost combat ability is 68.59%.

Embodiment two:

the embodiment of the invention provides a combat network self-adaptive combination device, which comprises:

Embodiment III:

based on the first embodiment, the embodiment of the invention provides electronic equipment, which comprises a processor and a storage medium;

the storage medium is used for storing instructions;

Embodiment four:

based on the first embodiment, the embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above method.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. A method for adaptive combining of a combat network, comprising:

acquiring a control node, a investigation node, a hit node and a target node;

constructing a Belman optimal equation of a Markov decision process, and solving and obtaining a combined result;

wherein the directed edge representing the dependency relationship comprises:

the dependency relationship between the nodes connected by the directed edges satisfies the following conditions:

O _j ＝min(SOD_O _j ,COD_O _j )

SOD_O _j ＝Average(SOD_O _j1 ,SOD_O _j2 ,…,SOD_O _jn )

SOD_O _ji ＝α _ij O _i +(1-α _ij )SE _j

COD_O _j ＝min(COD_O _j1 ,COD_O _j2 ,…,COD_O _jn )

COD_O _j ＝O _i +β _ij

wherein O is _i 、O _j For the running performance of the nodes i and j, average is an Average function, alpha _ij 、β _ij Respectively is a sectionThe dependent intensity SOD and dependent critical COD, SE of points i, j _j Active performance for node j;

wherein, when node i is a investigation node, a control node or a hit node, the operation performance O _i A capability value for node i; when the node i is the target node, the operation performance O _i The damage degree value of the node i;

the combat chain comprises a investigation node, a control node, a hitting node and a target node which are sequentially connected through directed edges;

the combat capability of the combat chain is as follows:

wherein E is _OC (l _j ) For the j-th combat chain l _j Is the combat ability of(s) _j 、d _j 、i _j 、t _j Respectively fight chain l _j In (a) a detection node, a control node, a hit node, and a target node, O _S (s _j )、O _D (d _j )、O _I (i _j ) Respectively, scout nodes s _j Control node d _j Striking node i _j Capability value of O _T (t _j ) For the target node t _j The value of the degree of damage, t _j ∈T；

The combat capability of the combat network is as follows:

wherein E is _N (G) Combat capability for combat network G;

the building a Markov decision process includes:

taking nodes and directed edges in a combat network as states and marking the states as G _t ＝(N _t ,E _t ) Wherein G is _t 、N _t 、E _t Combat network at time t and node set corresponding to combat networkA directed edge set;

2. The method of claim 1, wherein constructing the bellman optimal equation for the markov decision process comprises:

wherein DeltaC _t The return value is the time t;

3. The method of claim 2, wherein the solving to obtain the combined result comprises:

V _t (G _t ,x _t )←V _t (G _t ,x _t )+ηδ _t

4. A combat network adaptive combiner, characterised in that it uses a method according to any one of claims 1-3, said device comprising:

5. An electronic device, comprising a processor and a storage medium;

the storage medium is used for storing instructions;

the processor being operative according to the instructions to perform the steps of the method according to any one of claims 1-3.

6. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-3.