CN116489193B - Combat network self-adaptive combination method, device, equipment and medium - Google Patents

Combat network self-adaptive combination method, device, equipment and medium Download PDF

Info

Publication number
CN116489193B
CN116489193B CN202310487406.6A CN202310487406A CN116489193B CN 116489193 B CN116489193 B CN 116489193B CN 202310487406 A CN202310487406 A CN 202310487406A CN 116489193 B CN116489193 B CN 116489193B
Authority
CN
China
Prior art keywords
combat
node
network
decision
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310487406.6A
Other languages
Chinese (zh)
Other versions
CN116489193A (en
Inventor
张婷婷
孙云鹏
陈岩
李辉
肖春霞
宦蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Army Engineering University of PLA
Original Assignee
Army Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Army Engineering University of PLA filed Critical Army Engineering University of PLA
Priority to CN202310487406.6A priority Critical patent/CN116489193B/en
Publication of CN116489193A publication Critical patent/CN116489193A/en
Application granted granted Critical
Publication of CN116489193B publication Critical patent/CN116489193B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a combat network self-adaptive combination method, device, equipment and medium, wherein the method comprises the following steps: acquiring a control node, a investigation node, a hit node and a target node; connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network; constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node; calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network; constructing a Markov decision process according to the decision space network and the combat network; constructing a Belman optimal equation of a Markov decision process, and solving and obtaining a combined result; the invention can adapt to complex combat environment and improve the elasticity and flexibility of combat network.

Description

Combat network self-adaptive combination method, device, equipment and medium
Technical Field
The invention relates to a combat network self-adaptive combination method, device, equipment and medium, belonging to the technical field of system management.
Background
With the rapid development of network information technology and military technology, the scale and style of war have changed deeply. The traditional centralized command control mode is difficult to adapt to a battlefield with high dynamic and high uncertainty. In this regard, the strategic technical office of the advanced research planning agency of the united states department of defense presents a "mosaic" combat concept that is expected to be rapidly integrated into a sensor network, multi-domain command control system, by utilizing a low cost, low complexity autonomous system, thereby imposing complexity on an adversary. In mosaic battles, dispersed manned or unmanned battles on the battlefield may be dynamically assembled into an elastic battle network through communication.
However, the combined decision making of the warfare network is a difficult problem due to cascading effects resulting from interdependencies between heterogeneous weapon systems, and capturing and quantifying such dependencies is critical to maximizing the warfare network's capacity. High uncertainty battlefield environments can cause damage to the battlefield network, such as the decline of the battlefield network capability caused by major faults or damages of weapon equipment, and traditional manual decisions based on commander experience are difficult to quickly respond to battlefield tasks.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a combat network self-adaptive combination method, device, equipment and medium, and solves the technical problem that the traditional manual decision based on commander experience is difficult to quickly respond to combat tasks, so that the combat network capacity is reduced.
In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a method for adaptive combining of a combat network, including:
acquiring a control node, a investigation node, a hit node and a target node;
connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;
calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;
constructing a Markov decision process according to the decision space network and the combat network;
and constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.
Optionally, the directed edge representing the dependency includes:
directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node.
Optionally, the combat chain comprises a investigation node, a control node, a striking node and a target node which are sequentially connected through directed edges;
the combat capability of the combat chain is as follows:
wherein E is OC (l j ) For the j-th combat chain l j Is the combat ability of(s) j 、d j 、i j 、t j Respectively fight chain l j In (a) a detection node, a control node, a hit node, and a target node, O S (s j )、O D (d j )、O T (t j ) Respectively, scout nodes s j Control node d j Striking node i j Capability value of O T (t j ) For the target node t j The value of the degree of damage, t j ∈T;
The combat capability of the combat network is as follows:
wherein E is N (G) To combat the combat capability of the combat network G.
Optionally, the dependency relationship between the nodes connected by the directed edges satisfies:
O j =min(SOD_O j ,COD_O j )
SOD_O j =Average(SOD_O j1 ,SOD_O j2 ,…,SOD_O jn )
SOD_O ji =α ij O i +(1-α ij )SE j
COD_O j =min(COD_O j1 ,COD_O j2 ,…,COD_O jn )
COD_O j =O iij
wherein O is i 、O j For the running performance of the nodes i and j, average is an Average function, alpha ij 、β ij The dependent strength SOD and dependent critical COD, SE of the nodes i, j respectively j Active performance for node j;
wherein, when node i is a investigation node, a control node or a hit node, the operation performance O i A capability value for node i; when the node i is the target node, the operation performance O i Is the value of the damage degree of the node i.
Optionally, the constructing a markov decision process includes:
taking nodes and directed edges in a combat network as states and marking the states as G t =(N t ,E t ) Wherein G is t 、N t 、E t The method comprises the steps of respectively obtaining a combat network at a moment t, a node set and a directed edge set corresponding to the combat network;
taking a directed edge in a combat network as a removable action, taking a directed edge capable of being connected with a node in the combat network in a decision space network as an addable action, taking the removable action and the addable action as decision actions, and marking as x t =(n t ,e t ) Wherein x is t 、e t 、n t The decision action at the moment t and the directed edge and the node corresponding to the decision action are respectively; constructing a decision action space according to the decision action;
selecting decision motion from a decision motion spaceExecuting the operation, taking the change of the combat capability of the combat network after the execution as a return value, and recording as delta C t+1 =E N (G t+1 )-E N (G t ) Wherein E is N (G t+1 )、E N (G t ) The combat capabilities of the combat network at the moments t+1 and t respectively;
taking the state of the combat network after execution as state transition, and marking the state as G t+1 (N t+1 ,E t+1 )=G t (N t ±n t ,E t+1 ±e t );
And constructing a Markov decision process according to the states, the decision actions, the return values and the state transitions.
Optionally, the constructing the bellman optimal equation of the markov decision process includes:
initializing a decision action sequence which is expected to be executed: { x 1 ,x 2 ,…x t …,x k -wherein k is the total number of times;
constructing an objective function with a maximum return value based on the decision action sequence:
wherein DeltaC t The return value is the time t;
and (3) expanding and pouring the objective function to obtain a Belman optimal equation:
wherein, gamma is a discount factor, gamma is E (0, 1)];ΔC t+1 For the return value at time t+1, V t (G t ,x t ) Is a state-decision action cost function.
Optionally, the solving to obtain the combined result includes:
updating a Belman optimal equation by adopting a time sequence difference method:
V t (G t ,x t )←V t (G t ,x t )+ηδ t
wherein eta is the update step length delta t As error, delta t =ΔC t+1 +γV t+1 (G t+1 ,x t+1 )-V t (G t ,x t );
The decision action is selected according to epsilon-greedy strategy, and V is selected by probability epsilon t (G t ,x t ) The decision action with the largest value is randomly selected according to the probability 1-epsilon;
selecting V t (G t ,x t ) Maximum value decision action x t As an optimal decision action
From initial state G based on state transition 0 According to the optimal decision actionForward transferring to generate an optimal decision action sequence;
and obtaining the combat network combination result at each moment according to the execution result of the optimal decision action sequence.
In a second aspect, the present invention provides an adaptive assembly apparatus for a combat network, the apparatus comprising:
the node acquisition module is used for acquiring control nodes, investigation nodes, hit nodes and target nodes;
the space construction module is used for connecting the nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
the network construction module is used for constructing a combat chain aiming at each target node and combining each combat chain to construct a combat network of the target node;
the capacity calculation module is used for calculating and summing the combat capacity of each combat chain to obtain the combat capacity of the combat network;
the process construction module is used for constructing a Markov decision process according to the decision space network and the combat network;
and the process solving module is used for constructing a Belman optimal equation of the Markov decision process and solving and obtaining a combined result.
In a third aspect, the present invention provides an electronic device, including a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform steps according to the method described above.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a combat network self-adaptive combination method, device, equipment and medium, which combines a function-dependent network analysis method to establish a decision action space and bring cascade effect into a system combat capability evaluation index; solving an optimal strategy function solution aiming at a target node, and drawing out a combat network with maximum combat capability; when the weapon equipment encounters serious faults or damages, a new combat network is reconfigured and combat capability is partially restored, so that the elasticity and flexibility of the combat network are improved; and the method can provide reference for mosaic combat design and planning.
Drawings
Fig. 1 is a flowchart of a method for adaptive combining of a combat network according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a decision space network according to an embodiment of the present invention;
FIG. 3 is a schematic view of a combat graph according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a battle diagram after being changed according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Embodiment one:
as shown in fig. 1, the present invention provides a combat network adaptive combining method, which includes:
1. acquiring a control node, a investigation node, a hit node and a target node.
2. Connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
2.1, directed edges representing dependencies include:
directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node;
2.2, the dependency relationship between nodes connected by directed edges satisfies:
O j =min(SOD_O j ,COD_O j )
SOD_O j =Average(SOD_O j1 ,SOD_O j2 ,…,SOD_O jn )
SOD_O ji =α ij O i +(1-α ij )SE j
COD_O j =min(COD_O j1 ,COD_O j2 ,…,COD_O jn )
COD_O j =O iij
wherein O is i 、O j For the running performance of the nodes i and j, average is an Average function, alpha ij 、β ij The dependent strength SOD and dependent critical COD, SE of the nodes i, j respectively j Active performance for node j;
wherein, when node i is a investigation node, a control node or a hit node, the operation performance O i A capability value for node i; when the node i is the target node, the operation performance O i Is the value of the damage degree of the node i.
3. Constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;
and 3.1, the combat chain comprises a investigation node, a control node, a hit node and a target node which are sequentially connected through directed edges.
4. Calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;
4.1, the combat capability of the combat chain is as follows:
wherein E is OC (l j ) For the j-th combat chain l j Is the combat ability of(s) j 、d j 、i j 、t j Respectively fight chain l j In (a) a detection node, a control node, a hit node, and a target node, O S (s j )、O D (d j )、O T (t j ) Respectively, scout nodes s j Control node d j Striking node i j Capability value of O T (t j ) For the target node t j The value of the degree of damage, t j ∈T;
4.2, the combat capability of the combat network is as follows:
wherein E is N (G) To combat the combat capability of the combat network G.
5. Constructing a Markov decision process according to the decision space network and the combat network;
the construction of the markov decision process includes:
taking nodes and directed edges in a combat network as states and marking the states as G t =(N t ,E t ) Wherein G is t 、N t 、E t Respectively time of dayA combat network of t, a node set and a directed edge set corresponding to the combat network;
taking a directed edge in a combat network as a removable action, taking a directed edge capable of being connected with a node in the combat network in a decision space network as an addable action, taking the removable action and the addable action as decision actions, and marking as x t =(n t ,e t ) Wherein x is t 、e t 、n t The decision action at the moment t and the directed edge and the node corresponding to the decision action are respectively; constructing a decision action space according to the decision action;
selecting and executing a decision action from the decision action space, taking the change of the combat capability of the combat network after execution as a return value, and recording the return value as delta C t+1 =E N (G t+1 )-E N (G t ) Wherein E is N (G t+1 )、E N (G t ) The combat capabilities of the combat network at the moments t+1 and t respectively;
taking the state of the combat network after execution as state transition, and marking the state as G t+1 (N t+1 ,E t+1 )=G t (N t ±n t ,E t+1 ±e t );
And constructing a Markov decision process according to the states, the decision actions, the return values and the state transitions.
6. And constructing a Belman optimal equation of the Markov decision process, and solving and obtaining a combined result.
6.1, constructing a Belman optimal equation of a Markov decision process, wherein the Belman optimal equation comprises the following steps:
initializing a decision action sequence which is expected to be executed: { x 1 ,x 2 ,…x t …,x k -wherein k is the total number of times;
constructing an objective function with a maximum return value based on the decision action sequence:
wherein DeltaC t Return value for time t;
And (3) expanding and pouring the objective function to obtain a Belman optimal equation:
wherein, gamma is a discount factor, gamma is E (0, 1)];ΔC t+1 For the return value at time t+1, V t (G t ,x t ) Is a state-decision action cost function.
And 6.2, solving to obtain a combined result comprises the following steps:
updating a Belman optimal equation by adopting a time sequence difference method:
V t (G t ,x t )←V t (G t ,x t )+ηδ t
wherein eta is the update step length delta t As error, delta t =ΔC t+1 +γV t+1 (G t+1 ,x t+1 )-V t (G t ,x t );
The decision action is selected according to epsilon-greedy strategy, and V is selected by probability epsilon t (G t ,x t ) The decision action with the largest value is randomly selected according to the probability 1-epsilon;
selecting V t (G t ,x t ) Maximum value decision action x t As an optimal decision action
From initial state G based on state transition 0 According to the optimal decision actionForward transferring to generate an optimal decision action sequence;
and obtaining the combat network combination result at each moment according to the execution result of the optimal decision action sequence.
As shown in fig. 2, assuming a decision space network formed by 104 nodes and 732 sides, the unidirectional arrow between the nodes is used as a directed side representing the dependency relationship, and the node types and the number are divided into: scout node s= {1,2, …,52}, command node d= {53,54, …,68}, hit node i= {69,70, …,99}, target node t= {100,101, …,104};
from initial state G 0 Starting state transition, and selecting decision action by using epsilon-greedy strategy; setting the maximum number of nodes of the combat network as k N The method comprises the steps of carrying out a first treatment on the surface of the The algorithm operation parameters are set to learning rate eta=0.01, discount coefficient gamma=0.9, greedy strategy probability epsilon=0.75, and maximum node number k N =10; setting the operational capacity profit delta C of the decision action t <When 0 or when the number of nodes is at most k N >10, a round of training ending mark; the combat network performs interactive training with the decision space, and the obtained return value is used for updating the strategy function; the strategy function is trained to reach a convergence state, in this example a training round number of 3000 rounds.
Selecting initial state G 0 According to an optimal strategy function V as a starting point t (G t ,x t ) The decision action with the largest selection function value continuously carries out state transition until the number k of nodes of the combat network N Or the operational capacity benefit DeltaC of the decision action t <And 0, stopping state transition, wherein the state transition time is the optimal combat network state. The optimal combat network for the target node 103 in this example is shown in fig. 3, and the corresponding combat capability is 125658;
to embody the flexibility of the proposed method, a random node attack is performed on fig. 2, the attacked node and all the relevant edges thereof are removed, and the removed edges cannot be selected by a decision action; under the random node attack strategy, nodes 80 and 85 of the combat network in fig. 3 are removed after being attacked, and the combat capability is reduced to 87880; then from the initial state G 0 According to an optimal strategy function V as a starting point t (G t ,x t ) State transitions are made, but the removed edge is not selectable in the processThe method comprises the steps of carrying out a first treatment on the surface of the The recombined combat network is shown in fig. 4, and the combat capability is 103250; the recovery rate of lost combat ability is 68.59%.
Embodiment two:
the embodiment of the invention provides a combat network self-adaptive combination device, which comprises:
the node acquisition module is used for acquiring control nodes, investigation nodes, hit nodes and target nodes;
the space construction module is used for connecting the nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
the network construction module is used for constructing a combat chain aiming at each target node and combining each combat chain to construct a combat network of the target node;
the capacity calculation module is used for calculating and summing the combat capacity of each combat chain to obtain the combat capacity of the combat network;
the process construction module is used for constructing a Markov decision process according to the decision space network and the combat network;
and the process solving module is used for constructing a Belman optimal equation of the Markov decision process and solving and obtaining a combined result.
Embodiment III:
based on the first embodiment, the embodiment of the invention provides electronic equipment, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is operative according to the instructions to perform steps according to the method described above.
Embodiment four:
based on the first embodiment, the embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above method.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (6)

1. A method for adaptive combining of a combat network, comprising:
acquiring a control node, a investigation node, a hit node and a target node;
connecting all nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
constructing a fight chain aiming at each target node, and combining each fight chain to construct a fight network of the target node;
calculating and summing the combat capability of each combat chain to obtain the combat capability of the combat network;
constructing a Markov decision process according to the decision space network and the combat network;
constructing a Belman optimal equation of a Markov decision process, and solving and obtaining a combined result;
wherein the directed edge representing the dependency relationship comprises:
directed edges from a scout node to another scout node or control node, directed edges from a control node to another control node or hit node, directed edges from a hit node to a target node;
the dependency relationship between the nodes connected by the directed edges satisfies the following conditions:
O j =min(SOD_O j ,COD_O j )
SOD_O j =Average(SOD_O j1 ,SOD_O j2 ,…,SOD_O jn )
SOD_O ji =α ij O i +(1-α ij )SE j
COD_O j =min(COD_O j1 ,COD_O j2 ,…,COD_O jn )
COD_O j =O iij
wherein O is i 、O j For the running performance of the nodes i and j, average is an Average function, alpha ij 、β ij Respectively is a sectionThe dependent intensity SOD and dependent critical COD, SE of points i, j j Active performance for node j;
wherein, when node i is a investigation node, a control node or a hit node, the operation performance O i A capability value for node i; when the node i is the target node, the operation performance O i The damage degree value of the node i;
the combat chain comprises a investigation node, a control node, a hitting node and a target node which are sequentially connected through directed edges;
the combat capability of the combat chain is as follows:
wherein E is OC (l j ) For the j-th combat chain l j Is the combat ability of(s) j 、d j 、i j 、t j Respectively fight chain l j In (a) a detection node, a control node, a hit node, and a target node, O S (s j )、O D (d j )、O I (i j ) Respectively, scout nodes s j Control node d j Striking node i j Capability value of O T (t j ) For the target node t j The value of the degree of damage, t j ∈T;
The combat capability of the combat network is as follows:
wherein E is N (G) Combat capability for combat network G;
the building a Markov decision process includes:
taking nodes and directed edges in a combat network as states and marking the states as G t =(N t ,E t ) Wherein G is t 、N t 、E t Combat network at time t and node set corresponding to combat networkA directed edge set;
taking a directed edge in a combat network as a removable action, taking a directed edge capable of being connected with a node in the combat network in a decision space network as an addable action, taking the removable action and the addable action as decision actions, and marking as x t =(n t ,e t ) Wherein x is t 、e t 、n t The decision action at the moment t and the directed edge and the node corresponding to the decision action are respectively; constructing a decision action space according to the decision action;
selecting and executing a decision action from the decision action space, taking the change of the combat capability of the combat network after execution as a return value, and recording the return value as delta C t+1 =E N (G t+1 )-E N (G t ) Wherein E is N (G t+1 )、E N (G t ) The combat capabilities of the combat network at the moments t+1 and t respectively;
taking the state of the combat network after execution as state transition, and marking the state as G t+1 (N t+1 ,E t+1 )=G t (N t ±n t ,E t+1 ±e t );
And constructing a Markov decision process according to the states, the decision actions, the return values and the state transitions.
2. The method of claim 1, wherein constructing the bellman optimal equation for the markov decision process comprises:
initializing a decision action sequence which is expected to be executed: { x 1 ,x 2 ,…x t …,x k -wherein k is the total number of times;
constructing an objective function with a maximum return value based on the decision action sequence:
wherein DeltaC t The return value is the time t;
and (3) expanding and pouring the objective function to obtain a Belman optimal equation:
wherein, gamma is a discount factor, gamma is E (0, 1)];ΔC t+1 For the return value at time t+1, V t (G t ,x t ) Is a state-decision action cost function.
3. The method of claim 2, wherein the solving to obtain the combined result comprises:
updating a Belman optimal equation by adopting a time sequence difference method:
V t (G t ,x t )←V t (G t ,x t )+ηδ t
wherein eta is the update step length delta t As error, delta t =ΔC t+1 +γV t+1 (G t+1 ,x t+1 )-V t (G t ,x t );
The decision action is selected according to epsilon-greedy strategy, and V is selected by probability epsilon t (G t ,x t ) The decision action with the largest value is randomly selected according to the probability 1-epsilon;
selecting V t (G t ,x t ) Maximum value decision action x t As an optimal decision action
From initial state G based on state transition 0 According to the optimal decision actionForward transferring to generate an optimal decision action sequence;
and obtaining the combat network combination result at each moment according to the execution result of the optimal decision action sequence.
4. A combat network adaptive combiner, characterised in that it uses a method according to any one of claims 1-3, said device comprising:
the node acquisition module is used for acquiring control nodes, investigation nodes, hit nodes and target nodes;
the space construction module is used for connecting the nodes by adopting directed edges representing the dependency relationship to construct a decision space network;
the network construction module is used for constructing a combat chain aiming at each target node and combining each combat chain to construct a combat network of the target node;
the capacity calculation module is used for calculating and summing the combat capacity of each combat chain to obtain the combat capacity of the combat network;
the process construction module is used for constructing a Markov decision process according to the decision space network and the combat network;
and the process solving module is used for constructing a Belman optimal equation of the Markov decision process and solving and obtaining a combined result.
5. An electronic device, comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor being operative according to the instructions to perform the steps of the method according to any one of claims 1-3.
6. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-3.
CN202310487406.6A 2023-05-04 2023-05-04 Combat network self-adaptive combination method, device, equipment and medium Active CN116489193B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310487406.6A CN116489193B (en) 2023-05-04 2023-05-04 Combat network self-adaptive combination method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310487406.6A CN116489193B (en) 2023-05-04 2023-05-04 Combat network self-adaptive combination method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN116489193A CN116489193A (en) 2023-07-25
CN116489193B true CN116489193B (en) 2024-01-23

Family

ID=87215475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310487406.6A Active CN116489193B (en) 2023-05-04 2023-05-04 Combat network self-adaptive combination method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116489193B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200092457A (en) * 2019-01-07 2020-08-04 한국과학기술원 System and method for predicting human choice behavior and underlying strategy using meta-reinforcement learning
CA3144397A1 (en) * 2019-07-19 2021-01-28 Mark GORSKI An unmanned aerial vehicle (uav)-based system for collecting and distributing animal data for monitoring
CN112632744A (en) * 2020-11-13 2021-04-09 中国人民解放军国防科技大学 Combat system architecture modeling method and space exploration algorithm based on hyper-network model
CN112947581A (en) * 2021-03-25 2021-06-11 西北工业大学 Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning
CN113093802A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning
CN114202010A (en) * 2021-10-25 2022-03-18 北京仿真中心 Information entropy-based complex system networked modeling method, device and medium
CN115034067A (en) * 2022-06-16 2022-09-09 中国人民解放军国防科技大学 Game optimization method and device based on combat network attack and defense strategy of link
CN115169131A (en) * 2022-07-18 2022-10-11 中国人民解放军国防科技大学 Toughness-based combat system node protection method and device and electronic equipment
CN115906673A (en) * 2023-01-10 2023-04-04 中国人民解放军陆军工程大学 Integrated modeling method and system for combat entity behavior model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11146479B2 (en) * 2019-10-10 2021-10-12 United States Of America As Represented By The Secretary Of The Navy Reinforcement learning-based intelligent control of packet transmissions within ad-hoc networks

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200092457A (en) * 2019-01-07 2020-08-04 한국과학기술원 System and method for predicting human choice behavior and underlying strategy using meta-reinforcement learning
CA3144397A1 (en) * 2019-07-19 2021-01-28 Mark GORSKI An unmanned aerial vehicle (uav)-based system for collecting and distributing animal data for monitoring
CN112632744A (en) * 2020-11-13 2021-04-09 中国人民解放军国防科技大学 Combat system architecture modeling method and space exploration algorithm based on hyper-network model
CN112947581A (en) * 2021-03-25 2021-06-11 西北工业大学 Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning
CN113093802A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning
CN114202010A (en) * 2021-10-25 2022-03-18 北京仿真中心 Information entropy-based complex system networked modeling method, device and medium
CN115034067A (en) * 2022-06-16 2022-09-09 中国人民解放军国防科技大学 Game optimization method and device based on combat network attack and defense strategy of link
CN115169131A (en) * 2022-07-18 2022-10-11 中国人民解放军国防科技大学 Toughness-based combat system node protection method and device and electronic equipment
CN115906673A (en) * 2023-01-10 2023-04-04 中国人民解放军陆军工程大学 Integrated modeling method and system for combat entity behavior model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"A New Model for Evaluation of Radar Anti-Stealth Preplan in Mosaic Warfare";Yanhong Duan et al;《2022 2nd International Conference on Frontiers of Electronics, Information and Computation Technologies (ICFEICT)》;全文 *
"基于演化博弈的网络信息体系资源优选";王楠等;《计算机系统应用》;全文 *
"马赛克作战模式的递归拼图计算体系";张婷婷等;《指挥与控制学报》;全文 *
基于动态影响网络的空战效能评估方法;潘勃;陶茜;刘同豪;火力与指挥控制(第004期);全文 *

Also Published As

Publication number Publication date
CN116489193A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN112329348B (en) Intelligent decision-making method for military countermeasure game under incomplete information condition
CN110929394B (en) Combined combat system modeling method based on super network theory and storage medium
Yuan et al. Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks
Alighanbari et al. Cooperative task assignment of unmanned aerial vehicles in adversarial environments
CN111339690A (en) Deep reinforcement learning training acceleration method based on expected value function
CN110969362B (en) Multi-target task scheduling method and system under cloud computing system
WO2016107426A1 (en) Systems and methods to adaptively select execution modes
CN105446742B (en) A kind of artificial intelligence executes the optimization method of task
Han et al. $ H_\infty $ Model-free Reinforcement Learning with Robust Stability Guarantee
CN110061870B (en) Node and edge-based combined efficiency evaluation method in tactical internet
de Cote et al. Learning to cooperate in multi-agent social dilemmas
Xu et al. A study of count-based exploration and bonus for reinforcement learning
Toghiani-Rizi et al. Evaluating deep reinforcement learning for computer generated forces in ground combat simulation
Jia et al. Improving policy optimization with generalist-specialist learning
CN112734239A (en) Task planning method, device and medium based on task and resource capacity attributes
Wiering et al. Reinforcement learning soccer teams with incomplete world models
CN116489193B (en) Combat network self-adaptive combination method, device, equipment and medium
CN116088586B (en) Method for planning on-line tasks in unmanned aerial vehicle combat process
Reiter et al. Augmenting spacecraft maneuver strategy optimization for detection avoidance with competitive coevolution
CN114662655A (en) Attention mechanism-based weapon and chess deduction AI hierarchical decision method and device
Ulam et al. Combining model-based meta-reasoning and reinforcement learning for adapting game-playing agents
CN113324545A (en) Multi-unmanned aerial vehicle collaborative task planning method based on hybrid enhanced intelligence
Lu et al. Optimal cost constrained adversarial attacks for multiple agent systems
CN116227361B (en) Intelligent body decision method and device
CN107633302B (en) Dependence implementation system and method of game strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant