CN112436971A

CN112436971A - Global instruction control network cooperative topology generation method based on Monte Carlo tree search

Info

Publication number: CN112436971A
Application number: CN202011344455.7A
Authority: CN
Inventors: 许珺怡; 卜先锦; 季明; 吴志强; 雷中原; 付东; 田义伟
Original assignee: Evaluation Argument Research Center Academy Of Military Sciences Pla China
Current assignee: Evaluation Argument Research Center Academy Of Military Sciences Pla China
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-03-02
Anticipated expiration: 2040-11-25
Also published as: CN112436971B

Abstract

The application relates to a global area control network cooperation topology generation method based on Monte Carlo tree search. The method comprises the following steps: and acquiring a matched observation node set and an execution node set from the global command network according to preset task region parameters, and acquiring a network path set sequentially comprising observation nodes, communication nodes, command nodes and execution nodes from the global command network based on a Monte Carlo tree search algorithm. And calculating a task execution utility value of the network path according to a preset utility function, selecting the network path according to a preset task utility parameter and the task execution utility value of the network path, and generating the global instruction control network cooperation topology according to the obtained network path. The method can generate the global finger control network cooperative topology meeting the task constraint and the network constraint, adapts to the cooperative requirements of different tasks on network nodes, and improves the efficiency of the global finger control network in the process from task observation, data transmission, task control to task execution.

Description

Global instruction control network cooperative topology generation method based on Monte Carlo tree search

Technical Field

The application relates to the technical field of command and control networks, in particular to a global command and control network cooperation topology generation method based on Monte Carlo tree search.

Background

With the development of network and communication technology, nodes of a network system are widely deployed and applied to different types of fields such as sea, land, air, sky and the like, and can realize a uniform network command function, thereby forming a global command network covering the universe. The global finger control network has the advantages of large number of nodes and wide coverage range, and a large number of maneuvering nodes with uncertain positions and a large number of ad hoc network nodes with unfixed access states exist.

The types and the number of tasks of the global command network have the characteristics of diversification and complexity corresponding to diversified node types and complex network structures. The global finger control network needs to select appropriate network resources from currently available network resources under the constraint of requirements such as different task execution costs and task execution effect requirements, and generate a corresponding global finger control network cooperation topology for a specific task. The speed and the accuracy of the network cooperation topology generation process are optimized, the key of improving the cooperation capability of the global control network is to improve the cross-domain cooperation capability of the global control network and shorten the task execution period.

Disclosure of Invention

In view of the above, it is necessary to provide a global instruction controlled network cooperation topology generation method based on monte carlo tree search, which can generate a corresponding network cooperation topology according to task constraint conditions.

A global area control network cooperation topology generation method based on Monte Carlo tree search comprises the following steps:

and acquiring a set of observation nodes and a set of execution nodes in the global command control network according to preset task region parameters, so that the observation region parameters of the observation nodes, the execution region parameters of the execution nodes and the task region parameters are matched.

And searching the global command control network based on a Monte Carlo tree search algorithm to obtain a set of network paths sequentially comprising observation nodes, communication nodes, command control nodes and execution nodes.

And calculating a task execution utility value of the network path according to a preset utility function. The variables of the utility function comprise a task success probability parameter of the execution node, a time cost parameter of the network path and a path length parameter of the network path.

And acquiring a corresponding network path according to a preset task utility parameter and a task execution utility value, and generating a universe control network cooperation topology according to the acquired network path.

In one embodiment, the definition of the utility function includes:

wherein V ═<v₀，v₁，...，v_t>Sequence of nodes representing a network path, v₀Representing observation nodes, v_tRepresenting an executing node, P (v)_t) Probability of task success, T (v), of executing node_i) The time cost of each node in the network path, L (V) represents the path length of the network path, alpha, beta and gamma are preset weighted values, alpha is greater than 0, beta is less than 0.

In one embodiment, the defining manner of the UCT function of the monte carlo tree search algorithm includes:

where v denotes the current search node, v_iAn ith next hop node, Q (v), representing the current search node_i) Is the utility value of the ith next hop node, N (v)_i) The number of access times of the ith next hop node, n (v) the number of access times of the current search node, and c a preset constant.

In one embodiment, the step of searching the global command control network based on the monte carlo tree search algorithm to obtain a set of network paths sequentially including an observation node, a communication node, a command control node and an execution node includes:

and searching a next-hop node by taking an observation node as an initial node based on a Monte Carlo tree search algorithm and a rolout strategy to obtain a network path sequentially comprising the observation node, a communication node, an instruction control node and an execution node.

In one embodiment, the task area parameters include a task area type and a task area range, the observation area parameters include an observation area type and an observation area range, and the execution area parameters include an execution area type and an execution area range.

The method comprises the following steps of acquiring a set of observation nodes and a set of execution nodes in the global command network according to preset task regional parameters, and enabling the observation regional parameters of the observation nodes, the execution regional parameters of the execution nodes and the task regional parameters to be matched, wherein the steps comprise:

acquiring a set of observation nodes and a set of execution nodes in the global command network according to preset task area parameters, enabling the observation area types of the observation nodes, the execution area types of the execution nodes and the task area types to be the same, and enabling the observation area ranges of the observation nodes, the execution area ranges of the execution nodes and the task area ranges to contain the same sub-area ranges.

In one embodiment, the task utility parameter comprises a task utility threshold;

the method comprises the following steps of obtaining a corresponding network path according to a preset task utility parameter and a task execution utility value, and generating a universe control network cooperation topology according to the obtained network path:

and acquiring a network path with a task execution utility value larger than a preset task utility threshold, and generating the global instruction control network cooperation topology according to the acquired network path.

In one embodiment, after the step of obtaining a set of observation nodes and a set of execution nodes in the global finger control network according to a preset task area parameter and matching the observation area parameter of the observation nodes, the execution area parameter of the execution nodes, and the task area parameter, the method further includes:

and when the observation data parameters of the observation nodes are not matched with the preset task data parameters, deleting the observation nodes from the set of observation nodes.

And when the action type parameter of the execution node is not matched with the preset task action type parameter, deleting the execution node from the set of execution nodes.

A device for generating a global area command network cooperative topology based on monte carlo tree search, the device comprising:

and the endpoint acquisition module is used for acquiring a set of observation nodes and a set of execution nodes in the global finger control network according to preset task area parameters so as to match the observation area parameters of the observation nodes, the execution area parameters of the execution nodes and the task area parameters.

And the path searching module is used for searching the global command control network based on the Monte Carlo tree searching algorithm to obtain a set of network paths sequentially comprising observation nodes, communication nodes, command control nodes and execution nodes.

And the path utility calculation module is used for calculating a task execution utility value of the network path according to a preset utility function. The variables of the utility function comprise a task success probability parameter of the execution node, a time cost parameter of the network path and a path length parameter of the network path.

And the cooperation topology generation module is used for acquiring a corresponding network path according to a preset task utility parameter and a task execution utility value and generating the universe instruction control network cooperation topology according to the acquired network path.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the Monte Carlo tree search-based global instruction control network cooperative topology generation method, the Monte Carlo tree search-based global instruction control network cooperative topology generation device, the computer equipment and the storage medium, the matched observation node set and execution node set are obtained from the global instruction control network according to the preset task area parameters, and the network path set sequentially comprising the observation nodes, the communication nodes, the instruction control nodes and the execution nodes is obtained from the global instruction control network based on the Monte Carlo tree search algorithm. And calculating a task execution utility value of the network path according to a preset utility function, selecting the network path according to a preset task utility parameter and the task execution utility value of the network path, and generating the global instruction control network cooperation topology according to the obtained network path. According to the method and the device, appropriate observation nodes and execution nodes can be selected from the nodes which are currently connected to the global command network according to task parameters, and a global command network cooperative topology which meets the constraint of task execution utility parameters is generated on the basis, so that the cooperative requirements of the global command network on network nodes when different tasks are completed can be met, the global command network cooperative topology which meets the task requirements is generated, and the efficiency of the global command network in the process of task observation, data transmission, task control and task execution is improved.

Drawings

FIG. 1 is a block diagram of a method for generating a coordinated topology of a global area control network based on Monte Carlo tree search according to an embodiment;

FIG. 2 is a schematic diagram of network paths generated for a given task in one embodiment;

FIG. 3 is a flowchart illustrating a method for generating a coordinated topology of a global area control network based on Monte Carlo tree search according to an embodiment;

FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, there is provided a method for generating a global area command network cooperative topology based on monte carlo tree search, including the following steps:

step 102, acquiring a set of observation nodes and a set of execution nodes in the global finger control network according to preset task area parameters, so that the observation area parameters of the observation nodes, the execution area parameters of the execution nodes and the task area parameters are matched.

Specifically, the cooperative process of the command control network can be constructed as a sequential decision process, that is, according to a given task, a group of observation nodes is selected to observe a task area, and obtained observation data is transmitted to the communication nodes; the communication node processes the observation data and transmits the observation data to the control node, and usually, the observation data can reach the control node through a plurality of communication nodes. And the control node generates a control instruction according to the received observation data and transmits the control instruction to the execution node to implement the task action.

Sequential decisions are various decisions arranged in a time sequence and are used in a decision method for stochastic or uncertainty dynamic system optimization. A complete sequential decision process includes multiple states and actions, each state transition depending on the execution of one action. In the problem of global command coordination to perform a specific task, the current coordination state is defined as: and selecting a set of network nodes for executing the specific task in the current global instruction control cooperative network, wherein the action is to select the next network node to join the set with a certain probability in the current cooperative state.

The sequential decision process determines a starting point and an ending point based on the task parameters. When the initial node is selected, acquiring a set of observation nodes and a set of execution nodes in the global finger control network according to preset task area parameters, and matching the observation area parameters of the observation nodes, the execution area parameters of the execution nodes and the task area parameters.

Further, the task area parameters may include a task area type, a task area range, a task target, and the like, the observation area parameters may include an observation area type, an observation area range, an observation target, and the like, and the execution area parameters may include an execution area type, an execution area range, an execution target, and the like. Wherein, the task area type can comprise specific type values of land, sea, air, sky and the like; the task area range is used for describing an area to be covered when the task is implemented, and may be one area or a plurality of scattered sub-areas, and the representation mode may be a specific longitude and latitude value or a differently-shaped area range centered on a given target point. The task area range can be further divided into a task observation range specified for the observation node and a task execution area range specified for the execution node according to task execution needs. The task object is used for describing an object needing attention in a specific task, and mainly gives geographical position information of the object. The parameter types and the definition modes of the observation region parameters and the execution region parameters are similar to those of the task region parameters, and the parameter types of the observation region parameters and the execution region parameters have corresponding relations in the same task.

The observation region parameters of the observation nodes, the execution region parameters of the execution nodes and the task region parameters are matched, namely, the value range of the observation region parameters of each observation node is at least partially overlapped with the value range of the corresponding task region parameters; and simultaneously, the value range of the execution area parameter of each execution node is at least partially overlapped with the value range of the corresponding task area parameter.

And 104, searching the universe control network based on the Monte Carlo tree search algorithm to obtain a set of network paths sequentially comprising observation nodes, communication nodes, control nodes and execution nodes.

Specifically, after an observation node set and an execution node set available for a given task are obtained according to task region parameters, a network path set with a starting point as an observation node and an end point as an execution node is obtained from a current global instruction network based on a monte-carlo tree search algorithm. In the construction process of the search tree, when the current node is further expanded to generate a subsequent node, the two nodes are directly connected. As shown in fig. 2, according to the task target specified in the task area parameter, the corresponding observation node and execution node are obtained. And then sequentially searching in the current global command network based on the Monte Carlo tree from the obtained observation nodes as starting points: the method comprises the steps of directly connecting a communication node with a starting point, directly connecting an instruction control node with the communication node, and directly connecting an execution node with the instruction control node (for nodes which are not directly connected and nodes which do not conform to the type of the connection node, the algorithm directly prunes). Note that, during the search, the observation node and the control node may also be connected through multiple communication nodes in a multi-hop manner, in which case the communication nodes are connected in sequence, the observation node is directly connected to one communication node, and the control node is directly connected to the last communication node. When all the searched execution nodes belong to the available execution node set for the given task, which is obtained according to the task area parameters, a complete cooperation process is completed, which can be regarded as an episode (episode), to obtain a network path, such as a network path between nodes marked by arrows in fig. 2.

In the execution nodes of fig. 2, the white nodes represent execution nodes that do not belong to the set of execution nodes available for a given task, and are pruned directly in the search algorithm. The execution nodes with the same pattern represent the same execution node, and represent that the same execution node can be reached from the same observation point through different communication nodes and control nodes. This may also be the case for the communication node and the control node, and fig. 2 is only a schematic diagram, and therefore a similar differential representation is not provided for the two types of nodes.

Note that in the process of searching, a group of nodes may be designated in the current global instruction network as a search range according to a condition required by a task, or the number of each type of nodes in a network path may be limited according to a preset value (a designated value or a randomly generated value), and then a plurality of simulation experiments are performed by means of a monte carlo tree search algorithm to select an optimal model parameter.

And 106, calculating a task execution utility value of the network path according to a preset utility function. The variables of the utility function comprise a task success probability parameter of the execution node, a time cost parameter of the network path and a path length parameter of the network path.

Specifically, for each network path obtained through searching, a preset utility function is used for calculating a task execution utility value of the network path. The task execution utility value is mainly used for a network path to execute a given task, and is related to parameters of each type of node and parameters of the network path. The node parameters related to the utility function mainly include: observation time, observation probability, observation data type, observation data processing delay and the like of an observation node, communication capacity, communication port number, connection node number, data forwarding delay and the like of a communication node, data fusion processing time, data analysis and instruction generation time, human-in-loop decision time (ratio) and the like of a control node, execution success probability, execution duration, execution frequency (strength) and the like of an execution node; the network path parameters mainly include communication quality between nodes, transmission delay (distance), communication capacity, and the like. When a utility function is constructed, dividing node parameters and network path parameters into three parts, namely a part related to task execution effect, if a given task needs to execute a certain action at a certain frequency, executing nodes in the current network path have higher total execution frequency for the action, and the task execution effect is better; the second is a part related to task consumed resources, wherein the task consumed resources comprise hardware resources, bandwidth resources and the like of nodes and the total number of the nodes in a network path; and thirdly, a part related to the task execution time mainly comprises node delay of each node and transmission delay among the nodes. The utility function can be defined as being proportional to the task execution effect and inversely proportional to the task consumption resource and the task execution time so as to calculate the task execution utility value of the network path.

And 108, acquiring a corresponding network path according to a preset task utility parameter and a task execution utility value, and generating a global instruction control network cooperation topology according to the acquired network path.

Specifically, the task utility parameter may be a threshold or a range, and when a network path with a task utility value smaller than the threshold or within a certain range is searched, the search is stopped, and a global instruction control network cooperation topology is generated according to the acquired network path. The task utility parameter may also be a policy, such as searching all network paths for a given task, and selecting a network path with the highest task utility value; or selecting a network path meeting the requirement from network paths with task utility values within a given range according to further limiting conditions such as task delay, task success probability and the like, and generating the global finger control network cooperation topology.

The method for generating the global instruction control network cooperative topology based on the Monte Carlo tree search can select proper observation nodes and execution nodes from the nodes currently connected to the global instruction control network according to the task parameters, generate the global instruction control network cooperative topology conforming to the task execution utility parameter constraint on the basis, adapt to the cooperative requirements of the global instruction control network on network nodes when different tasks are completed, generate the global instruction control network cooperative topology conforming to the task requirements, and improve the efficiency of the global instruction control network in the process of completing task observation, data transmission, task control to task execution.

In one embodiment, the definition of the utility function includes:

Specifically, the first term in the utility function value (v) indicates that the sum of the execution success probabilities of all the execution nodes in the network path for the given task is positively correlated with the utility value, the second term indicates that the sum of the time costs of all the nodes in the network path for the given task is negatively correlated with the utility value, and the third term indicates that the length of the network path is negatively correlated with the utility value. Specific values of alpha, beta and gamma can be preset according to the situation of the global command network and the task, and the validity of the specific values can be verified through experimental analysis. In addition, normalization processing can be carried out on data with three different dimensions of execution success probability, time cost and network path length, so that errors of data calculation are reduced.

The generation process of the global instruction control network topology comprises iterative search repeated for many times, and the process is terminated until certain limiting conditions (such as calculation time, maximum iteration number and the like) are reached. Each iteration comprises four basic steps:

(1) selection (Selection): from the start node v of the global command network₀Starting searching, and selecting an extensible non-terminated neighbor node according to a UCT strategy;

(2) extension (Expansion): expanding the currently selected node v to obtain a newly expanded neighbor node v_i；

(3) Simulation (Simulation): from the new extension node v_iInitially, forward deduction is continued according to a predetermined rollout strategy to obtain a simulated collaborative path

Obtaining utility values for simulated collaborative scenarios

(4) Feedback (feedback): from the new extension node v_iInitially, the simulation results are

And gradually transmitting the path to the initial node upwards, and updating the utility value and the access times of the relevant node on the path.

For simulating a collaborative path

The node utility value updating formula on the path is as follows:

the formula for updating the access times of the nodes on the path is as follows:

N(v_j)＝N(v_j)+1,j＝{0,...,i}

the embodiment provides a specific definition mode of a utility function, which can quantitatively calculate a task execution utility value of a network path as a standard for selecting the network path to generate the global finger network cooperation topology.

And when the global control network cooperation topology is generated, selecting the next hop node with the largest UCT function value to be added into the node sequence of the network path. Specifically, the monte carlo tree search algorithm used in the present application is mainly to search the best successor node hop by hop from the obtained observation node according to a given task. Through a plurality of simulation attempts, the best action in the current state is predicted based on the simulation result. Wherein, a simulation process is a sequence which is formed by a plurality of node selection actions and is started by the current searching node and ended by the last executing node. The action selection in the simulation process is based on the rollout strategy, that is, the network path from the current searching node to the executing node is cooperated according to the rollout. The rollout policy function may use a uniform random distribution or a preset non-uniform probability distribution. One simulation process generates a network path (composed of two network paths from the observation node to the current node and from the current node to the execution node) for a given task, and calculates a corresponding evaluation result, namely a task execution utility value, according to the utility function. After the simulation of the current search node is finished, the evaluation result is propagated back to the root node of the current search tree, and then the starting node of the simulation is marked as visited. Wherein the back propagation is a traversal process from the starting node of the simulation process to the root node of the entire search tree. The evaluation results of the simulation process are transmitted to the root node and the statistical information of each node on the back propagation path is updated. The back propagation ensures that the statistical information of each node can reflect the collaborative simulation results of all descendants of the node. The statistical information of the node v comprises two parts: total simulated utility Q (v) and total number of visits N (v). The total simulation utility may be defined as the sum of the simulation evaluation results for a node, and the total number of visits represents the number of occurrences of the node on the back propagation path.

The core of the monte carlo tree search algorithm is the uct (upper Confidence bound applied to trees) algorithm, i.e. the upper Confidence interval algorithm. The UCT function is about node v and its child nodes v_iFor selecting a next node from the accessed nodes to traverse. The UTC function of this embodiment is defined as:

wherein, Q (v)_i) Is a node v_iRepresents the sum of the utility values of all historical simulated paths through the node. N (v)_i) The number of node visits represents the historical number of simulation paths passing through the node. For a pass node v and its child nodes v_iSimulated co-path of

Q(v_i)、N(v_i) N (v) the update formula is:

N(v_i)＝N(v_i)+1

N(v)＝N(v)+1

c is a constant for balancing the utilization (iteration) and exploration (iteration) of the algorithm. The UCT function contains two parts: use item andexploration terms, with terms being viewed as child nodes v_iThe exploration terms make the search process more inclined to less explored nodes.

In this embodiment, as shown in fig. 3, one observation node is used as an initial node, and a next hop node is searched based on a monte carlo tree search algorithm and a rolout policy, so as to obtain a network path including the observation node, the communication node, the instruction control node, and the execution node in sequence.

The embodiment defines the UCT function in the Monte Carlo tree search algorithm, encourages to explore fewer explored nodes, and can ensure the comprehensiveness of the search algorithm to obtain results.

Specifically, in addition to pruning nodes which do not meet the constraint conditions according to task constraints in the search algorithm, the nodes can be pre-screened according to task data parameters and task action type parameters, so that the search range of the algorithm is narrowed, the calculation amount is reduced, and the real-time performance of the generation of the global finger control network cooperation topology is improved.

The task data parameters are used for defining the types of data required to be acquired by the task, and comprise data providing modes such as videos, photos, audios and the like; data content such as speed, size, appearance, etc. And selecting observation nodes with corresponding action capability observation data parameters according to the task data parameters. The task action type is used for defining an action type required for executing the task, such as maneuvering to a task area and providing various capability supports, clearing a specified target, defending the specified area and the like, and an execution node capable of performing corresponding type action is selected according to the task action type. For one executing node, multiple types of actions can be executed simultaneously, thus having multiple action type parameter values, one or more types of actions can be executed in a given task. When generating corresponding globally-controlled network collaboration topologies for a plurality of given tasks, a node is unavailable for tasks that do not generate a topology if the node is already included in the generated topology when considering available nodes.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, there is provided a global instruction network cooperative topology generation apparatus based on monte carlo tree search, including:

In one embodiment, the definition of the utility function includes:

In one embodiment, the path search module is configured to: and searching a next-hop node by taking an observation node as an initial node based on a Monte Carlo tree search algorithm and a rolout strategy to obtain a network path sequentially comprising the observation node, a communication node, an instruction control node and an execution node.

In one embodiment, the task area parameters include a task area type and a task area range, the observation area parameters include an observation area type and an observation area range, and the execution area parameters include an execution area type and an execution area range. The endpoint acquisition module is to: acquiring a set of observation nodes and a set of execution nodes in the global command network according to preset task area parameters, enabling the observation area types of the observation nodes, the execution area types of the execution nodes and the task area types to be the same, and enabling the observation area ranges of the observation nodes, the execution area ranges of the execution nodes and the task area ranges to contain the same sub-area ranges.

In one embodiment, the task utility parameter includes a task utility threshold. The collaboration topology generation module is to: and acquiring a network path with a task execution utility value larger than a preset task utility threshold, and generating the global instruction control network cooperation topology according to the acquired network path.

In one embodiment, the system further includes an endpoint screening module, configured to delete an observation node from the set of observation nodes when an observation data parameter of the observation node does not match a preset task data parameter. And when the action type parameter of the execution node is not matched with the preset task action type parameter, deleting the execution node from the set of execution nodes.

For specific limitations of the apparatus for generating the global steering network cooperative topology based on the monte carlo tree search, reference may be made to the above limitations of the method for generating the global steering network cooperative topology based on the monte carlo tree search, which are not described herein again. The modules in the above-mentioned global command network cooperative topology generating device based on monte carlo tree search may be wholly or partially implemented by software, hardware and their combination. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing task parameters, global control network node parameters and search process data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for generating a global-controlled network collaboration topology based on Monte Carlo tree search.

Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of: and searching a next-hop node by taking an observation node as an initial node based on a Monte Carlo tree search algorithm and a rolout strategy to obtain a network path sequentially comprising the observation node, a communication node, an instruction control node and an execution node.

In one embodiment, the task area parameters include a task area type and a task area range, the observation area parameters include an observation area type and an observation area range, and the execution area parameters include an execution area type and an execution area range. The processor, when executing the computer program, further performs the steps of: acquiring a set of observation nodes and a set of execution nodes in the global command network according to preset task area parameters, enabling the observation area types of the observation nodes, the execution area types of the execution nodes and the task area types to be the same, and enabling the observation area ranges of the observation nodes, the execution area ranges of the execution nodes and the task area ranges to contain the same sub-area ranges.

In one embodiment, the task utility parameter includes a task utility threshold. The processor, when executing the computer program, further performs the steps of: and acquiring a network path with a task execution utility value larger than a preset task utility threshold, and generating the global instruction control network cooperation topology according to the acquired network path.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and when the observation data parameters of the observation nodes are not matched with the preset task data parameters, deleting the observation nodes from the set of observation nodes. And when the action type parameter of the execution node is not matched with the preset task action type parameter, deleting the execution node from the set of execution nodes.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: and searching a next-hop node by taking an observation node as an initial node based on a Monte Carlo tree search algorithm and a rolout strategy to obtain a network path sequentially comprising the observation node, a communication node, an instruction control node and an execution node.

In one embodiment, the task area parameters include a task area type and a task area range, the observation area parameters include an observation area type and an observation area range, and the execution area parameters include an execution area type and an execution area range. The computer program when executed by the processor further realizes the steps of: acquiring a set of observation nodes and a set of execution nodes in the global command network according to preset task area parameters, enabling the observation area types of the observation nodes, the execution area types of the execution nodes and the task area types to be the same, and enabling the observation area ranges of the observation nodes, the execution area ranges of the execution nodes and the task area ranges to contain the same sub-area ranges.

In one embodiment, the task utility parameter includes a task utility threshold. The computer program when executed by the processor further realizes the steps of: and acquiring a network path with a task execution utility value larger than a preset task utility threshold, and generating the global instruction control network cooperation topology according to the acquired network path.

In one embodiment, the computer program when executed by the processor further performs the steps of: and when the observation data parameters of the observation nodes are not matched with the preset task data parameters, deleting the observation nodes from the set of observation nodes. And when the action type parameter of the execution node is not matched with the preset task action type parameter, deleting the execution node from the set of execution nodes.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A global area control network cooperation topology generation method based on Monte Carlo tree search is characterized by comprising the following steps:

acquiring a set of observation nodes and a set of execution nodes in a global command control network according to preset task region parameters, and matching the observation region parameters of the observation nodes, the execution region parameters of the execution nodes and the task region parameters;

searching the universe command control network based on a Monte Carlo tree search algorithm to obtain a set of network paths sequentially comprising the observation nodes, the communication nodes, the command control nodes and the execution nodes;

calculating a task execution utility value of the network path according to a preset utility function; the variables of the utility function comprise a task success probability parameter of the execution node, a time cost parameter of the network path and a path length parameter of the network path;

and acquiring the corresponding network path according to a preset task utility parameter and the task execution utility value, and generating the universe control network cooperation topology according to the acquired network path.

2. The method of claim 1, wherein the utility function is defined in a manner comprising:

wherein V ═<v₀，v₁，...，v_t>Sequence of nodes representing said network path, v₀Representing observation nodes, v_tRepresenting an executing node, P (v)_t) Indicating the probability of success of the task of the executing node, T (v)_i) The time cost of each node in the network path is represented, L (V) represents the path length of the network path, alpha, beta and gamma are preset weighted values, alpha is greater than 0, beta is less than 0.

3. The method of claim 2, wherein the UCT function of the Monte Carlo tree search algorithm is defined in a manner comprising:

4. The method of claim 3, wherein the step of searching the global command network based on the Monte Carlo tree search algorithm to obtain a set of network paths sequentially including the observation node, the communication node, the command node, and the execution node comprises:

and searching a next hop node by taking one observation node as an initial node based on a Monte Carlo tree search algorithm and a rolout strategy to obtain a network path sequentially comprising the observation node, the communication node, the command node and the execution node.

5. The method of claim 1, wherein the task area parameters include a task area type and a task area scope, wherein the observation area parameters include an observation area type and an observation area scope, and wherein the execution area parameters include an execution area type and an execution area scope;

the step of acquiring a set of observation nodes and a set of execution nodes in the global command control network according to preset task area parameters to match the observation area parameters of the observation nodes, the execution area parameters of the execution nodes and the task area parameters comprises the following steps:

acquiring a set of observation nodes and a set of execution nodes in a global command control network according to preset task region parameters, enabling the observation region types of the observation nodes, the execution region types of the execution nodes and the task region types to be the same, and enabling the observation region ranges of the observation nodes, the execution region ranges of the execution nodes and the task region ranges to contain the same sub-region ranges.

6. The method of claim 1, wherein the task utility parameter comprises a task utility threshold;

the step of acquiring the corresponding network path according to a preset task utility parameter and the task execution utility value, and generating the universe control network cooperation topology according to the acquired network path includes:

and acquiring the network path with the task execution utility value larger than a preset task utility threshold, and generating a global instruction control network cooperation topology according to the acquired network path.

7. The method according to any one of claims 1 to 6, wherein after the step of obtaining a set of observation nodes and a set of execution nodes in the global command control network according to a preset task area parameter and matching the observation area parameter of the observation nodes, the execution area parameter of the execution nodes, and the task area parameter, the method further comprises:

deleting the observation node from the set of observation nodes when the observation data parameter of the observation node is not matched with the preset task data parameter;

8. A device for generating a global area command network cooperative topology based on monte carlo tree search, the device comprising:

the system comprises an endpoint acquisition module, a task area parameter matching module and a global control network management module, wherein the endpoint acquisition module is used for acquiring a set of observation nodes and a set of execution nodes in the global control network according to a preset task area parameter so as to match the observation area parameter of the observation nodes, the execution area parameter of the execution nodes and the task area parameter;

the path searching module is used for searching the universe control network based on a Monte Carlo tree searching algorithm to obtain a set of network paths sequentially comprising the observation nodes, the communication nodes, the control nodes and the execution nodes;

the path utility calculation module is used for calculating a task execution utility value of the network path according to a preset utility function; the variables of the utility function comprise a task success probability parameter of the execution node, a time cost parameter of the network path and a path length parameter of the network path;

and the cooperation topology generation module is used for acquiring the corresponding network path according to a preset task utility parameter and the task execution utility value and generating the universe instruction control network cooperation topology according to the acquired network path.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.