CN115470710B

CN115470710B - Air game simulation method and device

Info

Publication number: CN115470710B
Application number: CN202211176772.1A
Authority: CN
Inventors: 陈敏杰; 吴斌星
Original assignee: Beijing Dingcheng Intelligent Manufacturing Technology Co ltd
Current assignee: Beijing Dingcheng Intelligent Manufacturing Technology Co ltd
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2023-06-06
Anticipated expiration: 2042-09-26
Also published as: CN115470710A

Abstract

The invention discloses an air game simulation method and device, wherein the method comprises the following steps: acquiring configuration parameters, and generating an air game simulation environment by using a simulation module; constructing a first intelligent algorithm model; the first intelligent algorithm model is used for generating an instruction for controlling the air strength of a first party; training the first intelligent algorithm model by using the air game simulation environment to obtain a second intelligent algorithm model; and evaluating the second intelligent algorithm model to obtain an evaluation result. According to the invention, by introducing a deep reinforcement learning algorithm and a mixed reality technology, an operator can directly control the entity in the virtual world to fight against the intelligent body, so that in the intelligent algorithm training stage, multiple actual random uncertain scenes can be learned, the generalization of the intelligent body is enhanced, and in the algorithm model evaluation, the operator directly fights against the intelligent body controlled by the intelligent model through the mixed reality technology, and the intelligent model training effect is verified.

Description

Air game simulation method and device

Technical Field

The invention relates to the technical field of computer simulation, in particular to an air game simulation method and device.

Background

In the air game simulation process in the military field, the actions of a driving entity and an intelligent agent in the traditional simulation technology mainly depend on rule algorithms, namely expert experiences, and the actions of each step of the entity are controlled by pre-coded rules, so that the rule algorithms need to be designed to be complex and comprehensive in order to adapt to the transformation of various scenes, rule algorithm models need to be constructed by sophisticated experience algorithm engineers, the construction difficulty of the rule algorithms is high, the requirement on experiences is high, and the training effect is influenced by the difficulty of the rule algorithm models.

In addition, in the traditional simulation system, once the algorithm is built, the outside is difficult to intervene in the training process, and the intelligent agent is lack of countermeasure with a real operator, so that the intelligent agent cannot learn a random uncertain scene in reality. The traditional 'man-in-the-loop' is based on the virtual environment completely, and the mode isolates an operator from the virtual environment, so that interaction with entities in the virtual environment is inconvenient, and an intelligent body can only fight against entities controlled by rules at the simulation end and cannot fight against entities controlled by real operators, so that the simulation verification method is lack of practicability.

Therefore, there is a need for an aerial simulation method that has low algorithm design difficulty and can introduce operator interaction in real time.

Disclosure of Invention

In view of the above-mentioned problems, the present invention aims to provide an air game simulation method, which introduces a reinforcement learning algorithm into an agent control method to construct an intelligent algorithm model, so that the intelligent algorithm can autonomously make a corresponding decision according to a specific battlefield scene; by introducing the mixed reality technology, an operator in the real world can directly control the entity in the virtual world to fight against the intelligent body in cooperation with the rule algorithm, so that the intelligent algorithm can learn various real random uncertain scenes, and the generalization of the intelligent body is enhanced; in the intelligent game evaluation stage, the intelligent agent can fight against an entity controlled by a real operator, and the simulation practicability is enhanced.

To achieve the above objective, a first aspect of the present invention discloses an air game simulation method, which includes:

s1, acquiring configuration parameters, and generating an air game simulation environment by using a simulation module;

s2, constructing a first intelligent algorithm model; the first intelligent algorithm model is used for generating an instruction for controlling the air strength of a first party;

s3, training the first intelligent algorithm model based on the air game simulation environment to obtain a second intelligent algorithm model;

And S4, evaluating the second intelligent algorithm model to obtain an evaluation result.

In a first aspect of the embodiment of the present invention, training the first intelligent algorithm model based on the air game simulation environment to obtain a second intelligent algorithm model includes:

s31, generating first situation information by using the simulation module based on the air game simulation environment; the situation information comprises first party air strength information, second party air strength information and environment data information; the first party air strength information and the second party air strength information are opposite parties;

s32, processing the first situation information by using the first intelligent algorithm model to obtain a first instruction; the first instruction is used for controlling the first party air strength;

s33, displaying the first situation information to an operator by using a map module and a mixed reality module to obtain first interaction information; the first interaction information comprises voice interaction information, handle interaction information and gaze interaction information;

s34, processing the first interaction information by using a preset rule algorithm model to obtain a second instruction; the second instruction comprises patrol, evacuation, navigation protection and attack; the second instructions are for controlling a second party's air strength;

S35, controlling the simulation module to perform step-length propulsion according to the first instruction and the second instruction, and generating second situation information;

s36, judging the second situation information by using a preset training judgment model to obtain a first judgment result; the preset training judgment model is used for judging whether the first intelligent algorithm model achieves a preset effect or not;

s37, when the first judging result is negative, changing the second situation information into first situation information, triggering and executing the first intelligent algorithm model, and processing the first situation information to obtain a first instruction; the first instruction is used for controlling the first party air strength;

and stopping training when the first judgment result is yes, and obtaining a second intelligent algorithm model.

In a first aspect of the embodiment of the present invention, the processing, by using the first intelligent algorithm model, the first situation information to obtain a first instruction includes:

analyzing the first situation information to obtain analysis data; the parsing includes data format conversion and data group package;

and extracting the characteristics of the analysis data to obtain characteristic data. The characteristic data comprises the position of the first party aerial power, the position of the second party aerial power, the distance between the first party aerial power and the second party aerial power and the residual quantity of the first party missile;

Coding the characteristic data to obtain characteristic coding data;

inputting the characteristic coding data into a preset neural network, and outputting action coding data;

and decoding the motion coding data to obtain a first instruction.

In a first aspect of the embodiment of the present invention, the displaying, by using a map module and a mixed reality module, the first situation information to an operator to obtain first interaction information includes:

acquiring the visual field direction of an operator from a mixed reality module;

processing the first situation information and the visual field azimuth of the operator by using a map module to generate visual field battlefield environmental information;

and displaying the visual field battlefield environmental information to an operator by utilizing the mixed reality module to obtain first interaction information.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the first intelligent algorithm model includes a deep neural network and a reinforcement learning algorithm; the deep neural network adopts an LSTM network, and the reinforcement learning algorithm adopts a PPO algorithm.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the preset training determination model includes a reward function, a termination function, and an average reward target for training, including:

The setting method of the rewarding function comprises the following steps: if the time is out, the reward value is subtracted by 9 points; the intelligent agent of the intelligent algorithm controller dies, and the rewarding value is reduced by 10 points; the enemy entity dies, and the reward value is added with 2 points; the intelligent algorithm controller has no missile, and the reward value is added with 2 points; the enemy has no missile, and the reward value is added by 4 points.

The setting method of the termination function comprises the following steps: the expiration time is reached and/or one entity dies and/or both missiles remain.

And the average reward target of the training establishment is that the average reward value is stable and converged and is not smaller than a preset reward threshold.

In an optional implementation manner, in a first aspect of the embodiment of the present invention, the determining, by using a preset training determination model, the second situation information to obtain a first determination result includes:

calculating a winning function and a termination function of a preset training judgment model according to the second situation information to obtain a winning value of the first intelligent algorithm model;

and judging whether the rewarding value reaches a preset rewarding threshold value or not to obtain a first judging result.

In a first aspect of the embodiment of the present invention, the evaluating the second intelligent algorithm model to obtain an evaluation result includes:

S41, configuring the simulation module to generate third state potential information; the situation information comprises first party air strength information, second party air strength information and environment data information; the first party air strength information and the second party air strength information are opposite parties;

s42, processing the third state potential information by using the second intelligent algorithm model to obtain a third instruction;

s43, displaying the third state information to an operator by using a map module and a mixed reality module to obtain second interaction information; the second interaction information comprises voice interaction information, handle interaction information and gaze interaction information;

s44, processing the second interaction information by using a preset rule algorithm model to obtain a fourth instruction;

s45, controlling the simulation module to perform step-length propulsion according to the third instruction and the fourth instruction, and generating fourth situation information;

s46, judging the fourth situation information to obtain a second judgment result;

s47, if the second judgment result is negative, performing parameter optimization on the first intelligent algorithm model, triggering and executing the simulation environment based on the air game, and training the first intelligent algorithm model to obtain a second intelligent algorithm model;

And if the second judgment result is yes, stopping the evaluation to obtain an evaluation result.

In a first aspect of the embodiment of the present invention, the processing, by using the second intelligent algorithm model, the third state potential information to obtain a third instruction includes:

analyzing the third state potential information to obtain third analysis data; the parsing includes data format conversion and data group package;

and extracting the characteristics of the third analysis data to obtain third characteristic data. The characteristic data comprises the position of the first party aerial power, the position of the second party aerial power, the distance between the first party aerial power and the second party aerial power and the residual quantity of the first party missile;

encoding the special third characteristic data to obtain third characteristic encoded data;

inputting the third characteristic coding data into a preset neural network, and outputting third action coding data;

and decoding the third motion coding data to obtain a third instruction.

In a first aspect of the present invention, the displaying the third gesture information to the operator by using the map module and the mixed reality module to obtain the second interaction information includes:

processing the third state potential information and the visual field azimuth of the operator by using a map module to generate visual field battlefield environmental information;

and displaying the battlefield environmental information to an operator by utilizing the mixed reality module to obtain second interaction information.

In an optional implementation manner, in a first aspect of the embodiment of the present invention, the determining the fourth situation information to obtain a second determination result includes:

and processing the fourth situation information, and judging whether the first party's air strength destroys the second party's air strength within the preset deduction time to obtain a second judgment result.

The second aspect of the invention discloses an air game simulation device, which comprises: the system comprises a simulation module, an algorithm module, a mixed reality module and a map module;

the simulation module is used for acquiring configuration parameters and/or instruction information and generating an air game simulation environment and situation information;

the algorithm module is used for processing situation information and operator interaction information received from the mixed reality module and outputting instructions; the algorithm module comprises an intelligent algorithm model, a training judgment model and a rule algorithm model;

The map module is used for generating visual field scene information according to the visual field azimuth of an operator and the acquired air game simulation environment and situation information; the visual field scene information is in a video stream format;

the mixed reality module is used for displaying the visual field scene information to an operator and capturing interaction information of the operator; the mixed reality module comprises a head-mounted display, a handle and a positioner.

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the head-mounted display is configured to receive field scene information of the map module and display the field scene information to an operator;

the handle is used for receiving the interaction information of operators;

the locator is used for aligning the virtual world with the real world, and is the origin of coordinates of the virtual world.

In a second aspect of the present invention, the mixed reality module is configured to display the view scene information to an operator, to obtain interaction information of the operator, and includes:

the mixed reality module acquires the view scene information from the map module;

the mixed reality module displays the view scene information to an operator by using a head-mounted display device;

The mixed reality module utilizes a locator to coordinate and locate the virtual world and the real world;

the mixed reality module captures the control actions of the operator and obtains the interaction information of the operator.

In a second aspect of the present invention, the capturing, by the mixed reality module, a manipulation action of an operator to obtain interaction information of the operator includes:

a head-mounted display in the mixed reality module acquires the movement and rotation actions of an operator wearing the helmet, and the visual field azimuth of the operator is obtained;

the mixed reality module sends the view direction to the map module, and receives updated view scene information from the map module;

the mixed reality module acquires instruction actions of an operator to obtain interaction information of the operator;

the instruction actions include voice instructions, gaze instructions, handle instructions.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

the invention provides an air game simulation method and device, the method comprises the steps of obtaining configuration parameters, generating an air game simulation environment by using a simulation module, constructing an intelligent algorithm model by using a deep reinforcement learning technology, and introducing a mixed reality technology in the training process of the intelligent algorithm model so that the intelligent algorithm can learn various realistic random uncertain scenes; and evaluating the trained intelligent algorithm model, and if the evaluation result does not reach the preset target, continuing to improve the intelligent algorithm model until the evaluation result reaches the preset target. Therefore, the invention combines the deep reinforcement learning technology and the mixed reality technology on the traditional air game simulation mode, so that the uncertain factors of reality are introduced in the training stage, and the generalization of intelligent algorithm training is enhanced; in the evaluation stage, through the mixed reality technology, the intelligent algorithm can fight against entities controlled by a rule algorithm, and besides the entities controlled by a real operator, so that the simulation practicability is enhanced.

Drawings

FIG. 1 is a schematic flow diagram of an air game simulation method disclosed in an embodiment of the invention;

fig. 2 is a schematic structural diagram of an air game simulation device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The algorithm running in the traditional simulation platform is mainly based on a traditional rule model, a set of fixed rules, such as physical maneuver, are required to be formulated in advance according to experience, maneuver plans are required to be formulated in advance, maneuver routes are set, and the physical maneuver is performed according to route points formulated in advance. The method can solve the problem that the method is single, and the rules formulated in advance can not adjust the current self strategy according to the scene change.

The traditional simulation platform effect display mode is that a computer loads a model and renders an interface on a computer display terminal according to the acquired digital information flow, and a user can intuitively see the simulation effect but lacks immersive experience. With the rise of the metauniverse concept, the user experience requirement on interaction is gradually enhanced, the interaction mode of the traditional simulation technology is single, the interaction with the entity in the interface is poor, and the ever-increasing experience requirement of people cannot be met.

In order to solve the defect that the conventional rule model of the traditional simulation platform can only adapt to a single scene and cannot adjust the strategy of the rule model along with the change of the scene to achieve the final purpose, the invention introduces a deep reinforcement learning algorithm, and in the interaction between an algorithm frame and a simulation environment, the intelligent model adapting to different scenes under specific service can be trained by setting a reward and punishment mechanism or a guiding strategy, and in the scene change process, the intelligent model can make a decision according to the current battlefield environment to achieve the final purpose.

In order to solve the defects of single interaction mode and lack of interaction experience with entities in the effect display process of the traditional simulation platform, the invention introduces a mixed reality technology. After combining the mixed reality technology, the operator has the feeling of being in the middle of the mixed reality device. An operator can map to a change of position and orientation in the virtual world by moving or rotating in the real world, and can feel the scene in the virtual world in all directions. In the process of visual presentation, the user interacts with entities in the virtual world.

The terms used in the embodiments of the present invention are explained below.

A rule algorithm model. Is a knowledge-based software model, applies a rule-based system, and adopts a generalization and reasoning mechanism to complete the final decision. The rule algorithm model is a group of rule algorithm model libraries, which is a set of several rules that are empirically coded by an expert. In the procedure of combating the smart algorithm, it will in turn combat the current smart algorithm. In order to improve the training effect of the intelligent algorithm, the rule algorithm model with more winnings has higher weight and is easier to be selected to fight against the intelligent algorithm.

And (5) an intelligent algorithm model. Through a certain artificial intelligence means, through the study of large sample data, a generated model applicable to different scenes of the same service is used for making corresponding decisions according to the current scene in real time along with the change of the scene.

Mixed Reality (MR) which includes augmented reality and augmented virtual refers to a new visual environment created by combining the reality and virtual world. Physical and digital objects coexist in the new visualization environment and interact in real time. The implementation of Mixed Reality (MR) is required in an environment that can interact with real world things.

Example 1

Referring to fig. 1, fig. 1 is a schematic flow chart of an air game simulation method according to an embodiment of the invention. The air game simulation method described in fig. 1 is used in an air game simulation system, such as a local server or a cloud server used in the air game simulation system, and the embodiment of the invention is not limited. As shown in fig. 1, the air game simulation method may include the following operations:

s2, constructing a first intelligent algorithm model;

in the embodiment of the invention, the first intelligent algorithm model is used for generating an instruction for controlling the air strength of a first party;

s3, training the first intelligent algorithm model based on an air game simulation environment to obtain a second intelligent algorithm model;

Therefore, by implementing the air game simulation method described by the embodiment of the invention, an air game simulation environment can be generated by using the simulation module, and an intelligent algorithm model with deep reinforcement learning capability can be constructed; in the intelligent algorithm model training and evaluation, the mixed reality technology is utilized to enable operators to be added into the countermeasure, so that the generalization and simulation practicability of the intelligent algorithm training are enhanced.

In an alternative embodiment, the building the first intelligent algorithm model specifically includes:

the first intelligent algorithm model comprises a deep neural network and a reinforcement learning algorithm, wherein the deep neural network adopts an LSTM network, and the reinforcement learning algorithm adopts a PPO algorithm.

Network model configuration: the method comprises the steps of constructing a feature extraction layer, constructing a hidden layer, constructing an output layer and constructing a value output layer.

Algorithm parameter configuration: setting experience pool size, batch size, KL divergence initial coefficient, learning rate and the like.

And (3) observing space configuration: the observation space includes a state of a first party's air strength, a state of a second party's air strength, a state of a blue party missile, and a state of a red party missile.

The observation space refers to the size of the neural network input.

Behavioral space configuration: the behavior space comprises a maneuvering target point of the airplane and whether the airplane shoots or not.

The behavior space configuration refers to the size of the neural network output.

And carrying out feature coding on the first situation information according to the configuration of the observation space to obtain strategy features of the incoming LSTM network, outputting strategy actions through a feature extraction layer, a construction hidden layer, a construction output layer and a construction value output layer of the LSTM network, and decoding through multi-strategy actions to obtain a first instruction.

In the training stage, the intelligent algorithm model makes a decision on the received first situation information by using the initial model to obtain a first instruction, the first instruction acts on the simulation module, the state of the simulation module changes, the steps are repeatedly executed, a large number of simulation environment training samples are obtained, and each sample comprises: situation information, instructions, rewards. The intelligent algorithm model selects a plurality of samples from the training samples, performs gradient calculation by using a PPO algorithm, and updates parameters of the initial model.

Therefore, by implementing the air game simulation method described by the embodiment of the invention, the intelligent algorithm model introduces a deep reinforcement learning technology, so that the purposes of large sample acquisition and automatic calculation of model parameters are achieved.

In an optional embodiment, training the first intelligent algorithm model based on the air game simulation environment to obtain a second intelligent algorithm model specifically includes:

generating first situation information by using a simulation module based on an air game simulation environment; the situation information comprises first party air strength information, second party air strength information and environment data information; the first party air strength information and the second party air strength information are opposite parties;

Processing the first situation information by using a first intelligent algorithm model to obtain a first instruction; the first instruction is used for controlling the first party air strength;

displaying first situation information to an operator by using a map module and a mixed reality module to obtain first interaction information; the first interaction information comprises voice interaction information, handle interaction information and gaze interaction information;

processing the first interaction information by using a preset rule algorithm model to obtain a second instruction;

the second instruction comprises patrol, evacuation, voyage protection and attack;

the second instruction is used for controlling the second square air force;

according to the first instruction and the second instruction, the simulation module performs step-length propulsion to generate second situation information;

judging the second situation information by using a preset training judgment model to obtain a first judgment result; the preset training judgment model is used for judging whether the first intelligent algorithm model achieves a preset effect or not;

when the first judgment result is negative, changing the second situation information into first situation information, triggering and executing the first intelligent algorithm model, and processing the first situation information to obtain a first instruction;

Therefore, in the training stage, the air game simulation method described by the embodiment of the invention utilizes the network initial model to acquire the state in the simulation environment to make a decision, converts the decision into an action instruction to be transmitted to the simulation environment, advances the simulation engine, and repeatedly executes the steps to obtain a large number of training samples of the simulation environment. Gradient calculation is performed by using a reinforcement learning algorithm on learning of the training sample, and parameters of the initial model are updated. By introducing a deep reinforcement learning algorithm and a mixed reality technology, an operator in the real world can directly control the entity in the virtual world to fight against the intelligent agent, so that the intelligent agent can learn various real random uncertain scenes in the intelligent game training stage, and the generalization of the intelligent agent is enhanced.

In an optional embodiment, the processing the first situation information by using the first intelligent algorithm model to obtain a first instruction includes:

coding the characteristic data to obtain characteristic coding data;

and decoding the motion coding data to obtain a first instruction.

Therefore, by utilizing the intelligent algorithm, different instructions can be automatically generated according to the scene change, so that the aim of adjusting the strategy of the user is fulfilled.

In an optional embodiment, the displaying the first situation information to the operator by using the map module and the mixed reality module to obtain first interaction information includes:

and displaying the battlefield environmental information to an operator by utilizing the mixed reality module to obtain first interaction information.

Optionally, the second instruction includes patrol, evacuation, convoy, attack.

Therefore, game situation information is visually presented to an operator by utilizing a mixed reality technology, so that the operator has immersive experience, and instruction information of the operator is fed back to the simulation module through the interaction equipment.

In an optional embodiment, the determining the second situation information by using a preset training determination model to obtain a first determination result includes:

calculating a winning function and a termination function of a preset training judgment model according to the second situation information to obtain a rewarding value of the first intelligent algorithm model;

In an alternative embodiment, the preset training decision model includes a reward function, a termination function, and an average reward target for training.

The setting method of the reward function comprises the following steps: if the time is out, the reward value is subtracted by 9 points; the intelligent agent of the intelligent algorithm controller dies, and the rewarding value is reduced by 10 points; the enemy entity dies, and the reward value is added with 2 points; the intelligent algorithm controller has no missile, and the reward value is added with 2 points; the enemy has no missile, and the reward value is added by 4 points.

The above-described reward function defines the goal in reinforcement learning, and in each step, the reward function sets the goal value of the agent, the only goal of the agent being to maximize the reward value.

The setting method of the termination function comprises the following steps: the expiration time is reached and/or one entity dies and/or both missiles remain. The termination function acquires the environment state from the environment at each step of each round, and judges whether the current state is to be terminated. If terminated, the next round will be entered.

And the average reward target of the training establishment is that the average reward value stably converges to not less than 8 points.

Therefore, by implementing the air game simulation method described by the embodiment of the invention, the LSTM network can be guided to output more effective strategy actions through the design of the rewarding function. Through the design of the termination function, the simulation environment can be restarted at a specific time, so that sampling training can be carried out continuously, and a large sample support is provided for training of an intelligent algorithm model.

In an optional embodiment, the evaluating the second intelligent algorithm model to obtain an evaluation result includes:

s45, according to the third instruction and the fourth instruction, the simulation module performs step-length propulsion to generate fourth situation information;

s47, if the second judgment result is negative, performing parameter optimization on the first intelligent algorithm model, triggering execution of the simulation environment based on the air game, and training the first intelligent algorithm model to obtain a second intelligent algorithm model;

It can be seen that, in this embodiment, the mixed reality technology is utilized, so that an operator can immersively integrate into a simulation environment, and feel scenes in the virtual world in all directions. In the visual display process, a user interacts with entities in the virtual world in a voice, handle and staring mode according to situation development, and the problems that in the traditional simulation system effect display process, the interaction mode is single and interaction experience with the entities is lacking are solved.

In an optional embodiment, the determining the fourth situation information to obtain the second determination result includes:

Therefore, the operator evaluates the second intelligent algorithm model through the mixed reality module control entity, and can intuitively check the actual training effect of the second intelligent algorithm model.

Therefore, the air game simulation method described by the embodiment of the invention trains the intelligent model adapting to different scenes under specific service by introducing a deep reinforcement learning algorithm, and in the scene change process, the intelligent model can autonomously make decisions according to the current battlefield environment, thereby solving the problems that the existing simulation method can only adapt to a single scene and cannot adjust own strategy for achieving the final purpose along with the scene change; the mixed reality technology is adopted to show the air game situation for the operator, so that the operator has the feeling in the situation, and the operator can feel the scene in the virtual world in all directions. The method solves the problems that in the effect display process of the traditional simulation system, the interaction mode is single, and the interaction experience with the entity is lacking.

Example two

Referring to fig. 2, fig. 2 is a schematic structural diagram of an air game simulation device according to an embodiment of the invention. The air game simulation device described in fig. 2 is used in an air game simulation system, such as a local server or a cloud server used in the air game simulation system, and the embodiment of the invention is not limited. As shown in fig. 2, the apparatus may include:

the simulation module 201 is configured to generate an air game simulation environment and situation information according to the acquired configuration parameters and/or instruction information;

the algorithm module 202 is used for processing situation information and operator interaction information received from the mixed reality module and outputting an instruction; the algorithm module comprises an intelligent algorithm model, a training judgment model and a rule algorithm model;

the mixed reality module 203 is configured to display view scene information to an operator, and capture interaction information of the operator;

the map module 204 is configured to generate view scene information according to the view direction of the operator and the acquired air game simulation environment and situation information; the map module 204 may load three-dimensional aircraft model data, may display azimuth and attitude to an aircraft entity in real time, may display a missile explosion effect, may display a following view angle and a roaming view angle, and may receive a switch from a mixed reality device to a scene view angle.

Therefore, by implementing the air game simulation device described by the embodiment of the invention, the deep reinforcement learning technology and the mixed reality technology are combined on the traditional air game simulation mode, so that the uncertain factors of reality are introduced in the training stage, and the generalization of the training of the intelligent body is enhanced; in the demonstration verification stage, the mixed reality technology is adopted, so that an operator can feel scenes in the virtual world in an omnibearing manner.

In an alternative embodiment, the mixed reality module includes a head mounted display, a handle, and a locator;

the head-mounted display is used for receiving the view scene information of the map module and displaying the view scene information to an operator;

the handle is used for receiving the interaction information of operators;

a locator for aligning the virtual world with the real world is the origin of coordinates of the virtual world.

Optionally, the mixed reality module transmits the azimuth of the head-mounted display to the map module in the interaction with the map module, and the map module loads a battlefield picture corresponding to the azimuth and finally outputs the battlefield picture at the terminal of the head-mounted display; transmitting the instruction to the simulation module through voice and staring; and controlling the scaling of the scene and the switching of the global view and the local view through the handle keys.

Therefore, in the simulation process, the user interacts with the entity in the virtual world through the voice, the handle and the staring mode according to situation development, and the problems that in the effect display process of the traditional simulation system, the interaction mode is single and interaction experience with the entity is lacking are solved.

In an alternative embodiment, the simulation module 201 generates the air game simulation environment and situation information according to the acquired configuration parameters and/or the instruction information by:

initializing the simulation module 201;

constructing an air game simulation environment and situation information by using the simulation module 201 by using the acquired configuration parameters;

the simulation module 201 obtains instruction information sent by the algorithm module 202;

according to the instruction information, the simulation module 201 performs step-size propulsion to generate next gait information.

Optionally, the configuration parameters may include a detection range of the sensor, a missile attack distance, and a flight speed.

Optionally, the above-mentioned air game simulation environment includes blue air power, red air power, combat area configuration, weapon mount configuration.

Optionally, the situation information includes blue party air strength information, red party air strength information, missile information and intelligence information.

It can be seen that, by configuring parameters of the entity components and making environmental scenes, scenes in which the simulation module interacts with the intelligent algorithm module can be generated.

In an alternative embodiment, the intelligent algorithm model in the algorithm module 202 processes the situation information and outputs the instruction in a specific manner:

the intelligent algorithm model analyzes the received situation information to obtain analysis data; the parsing includes data format conversion and data group package;

the intelligent algorithm model performs feature extraction on the analysis data to obtain feature data;

the intelligent algorithm model encodes the characteristic data to obtain characteristic encoded data;

the intelligent algorithm model inputs the characteristic coding data into a preset neural network and outputs action coding data;

the intelligent algorithm model decodes the motion coding data to obtain an instruction;

the intelligent algorithm model outputs the above instructions to the simulation module 201.

Optionally, the parsing includes data format conversion and data group package.

Optionally, the above features include, but are not limited to, the location of the blue air force, the location of the red air force, the distance between the two air forces, the remaining number of missiles.

In an alternative embodiment, the rule algorithm model in the algorithm module 202 processes the interaction information and outputs the instruction in a specific manner:

the rule algorithm model processes the received interaction information from the mixed reality module 203;

according to a pre-designed rule, the rule algorithm model outputs an instruction through judging the current situation; the instruction is used for controlling own multi-agent collaborative operation to fight against enemy intelligence;

optionally, the collaborative operation includes a plurality of strategies coded in advance, and specifically adopted strategies can be adjusted, including surrounding potential attack, attack potential attack, formation potential attack, target allocation and the like.

In an optional embodiment, the mixed reality module 203 is configured to display the view scene information to an operator, and capture interaction information of the operator, by:

the mixed reality module 203 acquires the view scene information from the map module 204;

the mixed reality module 203 displays the view scene information to an operator using a head mounted display device;

the mixed reality module 203 coordinates-locates the virtual world with the real world using a locator;

The mixed reality module 203 captures the manipulation actions of the operator and obtains the interaction information of the operator.

In an alternative embodiment, the mixed reality module 203 captures the manipulation actions of the operator, and obtains the interaction information of the operator, which is specifically:

the head-mounted display in the mixed reality module 203 acquires the movement and rotation actions of the operator wearing the helmet, and obtains the visual field azimuth of the operator;

the mixed reality module 203 sends the view direction to the map module 204, and receives updated view scene information from the map module 204;

after observing and analyzing the scene information, an operator can directly give a voice command at a proper time to drive an airplane to execute patrol, retreat, guard navigation, attack and other actions in a battlefield environment;

an operator can wake up the display of the staring interface through a handle key, directly give an instruction in an interface staring mode, and drive an airplane in a battlefield environment to execute patrol, retreat, guard navigation, attack and other actions;

an operator can adjust the size of the local visual angle through a handle key;

an operator can switch the global view angle and the local view angle through a handle key.

The mixed reality module 203 obtains the instruction action of the operator and obtains the interaction information of the operator;

The instruction actions include voice instruction, staring instruction and handle instruction.

Optionally, the voice instruction interaction specific mode is as follows:

the operator can watch the battlefield situation in real time by wearing the MR device, and can control the maneuver, striking and other actions of the entity in the battlefield by giving a voice command.

The voice software converts the voice information into voice codes through recognition, the voice codes are transmitted to the human-computer interaction module, voice instructions are transmitted to the simulation engine through decoding of the voice codes, and the effect executed by the simulation engine can be displayed in the MR equipment.

Optionally, the specific manner of the gaze interaction is:

the display and the hiding of the staring button are controlled by the handle keys, the helmet is worn, and the displaying and the hiding of the staring button are controlled by the handle keys. The cursor point in the line of sight is made to fall on the gaze button frame by turning the head, stay for 2 seconds confirming that the button is selected. The staring interaction button consists of three parts, namely, firstly selecting an entity, then selecting the type of instructions to be issued (such as patrol, withdrawal, attack, convoy, and the like), and finally selecting a target, so that a complete action instruction can be issued.

Optionally, the specific manner of interaction of the handles is as follows:

the handle can control the zooming in and out of the scene under the MR visual angle besides controlling the showing and hiding of the staring button. The handle interacts with the MR device in a Bluetooth mode, the MR device transmits a corresponding key instruction to the computer terminal after receiving the key of the handle, and a final scene is displayed in the MR device after calculation.

In the simulation process, a user displays an air game situation for an operator by adopting a mixed reality technology according to situation development, and interacts with entities in the virtual world in a voice, handle and staring mode, so that the operator has a sense of being in the virtual world, and the operator senses the scene in the virtual world in all directions. The method solves the problems that in the effect display process of the traditional simulation system, the interaction mode is single, and the interaction experience with the entity is lacking.

The apparatus embodiments described above are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied in essence or in a portion contributing to the prior art in the form of a software product that may be stored in a computer readable storage medium including Read-only memory (ROM), random access memory (RandomAccessMemory, RAM), programmable Read-only memory (PROM), erasable programmable Read-only memory (erasablc program ReadOnlyMemory, EPROM), one-time programmable Read-only memory (One-timeProgrammable Read-OnlyMemory, OTPROM), electrically erasable programmable Read-only memory (CD-ROM) or other optical disk memory, magnetic disk memory, tape memory, or any other medium that can be used to carry or store data that is readable by a computer.

Finally, it should be noted that: the embodiment of the invention discloses an air game simulation method and device, which are disclosed as preferred embodiments of the invention, and are only used for illustrating the technical scheme of the invention, but not limiting the technical scheme; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme recorded in the various embodiments can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. An air game simulation method, characterized in that the method comprises the following steps:

s2, constructing a first intelligent algorithm model; the first intelligent algorithm model is used for generating an instruction for controlling the air strength of a first party; specific:

the first intelligent algorithm model comprises a deep neural network and a reinforcement learning algorithm, wherein the deep neural network adopts an LSTM network, and the reinforcement learning algorithm adopts a PPO algorithm;

network model configuration: constructing a feature extraction layer, a hiding layer, an output layer and a value output layer;

Algorithm parameter configuration: setting the experience pool size, batch size, KL divergence initial coefficient, learning rate and the like;

and (3) observing space configuration: the observation space comprises the state of first party air force, the state of second party air force, the state of blue-party missile and the state of red-party missile; the observation space refers to the input size of the neural network;

behavioral space configuration: the behavior space comprises a maneuvering target point of the airplane and whether the airplane shoots or not; the behavior space configuration refers to the output size of the neural network;

feature coding is carried out on the first situation information according to the configuration of the observation space, so that the strategy features of the incoming LSTM network are obtained, strategy actions are output through a feature extraction layer, a built hidden layer, a built output layer and a built value output layer of the LSTM network, and a first instruction is obtained through multi-strategy action decoding;

in the training stage, the intelligent algorithm model makes a decision on the received first situation information by using the initial model to obtain a first instruction, the first instruction acts on the simulation module, the state of the simulation module changes, the steps are repeatedly executed, a large number of simulation environment training samples are obtained, and each sample comprises: situation information, instructions, rewards; the intelligent algorithm model selects samples from the training samples, performs gradient calculation by using a PPO algorithm, and updates parameters of an initial model;

S3, training the first intelligent algorithm model based on the air game simulation environment to obtain a second intelligent algorithm model, wherein the method specifically comprises the following steps of:

s33, displaying the first situation information to an operator by using a map module and a mixed reality module to obtain first interaction information; specific:

displaying the visual field battlefield environmental information to an operator by utilizing a mixed reality module to obtain first interaction information;

s34, processing the first interaction information by using a preset rule algorithm model to obtain a second instruction; the second instructions are for controlling a second party's air strength;

S35, performing step-length propulsion by the simulation module according to the first instruction and the second instruction to generate second situation information;

s36, judging the second situation information by using a preset training judgment model to obtain a first judgment result; the preset training judgment model is used for judging whether the first intelligent algorithm model achieves a preset effect or not; the preset training judgment model comprises a reward function, a termination function and an average reward target for training;

the setting method of the reward function comprises the following steps: if the time is out, the reward value is subtracted by 9 points; the intelligent agent of the intelligent algorithm controller dies, and the rewarding value is reduced by 10 points; the enemy entity dies, and the reward value is added with 2 points; the intelligent algorithm controller has no missile, and the reward value is added with 2 points; the enemy has no missile, and the reward value is added with 4 points;

the reward function defines the target in reinforcement learning, and in each step, the reward function sets the target value of the intelligent agent, and the only target of the intelligent agent is the maximum reward value;

the setting method of the termination function comprises the following steps: reaching the termination time, and/or, all the entities of one party die and/or no missile remains; the termination function obtains the environment state from the environment and judges whether the current state is to be terminated or not under each step length of each round; if terminated, the next round will be entered;

The average rewarding target of the formulated training is that the average rewarding value stably converges to be not less than 8 points;

when the first judgment result is yes, stopping training to obtain a second intelligent algorithm model;

2. The air game simulation method according to claim 1, wherein the determining the second situation information by using a preset training determination model to obtain a first determination result includes:

3. The air game simulation method according to claim 1, wherein the evaluating the second intelligent algorithm model to obtain an evaluation result comprises:

S41, configuring the simulation module to generate third state potential information; the situation information comprises first party air strength information, second party air strength information and environment data information;

s43, displaying the third state information to an operator by using a map module and a mixed reality module to obtain second interaction information;

s47, if the second judgment result is negative, performing parameter optimization on the first intelligent algorithm model, triggering execution, and training the first intelligent algorithm model based on the air game simulation environment to obtain a second intelligent algorithm model;

4. The air game simulation method according to claim 3, wherein the determining the fourth situation information to obtain the second determination result includes:

5. The air game simulation method according to claim 1, wherein the processing the first situation information by using the first intelligent algorithm model to obtain a first instruction includes:

analyzing the first situation information to obtain analysis data;

extracting the characteristics of the analysis data to obtain characteristic data;

coding the characteristic data to obtain characteristic coding data;

and decoding the motion coding data to obtain a first instruction.

6. An air game simulation device, the device comprising: the system comprises a simulation module, an algorithm module, a mixed reality module and a map module;

The map module is used for generating visual field scene information according to the visual field azimuth of an operator and the acquired air game simulation environment and situation information;

the mixed reality module is used for displaying the visual field scene information to an operator and capturing interaction information of the operator; the mixed reality module comprises a head-mounted display, a handle and a positioner;

the intelligent algorithm model is obtained by training a preset first intelligent algorithm model by using the air game simulation device, and specifically comprises the following steps:

When the first judgment result is yes, stopping training to obtain a trained intelligent algorithm model;

the preset training judgment model comprises a reward function, a termination function and an average reward target for training;

the setting method of the termination function comprises the following steps: reaching the termination time, and/or all the entities of one party die, and/or no missile remains; the termination function obtains the environment state from the environment and judges whether the current state is to be terminated or not under each step length of each round; if terminated, the next round will be entered;

7. The air game simulation device of claim 6, wherein the mixed reality module is configured to display the view scene information to an operator to obtain interaction information of the operator, and the method comprises:

8. The air game simulation device of claim 7, wherein the mixed reality module captures the manipulation actions of the operator to obtain the interactive information of the operator, and the method comprises: