CN116663417A

CN116663417A - Virtual geographic environment role modeling method

Info

Publication number: CN116663417A
Application number: CN202310644061.0A
Authority: CN
Inventors: 郭德华; 杨锋; 阎毛毛; 屈莹
Original assignee: China National Institute of Standardization
Current assignee: China National Institute of Standardization
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-08-29
Anticipated expiration: 2043-06-01
Also published as: CN116663417B

Abstract

The application discloses a virtual geographic environment role modeling method, which comprises defining attribute functions of a geographic environment and an initial state of a role; secondly, making a decision according to the current state of the character and the environmental attribute function to obtain an action; finally, updating the character state according to the action, wherein the character state comprises position coordinates, current speed and environmental influence, and the simulation is ended when the termination condition is met; the virtual geographic environment role modeling method based on deep reinforcement learning has the technical effects of novelty, self-adaption, simulation precision improvement, time saving and labor saving, and can cope with the defects of the traditional method and improve the quality and effect of modeling and simulation.

Description

Virtual geographic environment role modeling method

Technical Field

The application relates to the technical field of virtual modeling, in particular to a virtual geographic environment role modeling method.

Background

The virtual geographic environment is a new field of time-space information technology which is currently rapidly developed, and has been widely applied to military and civil use. Supporting multi-user collaboration is an important content of a virtual geographic environment. Currently, collaboration of virtual geographic environments mainly refers to a computer-supported collaborative work (CSCW, computer Supported Cooperative Work) concept and method, and in practical application, system development needs to be performed for a certain typical application to realize information sharing and interoperation. However, no model and method of geographic synergy exist in the prior art that meets the characteristics and needs of the virtual geographic environment itself. The virtual geographic environment emphasizes the core of 'people', but how to construct a cooperative operation environment suitable for participation of multiple people, and also allows for efficient construction of the virtual geographic environment, and a practical and effective solution is still lacking at present.

The current virtual geographic environment role modeling method has defects. Traditional modeling methods typically rely on manually defined rules or use simple mathematical models, lacking the ability to accurately model complex environments and character behaviors. Furthermore, these methods often require extensive manual design and adjustment, and are not adaptable to different environments and tasks. Thus, a novel approach is needed to address these issues.

Disclosure of Invention

The application aims to provide a virtual geographic environment role modeling method.

In order to achieve the above purpose, the application is implemented according to the following technical scheme:

the application comprises the following steps:

s1: defining attribute functions of the geographic environment and an initial state of the character;

s2: simulation process: making a decision according to the current state of the character and the environmental attribute function to obtain an action;

s3: updating the character state according to the action, wherein the character state comprises position coordinates, current speed and environmental influence, and the simulation is ended when the termination condition is met;

s4: if the termination condition is not satisfied, the process returns to step S2.

Defining an attribute function V (x, y) of the geographic environment in the step S1; defining an initial state S (0) of the character; the simulation process in the step S2 defines each time step t; and making a decision through a reinforcement learning algorithm according to the current state S (t) of the character and the environment attribute function V (x, y) to obtain an action a (t).

In the step S3, the character state S (t+1) is updated according to the action a (t); the position of the character is represented by coordinates (x, y), and the position is updated according to the current position and speed:

x(t+1)＝x(t)+vx(t)

y(t+1)＝y(t)+vy(t)

the current speed is updated according to the following equation:

vx(t+1)＝f_vx(vx(t)，vy(t)，S(t)，V(x，y))

vy(t+1)＝f_vy(vx(t)，vy(t)，S(t)，V(x，y))

wherein f_vx and f_vy are functions that calculate a new speed from the current speed, character state, and environmental attributes;

updating the character state S (t+1) according to the environmental impact function F (S (t), V (x, y));

S(t+1)＝F(S(t)，V(x，y))

the termination condition is satisfied for a predetermined time or for a specified position, and the simulation is ended.

The reinforcement learning algorithm is as follows: the method comprises the steps of updating a Q value function and selecting an action strategy, wherein the specific formula is as follows:

updating the Q value function:

Q(s(t)，a(t))＝r(t+1)+γ*max(Q(s(t+1)，a))

wherein Q (s (t), a (t)) represents the Q value of taking action a (t) at state s (t); r (t+1) represents the immediate prize obtained after action a (t) is taken at state s (t); gamma is a discount factor for balancing the importance of current rewards and future rewards; max (Q (s (t+1), a)) represents the maximum Q value of the selection action a in the next state s (t+1);

action selection strategy:

a(t)＝argmax(Q(s(t)，a))

wherein argmax (Q (s (t), a)) represents an action a selected to maximize the Q value in the current state s (t).

In a second aspect, an embodiment of the present application further provides an electronic device, including:

a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the method steps of the first aspect.

In a third aspect, embodiments of the present application also provide a computer-readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method steps of the first aspect.

The beneficial effects of the application are as follows:

compared with the prior art, the virtual geographic environment role modeling method has the following technical effects:

compared with the traditional method, the application provides a novel modeling method, and the complex environment and the role behavior can be modeled more accurately by utilizing advanced technologies such as a deep reinforcement learning algorithm and the like.

The application can adapt to different environments and tasks, and can make decision and conduct behavior adjustment according to actual conditions through learning and optimizing processes, thereby improving the adaptability of the roles in the virtual geographic environment.

According to the application, by introducing a deep reinforcement learning algorithm, the movement and the behavior of the character in the virtual geographic environment can be better simulated, and the simulation precision and the sense of reality are improved.

Compared with the traditional method, the application reduces the dependence on manual design and adjustment, and can reduce the workload of a developer and improve the development efficiency through learning and autonomous decision-making.

In summary, the virtual geographic environment role modeling method based on deep reinforcement learning has the technical effects of novelty, self-adaptability, simulation precision improvement, time saving and labor saving, and can cope with the defects of the traditional method and improve the quality and effect of modeling and simulation.

Drawings

Fig. 1 is a flow chart of the method of the present application.

Fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The application will be further described with reference to the accompanying drawings and specific embodiments, wherein the exemplary embodiments and descriptions of the application are for purposes of illustration, but are not intended to be limiting.

The role modeling method of the virtual geographic environment combines several key concepts and mathematical formulas:

the geographical environment represents: the geographic environment is defined as a two-dimensional plane, represented by a coordinate system. Let the point coordinates on the plane be (x, y). The geographic environment is divided into a plurality of discrete areas or grids, each having a set of attributes (e.g., terrain, altitude, vegetation, etc.). A geographical context representation function V (x, y) is defined that maps each point to a corresponding attribute vector.

Role modeling: roles are modeled as entities with a set of attributes. These attributes may include location, speed, direction, health status, etc. A character state function S (t) is defined that describes the attribute state of the character at time t. The position of the character is represented by coordinates (x, y) and the velocity is represented by vector v= (vx, vy).

Roles interact with the environment: the character makes a decision according to the current state S (t) and the environmental attribute V (x, y), and updates its own state. Interactions between roles and environments may be achieved through some rules, physical simulation, or machine learning algorithms. For example, reinforcement learning algorithms are used to train roles to make optimal decisions.

Environmental impact: an influence function F (S (t), V (x, y)) of the environment on the character is defined, which represents the influence of the environment properties on the character state. This may be a function based on physical laws or a function based on experience or probability.

Based on the above concept, as shown in fig. 1: the application comprises the following steps:

x(t+1)＝x(t)+vx(t)

y(t+1)＝y(t)+vy(t)

the current speed is updated according to the following equation:

vx(t+1)＝f_vx(vx(t)，vy(t)，S(t)，V(x，y))

vy(t+1)＝f_vy(vx(t)，vy(t)，S(t)，V(x，y))

S(t+1)＝F(S(t)，V(x，y))

updating the Q value function:

Q(s(t)，a(t))＝r(t+1)+γ*max(Q(s(t+1)，a))

action selection strategy:

a(t)＝argmax(Q(s(t)，a))

In the above algorithm process, the specific formulas and definitions depend on the application scenario and requirements. The functions f_vx and f_vy defining the environment attribute function V (x, y), the decision algorithm, the environment influence function F (S (t), V (x, y)) and the update speed can be changed according to data at the time of actual application. The algorithm can be customized according to the specific problem, so that the method meets the actual requirement.

Examples:

initializing: defining an attribute function V (x, y) of the geographic environment: v (x, y) represents the attribute vector of the geographic environment at coordinates (x, y). Define initial state S (0) of the character: s (0) represents the state of the character at the initial time, and includes attributes such as position and speed.

Simulation process: for each time step t:

decision stage: and making a decision according to the current state S (t) of the character and the environment attribute function V (x, y) to obtain an action a (t). Decision functions are learned using reinforcement-based methods, such as using deep reinforcement learning algorithms (e.g., DQN). Action a (t) represents an action taken by the character at time t.

Updating the character status: the character state S (t+1) is updated according to the action a (t). Updating the position: the location is updated based on the current location and speed.

x(t+1)＝x(t)+vx(t)*dt

y(t+1)＝y(t)+vy(t)*dt

Where (x (t), y (t)) is the position of the character at time t, (vx (t), vy (t)) is the speed of the character at time t, and dt is the time step.

Update speed: it is assumed that the velocity is affected by gravity while taking into account the acceleration of the character.

vx(t+1)＝vx(t)+ax(t)*dt

vy(t+1)＝vy(t)+(ay(t)-g)*dt

Where (ax (t), ay (t)) is the acceleration of the character at time t, and g is the gravitational acceleration.

Environmental impact: the character state S (t+1) is updated according to the environmental impact function F (S (t), V (x, y)). Let the character be highly influenced by the terrain and the speed be influenced by friction.

ay(t+1)＝f_ay(x(t+1),y(t+1),V(x(t+1),y(t+1)))-f_friction(vx(t+1),vy(t+1))

Where f_ay (x (t+1), y (t+1), V (x (t+1), y (t+1)) is an acceleration function calculated from the terrain elevation and character position,

f_friction (vx (t+1), vy (t+1)) is a friction function calculated from the speed.

Termination condition: if a termination condition is met, such as reaching a predetermined time or reaching a specific location, the simulation is ended.

In this embodiment, the influence of gravity and friction on the color state is taken into account using a decision-making method based on reinforcement learning.

The deep reinforcement learning algorithm is a method combining deep learning with reinforcement learning to solve the problem of learning an optimal behavior strategy from an interactive environment without prior knowledge. The core idea is to implement decision and behavior optimization for complex environments by constructing a deep neural network (deep network) to approximate a value function or a strategy function.

The decision method based on reinforcement learning is to select the best action to optimize the strategy by using reinforcement learning algorithm through the observation of environmental state and the feedback of rewarding signal. The decision method may be embodied as a value function-based method (e.g., Q-learning, DQN), a policy gradient-based method (e.g., REINFORCE, PPO), or a combination of value functions and policy gradients (e.g., A3C, DDPG). These methods guide the decision making process by learning a cost function or a strategy function and using a return signal for feedback.

Further defining and refining of formulas and functions may be performed according to specific application scenarios and problem requirements. For example, in the above example, we consider the effect of terrain elevation and friction on the color state. Further defining and refining these formulas and functions may include:

influence function f_ay of terrain height on acceleration (x (t+1), y (t+1), V (x (t+1), y (t+1))): the effect of terrain elevation on the acceleration of the color may be described using a mathematical model or according to an empirical design function. Factors such as terrain slope, height differential, etc. may be considered and defined according to particular problems.

Friction function f_friction (vx (t+1), vy (t+1)): the friction function may be designed according to the speed of the character. This may be a simple linear function based on the coefficient of friction, or a more complex model, such as taking into account the square of the velocity.

In addition, other formulas and functions are further defined and refined according to specific problems, such as rewards functions, approximation methods of cost functions, structure of policy networks, and the like. These defining and refining processes need to be done in combination with the characteristics and requirements of the particular problem, as well as the feasibility of the algorithms and models.

In the above method, the deep reinforcement learning algorithm may use a value function-based algorithm such as DQN (deep q-Network). DQN is a reinforcement learning algorithm that uses deep neural networks to approximate functions.

The core formula of the DQN comprises the updating of the Q value function and the action selection strategy, and the specific formula is as follows:

update formula (update target) of Q value function:

Q(s(t),a(t))＝r(t+1)+γ*max(Q(s(t+1),a))

wherein Q (s (t), a (t)) represents the Q value of taking action a (t) at state s (t); r (t+1) represents the immediate prize obtained after action a (t) is taken at state s (t); gamma is a discount factor for balancing the importance of current rewards and future rewards; max (Q (s (t+1), a)) represents the maximum Q value of the selection action a in the next state s (t+1).

Action selection strategy:

a(t)＝argmax(Q(s(t),a))

The DQN algorithm uses a deep neural network to approximate a Q function, where the Q in the formula can be derived from the forward propagation of the neural network.

The above is the core formula of the DQN algorithm for updating the value function and the selection action. In practical applications, however, other details, such as experience playback, use of a target network, etc., need to be designed in combination with specific problems to improve the stability and effect of the algorithm.

Fig. 2 is a schematic structural view of an electronic device according to an embodiment of the present application. Referring to fig. 2, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 2, but not only one bus or type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the virtual geographic environment role modeling device on a logic level. And the processor is used for executing the program stored in the memory and particularly executing any one of the virtual geographic environment role modeling methods.

The virtual geographic environment role modeling method disclosed in the embodiment shown in fig. 1 of the present application can be applied to a processor or implemented by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

The electronic device may also execute a virtual geographic environment role modeling method in fig. 1, and implement the functions of the embodiment shown in fig. 1, which is not described herein.

The embodiments of the present application also provide a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device comprising a plurality of application programs, perform any of the aforementioned methods of modeling virtual geographic environment roles.

The technical scheme of the application is not limited to the specific embodiment, and all technical modifications made according to the technical scheme of the application fall within the protection scope of the application.

Claims

1. The virtual geographic environment role modeling method is characterized by comprising the following steps of:

2. The virtual geographic environment role modeling method of claim 1, wherein: defining an attribute function V (x, y) of the geographic environment in the step S1; defining an initial state S (0) of the character; the simulation process in the step S2 defines each time step t; and making a decision through a reinforcement learning algorithm according to the current state S (t) of the character and the environment attribute function V (x, y) to obtain an action a (t).

3. The virtual geographic environment role modeling method of claim 2, wherein: in the step S3, the character state S (t+1) is updated according to the action a (t); the position of the character is represented by coordinates (x, y), and the position is updated according to the current position and speed:

x(t+1)＝x(t)+vx(t)

y(t+1)＝y(t)+vy(t)

the current speed is updated according to the following equation:

vx(t+1)＝f_vx(vx(t)，vy(t)，S(t)，V(x，y))

vy(t+1)＝f_vy(vx(t)，vy(t)，S(t)，V(x，y))

S(t+1)＝F(S(t),V(x,y))

4. The virtual geographic environment role modeling method of claim 2, wherein: the reinforcement learning algorithm is as follows: the method comprises the steps of updating a Q value function and selecting an action strategy, wherein the specific formula is as follows:

updating the Q value function:

Q(s(t),a(t))＝r(t+1)+γ*max(Q(s(t+1),a))

action selection strategy:

a(t)＝argmax(Q(s(t),a))