CN116772886A

CN116772886A - Navigation method, device, equipment and storage medium for virtual characters in virtual scene

Info

Publication number: CN116772886A
Application number: CN202311036784.9A
Authority: CN
Inventors: 杨汶锦; 刘飞宇; 高一鸣; 王亮; 付强
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-08-17
Filing date: 2023-08-17
Publication date: 2023-09-19
Anticipated expiration: 2043-08-17
Also published as: CN116772886B

Abstract

The embodiment of the application discloses a navigation method, a device, equipment and a storage medium for virtual characters in a virtual scene, and can be applied to the field of traffic or maps. Comprising the following steps: acquiring character characteristics of the virtual character and navigation characteristics of the position of the virtual character in the virtual environment, wherein the navigation characteristics comprise environment perception characteristics and navigation point characteristics, the environment perception characteristics comprise two-dimensional environment characteristics of at least two dimensions, and the combination of the two-dimensional environment characteristics of the at least two dimensions is used for representing a three-dimensional environment of the position of the virtual character; inputting character features and navigation features into a navigation model to obtain the action execution probability of candidate actions output by the navigation model; and determining a target action from the candidate actions based on the action execution probability, and controlling the virtual character to move in the virtual environment based on the target action.

Description

Navigation method, device, equipment and storage medium for virtual characters in virtual scene

Technical Field

The embodiment of the application relates to the technical field of machine learning, in particular to a navigation method, a device, equipment and a storage medium for virtual roles in a virtual scene.

Background

In three-dimensional scenes, there are typically scenes in which the virtual character moves from one location to another in the map, during which the virtual character needs to be navigated by the agent.

In the related art, when training an agent by means of deep learning, the agent searches for an optimal path to an end point by voxelizing a three-dimensional environment and performing feature processing on 3D voxels.

However, in the solution provided by the related art, the use of 3D voxel features to construct the surrounding environment of the agent increases the feature number by tens or hundreds of times, and the time to process the feature increases, resulting in an increase in real experiments in the navigation process.

Disclosure of Invention

The embodiment of the application provides a navigation method, a device, equipment and a storage medium for virtual characters in a virtual scene. The technical scheme comprises the following aspects.

In one aspect, an embodiment of the present application provides a method for navigating a virtual character in a virtual scene, where the method includes the following steps.

Acquiring character characteristics of a virtual character and navigation characteristics of the position of the virtual character in a virtual environment, wherein the navigation characteristics comprise environment perception characteristics and navigation point characteristics, the environment perception characteristics comprise two-dimensional environment characteristics with at least two dimensions, the combination of the two-dimensional environment characteristics with at least two dimensions is used for representing a three-dimensional environment of the position of the virtual character, the navigation point characteristics are used for representing the position relationship between the position of the virtual character and a navigation destination, and the character characteristics are at least used for representing the moving state of the virtual character; inputting the character features and the navigation features into a navigation model to obtain the action execution probability of candidate actions output by the navigation model; and determining a target action from the candidate actions based on the action execution probability, and controlling the virtual character to move in the virtual environment based on the target action.

On the other hand, the embodiment provides a navigation method of virtual characters in a virtual scene, which comprises the following steps.

Acquiring sample character features of a virtual character and sample navigation features of positions of the virtual character in a virtual environment, wherein the sample navigation features comprise sample environment perception features and sample navigation point features, the sample environment perception features comprise sample two-dimensional environment features with at least two dimensions, the combination of the sample two-dimensional environment features with at least two dimensions is used for representing a sample three-dimensional environment of the positions of the virtual character, the sample navigation point features are used for representing a position relation between the positions of the virtual character and navigation destinations, and the sample character features are at least used for representing moving states of the virtual character; training a navigation model based on the sample character features and the sample navigation features by reinforcement learning, wherein the navigation model is used for determining the action execution probability of the candidate action based on the character features and the navigation features.

On the other hand, the present embodiment provides a navigation device for a virtual character in a virtual scene, the device including the following structure.

The system comprises an acquisition module, a navigation module and a display module, wherein the acquisition module is used for acquiring character characteristics of a virtual character and navigation characteristics of the position of the virtual character in a virtual environment, the navigation characteristics comprise environment perception characteristics and navigation point characteristics, the environment perception characteristics comprise two-dimensional environment characteristics of at least two dimensions, the combination of the two-dimensional environment characteristics of at least two dimensions is used for representing a three-dimensional environment of the position of the virtual character, the navigation point characteristics are used for representing the position relation between the position of the virtual character and a navigation destination, and the character characteristics are at least used for representing the moving state of the virtual character; the input module is used for inputting the character features and the navigation features into a navigation model to obtain the action execution probability of candidate actions output by the navigation model; and the determining module is used for determining a target action from the candidate actions based on the action execution probability and controlling the virtual character to move in the virtual environment based on the target action.

The system comprises an acquisition module, a navigation module and a display module, wherein the acquisition module is used for acquiring sample character features of a virtual character and sample navigation features of positions of the virtual character in a virtual environment, the sample navigation features comprise sample environment perception features and sample navigation point features, the sample environment perception features comprise sample two-dimensional environment features with at least two dimensions, the combination of the sample two-dimensional environment features with at least two dimensions is used for representing a sample three-dimensional environment of the positions of the virtual character, the sample navigation point features are used for representing a position relation between the positions of the virtual character and navigation destinations, and the sample character features are at least used for representing moving states of the virtual character; and the training module is used for training a navigation model in a reinforcement learning mode based on the sample character characteristics and the sample navigation characteristics, and the navigation model is used for determining the action execution probability of the candidate action based on the character characteristics and the navigation characteristics.

In another aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores at least one instruction, at least one section of program, a code set, or an instruction set, and the at least one instruction, the at least one section of program, the code set, or the instruction set is loaded and executed by the processor to implement a method for navigating a virtual character in a virtual scene as described in the above aspect.

In another aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions loaded and executed by a processor to implement a method of navigating a virtual character in a virtual scene as described in the above aspect.

In another aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the navigation method of the virtual character in the virtual scene provided in the above aspect.

The technical scheme provided by the embodiment of the application has the beneficial effects that at least the following are included.

In the embodiment of the application, in the process of navigating the virtual character in the virtual scene, the character characteristics and the navigation characteristics of the virtual character are obtained, so that the execution probability of the action corresponding to each candidate action is determined through the navigation model, and the target action is finally determined. During navigation, the acquisition of character features of the virtual character can characterize the movement state, such as the current position, of the current virtual character. The acquired navigation features comprise environment perception features, wherein the environment perception features comprise two-dimensional environment features with at least two dimensions, and the three-dimensional environment described by the virtual environment can be characterized by combining the two-dimensional environment features with the at least two dimensions. The computer device characterizes the three-dimensional environment based on the two-dimensional environment features of at least two dimensions through the navigation model, so that the navigation model does not need to characterize the three-dimensional environment through three-dimensional voxels, and the processing time length of the three-dimensional environment features is reduced. And the three-dimensional environment is depicted by adopting the two-dimensional environment characteristics of at least two dimensions, so that the perception of the navigation model on the map is more accurate, and the generalization of the navigation model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic diagram of a navigation network architecture.

FIG. 2 illustrates a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application.

Fig. 3 is a flowchart illustrating a method for navigating a virtual character in a virtual scene according to an exemplary embodiment of the present application.

FIG. 4 illustrates a schematic diagram of input features of a navigation model provided by an exemplary embodiment of the present application.

FIG. 5 illustrates a schematic diagram of a ring ray provided by an exemplary embodiment of the present application.

FIG. 6 illustrates a schematic diagram of a visual ring-shaped ray feature provided by an exemplary embodiment of the present application.

FIG. 7 illustrates a depth ray schematic provided by an exemplary embodiment of the present application.

FIG. 8 illustrates a schematic diagram of a visual environment depth profile provided by an exemplary embodiment of the present application.

FIG. 9 illustrates a high level ray schematic provided by an exemplary embodiment of the present application.

FIG. 10 illustrates a schematic view of a target range provided by an exemplary embodiment of the present application.

FIG. 11 illustrates a schematic diagram of a visual height ray feature provided by an exemplary embodiment of the present application.

FIG. 12 illustrates a schematic structure of a navigation model according to an exemplary embodiment of the present application.

FIG. 13 illustrates a navigation interface diagram of a virtual character in a virtual environment, according to an exemplary embodiment of the present application.

Fig. 14 illustrates a flowchart showing a navigation method of a virtual character in a seed virtual scene according to another exemplary embodiment of the present application.

FIG. 15 illustrates a flowchart of a navigation model training process provided by an exemplary embodiment of the present application.

FIG. 16 illustrates a schematic diagram of an action rewards categorization provided by an exemplary embodiment of the application.

Fig. 17 is a schematic diagram of a navigation system for virtual characters in a computer device according to an exemplary embodiment of the present application.

Fig. 18 is a schematic diagram illustrating a configuration of a navigation device for a virtual character in a virtual scene according to an exemplary embodiment of the present application.

Fig. 19 is a schematic structural view of a navigation device for virtual characters in a virtual scene according to another exemplary embodiment of the present application.

Fig. 20 is a schematic diagram showing the structure of a computer device according to an exemplary embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Embodiments of the present application relate to artificial intelligence (Artificial Intelligence, AI) and Machine Learning techniques, designed based on Machine Learning (ML) in artificial intelligence.

Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions. With the development and progress of artificial intelligence, the artificial intelligence is researched and applied in various fields, such as common smart home, smart customer service, virtual assistant, smart sound box, smart marketing, unmanned driving, automatic driving, robot, smart medical treatment and the like, and with the further development of future technology, the artificial intelligence is applied in more fields, and plays an increasingly important value.

Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance.

Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like.

The artificial intelligence technology such as reinforcement learning, deep learning and the like has wide application in various fields. In particular, the present application relates to reinforcement learning techniques in machine learning.

Reinforcement learning (Reinforcement Learning, RL) is a branch of machine learning, and agents learn by interacting with env (environment). This is a goal-oriented learning process, in which the agent is not informed of what actions to take; instead, the agent learns from the results of its actions.

The agent is able to sense the environment through the sensor and to act on the environment through the actuator, and for each possible series of senses, the agent should choose an action to perform that would maximize its energy expectations if it had evidence provided by the sequence of indications and senses. In the embodiment of the application, the virtual role is used as an agent to sense the virtual environment and execute the target action.

A virtual environment is a virtual environment that an application displays (or provides) when running on a computer device. The virtual environment may be a simulation environment for the real world, a semi-simulation and semi-imaginary environment, or a pure imaginary environment. The virtual environment may be any one of a two-dimensional virtual environment, a 2.5-dimensional virtual environment, and a three-dimensional virtual environment, which is not limited in the present application. The following embodiments are illustrated with the virtual environment being a three-dimensional virtual environment.

A virtual character refers to a movable object in a virtual environment. The movable object may be at least one of a virtual character, a virtual animal, a cartoon character. Alternatively, when the virtual environment is a three-dimensional virtual environment, the virtual characters may be three-dimensional virtual models, each having its own shape and volume in the three-dimensional virtual environment, occupying a part of the space in the three-dimensional virtual environment. Optionally, the virtual character is a three-dimensional character constructed based on three-dimensional human skeleton technology, which implements different external figures by wearing different skins. In some implementations, the avatar may also be implemented using a 2.5-dimensional or 2-dimensional model, as embodiments of the application are not limited in this regard.

In the related art, in the navigation process, position information such as position coordinates of an agent, position coordinates of a target point and current moving speed of the agent is extracted to obtain position characteristics of the agent. Meanwhile, the 3D voxel features are subjected to convolution operation through 3D convolution, so that the environmental features capable of representing the surrounding environment of the intelligent agent are obtained. And then inputting the environmental characteristics and the position characteristics into a navigation decision network, thereby realizing navigation route planning of the intelligent body.

However, in the technical scheme provided by the related technology, the computer equipment participates in the depiction of the three-dimensional environment based on the 3D voxel, and when the 3D voxel feature is processed, the time consumption of the agent in processing the 3D voxel feature is greatly increased due to the large feature number of the 3D type feature, so that the training cost is greatly increased, the timeliness of the agent in making decisions is also influenced, and the generalization of the agent to a new map is reduced.

Therefore, the embodiment of the application provides a navigation method for virtual roles in a virtual scene, which can be used for describing three-dimensional scene features based on two-dimensional environment features with at least two dimensions, so that the problem of time consumption increase in the related art is avoided, the timeliness of navigation model decision is not influenced, and the generalization of an intelligent agent to a new map is facilitated to be improved.

The computer device in the application can be a desktop computer, a laptop computer, a mobile phone, a tablet computer, a desktop computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III, dynamic image expert compression standard audio layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert compression standard audio layer 4) player, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal and the like. The computer device has installed and running therein an application program supporting a virtual environment, such as an application program supporting a three-dimensional virtual environment. The application may be any one of a virtual reality application, a three-dimensional map application, a TPS (Third Person Shooter) game, a FPS (First Person Shooter) game, a MOBA (Multiplayer Online Battle Arena, multiplayer online tactical competition) game. Alternatively, the application may be a stand-alone application, such as a stand-alone 3D game, or a network-on-line application. The following embodiments are illustrated with application in a game.

Games based on virtual environments often consist of one or more maps of the game world, where the virtual environment simulates a real world scene, and virtual characters can walk, run, jump, shoot, fight, drive, climb, glide, switch use of virtual props, use of virtual props to attack other virtual characters, etc. in the virtual environment.

Referring to fig. 2, a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application is shown. The implementation environment may include: terminal 210 and server 220.

In an embodiment of the present application, the terminal 210 includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a desktop computer, an electronic book reader, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, and the like. The terminal 210 has an application 211 supporting a virtual environment, which may be an open-play (RPG) Game running therein. When the terminal 210 runs the application 211, a user interface of the application 211 is displayed on a screen of the terminal 210. The user uses the terminal 210 to control a virtual Character located in the virtual environment to perform an activity, or the terminal controls an NPC (Non-Player Character) located in the virtual environment to perform an activity. The activities of the avatar include, but are not limited to: adjusting at least one of body posture, crawling, walking, running, riding, flying, jumping, driving, picking up, shooting, attacking, throwing, releasing skills.

The server 220 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligence platforms, and the like. In the embodiment of the present application, the server 220 is configured to provide a background service for an application program supporting a three-dimensional virtual environment. Optionally, the server 220 takes on primary computing work and the terminal takes on secondary computing work; alternatively, the server 220 takes on secondary computing work and the terminal takes on primary computing work; alternatively, a distributed computing architecture is used for collaborative computing between the server 220 and the terminal.

Optionally, in the solution provided in the embodiment of the present application, the trained navigation model is stored in the terminal 210, and the terminal 210 determines the target action through the navigation model under the condition of having the navigation requirement of the virtual character. Optionally, in the solution provided in the embodiment of the present application, the trained navigation model is stored in the server 220, and in the case of having a navigation requirement of the virtual character, the server 220 obtains the character feature and the navigation feature, determines the target action through the navigation model, and returns to the terminal 210. Optionally, in the solution provided in the embodiment of the present application, the trained navigation model is stored in the server 220, and under the condition of having a navigation requirement of the virtual character, the terminal 210 obtains the character feature and the navigation feature of the virtual character, sends the character feature and the navigation feature to the server 220, and the server 220 determines the target action through the navigation model and returns to the terminal 210.

In training the navigation model, the server 220 trains the navigation model based on the acquired character features and the navigation features. After the training is finished, the server 220 or the terminal 210 may predict the location of the virtual environment where the current virtual character is located based on the trained navigation model to determine a next route to the navigation destination.

The method provided by the embodiment can be applied to the navigation scene of the virtual roles in the virtual scene. The following describes schematically an application scenario of a method for navigating a virtual character in a virtual scenario provided by an embodiment of the present application.

The navigation method of the virtual characters in the virtual scene is applied to the game scene, under the condition that the virtual characters need to be navigated in the game scene, the character characteristics of the virtual characters and the navigation characteristics of the positions of the virtual characters in the virtual environment are obtained, and the action execution probability of candidate actions is generated based on the character characteristics and the navigation characteristics through the trained navigation model, so that the target actions required by the virtual characters to advance to the navigation destination in the three-dimensional environment are determined. In the process of training the navigation model, the character characteristics of the virtual character and the navigation characteristics of the position of the virtual character in the virtual environment are obtained, and the character characteristics and the navigation characteristics are input into the navigation model which is not trained, so that the navigation model is trained. The process of training the navigation model may be performed by the server 220, and the process of determining the target action in the application scenario may be performed by the terminal 210 or may be performed by the server 220.

It should be noted that the embodiments of the present application may be applied to various scenarios such as cloud technology, artificial intelligence, intelligent traffic, driving assistance, etc. The above-mentioned implementation environments are merely illustrative examples, and are not limited to the application scenario of the embodiments of the present application.

It should be noted that, the scheme provided by the embodiment of the application can be applied to a navigation scene of a virtual environment as well as a navigation scene of a real world. The embodiment of the application can be used for the vehicle-mounted terminal, and the vehicle-mounted terminal plans the navigation route conforming to the real-environment two-dimensional environment characteristics by acquiring the real-environment two-dimensional environment characteristics and positioning the vehicle-mounted terminal according to the real-time positioning and the positioning of the navigation destination.

For convenience of description, the following embodiments are described as examples of a method for navigating a virtual character in a virtual scene performed by a computer device.

Referring to fig. 3, a flowchart of a method for navigating virtual characters in a virtual scene according to an exemplary embodiment of the present application is shown. This embodiment will be described by taking the method for a computer device as an example, and the method includes the following steps.

Step 301, acquiring character characteristics of the virtual character and navigation characteristics of the position of the virtual character in the virtual environment.

In a game scenario, a computer device extracts useful information in the game scenario, which refers to features that can be used to input a navigation model, including character features of a virtual character and navigation features of where the virtual character is located in the virtual environment. Wherein the character features are used at least to characterize the movement state of the virtual character. For example, character features include coordinates of where the virtual character is currently located, the current orientation of the virtual character, and the moving speed of the virtual character, etc.

The navigation features comprise environment perception features and navigation point features, and the navigation point features are used for representing the position relation between the position of the virtual character and the navigation destination. For example, the distance between the current position of the virtual character in the virtual environment and the navigation destination, the difference between the position coordinates of the current position and the position coordinates of the navigation destination, the distance between the current position of the virtual character and the navigation destination, and the like.

The environment-aware features contained in the navigation features comprise at least two-dimensional environment features, a combination of which is used to characterize the three-dimensional environment in which the virtual character is located. The two-dimensional environmental features refer to features in a certain dimension in a three-dimensional space, and the features in the three-dimensional space can be represented to a certain extent by combining two-dimensional features in at least two dimensions in the three-dimensional space. Therefore, the embodiment of the application characterizes the three-dimensional environment where the virtual character is located in the virtual environment by adopting at least two-dimensional environment characteristics.

Alternatively, the computer device may acquire character features of the virtual character once and navigation features of the position of the virtual character in the virtual environment when each frame of game screen is displayed. Alternatively, the computer device may acquire character features once after the display of each N game frames is completed and navigation features of the virtual character in the virtual environment. The frequency of acquiring the characteristics is relatively fast, so that the navigation model can acquire more abundant information when predicting, and the probability of calculating candidate actions is relatively accurate; the feature acquisition frequency is low, so that the situation that a new target action is received under the condition that the virtual character does not finish executing the first target action can be avoided, the execution of the first target action is interrupted, and the virtual character action can be controlled to be wrong. For example, the computer device may acquire the character features and navigation features for each frame and input the character features and navigation features acquired for frame 4 into the navigation model.

Alternatively, when the character feature is acquired, the character feature may be classified into a movement related feature and a non-movement related feature, where the movement related feature includes at least a character movement speed, a movement acceleration, and the like, and the non-movement related attribute may be physical strength of the character itself or the like. The character features are divided into the movement related features and the non-movement related features and input into the navigation model, so that the navigation model can determine the action execution probability which better accords with the current character features according to the relation between the movement related features and the non-movement related features.

Step 302, inputting the character features and the navigation features into the navigation model to obtain the action execution probability of the candidate actions output by the navigation model.

After the character features and the navigation features are acquired by the computer equipment, the character features and the navigation features are input into a navigation model, and the action execution probabilities of different candidate actions are determined by the navigation model aiming at the current moving state of the virtual character and the two-dimensional environment of the position of the virtual character.

Alternatively, the candidate actions include a direction in which the virtual character moves and an action in which the virtual character moves. The direction of movement of the virtual character may be the same as the direction of the current virtual character or may be a direction having an angle with the direction of the virtual character. Further, in some embodiments, the movement direction of the avatar may be upward or downward, or may be obliquely upward or obliquely downward. The embodiment of the application is not limited to the classification of the moving direction.

The movement of the avatar may include walking, scovering, diving, jumping, climbing, flying, and gliding, among others, and the present embodiment is not limited thereto.

Because the navigation model calculates the action execution probability of the candidate action based on the character features and the navigation features, the action execution probability output by the navigation model is related to the character features and the navigation features, namely, the action execution probability output by the navigation model is determined based on the virtual environment where the current position is and the current movement state of the virtual character. For example, when the navigation feature indicates that the navigation destination is located right in front of the virtual character and there is one object of a low height in front, the navigation model outputs that the motion execution probability of the jumping motion is higher than the motion execution probability of the diving motion, and the navigation model outputs that the motion execution probability of moving toward the virtual character should be higher than the motion execution probability of moving toward the opposite direction of the virtual character.

After character features and navigation features are input into the navigation model by the computer equipment, the navigation model plans a route to a navigation destination for the virtual character based on the received character features and navigation features, the character features and the navigation features are input once by the computer equipment every time the navigation model inputs, the navigation model updates the navigation route based on the latest received character features and navigation features, and the action execution probability of each candidate action is redetermined.

And step 303, determining a target action from the candidate actions based on the action execution probability, and controlling the virtual character to move in the virtual environment based on the target action.

Optionally, the computer device determines the target action based on a probability value of an action execution probability. After obtaining the action execution probability corresponding to the candidate action, the computer device determines the candidate action corresponding to the maximum action execution probability as the target action.

Optionally, the computer device determines a target movement direction and a target movement action from the candidate actions based on the action execution probability, and controls the virtual character to move in the virtual environment based on the target movement direction and the target movement action.

In one possible implementation, the computer device encodes the different candidate actions separately, resulting in a plurality of candidate action tags. The computer equipment inputs the character features and the navigation features into the navigation model to obtain the action execution probability corresponding to the candidate action label. A target candidate action tag is determined from the candidate action tags based on the action execution probability. In the case of determining the target candidate action tag, the computer device decodes the target candidate action tag, obtains a target action, and controls the virtual character to move in the virtual environment based on the target action.

In summary, in the embodiment of the present application, in the process of navigating a virtual character in a virtual scene, character features and navigation features of the virtual character are obtained, so that the execution probability of actions corresponding to each candidate action is determined through a navigation model, and finally, the target action is determined. During navigation, the acquisition of character features of the virtual character can characterize the movement state, such as the current position, of the current virtual character. The acquired navigation features comprise environment perception features, wherein the acquired navigation features comprise two-dimensional environment features with at least two dimensions, and the combination of the two-dimensional environment features with at least two dimensions can represent the three-dimensional environment of the virtual environment. The computer device characterizes the three-dimensional environment based on the two-dimensional environment features of at least two dimensions through the navigation model, so that the navigation model does not need to characterize the three-dimensional environment through three-dimensional voxels, and the processing time length of the three-dimensional environment features is reduced. And the three-dimensional environment is depicted by adopting the two-dimensional environment characteristics of at least two dimensions, so that the perception of the navigation model on the map is more accurate, and the generalization of the navigation model is improved.

The avatar may perform walking, running, jumping, climbing, gliding, and using skills, among other actions, which are candidate actions, in the virtual environment. In some game scenarios, some candidate actions may have a cooling state or may consume physical attributes of the virtual character, so that in order to ensure that the determined target action is executable by the current virtual action, character attribute features for characterizing the attribute condition of the virtual character state and character skill features for characterizing the skill cooling condition of the virtual character need to be input into the navigation model, that is, the character features of the virtual character include the character attribute features and the character skill states, so that the action execution probability of the candidate action output by the navigation model is related to the character attribute features and the character skill features.

Referring to FIG. 4, a schematic diagram of input features of a navigation model provided by an exemplary embodiment of the present application is shown, and mainly includes character features including character skill features, character location features, and character attribute features, as well as navigation features. Character skill characteristics are used to characterize cooling of character skills, including in particular whether each skill of the character is in a released state, in an available state, in a cooled state, and so forth. The character attribute features are used for representing the state attribute conditions of the virtual characters, and specifically comprise: status, attributes, movement status, physical strength, etc. of the avatar. The specific content of the character position state for characterizing the navigation features such as the position of the virtual character, the direction of the virtual character, and the movement speed of the virtual character may refer to the above step 301, which is not described in detail in this embodiment.

The action execution probability of the candidate action output by the navigation model is related to the character attribute feature and the character skill feature, that is, the action execution probability of the candidate action output by the navigation model is affected by the character attribute feature and the character skill feature. For example, since the character attribute feature indicates that the physical strength value of the current virtual character is 0 and that high-speed walking and gliding cannot be performed when the physical strength value of the virtual character is zero, the probability value of the probability of the navigation model outputting the motion execution corresponding to high-speed walking and gliding is low and the probability value of the probability of the motion execution corresponding to walking motion is high in the current scene. For another example, if the character skill feature indicates that the current flight skill is in a cooling state, the execution probability of the action corresponding to the output flight action of the navigation model is lower in the current scene.

In the embodiment of the application, the character characteristics input into the navigation model comprise character attribute characteristics and character skill characteristics, so that the action execution probability corresponding to the Zhou action output by the navigation model can be matched with the state of the current virtual character. And the conflict between the determined target action and the state of the current virtual character is avoided, so that the error of navigation decision is avoided.

Before navigating, a computer device needs to obtain character features of a virtual character and navigation features of a position of the virtual character in a virtual environment. Wherein character characteristics of the virtual character may be obtained by a server reading a background process of the game. And the navigation feature of the position of the virtual character in the virtual environment needs to be acquired based on the position of the virtual character in the virtual environment.

1. For the navigation point feature.

The computer equipment determines the relative position relation between the position of the virtual character and the navigation destination based on the position of the virtual character and the position of the navigation destination in the virtual environment, and obtains the navigation point characteristics.

Optionally, the computer device determines a linear distance between the position of the virtual character and the position of the navigation destination in the virtual environment based on the position coordinates of the position of the virtual character and the position coordinates of the navigation destination in the virtual environment, for example, the linear distance may be calculated according to a formula between two points in the three-dimensional space.

Optionally, the computer device calculates a difference between the position coordinates of the position of the virtual character and the position coordinates corresponding to the navigation destination based on the position coordinates of the position of the virtual character and the position coordinates of the navigation destination in the virtual environment.

Optionally, the computer device determines a distance traveled by the virtual character to the navigation destination in the previous frame based on the position coordinates of the current position of the virtual character, the position coordinates of the navigation destination, and the position coordinates of the position of the virtual character in the previous frame.

Optionally, the computer device determines an angle between the current orientation of the virtual character and a connection line between the location of the virtual character and the navigation destination based on the location coordinates of the current location of the virtual character, the location coordinates of the navigation destination, and the current orientation of the virtual character.

2. For the context aware feature.

The computer device obtains the environmental perception feature through radiation detection based on the location of the virtual character.

The ray detection refers to that rays are emitted from specific positions to corresponding directions, and the position relation between an object and a ray emission point is roughly determined through reflection conditions generated after the rays strike the surface of the object, wherein the position relation comprises distance and azimuth.

Optionally, the environment-aware feature includes at least two of an annular ray feature, a depth ray feature, and a height ray feature, where the annular ray feature is used to characterize a distribution of objects at a same level around the virtual character, the depth ray feature is used to characterize a depth of the objects in a direction of the virtual character, and the height ray feature is used to characterize a height of the objects around a location where the virtual character is located.

The objects around the virtual character may be terrain, buildings, and the like, which is not limited in this embodiment.

1. For ring-shaped ray features.

The computer equipment takes the virtual character as a starting point, and emits environment-aware rays to the periphery, wherein the environment-aware rays reflect along the normal direction of the environment-aware rays under the condition that the environment-aware rays collide with the surface of an object, and different environment-aware rays are located at the same horizontal height.

Alternatively, to improve the accuracy of performing the ring-shaped ray feature, a plurality of environment-aware rays of different lengths may be emitted, for example, 2 meters, 20 meters and 50 meters, where the accuracy of object awareness for the same location is different for rays of different lengths, for example, for an object in the virtual character of 2 meters, the accuracy of object location perceived by the two meters of environment-aware rays is higher than the accuracy of object location perceived by the 50 meters of environment-aware rays.

Schematically, as shown in fig. 5, a schematic view of a ring ray provided by an exemplary embodiment of the present application is shown. The P point is the position of the virtual character, the virtual character is taken as a starting point, environment sensing rays are sent to the periphery, reflection occurs when the sensing rays collide with the surface of the object, and the computer equipment can determine the object conditions in the virtual environment around the virtual character according to the reflection conditions of the environment sensing rays. The environment-aware rays in the figure are emitted from the feet of the avatar, so that the determined ring-shaped ray features can characterize the distance between the object and the avatar in a horizontal plane around the avatar at the same level as the feet.

It should be noted that, in the above schematic diagrams, only a part of emission modes of the environmental perception rays are shown, and in practical application, more accurate ring-shaped ray characteristics can be obtained by emitting more dense environmental perception rays.

The ring-shaped ray characteristic refers to the distribution condition of objects with the same horizontal height around the virtual character, so that in order to enable the virtual character to sense the surrounding environment more comprehensively, different heights of the virtual character can be set as starting points, and environment sensing rays are emitted to the surrounding. For example, the head waist and the foot of the virtual character are respectively used as the starting points of the environment sensing rays, the environment sensing rays are emitted to the periphery, and the distribution condition of three objects with different heights around the virtual character is obtained.

In one possible implementation, to obtain ring-shaped ray features of different heights, the computer device transmits at least two lengths of environment-aware rays around, starting at least two heights where the avatar is located.

Wherein, at least two kinds of heights can be lower than the height of the virtual character, and at least one kind of height higher than the height of the virtual character can also exist. Moreover, the more the heights are selected as starting points, the denser the environment sensing rays are in the vertical direction, and the acquired ring-shaped ray characteristics are more accurate.

After emitting the ring-shaped ray, the computer device generates a ring-shaped ray feature based on the reflection of the ambient-perceived ray.

After emitting the ring-shaped ray, the computer device can determine the distance between the surrounding objects and the virtual character on the same level horizontal plane according to the reflection condition of the environment-aware ray, so as to generate the ring-shaped ray characteristic according to the distance between the surrounding 360-degree objects and the virtual character.

Schematically, FIG. 6 shows a schematic diagram of a visualized ring-shaped ray feature provided by an exemplary embodiment of the application. The three-level annular ray features are the height of the virtual character foot, the height of the virtual character waist and the height of the virtual character head. The existence of the objects in the three heights around the virtual character is different, namely the ring-shaped ray characteristics corresponding to the three heights are different, and P1, P2 and P3 are the starting points of the different heights respectively. In the ring ray characteristic, the current direction of the virtual character is 0 degrees, the current back direction of the virtual character is 180 degrees, and other degrees are distribution conditions of objects with corresponding included angles with the direction of the virtual character. In the ring ray characteristic, different points represent reflection points of environment-aware rays, and the distance between the reflection points and the starting points is the distance between an object in the virtual environment and the virtual character in the corresponding height. The distance between the circular rays emitted in the figure is 350 meters, i.e. the distribution characteristics of objects with the same height in the range of 350 meters around the virtual character can be detected, wherein the distance between each circle is 50 meters, and the total of the 7 circles is 350 meters.

It should be noted that the range of the virtual environment represented by the ring-shaped ray features shown in fig. 6 is merely an illustrative example, and in practical applications, ring-shaped ray features of the virtual environment within a smaller or larger range may be detected. For example, in practice, the computer device detects a virtual environment within 50 meters to obtain a ring-shaped ray characteristic that characterizes the distribution of objects within 50 meters of the virtual environment from the virtual character.

2. For depth ray features.

The depth ray features are similar to the depth map, and refer to a two-dimensional feature image of depth values of all pixels in a three-dimensional environment, and each position of the depth ray features stores the depth of the pixel at the position.

The computer device emits an environment-aware ray toward a direction of the virtual character with the virtual character as a starting point, wherein the environment-aware ray is reflected along a normal direction of the environment-aware ray in case of collision of the environment-aware ray to the surface of the object.

The depth ray features are used for detecting the object distribution condition in the direction of the virtual character, the environment-aware rays are emitted from the virtual character as a starting point and transmitted to the direction of the virtual character, and a certain angle is formed between the emitted rays and a plane in the direction perpendicular to the position of the virtual character. Optionally, when detecting the depth ray feature, the transmitted environment-aware ray has a maximum angle and a minimum angle with respect to a horizontal plane perpendicular to the position of the virtual character, thereby limiting a range of the depth ray feature detected based on the environment-aware ray. For example, the maximum angle may be 60 degrees of divergence angle that diverges outwardly from the starting point, and the minimum angle may be 30 degrees of divergence angle that diverges outwardly from the starting point.

Schematically, as shown in fig. 7, a depth ray schematic is shown provided by an exemplary embodiment of the present application. The P point is the position of the virtual character, the P point is taken as a starting point, environment-aware rays are emitted towards the virtual character, the starting points among different environment-aware rays are the same, reflection can be generated under the condition that the environment-aware rays collide with objects in the virtual environment, and therefore the computer equipment can determine the depth ray characteristics according to the reflection condition of the environment-aware rays. The arrow direction in the figure is the reflection generated by the environmental perception rays after impinging on the object surface.

Unlike ring-shaped rays, not all of the ambient-aware rays lie in one plane. The computer device is capable of determining a distance between an object in the virtual environment and the virtual character in a direction in which the virtual character is oriented based on the depth ray features.

In one possible embodiment, the accuracy of determining the depth of the object in the direction of the virtual character is improved. The computer device transmits at least two lengths of environment-aware rays around the virtual character using at least two heights of the virtual character as starting points.

Alternatively, to improve the accuracy of the depth ray feature, three different lengths of the environment-aware rays, such as 2 meters, 20 meters, and 50 meters, may be emitted, where the different lengths of the environment-aware rays have different accuracy for object awareness at the same location, such as for an object in the virtual character of 2 meters, where the object perceived by the two meters of the environment-aware rays is located with higher accuracy than the 50 meters of the environment-aware rays.

In the case of transmitting an ambient-aware ray, the computer device generates a ring-shaped ray feature based on the reflection of the ambient-aware ray.

After the ring-shaped rays are emitted, the computer device can determine the distances between objects in the environment-aware ray range and the virtual roles in the direction of the virtual roles according to the reflection condition of the environment-aware rays, so that depth ray characteristics are generated according to the distances between different objects and the virtual roles.

Referring to fig. 8, a schematic diagram of a visual environment depth feature provided by an exemplary embodiment of the present application is shown. In the depth ray characteristic, the distribution condition of the objects in the direction of the virtual character is represented by the depth of the color, wherein the darker the color depth is, the smaller the distance between the object and the virtual character exists in the direction of the virtual character, and the lighter the color is, the larger the distance between the object and the virtual character exists in the direction of the virtual character. In the figure, the region of the first color characterizes that the region has an object closest to the virtual character, and the region of the second color characterizes that the region does not have an object within the length of the environment-aware ray.

In a possible embodiment, the computer is arranged to determine a depth map of the virtual environment with the virtual character facing in the direction by means of the reflection of the ambient perceived rays, and to subsequently determine a normal vector of the object surface on the basis of the reflection of the ambient perceived rays. Thereby enabling the computer device to determine to some extent also the three-dimensional characteristics of the virtual environment in which the virtual character is facing.

3. For high-level ray features.

The computer device determines a target range centered on the virtual character, and emits an environment-aware ray from a target height to the target range, wherein the environment-aware ray is reflected in a normal direction of the environment-aware ray in case of collision of the environment-aware ray against the object surface.

The target range may be any shape range, such as a rectangle, a circle, or an irregular pattern, and may be adjusted based on a specific application scenario, which is not limited in this embodiment.

The computer device emits a plurality of environment-aware rays from within a plane of the target height, the different environment-aware rays being parallel to each other and perpendicular to the direction of emission of the rays and to the horizontal direction.

The virtual character is used as a center to determine the target range, so that the computer equipment can determine the height condition of objects in the target range around the virtual character according to the reflection condition of the environment-perceived rays, and finally the obtained matrix with the height ray characteristic data of n x n is obtained.

Referring to fig. 9, a schematic high level ray diagram is shown, provided in accordance with an exemplary embodiment of the present application. In the figure, the target range is a square area with a dotted line, the area is a square area with a position P where the virtual character is located as a center, D1 is one half side length, and the target height of the emitted environment sensing ray is H. And emitting environment-sensing rays from the target height H to the target range, wherein the environment-sensing rays are emitted from top to bottom and reflected under the condition of collision on the surface of an object, and the arrow direction is the reflection direction of the environment-sensing rays.

In one possible embodiment, in order to acquire the height characteristics of objects within different ranges, the computer device determines at least two square ranges corresponding to different side lengths as target ranges centering on the virtual character.

Alternatively, the height environment characteristics obtained by the small-range target range can be used for representing the height condition of surrounding objects in more detail, and the computer equipment can be enabled to obtain the height condition of the objects in a larger range around the virtual character approximately by using the height environment characteristics obtained by the large-range target range.

Referring to fig. 10, a schematic diagram of a target range provided by an exemplary embodiment of the present application is shown. Wherein, the two target ranges are square areas with the virtual character as the center, and the target height is H. The method comprises the steps that the side length of a first target range is 2 multiplied by D1, and after environment sensing rays are emitted from the target height in the first target range, the computer equipment acquires the height ray characteristics of a small range; the second target range has a side length of 2×d2, and the computer device acquires a large range of height ray features after emitting the environment-aware rays from the target heights within the second target range.

After emitting the ambient perceived rays from the target height to the target range, the computer device generates a height ray feature based on the reflection of the ambient perceived rays.

After the computer device transmits the environment-aware rays, the computer device can determine the height conditions of objects around the virtual character according to the reflection conditions of the environment-aware rays, so as to generate height ray characteristics.

The height ray features are visualized with the horizontal plane of the target height as a reference plane, and the visualized height ray features characterize the distance between objects around the virtual character and the reference plane.

Referring to fig. 11, a schematic diagram of a visual high-level ray feature provided by an exemplary embodiment of the present application is shown.

In the depth ray feature, the distribution situation of the objects around the virtual character is represented by the depth of the color, wherein the darker the color depth is the smaller the distance between the objects around the virtual character and the reference plane, namely the darker the color is the higher the height of the objects around the virtual character is, the lighter the color is the smaller the distance between the objects in the direction of the virtual character and the reference plane is, namely the lighter the color is the lower the height of the objects in the direction of the virtual character is. In the figure, the region of the first color characterizes the region having the highest height of the object present, and the region of the second color characterizes the region having the lowest height of the object over the length of the ambient perceived ray.

In the embodiment of the application, the computer equipment determines the two-dimensional environment characteristics of different dimensions based on the environment sensing rays, so that the navigation model can describe a three-dimensional environment according to the two-dimensional environment characteristics of at least two dimensions, the calculated amount of the characteristic processing is reduced, and the time delay of the characteristic processing is reduced. And moreover, the environment perception characteristics corresponding to a plurality of distance scales are obtained in different scales, so that the accuracy of describing the three-dimensional environment is improved, and the generalization capability of the navigation model can be enhanced to a certain extent.

After the character features and the navigation features are acquired, the computer device inputs the character features and the navigation features into a navigation model, and determines the execution probability of the action corresponding to each candidate action based on the character features and the navigation features through the navigation model.

The navigation model comprises a coding sub-network and a time sequence sub-network, wherein the coding sub-network is used for coding character features and navigation features, and the time sequence sub-network is used for determining action execution probabilities corresponding to different candidate actions.

Alternatively, the time sequence subnetwork may employ LSTM (Long Short Term Memory networks, long short term memory network model), convolutional neural network, etc., which is not limited in this embodiment.

Character features and navigation features are input into a navigation model, character features and navigation features are respectively encoded through an encoding sub-network, and character feature encoding results and navigation feature encoding results obtained through encoding are spliced to obtain navigation state features.

And inputting the navigation state characteristics into a time sequence sub-network to obtain the action execution probabilities corresponding to different candidate actions.

The process of coding the diagonal features and the navigation features of the navigation model is the process of extracting hidden layer coding vectors of the input features by the navigation network.

Alternatively, it is contemplated that the same encoding network characteristics may be used for the same type of character characteristics, e.g., the same encoding network weights may be shared for attributes of the virtual character that are all of the speed, direction, character status, etc. of the virtual character.

Alternatively, in some scenarios, it may be desirable for a portion of the features to have a greater impact on the probability of action execution of the candidate action, and thus, a higher coding network weight may be given to the portion of the features to increase the impact of the features on the probability of action execution.

In one possible implementation, the encoding sub-network includes a character feature encoder and an environmental feature encoder, and the environmental feature encoder includes at least two multi-scale convolutional layers, different multi-scale convolutional layers being used to multi-scale encode two-dimensional environmental features of different dimensions.

The character feature and the navigation feature are respectively feature-coded by the coding network, and optionally, the character feature encoder adopts a plurality of full-connection layers to code the character feature, and the plurality of full-connection layers can enable the navigation network model to extract deeper features and can mine the relevance among different features.

In the navigation network model, different two-dimensional environment features are respectively encoded. Alternatively, the ring-shaped ray features are encoded by two connected full-link layers. Optionally, the depth ray features and the height ray features are each encoded by a multi-scale convolution layer.

Optionally, in the depth ray feature detection process, the sensing scales of the emitted different ray lengths are the same, so that the scales of the depth ray features obtained based on the environment sensing rays with different lengths are the same, and no separate coding is needed. For the height ray characteristics, the scales of the height ray characteristics obtained by sensing different target ranges are different, so that the height ray characteristics with different scales are required to be encoded through multi-scale convolution layers respectively.

In the process of coding the navigation features through the coding sub-network, firstly, respectively carrying out convolution processing on the two-dimensional environment features through convolution kernels with different sizes in the multi-scale convolution layer to obtain a multi-scale coding result, and then, fusing the multi-scale coding result to obtain a navigation feature coding result.

In some embodiments, the three-dimensional virtual environment is quite complex, and the three-dimensional virtual environment has the characteristics of various terrain types, irregular shapes of the terrain and quite large volume difference of each terrain. Therefore, in the course of a virtual character traversing various terrains to reach navigation destinations several hundred meters away, it is necessary to have a more accurate perception of the environment. Therefore, the multi-scale convolution can be adopted to process the two-dimensional environment characteristics so as to more comprehensively describe the three-dimensional environment of the virtual environment. By the method, the navigation model can sense the shape and the position of the object in the virtual environment, path planning is performed in advance, and the virtual role is prevented from colliding with the object.

Optionally, a 1*1 convolution kernel, a 3*3 convolution kernel and a 5*5 convolution kernel can be adopted in the multi-scale convolution layer to respectively carry out convolution processing on the two-dimensional environment characteristics to obtain a multi-scale coding result, and the multi-scale coding result is sequentially subjected to convolution through a 5*5 convolution kernel, a 3*3 convolution kernel and a 1*1 convolution kernel to obtain navigation coding results corresponding to different two-dimensional environment characteristics.

Alternatively, the multi-scale convolution layer may employ an acceptance (initial) network, which is not limited in this embodiment.

Optionally, the coding sub-network is further configured to splice the feature coding results of different roles to obtain a total role feature coding result, and then downsample the role feature coding result through the pooling layer, and simultaneously compress the features to reduce the number of parameters. Alternatively, the pooling layer may select average pooling or maximum pooling, which is not limited in this embodiment. Optionally, the coding sub-network integrates each two-dimensional environment feature coding result in the navigation feature coding results through a plurality of full-connection layers, and then performs feature compression on the navigation feature coding results through a pooling layer.

Referring to fig. 12, a schematic structural diagram of a navigation model according to an exemplary embodiment of the present application is shown. In the figure, the navigation model includes a coding sub-network and a timing sub-network. The character feature coding method comprises the steps of coding character feature codes, coding character skill features and character attribute features, then carrying out feature splicing on coding results through the coding sub-network, and obtaining character feature coding results through a pooling layer and a full-connection layer. The coding sub-model further comprises a navigation feature coder which respectively codes the annular ray feature, the depth ray feature, the large-range height ray feature and the small-range height ray feature, then the coding sub-network performs feature splicing on the coding result, the character position feature and the navigation point feature, and the navigation feature coding result is obtained through a plurality of full-connection layers and the pooling layers in sequence. After characteristic splicing is carried out on the navigation characteristic coding result and the character characteristic coding result, the navigation characteristic coding result and the character characteristic coding result are input into a time sequence sub-network to determine the action execution probability of different candidate actions, wherein the time sequence sub-network comprises a full-connection layer and a long-period and short-period memory network.

In the embodiment of the application, the computer equipment adopts the multi-scale convolution layers to respectively encode the environment perception characteristics, so that the virtual character can better perceive the shape and the position of the object in the virtual environment, and the path planning is performed in advance, thereby avoiding the strategy of collision between the virtual character and the object.

Referring to fig. 13, a navigation interface diagram of a virtual character in a virtual environment according to an exemplary embodiment of the present application is shown. In the figure, the virtual character 1301 is located at the current position 1302 in the virtual environment, and the navigation destination is 1303. The computer equipment acquires character characteristics and navigation characteristics of the virtual environment at the current position of the virtual character, inputs the navigation model, and determines execution probabilities of different candidate actions according to the character characteristics of the current virtual character and the navigation characteristics of the virtual environment by the navigation model. And determining the target action corresponding to the highest candidate action execution probability as running right ahead based on the candidate action execution probability. After determining the target action, the computer device controls the virtual character 1301 to run in front of the current position 1302 of the virtual character, and controls the virtual character to run to the position 1304 of the next frame. And re-acquiring character characteristics of the virtual character and navigation characteristics of the virtual environment so that the navigation model re-determines the action execution probability corresponding to the candidate action.

In the application scenario in which the navigation process of the virtual character in the virtual scenario shown in the above embodiment is the navigation model, before the navigation model is applied, the navigation model needs to be trained, so that the virtual character can have the ability of navigating to the navigation destination based on the position of the virtual character in the virtual environment.

Referring to fig. 14, a flowchart of a method for navigating virtual characters in a seed virtual scene according to another exemplary embodiment of the present application is shown. This embodiment will be described by taking the method for a computer device as an example, and the method includes the following steps.

Step 1401, obtaining sample character features of the virtual character and sample navigation features of the position of the virtual character in the virtual environment.

In a game scene, a computer device extracts sample information in the game scene, wherein the sample information refers to characteristics which can be used for inputting a navigation model, and the sample information comprises sample character characteristics of a virtual character and navigation characteristics of the position of the virtual character in the virtual environment. Wherein the sample character features are used at least to characterize the movement state of the virtual character. For example, the sample character features include coordinates of where the virtual character is currently located, the current orientation of the virtual character, the moving speed of the virtual character, and the like.

The sample navigation features comprise sample environment perception features and sample navigation point features, and the sample navigation point features are used for representing the position relation between the position of the virtual character and the navigation destination. For example, the distance, relative angle, etc. between the location of the avatar currently in the virtual environment and the navigation destination.

The sample environment perception feature comprises at least two-dimensional sample environment features, and a combination of the at least two-dimensional sample environment features is used for representing a sample three-dimensional environment of the position of the virtual character. The sample two-dimensional environmental features refer to features in a certain dimension in a sample three-dimensional space, and the sample features in the sample three-dimensional space can be represented to a certain extent by combining the sample two-dimensional features in at least two dimensions in the sample three-dimensional space. Therefore, the embodiment of the application characterizes the sample three-dimensional environment where the virtual character is located in the virtual environment by adopting at least two sample two-dimensional environment characteristics.

Alternatively, when the sample character feature is acquired, the sample character feature may be classified into a movement related feature and a non-movement related feature, where the movement related feature includes at least a character movement speed, a movement acceleration, and the like, and the non-movement related attribute may be a character's own physical strength or the like. The sample character features are divided into the movement related features and the non-movement related features and input into the navigation model, so that the navigation model is beneficial to learning the change relation between the movement features and the non-movement features, and the action execution probability of the candidate action can be determined according to the non-movement features of the virtual character. For example, if the non-moving feature is a physical feature, the navigation learning can learn the influence of the candidate action on the physical strength of the virtual character, and can consider whether different current physical features can support completing the candidate action in the process of calculating the action execution probability of the candidate action, so as to adjust the probabilities of different candidate actions.

Step 1402, training a navigation model by reinforcement learning based on the sample character features and the sample navigation features.

Wherein the navigation model is used to determine an action execution probability of the candidate action based on the character features and the navigation features. After the computer equipment acquires the character features and the sample navigation features, the sample character features and the sample navigation features are input into a navigation model, the features of the sample two-dimensional environment of the current moving state and the position of the virtual character are learned through the navigation model, and the navigation model is trained.

Optionally, the training samples of the navigation model are sample character features and sample navigation features, and the two sample features are data generated by interaction with the virtual environment after the computer device controls the virtual character to execute corresponding actions. Training the navigation model may be performed by training the network model starting with an initialized weight of zero, or by fine-tuning the weight from a trained to converged model. In the training process from the initialization weight of zero, training is needed to be performed from a region with a simpler virtual environment, namely, a region with a smaller region and a simpler terrain, then reinforcement learning training is performed from a region with a difficult access to a region with a simpler terrain, and aiming at a more complex virtual environment, training samples are needed to be added.

After model training, whether the navigation model is trained is determined by model evaluation, in the process of model evaluation, an instruction for leading the virtual character to navigate from a starting point to a destination is issued, the index for evaluating the model strength is mainly the probability of reaching the designated destination, and in the issued instructions, under the condition that the probability of enabling the virtual character to reach the destination is higher than a probability threshold, the navigation model is determined to be trained.

Optionally, in the model evaluation process, other performance indexes, for example, attribute reduction condition of the virtual character in the navigation process, time spent by the virtual character reaching the destination, and the like, need to be referred to, so as to improve the navigation capability of the navigation model, and the embodiment is not limited to specific types of other performance indexes.

In some embodiments, the navigation model provided by the embodiment of the application can control the efficient running chart of the virtual character, thereby acquiring the game performance of different game areas and being beneficial to optimizing game parameters. In addition, the method can also be used for detecting loopholes existing in the game development process by combining the navigation of the virtual character and some recognition tools, for example, the problems that the virtual character penetrates through a wall, an air wall exists in a virtual environment, and the placement of partial objects in the virtual environment is unreasonable exist.

In some embodiments, the trained navigation model may be used to conduct new hand guidance in a game scenario, guiding the virtual character controlled by the new hand account to find a destination using the navigation AI.

In summary, in the embodiment of the present application, the navigation model is trained by reinforcement learning based on the sample character features and the sample navigation features, so that the navigation model obtained by training can determine the candidate action execution probability based on the character features and the navigation features. And the navigation model learns how to describe a three-dimensional environment through the environmental features of at least two dimensions through the learned sample two-dimensional environmental features, so that the processing time for processing the two-dimensional environmental features is reduced, and the feature processing time delay is reduced. In addition, the navigation model is subjected to reinforcement training based on the character features and the navigation features, so that generalization of the navigation model after training is facilitated to be improved, and the application scene is wider.

Since the virtual character performs actions such as walking, running, jumping, climbing, gliding, and using skills in the virtual environment, these actions are candidate actions. In some game scenarios, there may be a cooling state or physical attributes of the virtual character may be consumed without classifying the candidate action, and in order to ensure that the determined target action is executable by the current virtual action, a sample character attribute feature for characterizing the attribute condition of the virtual character state and a sample character skill feature for characterizing the skill cooling condition of the virtual character are required to be input into the navigation model, that is, the sample character feature of the virtual character includes the sample character attribute feature and the sample character skill state, so that the navigation model learns the relationship between the sample character skill feature and the sample character attribute feature and the sample candidate action execution probability, so that the action execution probability of the candidate action output in the application process is related to the character attribute feature and the character skill feature.

The navigation model learns the correlation between the sample action execution probability of the sample candidate action and the sample character attribute feature and the sample character skill feature, namely the sample action execution probability of the navigation model output candidate action can be influenced by the sample character attribute feature and the sample character skill feature. In the reinforcement learning process, if the correlation degree between the candidate action sample and the sample character attribute feature and the sample character skill feature is smaller, after the virtual character is controlled to execute the estimated target action, the obtained benefit is opposite to the expected obtained benefit, and the computer equipment updates the model parameters of the navigation model, so that the correlation degree between the sample action execution probability and the sample character skill feature and the sample character attribute feature is continuously improved, and the action execution probability output by the navigation model is also correlated with the character attribute feature and the character skill in the process of applying the navigation model.

In the embodiment of the application, the character characteristics and the character characteristic states are contained in the character characteristics input into the navigation model, so that the character skill characteristics output by the navigation model can accord with the states of the current virtual characters. And the conflict between the determined target action and the state of the current virtual character is avoided, so that the error of navigation decision is avoided.

In the process of training the navigation model through reinforcement learning, namely updating the model parameters of the navigation model, the computer equipment can obtain the maximized rewards after controlling the virtual character to execute the estimated target action determined through the navigation model. The process of training the navigation model will be described below by way of an exemplary embodiment.

Referring to FIG. 15, a flow chart of a navigation model training process provided by an exemplary embodiment of the present application is shown.

Step 1501, the sample character features and the sample navigation features are input into the navigation model, and the estimated motion execution probability of the candidate motion output by the navigation model is obtained.

In step 1502, a predicted target action is determined from the candidate actions based on the predicted action execution probability.

Optionally, the computer device determines the predicted target action based on a probability value of the predicted action execution probability. After obtaining the estimated motion execution probability corresponding to the candidate motion, the computer equipment determines the candidate motion corresponding to the maximum motion execution probability as the estimated target motion.

Optionally, the computer device determines a predicted target moving direction and a predicted target moving action from the candidate actions based on the predicted action execution probability, and controls the virtual character to move in the virtual environment based on the predicted target moving direction and the predicted target moving action.

In step 1503, the virtual character is controlled to move in the virtual environment based on the estimated target motion.

In one possible implementation, the computer device encodes the different candidate actions separately, resulting in a plurality of candidate action tags. The computer equipment inputs the sample character features and the sample navigation features into a navigation model to obtain estimated motion execution probability corresponding to the candidate motion label. And determining the estimated target candidate action label from the candidate action labels based on the estimated action execution probability. Under the condition that the estimated target candidate action label is determined, the computer equipment decodes the estimated target candidate action label to obtain an estimated target action, and controls the virtual character to move in the virtual environment based on the estimated target action.

Step 1504, determining action rewards corresponding to the estimated target actions based on the action execution results of the virtual roles.

After controlling the virtual character to move in the virtual environment, the computer device re-acquires the sample character features and the sample navigation features, and determines an action reward corresponding to the estimated action based on the sample character features of the previous frame, the sample navigation features of the previous frame, the re-acquired sample character features, and the re-acquired sample navigation features.

In the reinforcement learning process, the computer equipment selects an action (estimated target action) according to the current state (including sample character characteristics and sample navigation characteristics) of the navigation model, the virtual environment makes corresponding change to the action, namely, after the virtual character is controlled to execute the estimated target action, the sample character characteristics and the sample navigation characteristics are changed and transferred to a new state (including the acquired sample character characteristics and the sample navigation characteristics) and an action reward is generated, the action reward is usually a numerical value, and discount and accumulation sum of the rewards can become benefit or return, and the benefit or return (i.e., the action reward) is the maximum target to be achieved by the navigation route planned by the navigation model.

Optionally, the action rewards refer to benefits generated by performing the estimated target action, and the action rewards may include forward rewards and reverse rewards, and the navigation model expects to generate the forward rewards after performing the estimated target action.

Referring to FIG. 16, a schematic diagram of action rewards categorization provided by an exemplary embodiment of the application is shown. In the figure, action rewards include sparse rewards including elimination rewards and arrival rewards, and dense rewards. Dense rewards include attribute rewards, proximity rewards, drop rewards, and short reach rewards. Note that the types of rewards and contents of rewards shown in fig. 16 are only illustrative examples, and specific classifications of action rewards are not limited.

The purpose of the dense rewards is to guide the intelligent agent to quickly read and explore, namely to accelerate the training speed of the navigation model, and the purpose of the sparse rewards is to enhance the exploration ability of the intelligent agent, namely to enhance the navigation ability of the navigation model.

In one possible embodiment, the action rewards include sparse rewards including at least one of reach rewards and phase out rewards, the reach rewards belonging to forward rewards, and the phase out rewards belonging to reverse rewards.

Specifically, when the result of the execution of the action by the virtual character indicates that the virtual character arrives at the navigation destination, the arrival reward is determined as the action reward corresponding to the estimated target action.

After the virtual character executes the estimated target action, under the condition that the position of the virtual character in the virtual environment enters the target range around the navigation destination, the navigation model judges that the navigation task is completed, and the virtual character reaches the rewards based on the extremely large forward rewards. For example, after the virtual character performs the estimated target motion, the virtual character enters a range of 10 meters around the navigation destination, and then determines the navigation task to complete.

And determining the elimination rewards as action rewards corresponding to the estimated target actions under the condition that the action execution result of the virtual characters indicates that the virtual characters are eliminated.

After the virtual character executes the target action, the virtual character is eliminated, and then a great reverse rewarding, namely eliminating rewarding, is given, so that the navigation model can be guided to consider the eliminating factors of the virtual character in the process of determining the action execution probability, and the situation that the navigation destination cannot be reached due to death of the virtual character in the navigation process is avoided.

In another possible embodiment, the action rewards further include a dense reward including at least one of a proximity reward, a drop reward, a not-reached reward, and an attribute reward.

Specifically, when the action execution result of the virtual character indicates that the distance between the virtual character and the navigation destination is reduced, determining the approach rewards as action rewards corresponding to the estimated target actions; and determining the approaching rewards as action rewards corresponding to the estimated target actions and as the inverse rewards when the action execution result of the virtual character indicates that the virtual character is far away from the navigation destination.

Alternatively, the joining avatar is currently located closer than all previous distances, and a proximity reward is obtained. Conversely, a reverse prize is obtained.

Determining the unreachable rewards as action rewards corresponding to the estimated target actions when the action execution result of the virtual character indicates that the virtual character does not reach the target range centering on the navigation destination; when the result of executing the action of the virtual character indicates that the virtual character reaches a target range centered on the navigation destination, determining that the virtual character does not reach the reward as an action reward corresponding to the estimated target action.

That is, in the case where the result of the execution of the action by the virtual character indicates that the virtual character does not reach the target range centered on the navigation destination, the reverse bonus is given.

When the result of executing the action of the virtual character indicates that the target attribute value of the virtual character is reduced, determining the attribute rewards as action rewards corresponding to the estimated target action. Wherein the target attribute value may be a blood volume value, a legal consumption value, a hunger value, etc., which is not limited in this embodiment.

That is, when the result of executing the action of the virtual character indicates that the target attribute value of the virtual character is lowered, the reverse bonus is given. For example, if the result of the execution of the action by the virtual character indicates that the blood volume percentage of the virtual character is reduced, a reverse bonus is given.

And determining the drop rewards as action rewards corresponding to the estimated target actions under the condition that the action execution results of the virtual characters indicate that the drop height of the virtual characters reaches the drop height threshold value.

That is, in the case where the result of the action execution of the virtual character indicates that the drop height of the virtual character reaches the drop height threshold value, a back bonus is given. For example, if the drop prize specifies a drop threshold of 15 meters, a reverse prize is awarded in the case where the result of the execution of the action by the virtual character indicates that the drop height of the virtual character reaches 15 meters. The virtual roles can be prevented from falling injury as much as possible in the navigation process.

Optionally, in some scenarios, the virtual character has a physical attribute, and the dense reward may include the physical attribute. And determining the physical strength as the action rewards corresponding to the estimated target action under the condition that the action execution result of the virtual character indicates that the physical strength value of the virtual character is lower than the physical strength threshold value. The navigation model can be prevented from controlling the virtual character to execute the action of consuming physical strength when the physical strength of the virtual character is insufficient.

Alternatively, a segmented dense reward may be employed to enhance the navigation capabilities of the navigation model. For example, a sectional approach reward is adopted, that is, a forward reward is given in a case where the result of action execution of the virtual character indicates that the virtual character approaches the navigation destination more than 10 meters. The exploration area can be enlarged by enhancing the exploration ability of the navigation model.

In step 1505, model parameters of the navigation model are updated based on the action rewards.

And updating the model parameters of the navigation model based on the action rewards, namely, adjusting the model parameters of the navigation parameters for the computer equipment, so that the action rewards tend to be maximized continuously.

Optionally, the navigation model may be robustly trained using PPO (Proximal Policy Optimization, near-end policy optimization). Before training starts, initializing a navigation model, wherein the model comprises a navigation strategy and a value parameter, and the navigation strategy is a function for outputting estimated action execution probability according to sample character characteristics and sample navigation characteristics. Firstly, a first sample character characteristic and a first sample navigation characteristic are obtained, the execution probability of the estimated action is calculated according to a navigation strategy, and the estimated target action is selected. The computer equipment controls the virtual character to execute the estimated target action, and obtains the second sample character characteristic, the second sample navigation characteristic and the action rewards of the next state. The computer device updates the navigation strategy, i.e. the model parameters, of the navigation model in accordance with the action rewards. Repeating the above steps until the stopping condition is reached.

Alternatively, other reinforcement learning algorithms may be used to train the navigation model, such as DDPG (Deep Deterministic Policy Gradient, depth deterministic strategy gradient) algorithm, SAC (Soft Actor-Critic) algorithm, and A3C (Asynchronous Advantage Actor-Critic, asynchronous dominant Actor-Critic) algorithm, to name a few.

In the embodiment of the application, the navigation model is trained through reinforcement learning, and the navigation model parameters are updated according to the action rewards, wherein the action rewards comprise rewards with different dimensions, so that the trained navigation model can determine the action execution probability of each candidate action according to the characteristics with different dimensions. And moreover, setting dense rewards is beneficial to guiding the virtual character to quickly go to a navigation destination, and adopting sparse rewards is beneficial to improving the navigation strength of a navigation model and improving the exploration capacity of the virtual character. In addition, setting the sectional rewards is beneficial to increasing the exploring ability of the virtual characters and expanding the exploring area of the virtual characters.

As with the actual application scenario, sample character features of the virtual character and sample navigation features of the position of the virtual character in the virtual environment are also required to be obtained when the navigation model is subjected to reinforcement training. Sample character features of the avatar may be obtained by a background process that reads the game by the server. And sample navigation features of the position of the virtual character in the virtual environment need to be acquired based on the position of the virtual character in the virtual environment.

3. For sample navigation point features.

And the computer equipment determines the relative position relation between the position of the virtual character and the navigation destination based on the position of the virtual character and the position of the navigation destination in the virtual environment, and obtains the sample navigation point characteristics.

4. For sample context aware features.

The computer device obtains sample environmental perception features through radiation detection based on the position of the virtual character.

The sample environment perception features comprise at least two of sample annular ray features, sample depth ray features and sample height ray features, wherein the sample annular ray features are used for representing the distribution condition of objects on the same horizontal height around the virtual character, the sample depth ray features are used for representing the depth condition of the objects in the direction of the virtual character, and the sample height ray features are used for representing the height condition of the objects at the position of the virtual character.

4. For the sample ring ray characteristics.

The computer device emits environment-aware rays to the periphery with the virtual character as a starting point, wherein the environment-aware rays reflect along the normal line of the environment-aware rays under the condition that the environment-aware rays collide with the surface of an object, and different environment-aware rays are located at the same horizontal height.

Optionally, in the training process, in order to enable the navigation model to more accurately describe the three-dimensional environment of the surrounding sample according to the ring-shaped ray characteristics of the sample, multiple environment sensing rays with different lengths can be selected, so that the object positions in different length ranges around the virtual character can be more accurately guided.

In one possible implementation, to obtain the ring-shaped ray characteristics of the samples with different heights, the computer device transmits at least two lengths of environment-aware rays around with at least two heights of the virtual character at the location as a starting point.

Optionally, the sample ring-shaped ray characteristic is capable of representing the distribution condition of the objects with the same horizontal height around the virtual character, so that in order to enable the navigation model to more comprehensively sense the surrounding environment, different heights of the virtual character can be set as starting points, and environment sensing rays are emitted to the surrounding. For example, the head waist and the foot of the virtual character are respectively used as the starting points of the environment sensing rays, the environment sensing rays are emitted to the periphery, and the distribution condition of three objects with different heights around the virtual character is obtained.

After emitting the environment-aware rays, the computer device generates sample ring-shaped ray features according to reflection conditions of the environment-aware rays.

The specific implementation manner of this step may refer to the above embodiment, and this embodiment will not be described in detail.

5. For sample depth ray features.

And transmitting the environment-aware rays to the direction of the virtual character by taking the virtual character as a starting point, wherein the environment-aware rays are reflected along the normal direction of the environment-aware rays when the environment-aware rays collide with the surface of the object.

The specific implementation process of transmitting the environment-aware ray to the direction of the virtual character by using the virtual character as the starting point in the above embodiment may refer to the specific implementation process of transmitting the environment-aware ray to the direction of the virtual character by using the virtual character as the starting point in the above embodiment, which will not be described in detail in this embodiment.

In one possible implementation, the computer device transmits at least two lengths of environment-aware rays to the direction of the avatar, starting with the avatar.

After the environmental perception rays are emitted, sample depth ray features are generated according to the reflection condition of the environmental perception rays.

6. For sample height ray features.

The specific implementation of this step may refer to the implementation process of generating the sample height ray feature in the foregoing embodiment, which is not described in detail herein.

In one possible embodiment, at least two target ranges corresponding to different radii are determined centered on the virtual character.

After emitting the ambient sense rays from the target height to the target range, the computer device generates sample height ray features based on the reflection of the ambient sense rays.

The specific implementation of this step may refer to the implementation process of generating the height ray features in the foregoing embodiment, which is not described in detail herein.

In the embodiment of the application, the computer equipment determines the sample two-dimensional environment characteristics of different dimensions based on the environment sensing rays, so that the navigation model can characterize the sample three-dimensional environment according to the sample two-dimensional environment characteristics of at least two dimensions, the calculated amount of the feature processing is reduced, and the time delay of the feature processing is reduced. And moreover, sample environment perception characteristics corresponding to a plurality of distance scales are obtained in different scales, so that the accuracy of describing a sample three-dimensional environment is improved, and the generalization capability of a navigation model can be enhanced to a certain extent.

After the sample character features and sample navigation features are acquired, the computer device inputs the sample character features and sample navigation features into a navigation model to train the navigation model.

The sample character features and the sample navigation features are input into a navigation model, the sample character features and the sample navigation features are respectively encoded through an encoding sub-network, and the sample character feature encoding results and the sample navigation feature encoding results obtained through encoding are spliced to obtain sample navigation state features.

And then inputting the sample navigation state characteristics into a time sequence sub-network to obtain estimated action execution probabilities corresponding to different candidate actions.

The process of coding character features and navigation features in the coding sub-network is the process of learning sample character features and sample navigation features by the navigation model, the navigation model can learn various sample features and can also learn the correlation among various features deeply, so that the trained navigation model can obtain the action execution probability corresponding to each candidate action according to the character features and the navigation features, and the probability value of the action execution probability is related to the character features and the navigation features.

In one possible implementation, the encoding sub-network includes a character feature encoder and an environmental feature encoder, and the environmental feature encoder includes at least two multi-scale convolutional layers, different multi-scale convolutional layers being used to multi-scale encode sample two-dimensional environmental features of different dimensions.

In the process of coding the navigation features through the coding sub-network, firstly, respectively carrying out convolution processing on the two-dimensional environmental features of the sample through convolution kernels with different sizes in the multi-scale convolution layer to obtain a multi-scale coding result of the sample, and then, fusing the multi-scale coding result of the sample to obtain a coding result of the navigation features of the sample.

In some embodiments, the three-dimensional virtual environment is quite complex, and the three-dimensional virtual environment has the characteristics of various terrain types, irregular shapes of the terrain and quite large volume difference of each terrain. Therefore, in the course of a virtual character traversing various terrains to reach navigation destinations several hundred meters away, it is necessary to have a more accurate perception of the environment. For the navigation model to have more accurate perception capability to the environment, the navigation model adopts a multi-scale convolution layer to perform characteristic processing on the sample depth ray characteristics and the sample height ray characteristics, so that the navigation model can sense the shape and the position of an object in the virtual environment in the application process, and path planning is performed in advance, so that the collision between the virtual character and the object is avoided.

In one possible implementation, the multi-scale convolution layer in the navigation model further includes an attention mechanism that enables the navigation model to weight features of greater interest in the surrounding environment where the virtual character is located in the virtual environment.

In the embodiment of the application, the computer equipment adopts the multi-scale convolution layers to respectively encode the environment perception characteristics, and can train the virtual character to better perceive the shape and the position of the object in the virtual environment, so that the trained navigation model can carry out path planning in advance, and the strategy of causing the virtual character to collide with the object is avoided.

Fig. 17 is a schematic diagram of a navigation system for virtual characters in a computer device according to an exemplary embodiment of the present application. The device mainly comprises a feature input unit 1701, a navigation model 1702, an action codec unit 1703 and an action reward estimating unit 1704. In training the navigation model, the feature input unit 1701 is configured to obtain sample character features of the virtual character and sample navigation features of positions of the virtual character in the virtual environment, and input the sample character features and the sample navigation features into the navigation model 1702. The navigation model 1702 is configured to determine, based on the sample navigation feature and the sample character feature, a predicted motion execution probability corresponding to the candidate motion label, thereby determining a predicted target motion label. The motion encoding and decoding unit 1703 is configured to encode the candidate motion to obtain a candidate motion label, input the candidate motion label into the navigation model 1702, and decode the estimated target motion label after obtaining the estimated target motion label, so as to obtain the estimated target motion. The action rewards estimation unit 1704 is configured to determine an action rewards generated by performing the estimated target action indicated by the estimated target action label, so as to update the model parameters of the navigation model 1702.

Referring to fig. 18, a schematic diagram of a navigation device for virtual characters in a virtual scene according to an exemplary embodiment of the present application is shown, where the device includes the following structures.

An obtaining module 1801, configured to obtain a character feature of a virtual character, and a navigation feature of a position of the virtual character in a virtual environment, where the navigation feature includes an environment-aware feature and a navigation point feature, the environment-aware feature includes two-dimensional environment features of at least two dimensions, a combination of the two-dimensional environment features of at least two dimensions is used to represent a three-dimensional environment where the virtual character is located, the navigation point feature is used to represent a positional relationship between the position where the virtual character is located and a navigation destination, and the character feature is at least used to represent a movement state of the virtual character; the input module 1802 is configured to input the character feature and the navigation feature into a navigation model, so as to obtain an action execution probability of a candidate action output by the navigation model; a determining module 1803, configured to determine a target action from the candidate actions based on the action execution probability, and control the virtual character to move in the virtual environment based on the target action.

Optionally, the acquiring module 1801 is configured to determine a relative positional relationship between the position of the virtual character and the navigation destination based on the position of the virtual character and the position of the navigation destination in the virtual environment, so as to obtain the navigation point feature; based on the position of the virtual character, the environment perception feature is obtained through ray detection, the environment perception feature comprises at least two of annular ray features, depth ray features and height ray features, the annular ray features are used for representing the distribution condition of objects on the same horizontal height around the virtual character, the depth ray features are used for representing the depth condition of the objects in the direction of the virtual character, and the height ray features are used for representing the height condition of the objects around the position of the virtual character.

Optionally, in the case that the environment-aware feature includes the ring-shaped ray feature, the acquiring module 1801 is configured to emit an environment-aware ray to the surroundings with the virtual character as a starting point, where the environment-aware ray is reflected along a normal direction of the environment-aware ray when the environment-aware ray collides with the surface of the object, and different environment-aware rays are located at the same level; and generating the annular ray characteristic according to the reflection condition of the environment-perceived ray.

Optionally, the acquiring module 1801 is configured to transmit the environmental awareness rays of at least two lengths to the surroundings with at least two heights of the position where the virtual character is located as a starting point.

Optionally, in the case that the environment-aware feature includes the depth ray feature, the acquiring module 1801 is configured to transmit, with the virtual character as a starting point, an environment-aware ray toward a direction of the virtual character, where the environment-aware ray is reflected along a normal direction of the environment-aware ray if the environment-aware ray collides with a surface of an object; and generating the depth ray characteristic according to the reflection condition of the environment-perceived ray.

Optionally, the acquiring module 1801 is configured to transmit, with the virtual character as a starting point, the environmental perception rays of at least two lengths to a direction of the virtual character.

Optionally, in the case that the environment-aware feature includes the altitude-ray feature, the acquiring module 1801 is configured to determine a target range centered on the virtual character; emitting an environment-aware ray from a target height to the target range, wherein the environment-aware ray is reflected along a normal direction of the environment-aware ray in case of collision of the environment-aware ray to a surface of an object; and generating the height ray characteristic according to the reflection condition of the environment-perceived ray.

Optionally, the acquiring module 1801 is configured to determine at least two target ranges corresponding to different radii with the virtual character as a center.

Optionally, the navigation model includes a coding sub-network and a timing sub-network, the coding sub-network is used for coding the character features and the navigation features, and the timing sub-network is used for determining the execution probabilities of the actions corresponding to different candidate actions; the input module 1802 is configured to input the character feature and the navigation feature into the navigation model, encode the character feature and the navigation feature through the encoding sub-network, and splice a character feature encoding result and a navigation feature encoding result obtained by encoding, so as to obtain a navigation state feature; and inputting the navigation state characteristics into the time sequence sub-network to obtain the action execution probabilities corresponding to different candidate actions.

Optionally, the coding sub-network includes a character feature encoder and an environmental feature encoder, and the environmental feature encoder includes at least two multi-scale convolution layers, different multi-scale convolution layers being used for multi-scale coding the two-dimensional environmental features of different dimensions; the input module 1802 is configured to perform convolution processing on the two-dimensional environmental features through convolution kernels with different sizes in the multi-scale convolution layer, so as to obtain a multi-scale coding result; and fusing the multi-scale coding results to obtain the navigation feature coding result.

Optionally, the character features of the virtual character further comprise character attribute features and character skill features; the action execution probability of the candidate action output by the navigation model is related to the character attribute feature by the character skill feature.

It should be noted that: the apparatus provided in the above embodiment is only exemplified by the division of the above functional modules, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to perform all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and detailed implementation processes of the method embodiments are described in the method embodiments, which are not repeated herein.

Referring to fig. 19, a schematic diagram of a navigation device for virtual characters in a virtual scene according to another exemplary embodiment of the present application is shown, where the device includes the following structures.

An obtaining module 1901, configured to obtain a sample character feature of a virtual character, and a sample navigation feature of a position of the virtual character in a virtual environment, where the sample navigation feature includes a sample environment sensing feature and a sample navigation point feature, the sample environment sensing feature includes a sample two-dimensional environment feature of at least two dimensions, a combination of the sample two-dimensional environment features of at least two dimensions is used to represent a sample three-dimensional environment of the position of the virtual character, the sample navigation point feature is used to represent a positional relationship between the position of the virtual character and a navigation destination, and the sample character feature is at least used to represent a movement state of the virtual character; a training module 1902 for training a navigation model based on the sample character features and the sample navigation features by reinforcement learning, the navigation model for determining an action execution probability of a candidate action based on the character features and the navigation features.

Optionally, an obtaining module 1901 is configured to determine a relative positional relationship between the position of the virtual character and the navigation destination based on the position of the virtual character and the position of the navigation destination in the virtual environment, so as to obtain the sample navigation point feature; based on the position of the virtual character, acquiring the sample environment sensing characteristic through ray detection, wherein the sample environment sensing characteristic comprises at least two of a sample annular ray characteristic, a sample depth ray characteristic and a sample height ray characteristic, the sample annular ray characteristic is used for representing the distribution condition of objects on the same horizontal height around the virtual character, the sample depth ray characteristic is used for representing the depth condition of the objects in the direction of the virtual character, and the sample height ray characteristic is used for representing the height condition of the objects at the position of the virtual character.

Optionally, in the case that the sample environmental perception feature includes the sample ring-shaped ray feature, the acquiring module 1901 is configured to emit environmental perception rays around with the virtual character as a starting point, where the environmental perception rays reflect along a normal line of the environmental perception rays when the environmental perception rays collide with a surface of an object, and different environmental perception rays are located at the same level; and generating the annular ray characteristics of the sample according to the reflection condition of the environment-perceived ray.

Optionally, the acquiring module 1901 is configured to transmit at least two lengths of the environmental perception rays around at least two heights of the position where the virtual character is located as a starting point.

Optionally, in the case that the environment-aware feature includes the sample depth ray feature, the acquiring module 1901 is configured to transmit an environment-aware ray toward a direction of the virtual character with the virtual character as a starting point, where the environment-aware ray is reflected along a normal direction of the environment-aware ray if the environment-aware ray collides with a surface of an object; and generating the sample depth ray characteristics according to the reflection condition of the environment sensing rays.

Optionally, the acquiring module 1901 is configured to transmit, with the virtual character as a starting point, the environmental perception rays with at least two lengths to a direction of the virtual character.

Optionally, in the case that the environmental awareness feature includes the sample altitude-ray feature, the acquiring module 1901 is configured to determine a target range centered on the virtual character; emitting an environment-aware ray from a target height to the target range, wherein the environment-aware ray is reflected along a normal direction of the environment-aware ray in case of collision of the environment-aware ray to a surface of an object; and generating the sample height ray characteristics according to the reflection condition of the environment sensing rays.

Optionally, the acquiring module 1901 is configured to determine at least two target ranges corresponding to different radii with the virtual character as a center.

Optionally, the training module 1902 is configured to input the sample character feature and the sample navigation feature into the navigation model, so as to obtain an estimated motion execution probability of the candidate motion output by the navigation model; determining an estimated target action from the candidate actions based on the estimated action execution probability; controlling the virtual character to move in the virtual environment based on the estimated target action; determining an action reward corresponding to the estimated target action based on the action execution result of the virtual character; updating model parameters of the navigation model based on the action rewards.

Optionally, the action rewards include sparse rewards including at least one of arrival rewards and elimination rewards, the arrival rewards belong to forward rewards, and the elimination rewards belong to reverse rewards; the training module 1902 is configured to determine, when the result of executing the action of the virtual character indicates that the virtual character arrives at the navigation destination, the arrival reward as the action reward corresponding to the estimated target action; and determining the elimination rewards as the action rewards corresponding to the estimated target actions under the condition that the action execution results of the virtual roles indicate that the virtual roles are eliminated.

Optionally, the action rewards further include a dense rewards including at least one of a proximity rewards, a drop rewards, a not-reached rewards, and an attribute rewards; the training module 1902 is configured to determine, when the result of executing the action of the virtual character indicates that the distance between the virtual character and the navigation destination is reduced, the proximity reward as the action reward corresponding to the estimated target action; determining the unreachable reward as the action reward corresponding to the estimated target action when the action execution result of the virtual character indicates that the virtual character does not reach a target range centered on the navigation destination; determining the attribute rewards as the action rewards corresponding to the estimated target actions under the condition that the action execution results of the virtual roles indicate that the target attribute values of the virtual roles are reduced; and determining the drop rewards as the action rewards corresponding to the estimated target actions under the condition that the action execution results of the virtual roles indicate that the drop heights of the virtual roles reach a drop height threshold value.

Optionally, the navigation model includes a coding sub-network and a timing sub-network, the coding sub-network is used for coding the sample role feature and the sample navigation feature, and the timing sub-network is used for determining the estimated action execution probabilities corresponding to different candidate actions; the training module 1902 is configured to input the sample character feature and the sample navigation feature into the navigation model, encode the sample character feature and the sample navigation feature through the encoding sub-network, and splice a sample character feature encoding result obtained by encoding and a sample navigation feature encoding result to obtain a sample navigation state feature; and inputting the sample navigation state characteristics into the time sequence sub-network to obtain the estimated action execution probabilities corresponding to different candidate actions.

Optionally, the coding sub-network includes a character feature encoder and an environmental feature encoder, and the environmental feature encoder includes at least two multi-scale convolution layers, different multi-scale convolution layers being used for multi-scale coding the sample two-dimensional environmental features of different dimensions; the training module 1902 is configured to perform convolution processing on the two-dimensional environmental features of the sample through convolution kernels with different sizes in the multi-scale convolution layer, so as to obtain a multi-scale sample coding result; and fusing the sample multi-scale coding results to obtain the sample navigation feature coding results.

Optionally, the sample character features of the virtual character further include sample character attribute features and sample character skill features, so that the motion execution probability of the candidate motion output by the trained navigation model is related to the character attribute features and the character skill features.

Referring to fig. 20, a schematic structural diagram of a computer device according to an exemplary embodiment of the present application is shown, where the computer device may be implemented as a terminal or a server in the foregoing embodiments. Specifically, the present application relates to a method for manufacturing a semiconductor device. The computer device 2000 includes a central processing unit (Central Processing Unit, CPU) 2001, a system memory 2004 including a random access memory 2002 and a read only memory 2003, and a system bus 2005 connecting the system memory 2004 and the central processing unit 2001. The computer device 2000 also includes a basic Input/Output system (I/O) 2006 that facilitates the transfer of information between various devices within the computer, and a mass storage device 2007 that stores an operating system 2013, application programs 2014, and other program modules 2015.

In some embodiments, the basic input/output system 2006 includes a display 2008 for displaying information and an input device 2009 such as a mouse, keyboard, etc. for a user to input information. Wherein the display 2008 and the input device 2009 are connected to the central processing unit 2001 through an input-output controller 2010 connected to a system bus 2005. The basic input/output system 2006 may also include an input/output controller 2010 for receiving and processing input from a keyboard, mouse, or electronic stylus among a plurality of other devices. Similarly, the input-output controller 2010 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 2007 is connected to the central processing unit 2001 through a mass storage controller (not shown) connected to the system bus 2005. The mass storage device 2007 and its associated computer-readable media provide non-volatile storage for the computer device 2000. That is, the mass storage device 2007 may include a computer-readable medium (not shown), such as a hard disk or drive.

The computer readable medium may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes random access Memory (Random Access Memory, RAM), read Only Memory (ROM), flash Memory or other solid state Memory technology, compact disk (Compact Disc Read-Only Memory, CD-ROM), digital versatile disk (Digital Versatile Disc, DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the one described above. The system memory 2004 and mass storage device 2007 described above may be collectively referred to as memory.

The memory stores one or more programs configured to be executed by the one or more central processing units 2001, the one or more programs containing instructions for implementing the methods described above, the central processing unit 2001 executing the one or more programs to implement the methods provided by the various method embodiments described above.

According to various embodiments of the application, the computer device 2000 may also operate through a network such as the Internet to a remote computer on the network. I.e. the computer device 2000 may be connected to the network 2012 via a network interface unit 2011 coupled to the system bus 2005, or alternatively, the network interface unit 2011 may be used to connect to other types of networks or remote computer systems (not shown).

The memory also includes one or more programs stored in the memory, the one or more programs including steps for performing the methods provided by the embodiments of the present application, as performed by the computer device.

The embodiment of the application also provides a computer readable storage medium, wherein at least one instruction, at least one section of program, code set or instruction set is stored in the readable storage medium, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by a processor to realize the method for navigating the virtual character in the virtual scene.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the navigation method of the virtual character in the virtual scene provided in the above aspect.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing related hardware, and the program may be stored in a computer readable storage medium, which may be a computer readable storage medium included in the memory of the above embodiments; or may be a computer-readable storage medium, alone, that is not incorporated into the terminal. The computer readable storage medium stores at least one instruction, at least one program, a code set, or an instruction set, where the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor to implement the method for navigating a virtual character in a virtual scene according to any one of the method embodiments.

Alternatively, the computer-readable storage medium may include: ROM, RAM, solid state disk (Solid State Drives, SSD), or optical disk, etc. The RAM may include resistive random access memory (Resistance Random Access Memory, reRAM) and dynamic random access memory (Dynamic Random Access Memory, DRAM), among others. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions.

Before and during the process of collecting the relevant data of the user, the application can display a prompt interface, a popup window or output voice prompt information, wherein the prompt interface, the popup window or the voice prompt information is used for prompting the user to collect the relevant data currently, so that the application only starts to execute the relevant step of acquiring the relevant data of the user after acquiring the confirmation operation of the user on the prompt interface or the popup window, otherwise (namely, when the confirmation operation of the user on the prompt interface or the popup window is not acquired), the relevant step of acquiring the relevant data of the user is ended, namely, the relevant data of the user is not acquired.

It should be understood that references herein to "a plurality" are to two or more. References herein to "first," "second," etc. are used to distinguish similar objects and are not intended to limit a particular order or sequence. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limiting.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but is intended to cover all modifications, equivalents, alternatives, and improvements falling within the spirit and principles of the application.

Claims

1. A method for navigating a virtual character in a virtual scene, the method comprising:

acquiring character characteristics of a virtual character and navigation characteristics of the position of the virtual character in a virtual environment, wherein the navigation characteristics comprise environment perception characteristics and navigation point characteristics, the environment perception characteristics comprise two-dimensional environment characteristics with at least two dimensions, the combination of the two-dimensional environment characteristics with at least two dimensions is used for representing a three-dimensional environment of the position of the virtual character, the navigation point characteristics are used for representing the position relationship between the position of the virtual character and a navigation destination, and the character characteristics are at least used for representing the moving state of the virtual character;

inputting the character features and the navigation features into a navigation model to obtain the action execution probability of candidate actions output by the navigation model;

and determining a target action from the candidate actions based on the action execution probability, and controlling the virtual character to move in the virtual environment based on the target action.

2. The method of claim 1, wherein the obtaining the navigation feature of the position of the virtual character in the virtual environment comprises:

determining a relative position relation between the position of the virtual character and the navigation destination based on the position of the virtual character and the position of the navigation destination in the virtual environment, and obtaining the navigation point characteristics;

based on the position of the virtual character, the environment perception feature is obtained through ray detection, the environment perception feature comprises at least two of annular ray features, depth ray features and height ray features, the annular ray features are used for representing the distribution condition of objects on the same horizontal height around the virtual character, the depth ray features are used for representing the depth condition of the objects in the direction of the virtual character, and the height ray features are used for representing the height condition of the objects around the position of the virtual character.

3. The method of claim 2, wherein, in the case where the context awareness feature comprises the ring ray feature, the obtaining the context awareness feature by ray detection based on the location of the virtual character comprises:

The virtual character is taken as a starting point, and environment-aware rays are emitted to the periphery, wherein the environment-aware rays are reflected along the normal direction of the environment-aware rays under the condition that the environment-aware rays collide with the surface of an object, and different environment-aware rays are located at the same horizontal height;

and generating the annular ray characteristic according to the reflection condition of the environment-perceived ray.

4. A method according to claim 3, wherein said emitting ambient sense rays around from said virtual character as a starting point comprises:

and transmitting the environment-aware rays with at least two lengths to the periphery by taking at least two heights of the position of the virtual character as starting points.

5. The method of claim 2, wherein, in the case where the context-aware feature comprises the depth ray feature, the acquiring the context-aware feature by ray detection based on the location of the virtual character comprises:

transmitting an environment-aware ray to the direction of the virtual character by taking the virtual character as a starting point, wherein the environment-aware ray is reflected along the normal direction of the environment-aware ray under the condition that the environment-aware ray collides with the surface of an object;

And generating the depth ray characteristic according to the reflection condition of the environment-perceived ray.

6. The method of claim 5, wherein the emitting the context-aware ray in the direction of the avatar starting from the avatar comprises:

and transmitting the environment-aware rays with at least two lengths to the direction of the virtual character by taking the virtual character as a starting point.

7. The method of claim 2, wherein, in the case where the context-aware feature includes the altitude-ray feature, the acquiring the context-aware feature by ray detection based on the location of the virtual character comprises:

determining a target range by taking the virtual character as a center;

emitting an environment-aware ray from a target height to the target range, wherein the environment-aware ray is reflected along a normal direction of the environment-aware ray in case of collision of the environment-aware ray to a surface of an object;

and generating the height ray characteristic according to the reflection condition of the environment-perceived ray.

8. The method of claim 7, wherein the determining the target range centered on the avatar comprises:

And determining at least two target ranges corresponding to different radiuses by taking the virtual character as a center.

9. The method of claim 1, wherein the navigation model includes an encoding sub-network for encoding the character features and the navigation features and a timing sub-network for determining the action execution probabilities corresponding to different of the candidate actions;

the step of inputting the character features and the navigation features into a navigation model to obtain the action execution probability of candidate actions output by the navigation model, comprising the following steps:

inputting the character features and the navigation features into the navigation model, respectively encoding the character features and the navigation features through the encoding sub-network, and splicing the character feature encoding results and the navigation feature encoding results obtained by encoding to obtain navigation state features;

and inputting the navigation state characteristics into the time sequence sub-network to obtain the action execution probabilities corresponding to different candidate actions.

10. The method of claim 9, wherein the encoding subnetwork comprises a character feature encoder and an environmental feature encoder, and the environmental feature encoder comprises at least two multi-scale convolutional layers, different multi-scale convolutional layers being used to multi-scale encode the two-dimensional environmental features of different dimensions;

The encoding the navigation feature through the encoding sub-network includes:

respectively carrying out convolution processing on the two-dimensional environment characteristics through convolution kernels with different sizes in the multi-scale convolution layer to obtain a multi-scale coding result;

and fusing the multi-scale coding results to obtain the navigation feature coding result.

11. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the character features of the virtual character further include character attribute features and character skill features;

the action execution probability of the candidate action output by the navigation model is related to the character attribute feature by the character skill feature.

12. A method for navigating a virtual character in a virtual scene, the method comprising:

acquiring sample character features of a virtual character and sample navigation features of positions of the virtual character in a virtual environment, wherein the sample navigation features comprise sample environment perception features and sample navigation point features, the sample environment perception features comprise sample two-dimensional environment features with at least two dimensions, the combination of the sample two-dimensional environment features with at least two dimensions is used for representing a sample three-dimensional environment of the positions of the virtual character, the sample navigation point features are used for representing a position relation between the positions of the virtual character and navigation destinations, and the sample character features are at least used for representing moving states of the virtual character;

Training a navigation model based on the sample character features and the sample navigation features by reinforcement learning, wherein the navigation model is used for determining the action execution probability of the candidate action based on the character features and the navigation features.

13. The method of claim 12, wherein the obtaining sample navigational features of the position of the virtual character in the virtual environment comprises:

determining a relative position relation between the position of the virtual character and the navigation destination based on the position of the virtual character and the position of the navigation destination in the virtual environment, and obtaining the sample navigation point characteristics;

based on the position of the virtual character, acquiring the sample environment sensing characteristic through ray detection, wherein the sample environment sensing characteristic comprises at least two of a sample annular ray characteristic, a sample depth ray characteristic and a sample height ray characteristic, the sample annular ray characteristic is used for representing the distribution condition of objects on the same horizontal height around the virtual character, the sample depth ray characteristic is used for representing the depth condition of the objects in the direction of the virtual character, and the sample height ray characteristic is used for representing the height condition of the objects at the position of the virtual character.

14. The method of claim 12, wherein the training a navigation model by reinforcement learning based on the sample character features and the sample navigation features comprises:

inputting the sample character features and the sample navigation features into the navigation model to obtain estimated motion execution probability of the candidate motion output by the navigation model;

determining an estimated target action from the candidate actions based on the estimated action execution probability;

controlling the virtual character to move in the virtual environment based on the estimated target action;

determining an action reward corresponding to the estimated target action based on the action execution result of the virtual character;

updating model parameters of the navigation model based on the action rewards.

15. The method of claim 14, wherein the action rewards include a sparse reward including at least one of a reach reward and a decoy reward, the reach reward being a forward reward and the decoy reward being a reverse reward;

the determining the action rewards corresponding to the estimated target actions based on the action execution results of the virtual roles comprises the following steps:

Determining the arrival rewards as the action rewards corresponding to the estimated target actions under the condition that the action execution results of the virtual roles indicate that the virtual roles arrive at the navigation destination;

and determining the elimination rewards as the action rewards corresponding to the estimated target actions under the condition that the action execution results of the virtual roles indicate that the virtual roles are eliminated.

16. The method of claim 14, wherein the action rewards further comprise a dense rewards including at least one of a proximity rewards, a drop rewards, a not-reached rewards, and an attribute rewards;

determining the proximity reward as the action reward corresponding to the estimated target action in a case where the action execution result of the virtual character indicates that a distance between the virtual character and the navigation destination is reduced;

determining the unreachable reward as the action reward corresponding to the estimated target action when the action execution result of the virtual character indicates that the virtual character does not reach a target range centered on the navigation destination;

Determining the attribute rewards as the action rewards corresponding to the estimated target actions under the condition that the action execution results of the virtual roles indicate that the target attribute values of the virtual roles are reduced;

and determining the drop rewards as the action rewards corresponding to the estimated target actions under the condition that the action execution results of the virtual roles indicate that the drop heights of the virtual roles reach a drop height threshold value.

17. A navigation device for a virtual character in a virtual scene, the device comprising:

the system comprises an acquisition module, a navigation module and a display module, wherein the acquisition module is used for acquiring character characteristics of a virtual character and navigation characteristics of the position of the virtual character in a virtual environment, the navigation characteristics comprise environment perception characteristics and navigation point characteristics, the environment perception characteristics comprise two-dimensional environment characteristics of at least two dimensions, the combination of the two-dimensional environment characteristics of at least two dimensions is used for representing a three-dimensional environment of the position of the virtual character, the navigation point characteristics are used for representing the position relation between the position of the virtual character and a navigation destination, and the character characteristics are at least used for representing the moving state of the virtual character;

The input module is used for inputting the character features and the navigation features into a navigation model to obtain the action execution probability of candidate actions output by the navigation model;

and the determining module is used for determining a target action from the candidate actions based on the action execution probability and controlling the virtual character to move in the virtual environment based on the target action.

18. A navigation device for a virtual character in a virtual scene, the device comprising:

the system comprises an acquisition module, a navigation module and a display module, wherein the acquisition module is used for acquiring sample character features of a virtual character and sample navigation features of positions of the virtual character in a virtual environment, the sample navigation features comprise sample environment perception features and sample navigation point features, the sample environment perception features comprise sample two-dimensional environment features with at least two dimensions, the combination of the sample two-dimensional environment features with at least two dimensions is used for representing a sample three-dimensional environment of the positions of the virtual character, the sample navigation point features are used for representing a position relation between the positions of the virtual character and navigation destinations, and the sample character features are at least used for representing moving states of the virtual character;

And the training module is used for training a navigation model in a reinforcement learning mode based on the sample character characteristics and the sample navigation characteristics, and the navigation model is used for determining the action execution probability of the candidate action based on the character characteristics and the navigation characteristics.

19. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one program that is loaded and executed by the processor to implement the method of navigating a virtual character in a virtual scene as claimed in any one of claims 1 to 11 or the method of navigating a virtual character in a virtual scene as claimed in any one of claims 12 to 16.

20. A computer readable storage medium, wherein at least one program is stored in the readable storage medium, and the at least one program is loaded and executed by a processor to implement the method for navigating a virtual character in a virtual scene according to any one of claims 1 to 11 or the method for navigating a virtual character in a virtual scene according to any one of claims 12 to 16.