CN112057858B

CN112057858B - Virtual object control method, device, equipment and storage medium

Info

Publication number: CN112057858B
Application number: CN202010951131.3A
Authority: CN
Inventors: 黄超
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2022-04-08
Anticipated expiration: 2040-09-11
Also published as: CN112057858A

Abstract

The embodiment of the application provides a control method, a control device, control equipment and a storage medium of a virtual object, and relates to the technical field of computer technology and artificial intelligence. The method comprises the following steps: acquiring an interface image of a target virtual object under a current visual angle, wherein the interface image is used for displaying a virtual environment of the current visual angle, and the virtual environment comprises the target virtual object; generating a depth map of the interface image, wherein the depth map is used for representing the depth distance of an element in the virtual environment in the interface image relative to the target virtual object; determining the moving direction of the target virtual object according to the depth map; the control target virtual object moves in the moving direction. According to the technical scheme, the time and labor cost required for realizing automatic movement of the virtual object can be reduced.

Description

Virtual object control method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computer technology and artificial intelligence, in particular to a method, a device, equipment and a storage medium for controlling a virtual object.

Background

In the development of game applications, it is sometimes necessary for a target virtual object (e.g., a game character) in a game to automatically advance.

In the related art, in order to avoid the virtual obstacle during the automatic movement of the virtual object, a related technician needs to manually plan as many specific paths as possible in advance, and if the target virtual object needs to be moved from the current position to the target position, one path is selected from the planned paths, and the target virtual object is controlled to move along the path.

In the above-described related art, a large number of paths need to be manually planned in advance, resulting in high time and labor costs.

Disclosure of Invention

The embodiment of the application provides a control method, a control device, control equipment and a storage medium of a virtual object, and can reduce the time and labor cost required for realizing automatic movement of the virtual object. The technical scheme is as follows:

according to an aspect of an embodiment of the present application, there is provided a method for controlling a virtual object, the method including:

acquiring an interface image of a target virtual object under a current visual angle, wherein the interface image is used for displaying a virtual environment of the current visual angle, and the virtual environment comprises the target virtual object;

generating a depth map of the interface image, the depth map being used to characterize depth distances of elements of the virtual environment relative to the target virtual object in the interface image;

determining the moving direction of the target virtual object according to the depth map;

and controlling the target virtual object to move according to the moving direction.

According to an aspect of an embodiment of the present application, there is provided an apparatus for controlling a virtual object, the apparatus including:

the system comprises an image acquisition module, a display module and a display module, wherein the image acquisition module is used for acquiring an interface image of a target virtual object under a current visual angle, the interface image is used for displaying a virtual environment of the current visual angle, and the virtual environment comprises the target virtual object;

a depth map generation module for generating a depth map of the interface image, the depth map being used to characterize depth distances of elements of the virtual environment relative to the target virtual object in the interface image;

a direction determination module for determining a moving direction of the target virtual object according to the depth map;

and the movement control module is used for controlling the target virtual object to move according to the movement direction.

According to an aspect of the embodiments of the present application, there is provided a computer device, including a processor and a memory, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the control method of the above virtual object.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement the control method of the above virtual object.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the control method of the virtual object.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

according to the method and the device, the interface image of the target virtual object under the current visual angle is obtained, the corresponding depth map is generated according to the interface image, the depth map can represent the depth distance of each virtual obstacle from the target virtual object in the virtual environment of the current visual angle of the target virtual object, and further the target virtual object can be controlled to avoid the virtual obstacle according to the depth map.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by one embodiment of the present application;

FIG. 2 is a flowchart of a method for controlling a virtual object according to an embodiment of the present application;

FIG. 3 is a schematic illustration of a depth map provided by one embodiment of the present application;

fig. 4 is a flowchart of a control method for a virtual object according to another embodiment of the present application;

FIG. 5 is a schematic illustration of a depth map provided by another embodiment of the present application;

FIG. 6 is a flow chart of a method for training a depth estimation model provided in one embodiment of the present application;

FIG. 7 is a schematic illustration of different gaming applications provided in one embodiment of the present application;

FIG. 8 is a schematic structural diagram of a depth estimation model provided in an embodiment of the present application;

fig. 9 is a flowchart of a control method for a virtual object according to another embodiment of the present application;

FIG. 10 is a schematic diagram of a global map provided by one embodiment of the present application;

FIG. 11 is a schematic view of a directional hint area provided by one embodiment of the present application;

FIG. 12 is a schematic view of an angle corresponding to a target viewing direction according to an embodiment of the present application;

FIG. 13 is a block diagram of a control apparatus for a virtual object according to an embodiment of the present application;

fig. 14 is a block diagram of a control apparatus for a virtual object according to another embodiment of the present application;

fig. 15 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods consistent with aspects of the present application, as detailed in the appended claims.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence technology relates to the field of extensive, and has the technology of hardware level and the technology of software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

The scheme provided by the embodiment of the application relates to an artificial intelligence machine learning technology and a computer vision technology, for example, a depth estimation model is trained by utilizing the machine learning technology; and for example, identify the color of a number or pixel in the image using computer vision techniques.

In addition, the technical scheme provided by the embodiment of the application can also be applied to the technical fields of intelligent obstacle avoidance, path planning, automatic driving and the like in real life. For example, according to the technical scheme of determining the moving direction of the target virtual object according to the interface image, the moving direction of the target virtual object in the next step can be determined according to the environment picture taken in real time, and the target object can be a real vehicle (such as an automatic driving car, an automatic driving ship and the like), an intelligent robot (such as an express delivery robot, a meal delivery robot, a sweeping robot and the like), an industrial robot and the like in automatic driving. The target object may also be other real objects, which is not limited in this application.

Referring to fig. 1, a schematic diagram of an implementation environment provided by an embodiment of the present application is shown. The implementation environment can be implemented as a control system of a virtual environment, which may comprise the terminal 11.

The terminal 11 is an electronic device with data calculation, processing and storage capabilities, and a target application, such as a client of the target application, is installed and run in the terminal 11. The terminal 11 may be a smart phone, a tablet Computer, a PC (Personal Computer), a wearable device, etc., which is not limited in the embodiments of the present application. Alternatively, the terminal 11 is a mobile terminal device provided with a touch display screen through which a user can realize human-computer interaction.

The target application may be a game application, such as a shooting game application, a multi-player gun Battle survival game application, a large-fleeing and killing survival game application, an LBS (Location Based Service) game application, an MOBA (Multiplayer Online Battle Arena) game application, and the like, which is not limited in this embodiment. The target application may also be any application capable of displaying a virtual environment and a target virtual object, such as a social application, a payment application, a video application, a music application, a shopping application, a news application, and the like. In the method provided by the embodiment of the present application, the execution subject of each step may be the terminal, such as the target application program running in the terminal.

A virtual environment is a scene displayed (or provided) by a client of a target application (e.g., a game application) when running on a terminal, and refers to a scene created for a target virtual object to perform an activity (e.g., a game competition), such as a virtual house, a virtual island, a virtual map, and the like. The virtual environment may be a simulation environment of a real world, a semi-simulation semi-fictional environment, or a pure fictional environment. The virtual environment may be a two-dimensional virtual environment, a 2.5-dimensional virtual environment, or a three-dimensional virtual environment, which is not limited in this embodiment of the present application. The target virtual object refers to a virtual role controlled by the user account in the target application program. Taking the target application as the game application as an example, the target virtual object refers to a game character controlled by the user account in the game application. The target virtual object may be in the form of a person, an animal, a cartoon or other forms, which are not limited in this application. The target virtual object may be displayed in a three-dimensional form or a two-dimensional form, which is not limited in the embodiment of the present application. Alternatively, when the virtual environment is a three-dimensional virtual environment, the first virtual object may be a three-dimensional stereo model created based on animated skeletal techniques. The target virtual object has its own shape and volume in the three-dimensional virtual environment, occupying a portion of the space in the three-dimensional virtual environment. Alternatively, the target application may have a function of simulating a real physical environment. In the virtual environment, the motion rule of each virtual element (such as a target virtual object) conforms to or is close to the physical rule of reality.

In some embodiments, the system 10 further includes a server 12, the server 12 establishes a communication connection (e.g., a network connection) with the terminal 11, and the server 12 is configured to provide a background service for the target application. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services.

It should be noted that, in the method steps provided in the embodiment of the present application, the execution subject may be the terminal 11, may be the server 12, or may be executed by the terminal 11 and the server 12 in an interactive cooperation manner, which is not limited in the embodiment of the present application.

The technical solution of the present application will be described below by means of several embodiments.

Referring to fig. 2, a flowchart of a control method for a virtual object according to an embodiment of the present application is shown. In the present embodiment, the method is applied to the client to illustrate. The method comprises the following steps (201-204):

step 201, acquiring an interface image of the target virtual object at the current view angle.

In some embodiments, the interface image is used to present a virtual environment from a current perspective, the virtual environment including a target virtual object. A viewing angle may also be referred to as a Field of view (FOV), and refers to an angular range of an image that a target object (e.g., a camera, a person, etc.) can receive. In the embodiment of the present application, the current viewing angle refers to the current orientation of the target virtual object, and the angle range of the picture that can be observed by the target virtual object.

In some embodiments, a virtual environment of a current view angle of a target virtual object is displayed in a user interface of a client, and an image currently displayed in the user interface can be obtained through a screen shot (i.e., a screenshot), that is, an interface image of the current view angle of the target virtual object is obtained.

Step 202, generating a depth map of the interface image.

Wherein the depth map is used to characterize depth distances of elements in the virtual environment relative to the target virtual object in the interface image. The elements in the virtual environment constitute the virtual environment, and the elements in the virtual environment include: virtual buildings (e.g., virtual houses, virtual enclosures, virtual towers, virtual statues), virtual vehicles (e.g., virtual cars, virtual motorcycles, virtual ships, virtual yachts, virtual airplanes), virtual natural environment elements (e.g., virtual rocks, virtual trees, virtual hills), virtual objects, and so forth. In some embodiments, elements in the virtual environment that can impede the movement of the target virtual object are referred to as virtual obstacles, such as virtual houses, virtual fences, virtual towers, virtual statues, virtual cars, virtual motorcycles, virtual rocks, virtual trees, and so forth. In other embodiments, the virtual obstacle includes other elements of the virtual environment in addition to the target virtual object.

In some embodiments, by analyzing the interface image, such as analyzing the relative size of the virtual obstacle included in the interface image, the color difference of the pixels corresponding to the virtual obstacle, the color of the virtual obstacle as a whole, and so on, the virtual obstacle represented by each pixel in the interface image, the depth distance relative to the target virtual object (i.e., the distance between the target virtual object), and marking the depth distance of each pixel corresponding to the target virtual object, thereby obtaining the depth map of the interface image. The interface image is composed of a plurality of pixels, and for one pixel in the interface image, the pixel and other pixels can jointly represent one element in the virtual environment. The distance between the position in the virtual obstacle represented by the pixel and the target virtual object is the depth distance corresponding to the pixel.

Optionally, the depth map is equal in size to the interface image, or the depth map has an aspect ratio that is the same as the aspect ratio of the interface image. In one example, as shown in FIG. 3, the interface image 32 is used to indicate the virtual environment at the current perspective of the target virtual object 33, and the depth map 31 is a depth map of the interface image 32. Wherein the depth map 31 represents depth distances of elements in the virtual environment with respect to the target virtual object 33 by different colors. Alternatively, in the depth map 31, pixels of the same color indicate the same depth distance, and pixels of similar colors indicate similar depth distances.

Step 203, determining the moving direction of the target virtual object according to the depth map.

In some embodiments, when the depth map corresponding to the interface image is acquired, that is, the interface image of the current view angle of the acquired target virtual object, the depth distance of the virtual obstacle included in each area relative to the target virtual object is acquired. And determining the direction of the corresponding area with the larger depth distance as the next moving direction of the target virtual object.

Step 204, the control target virtual object moves according to the moving direction.

After the moving direction of the target virtual object is determined, the target virtual object is controlled to move according to the moving direction, so that a virtual obstacle close to the target virtual object is avoided, the probability of in-situ idle running or keeping in-situ motionless caused by the fact that the target virtual object collides with the virtual obstacle is reduced, and the obstacle avoidance accuracy rate for controlling the virtual object to automatically move is improved.

In some embodiments, the target virtual object moves at a uniform speed in the direction of movement. In other embodiments, if the depth distance of the virtual obstacle in the moving direction relative to the target virtual object is greater than or equal to the first threshold, which indicates that the virtual obstacle in the moving direction is farther away from the target virtual object, the target virtual object is controlled to move at an accelerated speed according to the moving direction. In still other embodiments, if the distance between the virtual obstacle and the target virtual object in the moving direction is less than or equal to a second threshold value, which indicates that the virtual obstacle and the target virtual object are closer in the moving direction, the target virtual object is controlled to move at a reduced speed.

Optionally, the first threshold is greater than or equal to the second threshold. In some embodiments, if the distance between the virtual obstacle in the moving direction and the target virtual object is greater than the second threshold and smaller than the first threshold, the target virtual object is controlled to move at a constant speed in the moving direction at the current speed. The specific values of the first threshold and the second threshold may be set by a relevant technician according to an actual situation, which is not limited in the embodiment of the present application.

In some embodiments, the steps (201-204) are performed once every first interval duration, that is, the moving direction of the target virtual object is determined once every first interval duration according to the depth map, so that the moving direction of the target virtual object can be adjusted in time, and the probability of successfully avoiding the obstacle of the target virtual object is improved. Wherein the first interval duration comprises: 0.1 second, 0.2 second, 0.25 second, 0.3 second, 0.5 second, 0.8 second, 1 second, 1.5 second, and so forth. Optionally, the specific duration of the first interval duration is set by a related technician according to an actual situation, and this is not limited in the embodiment of the present application.

In some embodiments, if the execution subject of each step in the embodiments of the present application is a server, the step may be specifically implemented as: and sending control data to the client, wherein the control data comprises the moving direction.

To sum up, according to the technical scheme provided by the embodiment of the application, the interface image under the current view angle of the target virtual object is obtained, the corresponding depth map is generated according to the interface image, and in the virtual environment of the current view angle of the target virtual object, which can be represented by the depth map, the depth distance between each virtual obstacle and the target virtual object can be controlled according to the depth map so as to avoid the virtual obstacle.

Referring to fig. 4, a flowchart of a control method for a virtual object according to another embodiment of the present application is shown. In the present embodiment, the method is applied to the client to illustrate. The method comprises the following steps (401-407):

step 401, acquiring an interface image of a current view angle of a target virtual object.

The content of step 401 is the same as or similar to step 201 in the embodiment of fig. 2, and is not described here again.

And 402, processing the interface image through a depth estimation model to obtain a depth map.

The depth estimation model is a Machine learning model used for estimating a depth distance, such as a CDD (current drive Diffusion, determined Diffusion based on Curvature) model, a CNN (Convolutional Neural network), a naive bayes model, a decision tree model, a KNN (K-nearest neighbor) algorithm model, an SVM (Support Vector Machine), and the like. In some embodiments, the depth estimation model is trained in advance to obtain the trained depth estimation model, so that a more accurate depth map corresponding to the interface image can be obtained based on the trained depth estimation model.

In some embodiments, this step 402 further includes the sub-steps of:

1. inputting the interface image into a depth estimation model, and outputting a predicted depth map by the depth estimation model, wherein the predicted depth map comprises depth values respectively corresponding to elements in a virtual environment represented by pixels in the interface image, and the depth values are normalized depth values;

2. and performing inverse normalization processing on the predicted depth map to obtain the depth map.

By inputting the interface image into the trained depth estimation model, the depth estimation model can obtain a predicted depth map. In some embodiments, since the real depth map used for comparison in the training of the depth estimation model is a normalized depth map, and the training process of the depth estimation model is a process of making the predicted depth map output by the depth estimation model conform to the real depth map as much as possible, the predicted depth map output by the depth estimation model represents normalized depth values corresponding to elements in the virtual environment represented by pixels in the interface image. Therefore, the depth map corresponding to the interface image can be obtained only by performing inverse normalization processing on the predicted depth map output by the depth estimation model. Normalization refers to the process of mapping a set of data into a defined range of values. In the embodiment of the present application, normalization refers to a process of mapping a depth value to a specific value range of the depth value. Denormalization is the inverse of normalization. For a specific implementation manner of normalization and de-normalization in the embodiment of the present application, reference may be made to step 601 described below, which is not described herein again.

In step 403, the depth map is divided into a plurality of regions.

In some embodiments, the depth map is divided into a plurality of regions, different regions corresponding to different directions of movement of the target virtual object. Wherein the plurality of regions includes at least two regions. The depth map may be divided into 2 regions, 3 regions, 4 regions, 5 regions, and 6 regions … …, which is not limited in this embodiment. In one example, as shown in fig. 5, the depth map 50 is divided into 5 equal-sized regions from left to right; the 5 regions are, from left to right, a left region 51, a left front region 52, a right front region 53, a right front region 54, and a right region 55. The different moving directions of the different regions refer to directions in which the target virtual object points to the centers of the corresponding regions, that is, the moving direction corresponding to the left region 51 refers to a direction in which the target virtual object points to the center of the left region 51; the moving direction corresponding to the left front area 52 is a direction in which the target virtual object points to the center of the left front area 52; the moving direction corresponding to the directly front area 53 is a direction in which the target virtual object points to the center of the directly front area 53; the movement direction corresponding to the right front area 54 refers to a direction in which the target virtual object points to the center of the right front area 54; the moving direction corresponding to the right area 55 is a direction in which the target virtual object points to the center of the right area 55.

And step 404, respectively calculating the average value of the depth values in each area to obtain the depth average values respectively corresponding to the areas.

After the depth map is divided to obtain a plurality of areas of the depth map, the depth values respectively corresponding to the pixels in each area are obtained, the average value of the depth values respectively corresponding to the pixels in each area is respectively calculated, and the depth average value respectively corresponding to the areas is obtained.

Step 405, selecting one region from the plurality of regions as a target region according to the depth mean values corresponding to the plurality of regions respectively.

After the depth mean values corresponding to the regions are obtained, one region can be selected from the multiple regions according to the depth mean values to serve as a target region, and the direction of the target region is the next moving direction of the target virtual object.

In some embodiments, if the depth mean of the directly preceding region of the plurality of regions is greater than the depth threshold, the directly preceding region is selected as the target region, and the moving direction corresponding to the directly preceding region is directly in front. In the embodiment of the present application, the area immediately in front is preferentially taken as the target area. And when the depth mean value of the area in front of the vehicle is greater than the depth threshold value, the virtual obstacle in the area in front of the vehicle is far away from the target virtual object, and the area in front of the vehicle is taken as the target area. That is, if the depth mean of the other area is greater than the depth mean of the area immediately in front, the area immediately in front is set as the target area as long as the depth mean of the area immediately in front is greater than the depth threshold, and the moving direction of the target virtual object is ensured to be unchanged as much as possible while the target virtual object is ensured to avoid the virtual obstacle.

In other embodiments, if the depth mean of the directly preceding region is less than the depth threshold, the region with the largest depth mean is selected from the plurality of regions as the target region. And when the depth mean value of the area in front of the vehicle is smaller than the depth threshold value, the virtual obstacle in the area in front of the vehicle is close to the target virtual object, and the area with the largest depth mean value is selected from the multiple areas to serve as the target area.

In still other embodiments, if the depth mean of the directly preceding region is less than the depth threshold and the depth mean of the region closest to the directly preceding region is greater than the depth threshold, the region is taken as the target region. Therefore, the moving direction of the target virtual object is adjusted as small as possible under the condition that the target virtual object can avoid the virtual obstacle.

The depth threshold may be 33, 35, 38, 40, 50, 55, and the like, and specific values of the depth threshold are set by a related technician according to actual situations, which is not limited in the embodiment of the present application.

Step 406, determining the moving direction corresponding to the target area as the moving direction of the target virtual object.

After the target area is determined, the moving direction corresponding to the target area is determined as the moving direction of the target virtual object. For example, if the target area is a forward area, determining the forward area as the moving direction of the target virtual object; if the target area is a left front area, determining the left front as the moving direction of the target virtual object; if the target area is a right front area, determining the right front as the moving direction of the target virtual object; if the target area is a left area, determining a left side as the moving direction of the target virtual object; if the target area is the right area, the right side is determined as the moving direction of the target virtual object.

Step 407, the control target virtual object moves according to the moving direction.

The content of step 407 is the same as or similar to step 204 in the embodiment of fig. 2, and is not described again here.

In summary, according to the technical scheme provided in the embodiment of the present application, the depth map is divided into a plurality of regions, and the target virtual object is moved according to the moving direction corresponding to the region in front or the region closest to the region in front as much as possible, so that on the premise of ensuring that the virtual obstacle is avoided, the moving direction of the target virtual object is adjusted as little as possible or not with a small amplitude, and thus the probability that the target virtual object deviates from the direction of the target position is reduced as much as possible, and the accuracy of the automatic moving process of the target virtual object is improved.

In some embodiments, the training process of the depth estimation model includes the following sub-steps (601-604):

step 601, at least one training sample is obtained.

The sample data of the training sample comprises an interface image sample, and the label data of the training sample comprises a real depth map of the interface image sample.

In some embodiments, the interface image samples used to train the depth estimation model and the interface images used to input the depth estimation model when the depth estimation model is used are from different modes of the same application or the same mode of the same application. For example, a gaming application includes a first gaming mode and a second gaming mode. The environment types of the virtual environment in the first game mode and the second game mode are different, for example, the virtual environment in the first game mode is a sea island, and the virtual environment in the second game mode is a desert; for another example, the virtual environment in the first game mode is sunny, and the virtual environment in the second game mode is rainy. In one example, the interface image samples used to train the depth estimation model are from a first game mode of the game application, and the interface images used to input the depth estimation model when using the depth estimation model are from a second game mode of the game application; or, the interface image sample used for training the depth estimation model comes from the second game mode of the game application program, and the interface image input into the depth estimation model when the depth estimation model is used comes from the first game mode of the game application program. In another example, the interface image samples used to train the depth estimation model and the interface images used to input the depth estimation model when using the depth estimation model are from the first game mode of the game application; or, the interface image sample used for training the depth estimation model and the interface image input into the depth estimation model when the depth estimation model is used are both from the second game mode of the game application.

In other embodiments, the interface image samples used to train the depth estimation model are from a different application than the interface images used to input the depth estimation model when the depth estimation model is used. Alternatively, the different applications may be the same type of application. For example, as shown in fig. 7, the two game applications are a first game application 71 and a second game application 72, respectively, and the first game application 71 and the second game application 72 are the same type of game application as if they were a big-flee survivor type game application. The interface image samples used for training the depth estimation model come from a first game application program of the game application programs, and the interface images input into the depth estimation model when the depth estimation model is used come from a second game application program of the game application programs.

In some embodiments, this step 601 (i.e., obtaining at least one training sample) further includes the following sub-steps:

1. acquiring an interface image sample and an original depth map of the interface image sample;

2. setting the depth value of a target pixel in the original depth map as a threshold value to obtain a processed original depth map; wherein the target pixels comprise pixels with original depth values larger than a threshold value;

3. and normalizing the processed original depth map to obtain a real depth map of the interface image sample.

Optionally, the original depth map of the interface image sample is obtained through a data interface of an application program from which the interface image sample comes. The original depth value of the target pixel refers to the corresponding depth value of the target pixel in the original depth map. In the original depth map, the original depth value of a pixel is positively correlated with the corresponding depth distance. And when the original depth value of the target pixel in the original depth map is larger than the threshold value, setting the depth value as the threshold value to obtain a processed original depth map, and then carrying out normalization processing on the processed original depth map to obtain a real depth map of the interface image sample, so that the dynamic range of the depth value in the depth map of the interface image sample is reduced, and the prediction precision of the depth value at a closer distance is improved.

It should be noted that, the specific value of the threshold is set by a relevant technician according to an actual situation, and this is not limited in the embodiment of the present application.

In some embodiments, the normalization process refers to the following equation one:

the formula I is as follows:

wherein y' represents the depth value in the normalized real depth map, y represents the depth value in the processed original depth map, and H is a constant. Optionally, H is the above threshold. Optionally, setting H to 100, and setting a minimum value of y to 5 (that is, setting a depth value corresponding to a pixel having a depth value smaller than 5 in the processed original depth map to 5), then a value range of y 'is [1, 20], and the larger the value of y', the smaller the depth distance corresponding to the pixel is, so that the depth estimation model is more sensitive to the pixel having the larger normalized depth value (that is, the smaller the corresponding depth distance is), that is, the more accurate the depth estimation model predicts the depth value of the pixel having the smaller depth distance is, thereby increasing the probability that the target virtual object successfully avoids the obstacle in the moving process.

In some embodiments, the above-mentioned process of inverse normalization in step 402 can refer to the following formula two:

the formula II is as follows:

wherein, x represents the depth value in the depth map obtained by the inverse normalization processing, x' represents the normalized depth value in the predicted depth map obtained by the depth estimation model, and H is a constant. Optionally, H is the above threshold.

Step 602, processing the interface image sample through the depth estimation model to obtain a predicted depth map of the interface image sample.

The depth estimation model comprises at least one convolution layer and at least one up-sampling layer, wherein the convolution layer is used for carrying out feature extraction on an interface image sample to obtain a feature map of the interface image sample, and the up-sampling layer is used for carrying out up-sampling on the feature map of the interface image sample to obtain a predicted depth map. Inputting the interface image sample into a depth estimation model, firstly performing feature extraction through at least one convolution layer, then performing upsampling through at least one upsampling layer, and finally outputting to obtain a predicted depth map of the interface image sample. Optionally, before inputting the interface image sample into the depth estimation model, the interface image sample is scaled to a set pixel size. Such as scaling the interface image sample to 640X480 pixels. Optionally, for at least one convolutional layer and at least one upsampling layer, except for the last layer, an active layer exists after the rest layers, and is used for introducing nonlinearity into the depth estimation model, so that the interdependence relation of parameters in the depth estimation model is reduced, the overfitting problem is avoided, and the characterization capability of the depth estimation model is enhanced.

In some embodiments, as illustrated in fig. 8, the depth estimation model includes an encoding layer consisting of 5 convolutional layers 81 and a decoding layer consisting of 5 upsampling layers 82, and the interface image samples 83 scaled to 640 × 480 pixels are input into the depth estimation model, resulting in a predicted depth map 84 of the interface image samples 83 output by the depth estimation model. The convolution layer 81 is configured to perform convolution feature extraction processing on the interface image sample to obtain a feature map of the interface image sample 83, and the upsampling layer 82 is configured to upsample the feature map of the interface image sample 83, so as to restore the feature map of the interface image sample 83 to a size of 640 × 480 pixels. 5 convolutional layers 81 and 5 upsampling layers 82 are arranged, so that the depth estimation model can avoid the model structure from being too complex on the premise of ensuring better prediction accuracy. Fig. 8 is only an illustration of the structure of the depth estimation model, and a person skilled in the relevant art may also flexibly design the structure of the depth estimation model in combination with practical situations, which is not limited in this embodiment of the present application.

Step 603, calculating a loss function value of the depth estimation model according to the predicted depth map and the real depth map of the interface image sample.

And after one round of training of the depth estimation model is finished, calculating the predicted depth value in the predicted depth image and the real depth value in the real depth image to obtain a loss function value of the depth estimation model.

In some embodiments, the formula for calculating the loss function value refers to the following formula three:

the formula III is as follows:

wherein L is a loss function value of the depth estimation model, n is the number of pixels in the interface image sample, y_pIs the depth value, y ', of the p-th pixel in the real depth map'_pIs the depth value of the p-th pixel in the predicted depth map. The goal of the process of training the depth estimation model is to minimize the loss function values, i.e. to make the predicted depth map as close as possible to the true depth map.

And step 604, adjusting parameters of the depth estimation model according to the loss function values.

The smaller the loss function value, the smaller the difference between the predicted depth map and the real depth map. Therefore, after the loss function value is obtained through calculation, the parameters of the depth estimation model are adjusted, so that the loss function value is reduced as much as possible, that is, the predicted depth map is close to the real depth map as much as possible, and the accuracy of the predicted depth map output by the depth estimation model is improved as much as possible.

Optionally, when the loss function value is less than the loss threshold, the depth estimation model training is completed. The specific value of the loss threshold is set by a relevant technician according to an actual situation, and the embodiment of the present application does not limit this.

In summary, in the embodiment of the present application, the interface image samples used for training the depth estimation model come from different modes of the same application program from the interface images used for inputting the depth estimation model when the depth estimation model is used, so that the trained depth estimation model can be migrated from one mode of the application program to another mode of the same application program; or the interface image sample adopted by the training depth estimation model and the interface image input into the depth estimation model when the depth estimation model is used are from different application programs, so that the trained depth estimation model is migrated from one application program to be applied to another application program of the same type, namely, the trained depth estimation model can be applied to different modes similar to the interface image or different application programs, and the universality and the mobility of the depth estimation model are improved.

Referring to fig. 9, a flowchart of a control method for a virtual object according to another embodiment of the present application is shown. The main body of executing each step in this embodiment may be a terminal or a server, which is not limited in this embodiment of the present application.

In some embodiments, the method further comprises the steps (901-903) of:

step 901, acquiring a current position and a current view direction of the target virtual object.

The current position of the target virtual object may refer to a current position of the target virtual object in the virtual environment, and the current view direction refers to a view direction of the target virtual object in the virtual environment. It should be noted that the current view angle refers to a sector area that can be observed in a direction that the target virtual object faces at this time, and the current view angle direction refers to a direction of a central axis of the current view angle. The target viewing angle and the target viewing angle direction are the same, and the description is omitted here.

In some embodiments, obtaining the current position of the target virtual object comprises the sub-steps of:

1. acquiring a current global map of the virtual environment, wherein the global map is used for displaying a topographic map of the virtual environment and a current position mark of a target virtual object;

2. determining coordinates respectively corresponding to pixels corresponding to the current position mark in the global map;

3. and determining the average coordinates of the coordinates respectively corresponding to the pixels corresponding to the current position mark in the global map as the current position.

In some embodiments, obtaining the current global map of the virtual environment includes obtaining an interface image that includes the current global map. In some examples, the global map is located in a corner or side of the user interface in the state of a minimap. By triggering the small map, the small map is expanded to obtain a global map 101 occupying a larger area of the user interface as shown in fig. 10, at this time, a screenshot is performed for the user interface to obtain an interface image 102 including the global map, and in order to highlight the current position, the current position mark 103 occupies a plurality of pixels in the global map, and the current position is located at the geometric center of the current position mark 103. Therefore, the average coordinates of the coordinates respectively corresponding to the pixels corresponding to the current position mark in the global map are the coordinates of the current position in the global map. The current position mark may be a circle, a triangle, a square, a star, or another shape, which is not limited in this embodiment of the present application.

In some embodiments, obtaining the current perspective direction of the target virtual object comprises: and determining the direction indicated by the number in the target number area in the interface image as the current view angle direction. As shown in fig. 11, the interface image 110 includes a view angle display area 111 for indicating a view angle direction, and a direction indicated by a number in a target digital area 112 of the view angle display area 111 is a current view angle direction of the target virtual object. Optionally, the target digital area 112 is an area located in the middle of the interface image 110, or the target digital area 112 is an area located in the middle of the viewing angle display area 111.

In some embodiments, determining the coordinates of the pixels corresponding to the current position marker respectively corresponding to the pixels in the global map includes the following sub-steps:

1. determining the value range of each channel of RGB based on the value of each channel of RGB of the color marked at the current position;

2. and determining the coordinates of the pixels of which the RGB channels meet the value range in the pixels of the global map as the coordinates respectively corresponding to the pixels corresponding to the current position mark in the global map.

The RGB color pattern is a color standard, and various colors are obtained by changing three color channels of Red (R, Red), Green (G, Green), and Blue (B, Blue) and superimposing them on each other.

In some embodiments, the upper and lower thresholds for each channel of RGB are set as:

(R_small,R_big，G_small,G_big，B_small,B_big)；

then, the value range of each channel of RGB is:

(R_big>R>R_small)&(G_big>G>G_small)&(B_big>B>B_small)。

and the pixels with the colors meeting the value range are the pixels of the current position mark, and the corresponding coordinates are the coordinates respectively corresponding to the pixels corresponding to the current position mark in the global map.

Step 902, determining the direction of the current position pointing to the target position as the target view direction.

In some embodiments, the direction pointing to the target position from the current position is determined as the target view direction.

Step 903, when the included angle between the current view angle direction and the target view angle direction is greater than the included angle threshold, adjusting the current view angle direction to the target view angle direction.

After the current visual angle direction and the target visual angle direction are determined, an included angle between the current visual angle direction and the target visual angle direction can be identified, when the included angle between the current visual angle direction and the target visual angle direction is larger than an included angle threshold value, the fact that the current visual angle direction deviates from the target visual angle direction seriously is indicated, and the current visual angle direction is adjusted to the target visual angle direction.

In some embodiments, this step 903 further comprises the following sub-steps:

1. under the condition that the angle corresponding to the target visual angle direction is larger than the angle corresponding to the current visual angle direction, controlling the target virtual object to rotate towards the first direction until the current visual angle direction is overlapped with the target visual angle direction;

2. under the condition that the angle corresponding to the target visual angle direction is smaller than the angle corresponding to the current visual angle direction, controlling the target virtual object to rotate towards a second direction until the current visual angle direction is coincident with the target visual angle direction;

the first direction and the second direction are opposite directions, the angle corresponding to the target visual angle direction is a rotation angle passing through the reference direction vector rotating clockwise to the target visual angle direction, the angle corresponding to the current visual angle direction is a rotation angle passing through the reference direction vector rotating clockwise to the current visual angle direction, and the intersection point of the target visual angle direction, the reference direction vector and the current visual angle direction is the current position. Alternatively, the rotation in the first direction is a right turn and the rotation in the second direction is a left turn.

As shown in fig. 12, the reference direction vector 121 is a vertically upward vector with the current position 122 as a starting point, the target view direction 123 is a direction pointing to the target position 124 with the current position 122 as a starting point, and the reference direction vector 121 rotates clockwise to a rotation angle 125 that the target view direction 123 passes through, that is, an angle corresponding to the target view direction. The angle corresponding to the current viewing direction is the same, and is not described herein again.

Optionally, the steps 901 to 903 are executed once every second interval. The specific values of the second interval duration may be: 5 seconds, 10 seconds, 15 seconds, and the like, and specific values of the second interval duration may be set by a related technician according to an actual situation, which is not limited in the embodiment of the present application.

In the implementation manner, by setting the reference direction vector and respectively acquiring the rotation angles of the current view angle direction and the target view angle direction relative to the reference direction vector, the current view angle direction can be overlapped with the target view angle direction after rotating by the minimum rotation angle, and time is saved.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 13, a block diagram of a control apparatus for a virtual object according to an embodiment of the present application is shown. The device has the function of realizing the control method example of the virtual object, and the function can be realized by hardware or by hardware executing corresponding software. The device may be the terminal or the server described above, or may be provided on the terminal or the server. The apparatus 1300 may include: an image acquisition module 1310, a depth map generation module 1320, a direction determination module 1330, and a movement control module 1340.

The image obtaining module 1310 is configured to obtain an interface image of a target virtual object at a current view angle, where the interface image is used to show a virtual environment of the current view angle, and the virtual environment includes the target virtual object.

The depth map generation module 1320 is configured to generate a depth map of the interface image, where the depth map is used to characterize a depth distance of an element of the virtual environment in the interface image relative to the target virtual object.

The direction determining module 1330 is configured to determine a moving direction of the target virtual object according to the depth map.

The movement control module 1340 is configured to control the target virtual object to move according to the movement direction.

In some embodiments, as shown in fig. 14, the direction determination module 1330 includes: a region division sub-module 1331, a mean calculation sub-module 1332, a region selection sub-module 1333, and a direction determination sub-module 1334.

The region dividing sub-module 1331 is configured to divide the depth map into a plurality of regions, where different regions correspond to different moving directions of the target virtual object.

The mean value calculating submodule 1332 is configured to calculate a mean value of the depth values in each region respectively, so as to obtain depth mean values corresponding to the plurality of regions respectively.

The region selection sub-module 1333 is configured to select one region from the multiple regions as a target region according to the depth mean values corresponding to the multiple regions, respectively.

The direction determining submodule 1334 is configured to determine a moving direction corresponding to the target area as a moving direction of the target virtual object.

In some embodiments, the plurality of regions includes a region directly in front, the region directly in front corresponding to a direction of movement of the target virtual object being directly in front; as shown in fig. 14, the region selection sub-module 1333 is configured to:

if the depth mean value of the area right in front of the plurality of areas is greater than a depth threshold value, selecting the area right in front as the target area;

and if the depth mean value of the area right in front is smaller than the depth threshold value, selecting the area with the largest depth mean value from the plurality of areas as the target area.

In some embodiments, as shown in fig. 14, the region division sub-module 1331 is configured to:

dividing the depth map into 5 regions with the same size from left to right; the 5 areas are a left area, a left front area, a right front area and a right side area from left to right in sequence.

In some embodiments, as shown in fig. 14, the depth map generation module 1320 is configured to:

the depth estimation model is a machine learning model used for estimating the depth distance, the predicted depth map comprises depth values respectively corresponding to elements of the virtual environment in the interface image, and the depth values are normalized depth values;

and the depth map processing module is used for carrying out inverse normalization processing on the predicted depth map to obtain the depth map. In some embodiments, as shown in fig. 14, the apparatus 1300 further comprises: a sample acquisition module 1350, an image processing module 1360, a loss calculation module 1370, and a parameter adjustment module 1380.

The sample obtaining module 1350 is configured to obtain at least one training sample, where the sample data of the training sample includes an interface image sample, and the tag data of the training sample includes a real depth map of the interface image sample.

The image processing module 1360 is configured to process the interface image sample through the depth estimation model to obtain a predicted depth map of the interface image sample; the depth estimation model comprises at least one convolution layer and at least one up-sampling layer, wherein the convolution layer is used for carrying out feature extraction on the interface image sample to obtain a feature map of the interface image sample, and the up-sampling layer is used for carrying out up-sampling on the feature map of the interface image sample to obtain the prediction depth map.

The loss calculating module 1370 is configured to calculate a loss function value of the depth estimation model according to the predicted depth map and the real depth map of the interface image sample.

The parameter adjusting module 1380 is configured to adjust parameters of the depth estimation model according to the loss function value.

In some embodiments, as shown in fig. 14, the sample acquisition module 1350 is configured to:

acquiring the interface image sample and an original depth map of the interface image sample;

setting the depth value of a target pixel in the original depth map as a threshold value to obtain a processed original depth map; wherein the target pixels comprise pixels having original depth values greater than the threshold value;

and carrying out normalization processing on the processed original depth map to obtain a real depth map of the interface image sample.

In some embodiments, as shown in fig. 14, the apparatus 1300 further comprises: a position acquisition module 1390 and a direction adjustment module 1395.

The orientation obtaining module 1390 is configured to obtain a current position and a current view direction of the target virtual object.

The direction determining module 1330 is further configured to determine the direction in which the current position points to the target position as the target view direction.

The direction adjustment module 1395 is configured to adjust the current view direction to the target view direction when an included angle between the current view direction and the target view direction is greater than an included angle threshold.

In some embodiments, as shown in fig. 14, the position acquisition module 1390 includes: a map acquisition sub-module 1391, a coordinate determination sub-module 1392, and a location determination sub-module 1393.

The map obtaining sub-module 1391 is configured to obtain a current global map of the virtual environment, where the global map is used to show a topographic map of the virtual environment and a current location mark of the target virtual object.

The coordinate determination submodule 1392 is configured to determine coordinates of the pixels corresponding to the current position marker in the global map respectively.

The position determining sub-module 1393 is configured to determine, as the current position, an average coordinate of coordinates respectively corresponding to pixels corresponding to the current position marker in the global map.

The direction determining submodule 1334 is further configured to identify direction information in a direction prompt area in the interface image, so as to obtain the current view direction.

In some embodiments, as shown in fig. 14, the apparatus 1300 further comprises: a color value acquisition module 1396 to: and obtaining the values of the color RGB channels corresponding to the pixels in the global map respectively.

The coordinate determination sub-module 1392, configured to: determining the value range of each channel of RGB based on the value of each channel of RGB of the color of the current position mark;

and determining the coordinates of the pixels of which the RGB channels meet the value range in the pixels of the global map as the coordinates respectively corresponding to the pixels corresponding to the current position mark in the global map.

In some embodiments, as shown in fig. 14, the direction adjustment module 1395 is configured to:

under the condition that the angle corresponding to the target visual angle direction is larger than the angle corresponding to the current visual angle direction, controlling the target virtual object to rotate towards a first direction until the current visual angle direction is overlapped with the target visual angle direction;

under the condition that the angle corresponding to the target visual angle direction is smaller than the angle corresponding to the current visual angle direction, controlling the target virtual object to rotate towards a second direction until the current visual angle direction is coincident with the target visual angle direction;

the first direction and the second direction are opposite directions, the angle corresponding to the target view direction is a rotation angle through which a reference direction vector rotates clockwise to the target view direction, the angle corresponding to the current view direction is a rotation angle through which the reference direction vector rotates clockwise to the current view direction, and an intersection point of the target view direction, the reference direction vector and the current view direction is the current position.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 15, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device is used for implementing the control method of the virtual object provided in the above embodiment. Specifically, the method comprises the following steps:

the computer apparatus 1500 includes a CPU (Central Processing Unit) 1501, a system Memory 1504 including a RAM (Random Access Memory) 1502 and a ROM (Read-Only Memory) 1503, and a system bus 1505 connecting the system Memory 1504 and the Central Processing Unit 1501. The computer device 1500 also includes a basic I/O (Input/Output) system 1506 that facilitates transfer of information between devices within the computer, and a mass storage device 1507 that stores an operating system 1513, application programs 1514, and other program modules 1515.

The basic input/output system 1506 includes a display 1508 for displaying information and an input device 1509 such as a mouse, keyboard, etc. for a user to input information. Wherein the display 1508 and the input device 1509 are connected to the central processing unit 1501 via an input output controller 1510 connected to the system bus 1505. The basic input/output system 1506 may also include an input/output controller 1510 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller 1510 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1507 is connected to the central processing unit 1501 through a mass storage controller (not shown) connected to the system bus 1505. The mass storage device 1507 and its associated computer-readable media provide non-volatile storage for the computer device 1500. That is, the mass storage device 1507 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact disk Read-Only Memory) drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD (Digital Video Disc) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1504 and mass storage device 1507 described above may be collectively referred to as memory.

According to various embodiments of the present application, the computer device 1500 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1500 may be connected to the network 1512 through the network interface unit 1511 connected to the system bus 1505 or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 1511.

In some embodiments, there is further provided a computer-readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which when executed by a processor, implements the above-described control method for a virtual object.

Optionally, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory), SSD (Solid State drive), or optical disk. The Random Access Memory may include a ReRAM (resistive Random Access Memory) and a DRAM (Dynamic Random Access Memory).

In some embodiments, there is also provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and executes the computer instructions to cause the computer device to execute the control method of the virtual object.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for controlling a virtual object, the method comprising:

controlling the target virtual object to move according to the moving direction;

acquiring a current global map of the virtual environment and a current view angle direction of the target virtual object, wherein the global map is used for displaying a topographic map of the virtual environment and a current position mark of the target virtual object;

determining coordinates respectively corresponding to the pixels corresponding to the current position marks in the global map;

determining the average coordinates of the coordinates respectively corresponding to the pixels corresponding to the current position mark in the global map as the current position of the target virtual object;

determining the direction of the current position pointing to the target position as the direction of the target visual angle;

and under the condition that the included angle between the current visual angle direction and the target visual angle direction is greater than the included angle threshold value, adjusting the current visual angle direction to the target visual angle direction.

2. The method of claim 1, wherein determining the direction of movement of the target virtual object from the depth map comprises:

dividing the depth map into a plurality of regions, different regions corresponding to different moving directions of the target virtual object;

respectively calculating the average value of the depth values in each area to obtain the depth average values respectively corresponding to the areas;

selecting one area from the plurality of areas as a target area according to the depth mean values corresponding to the plurality of areas respectively;

and determining the moving direction corresponding to the target area as the moving direction of the target virtual object.

3. The method of claim 2, wherein the plurality of regions includes a region directly in front of which the moving direction of the target virtual object corresponds to is directly in front;

selecting one region from the plurality of regions as a target region according to the depth mean values corresponding to the plurality of regions respectively, including:

4. The method of claim 2, wherein the dividing the depth map into a plurality of regions comprises:

dividing the depth map into 5 regions with the same size from left to right;

the 5 areas are a left area, a left front area, a right front area and a right side area from left to right.

5. The method of claim 1, wherein generating the depth map of the interface image comprises:

inputting the interface image into a depth estimation model, and outputting a predicted depth map by the depth estimation model, wherein the depth estimation model is a machine learning model used for estimating the depth distance, the predicted depth map comprises depth values respectively corresponding to elements of the virtual environment in the interface image, and the depth values are normalized depth values;

and carrying out reverse normalization processing on the predicted depth map to obtain the depth map.

6. The method of claim 5, wherein the depth estimation model is trained as follows:

obtaining at least one training sample, wherein sample data of the training sample comprises an interface image sample, and label data of the training sample comprises a real depth map of the interface image sample;

processing the interface image sample through the depth estimation model to obtain a predicted depth map of the interface image sample; the depth estimation model comprises at least one convolutional layer and at least one upsampling layer, wherein the convolutional layer is used for extracting the characteristics of the interface image sample to obtain a characteristic diagram of the interface image sample, and the upsampling layer is used for upsampling the characteristic diagram of the interface image sample to obtain the predicted depth diagram;

calculating a loss function value of the depth estimation model according to the predicted depth map and the real depth map of the interface image sample;

and adjusting parameters of the depth estimation model according to the loss function value.

7. The method of claim 6, wherein the obtaining at least one training sample comprises:

8. The method of any one of claims 1 to 7, wherein obtaining a current perspective direction of the target virtual object comprises:

and identifying direction information in a direction prompt area in the interface image to obtain the current view direction.

9. The method of claim 1, further comprising: obtaining the values of RGB channels corresponding to pixels in the global map respectively;

the determining the coordinates respectively corresponding to the pixels corresponding to the current position marker in the global map includes:

determining the value range of each RGB channel based on the value of each RGB channel marked by the current position;

10. The method according to any one of claims 1 to 7, wherein the adjusting the current view direction to the target view direction comprises:

11. An apparatus for controlling a virtual object, the apparatus comprising:

the movement control module is used for controlling the target virtual object to move according to the movement direction;

the orientation acquisition module is used for acquiring a current global map of the virtual environment and a current view angle direction of the target virtual object, wherein the global map is used for displaying a topographic map of the virtual environment and a current position mark of the target virtual object; determining coordinates respectively corresponding to the pixels corresponding to the current position marks in the global map; determining the average coordinates of the coordinates respectively corresponding to the pixels corresponding to the current position mark in the global map as the current position of the target virtual object;

the direction determining module is further configured to determine the direction in which the current position points to the target position as a target view direction;

and the direction adjusting module is used for adjusting the current visual angle direction to the target visual angle direction under the condition that an included angle between the current visual angle direction and the target visual angle direction is greater than an included angle threshold value.

12. A computer device comprising a processor and a memory, said memory having stored therein at least one instruction, at least one program, set of codes or set of instructions, which is loaded and executed by said processor to implement a method of controlling a virtual object according to any one of claims 1 to 10.

13. A computer-readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of controlling a virtual object according to any one of claims 1 to 10.