CN112711335B

CN112711335B - Virtual environment picture display method, device, equipment and storage medium

Info

Publication number: CN112711335B
Application number: CN202110070087.XA
Authority: CN
Inventors: 罗飞虎
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2023-02-03
Anticipated expiration: 2041-01-19
Also published as: CN112711335A

Abstract

The application discloses a display method, a display device, display equipment and a storage medium of a virtual environment picture. The method comprises the following steps: acquiring a target human body image of an interactive object, and interacting the interactive object with a virtual environment through a virtual environment picture; identifying key human body nodes of the target human body image to determine a target action corresponding to the target human body image; and acquiring target virtual environment picture data matched with the target action, and displaying a target virtual environment picture based on the target virtual environment picture data. In this way, the target action is automatically determined by using the computer vision technology on the basis of the target human body image of the interactive object, and the real action of the interactive object can be reflected. The matching degree of the real actions of the target virtual environment picture and the interactive object displayed based on the target virtual environment picture data matched with the target action is high, so that better experience feeling can be brought to the interactive object, and the improvement of the interaction rate between the interactive object and the virtual environment is facilitated.

Description

Virtual environment picture display method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for displaying a virtual environment picture.

Background

With the rapid development of computer technology, more and more applications are capable of providing virtual environments. The terminal can display the virtual environment picture, so that the interactive object can experience various virtual environments by watching the virtual environment picture. Such as a map environment, a mall environment, a gaming environment, etc.

In the related art, a control button is displayed in a display interface of a terminal, the terminal responds to a trigger operation of an interactive object on the control button, obtains virtual environment picture data matched with the trigger operation, and then displays a virtual environment picture based on the virtual environment picture data.

In the process, the acquiring process of the virtual environment picture data has lower intelligence and poorer flexibility, and in addition, because the virtual environment picture data acquired by the terminal is matched with the triggering operation, the experience feeling brought to the interactive object by the virtual environment picture displayed based on the virtual environment picture data is poorer, and the interaction rate between the interactive object and the virtual environment is lower.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for displaying a virtual environment picture, which can be used for improving the interaction rate between an interaction object and a virtual environment. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for displaying a virtual environment picture, where the method includes:

acquiring a target human body image of an interactive object, wherein the interactive object is interacted with a virtual environment through a virtual environment picture;

carrying out human body key node identification on the target human body image to determine a target action corresponding to the target human body image;

and acquiring target virtual environment picture data matched with the target action, and displaying a target virtual environment picture based on the target virtual environment picture data.

In another aspect, there is provided a display apparatus of a virtual environment screen, the apparatus including:

the first acquisition unit is used for acquiring a target human body image of an interactive object, and the interactive object is interacted with a virtual environment through a virtual environment picture;

the determining unit is used for carrying out human key node identification on the target human image so as to determine a target action corresponding to the target human image;

the second acquisition unit is used for acquiring target virtual environment picture data matched with the target action;

and the display unit is used for displaying the target virtual environment picture based on the target virtual environment picture data.

In one possible implementation manner, the target human body image is another image except for the first frame image in the human body video of the interactive object; the determining unit is used for carrying out human body key node identification on the target human body image to obtain a target human body key node identification result; obtaining target posture data corresponding to the target human body image based on the target human body key node identification result; extracting basic attitude data corresponding to the first frame of image; and determining a target action corresponding to the target human body image based on a comparison result between the target posture data and the basic posture data.

In one possible implementation, the target pose data includes at least one of a target leg height, a target body width, a target body height, a target left ankle ordinate, and a target right ankle ordinate, and the base pose data includes at least one of a base leg height, a base body width, a base body height, a base left ankle ordinate, and a base right ankle ordinate; the determination unit further to determine that the target action comprises walking in response to the target leg height being less than the product of the base leg height and a first value; in response to the target body width being less than a product of the base body width and a second value and the target left ankle ordinate being less than the target right ankle ordinate, determining that the target action comprises a left turn; responsive to the target body width being less than a product of the base body width and a second value, and the target left ankle ordinate being greater than the target right ankle ordinate, determining that the target action comprises a right turn; in response to the target body height being less than the product of the base body height and a third value, determining that the target action comprises squatting; in response to the target left ankle ordinate being less than the base left ankle ordinate, and the target right ankle ordinate being less than the base right ankle ordinate, determining that the target action comprises a jump-up.

In a possible implementation manner, the second obtaining unit is configured to determine, based on the target action, a target state parameter of a virtual camera in the virtual environment; and taking the virtual environment picture data acquired by the virtual camera with the target state parameters as target virtual environment picture data matched with the target action.

In one possible implementation manner, the target human body image is another image except for the first frame image in the human body video of the interactive object; the second obtaining unit is further configured to, in response to that the target action meets a first condition, use a reference state parameter as a target state parameter of the virtual camera, where the reference state parameter is a state parameter of the virtual camera determined based on a reference human body image, and the reference human body image is an image of a frame before the target human body image in the human body video of the interactive object; and responding to the target action and meeting a second condition, adjusting the reference state parameter, and taking the state parameter obtained after adjustment as the target state parameter of the virtual camera, wherein the second condition is different from the first condition.

In one possible implementation, the satisfying the first condition includes: the target action does not comprise walking, the target action and a reference action meet a matching condition, and the reference action is used as an action corresponding to the reference human body image; the second condition is satisfied, including: the target action comprises walking, and the walking direction corresponding to the target action is the propelling direction of the virtual camera; or, the target action does not include walking and the matching condition is not satisfied between the target action and the reference action; wherein the meeting of the matching condition comprises: the target motion is completely consistent with the reference motion, or the target motion is standing and the reference motion is walking.

In a possible implementation manner, the second obtaining unit is further configured to determine a target adjustment manner based on the target action and the reference action in response to that the target action satisfies a second condition; and adjusting the reference state parameters according to the target adjustment mode, and taking the state parameters obtained after adjustment as the target state parameters of the virtual camera.

In a possible implementation manner, the display unit is further configured to display first prompt information in response to that the target motion includes walking and a walking direction corresponding to the target motion is a non-pushable direction of the virtual camera, where the first prompt information is used to prompt the interactive object to adjust the walking direction.

In one possible implementation, the apparatus further includes:

a replacing unit for replacing the reference state parameter with the target state parameter.

In a possible implementation manner, the determining unit is further configured to obtain a human body key node identification model; and calling the human body key node identification model to carry out human body key node identification on the target human body image.

In a possible implementation manner, the determining unit is further configured to perform human key node identification on a first frame image in a human video of the interactive object, and obtain basic pose data corresponding to the first frame image based on a basic human key node identification result;

the device further comprises:

and the storage unit is used for storing the basic attitude data corresponding to the first frame of image.

In a possible implementation manner, the display unit is further configured to respond to a camera call request and display camera call authorization information; responding to a confirmation instruction of the camera for calling the authorization information, and displaying second prompt information, wherein the second prompt information is used for prompting the interactive object to stand according to a reference posture;

the device further comprises:

and the calling unit is used for responding to the interactive object to stand according to the reference posture, calling the target camera to collect the human body video of the interactive object, and acquiring the image of the interactive object standing according to the reference posture by using a first frame image in the human body video of the interactive object.

In another aspect, a computer device is provided, and the computer device includes a processor and a memory, where at least one computer program is stored in the memory, and the at least one computer program is loaded by the processor and executed to implement any one of the above-mentioned virtual environment picture display methods.

In another aspect, a computer-readable storage medium is provided, where at least one computer program is stored, and the at least one computer program is loaded and executed by a processor to implement any one of the above-mentioned methods for displaying a virtual environment picture.

In another aspect, a computer program product or a computer program is also provided, comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer readable storage medium, and executes the computer instruction, so that the computer device executes any one of the above-mentioned display methods of the virtual environment picture.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

in the embodiment of the application, the terminal displays the target virtual environment picture based on the target virtual environment picture data matched with the target action, the target action is automatically determined according to the target human body image of the interactive object, and the real action of the interactive object can be reflected. In addition, because the target virtual environment picture data is matched with the target action capable of reflecting the real action of the interactive object, the displayed target virtual environment picture has high matching degree with the real action of the interactive object, can bring better experience to the interactive object, and is beneficial to improving the interaction rate between the interactive object and the virtual environment.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a display method of a virtual environment picture according to an embodiment of the present application;

fig. 2 is a flowchart of a method for displaying a virtual environment screen according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a human body key node provided in an embodiment of the present application;

fig. 4 is a schematic diagram of human key node identification performed on a target human body image according to an embodiment of the present application;

FIG. 5 is a schematic diagram of another human key node identification performed on a target human image according to an embodiment of the present application;

fig. 6 is a schematic diagram of a process of adjusting a reference state parameter of a virtual camera according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of a display process of a virtual environment screen according to an embodiment of the present application;

FIG. 8 is a diagram of a map environment screen provided in an embodiment of the present application;

fig. 9 is a schematic diagram of an exhibition hall environment picture provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of a room interior environment screen provided in an embodiment of the present application;

fig. 11 is a schematic diagram of a display device of a virtual environment screen according to an embodiment of the present application;

FIG. 12 is a diagram of another display device for displaying a virtual environment screen according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Several terms referred to in the embodiments of the present application are explained:

virtual environment: the application program provides (or displays) an environment when running on the terminal, which may be a two-dimensional virtual environment, a 2.5-dimensional virtual environment, or a three-dimensional virtual environment. The virtual environment may be a simulation environment of the real world, a semi-simulation semi-fictional environment, or a pure fictional environment. Illustratively, the virtual environment in the embodiment of the present application is a three-dimensional virtual environment. The terminal can display the virtual environment picture for the user to watch.

getUserMedia: an API (Application Programming Interface) of HTML5 (HyperText Markup Language 5) provides an Interface for a user to access hardware media (camera, video, audio, geographical location, etc.), and based on the Interface, the user can access the hardware media device without relying on any browser plug-in.

Getusermedia (): the user is prompted to give permission to use the media input which results in a MediaStream containing the track of the requested media type. The media stream may comprise a video track (from a hardware or virtual video source, such as a camera, video capture device, screen sharing service, etc.), an audio track (again from a hardware or virtual audio source, such as a microphone, a/D converter, etc.), and possibly other track types.

Tensorflow: an end-to-end open source machine learning platform. It has a comprehensive and flexible ecosystem containing various tools, libraries and community resources that can assist researchers in driving the development of advanced machine learning techniques and enable developers to easily build and deploy applications supported by machine learning.

PoseNet: the detection can be performed by detecting human body key nodes in the image or the video. Human body key node detection can use the format of 'numbered positions' for indexing, and a detection result of the positions is accompanied by a trust value. The trust value ranges from 0.0 to 1.0, and 1.0 is the highest trust value.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The embodiment of the application relates to a Computer Vision (CV) technology in an artificial intelligence technology. Computer vision is a science for researching how to make a machine "see", and more specifically, it refers to that a camera and a computer are used to replace human eyes to perform machine vision such as identification and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-Dimensional object reconstruction, 3D (Three Dimensional) technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and further include common biometric technologies such as face Recognition and fingerprint Recognition.

In an exemplary embodiment, machine learning/deep learning techniques of artificial intelligence techniques can also be utilized in identifying images. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

Referring to fig. 1, a schematic diagram of an implementation environment of a method for displaying a virtual environment picture provided in an embodiment of the present application is shown. The implementation environment includes: a terminal 101.

The terminal 101 is installed with an application capable of providing a virtual environment. The type of the application program capable of providing the Virtual environment is not limited in the embodiments of the present application, and the application program capable of providing the Virtual environment may be, for example, a game application program, a Virtual Reality (VR) application program, an Augmented Reality (AR) application program, a three-dimensional map program, a social application program, an interactive entertainment application program, or the like.

The terminal 101 can display a virtual environment screen. In the process of displaying the virtual environment picture, the terminal 101 can acquire a target human body image of the interactive object, and determine a target action corresponding to the target human body image according to the target human body image; and then acquiring target virtual environment picture data matched with the target action, and displaying a target virtual environment picture based on the target virtual environment picture data.

In one possible implementation, the terminal 101 is a smartphone, a tablet computer, a laptop computer, or a desktop computer. Terminal 101 may generally refer to one of a plurality of terminals. Those skilled in the art will appreciate that the terminal 101 is only an example, and other existing or future existing terminals, such as may be suitable for use in the present application, are also included within the scope of the present application and are hereby incorporated by reference.

Based on the implementation environment shown in fig. 1, an embodiment of the present application provides a method for displaying a virtual environment screen, which is applied to the terminal 101 as an example. As shown in fig. 2, the method provided in the embodiment of the present application includes the following steps 201 to 203.

In step 201, a target human body image of an interactive object is acquired, and the interactive object interacts with a virtual environment through a virtual environment picture.

The interactive object refers to a user of the terminal, and in the embodiment of the application, the interactive object can interact with the virtual environment through the virtual environment picture. The virtual environment is provided by a target application in the terminal, that is, the interactive object can interact with the virtual environment provided by the target application through the virtual environment screen. The purpose of interacting the interactive object with the virtual environment is not limited in the embodiments of the present application, and exemplarily, the purpose of interacting the interactive object with the virtual environment is as follows: the interactive object learns about the various environmental elements in the virtual environment by interacting with the virtual environment. For example, if the virtual environment is an internal environment of a house, the purpose of the interaction between the interaction object and the virtual environment is: the interactive object learns the situation of each room in the house, such as the number of rooms, the layout of the rooms, etc., by interacting with the internal environment of the house.

Illustratively, the terminal displays the virtual environment picture based on virtual environment picture data, which refers to data collected by a virtual camera in the virtual environment. It should be noted that, in the embodiment of the present application, the type and the number of the virtual cameras in the virtual environment are not limited, and can be flexibly set by a developer of the virtual environment.

The type of the target application program providing the virtual environment is not limited in the embodiments of the present application. Exemplary types of target applications include, but are not limited to: a game-type application, a Virtual Reality (VR) type application, an Augmented Reality (AR) type application, a three-dimensional map program, a social-type application, and an interactive entertainment type application. In an exemplary embodiment, the target application may also be a browser, illustratively a Web browser, that provides a virtual environment.

The types of virtual environments provided by different types of target applications are different, and the types of virtual environments are not limited in the embodiments of the present application. Exemplary virtual environments include, but are not limited to: map environment, game environment, mall environment, exhibition hall environment, and house interior environment. Illustratively, the map environment is an environment obtained by processing a real map environment on line, and the map environment enables an interactive object to comprehensively understand surrounding facilities of a certain position, and can be provided by a three-dimensional map program. The game environment is an environment for the virtual object to perform activities, which is set in advance by a developer of the game, and can be provided by a game-type application program.

Exemplarily, the mall environment is obtained by rendering a real mall picture to a 3D space, and the mall environment can meet the requirement of online browsing of the mall by the interactive object; the exhibition hall environment is obtained by rendering a real exhibition hall picture into a 3D space, and can meet the requirement of an interactive object on-line visiting the exhibition hall; the house internal environment is an environment obtained after real house pictures are rendered into a 3D space, and the house internal environment can meet the requirement of watching houses on an interactive object line. In an exemplary embodiment, the mall environment, the exhibition hall environment, and the house interior environment are provided by a virtual reality type application or an augmented reality type application.

The target human body image of the interactive object is obtained from the human body video of the interactive object. In one possible implementation manner, the process of acquiring the human body video of the interactive object includes the following steps 1 to 3:

step 1: and responding to the camera calling request, and displaying camera calling authorization information.

The camera calling request is used for indicating that the camera on the terminal needs to be called. The embodiment of the present application does not limit the manner of acquiring the camera call request. In one possible implementation manner, the terminal automatically acquires the camera call request when the call opportunity is met. The calling condition is set according to experience, or flexibly adjusted according to a mode of providing the virtual environment by the target application program, which is not limited in the embodiment of the present application. For example, the call opportunity refers to when the terminal displays the first frame virtual environment screen of the virtual environment provided by the target application program. The invocation occasion may also be, for example, when the terminal successfully starts the target application.

In another possible implementation manner, the terminal obtains the camera call request according to a camera call instruction generated by the interactive object. The embodiment of the application does not limit the way in which the interactive object generates the camera call instruction. Illustratively, the way in which the interactive object generates the camera call instruction is as follows: and triggering the camera to call the control by the interactive object. The interactive object can trigger the camera to call the control when the displayed virtual environment picture is required to be controlled by utilizing the self action.

After the terminal acquires the camera calling request, the camera calling authorization information is displayed to acquire the feedback of the interactive object to the camera calling authorization information, and whether to continue to execute the subsequent steps is determined according to the feedback of the interactive object to the camera calling authorization information so as to ensure the privacy security of the interactive object. In one possible implementation, the process of displaying camera call authorization information is implemented based on the mediadevices.

After the camera calling authorization information is displayed, the interactive object can generate feedback on the camera calling authorization information. The feedback generated by the interactive object to the camera calling authorization information comprises two conditions: and the interactive object confirms the camera calling authorization information, or refuses the camera calling authorization information. When the interactive object confirms the camera calling authorization information, the terminal acquires a confirmation instruction of the camera calling authorization information and executes the step 2; when the interactive object refuses the camera to call the authorization information, the terminal obtains a refusing instruction of the camera to call the authorization information, and under the condition, the terminal cannot continuously execute the display method of the virtual environment picture provided by the embodiment of the application.

And 2, step: and responding to a confirmation instruction of calling the authorization information by the camera, and displaying second prompt information, wherein the second prompt information is used for prompting the interactive object to stand according to the reference posture.

And after a confirmation instruction of the camera calling authorization information is acquired, displaying second prompt information. The second prompt information is used for prompting the interactive object to stand according to the reference posture. The reference posture is set empirically or flexibly adjusted according to an application scenario, which is not limited in the embodiments of the present application. Illustratively, the reference pose refers to a specified frontal pose. In addition, the embodiment of the application does not limit the representation form of the second prompt message, and for example, the representation form of the second prompt message is a character; or the representation form of the second prompt message is a reference human body dotted line graph matched with the reference posture.

After the second prompt message is displayed, the interactive object adjusts the posture of the interactive object according to the second prompt message so as to stand according to the reference posture. The reference posture is convenient for calculating the posture of basic posture data of the interactive object, and the basic posture data of the interactive object is used for providing data support for the process of determining the corresponding action of each human body image.

And step 3: responding to the interactive object to stand according to the reference posture, calling a target camera to collect the human body video of the interactive object, and collecting the first frame image in the human body video of the interactive object by carrying out image collection on the interactive object standing according to the reference posture.

And when the interactive object is determined to stand according to the reference posture, the terminal calls the target camera to acquire the human body video of the interactive object. In this case, the first frame image in the human body video of the interactive object is an image obtained by the target camera performing image acquisition on the interactive object standing according to the reference posture. It should be noted that after the interactive object stands according to the reference posture, the posture of the interactive object can be adjusted, that is, other images except the first frame image in the human body video may be images obtained by image acquisition of the interactive object after the posture adjustment.

It should be further noted that the camera called here for acquiring the human body video of the interactive object refers to a physical camera on the terminal, and compared with a virtual camera in the virtual environment, the camera called here is a camera that actually exists on the terminal, and is distinguished from the virtual camera in the virtual environment, and the camera called here is referred to as a target camera. The number of the target cameras and the positions of the target cameras on the terminal are not limited, and the number of the target cameras and the positions of the target cameras on the terminal may be different according to different types of the terminal.

The human body video of the interactive object is obtained by continuously acquiring images of the interactive object through a target camera, and the human body video of the interactive object is formed by multi-frame images. In a possible implementation manner, before a target camera is called to collect a human body video of an interactive object, a terminal creates a video (video) tag, initializes the video tag, and then binds a video stream (team) of the human body video collected by the target camera to the video tag. Exemplarily, the process of binding a video stream (team) of a human body video captured by the target camera to the video tag is implemented by video.

In a possible implementation manner, in the process of calling the target camera to collect the human body video of the interactive object, the terminal can sequentially display each frame of human body image of the interactive object collected by the target camera so as to enable the interactive object to be viewed.

In one possible implementation manner, in the process of calling the target camera to acquire the human body video of the interactive object, a time schedule controller consistent with the acquisition frame frequency of the target camera is created, and the time schedule controller is used for determining which frame of image in the human body video should be used as the target human body image of the interactive object. The target human body image is a frame image which needs human body key node identification at present. Exemplarily, the target human body image refers to a first frame image of at least one frame image of the human body video on which the human body key node identification operation has not been performed.

In an exemplary embodiment, the process of performing the human body key node identification operation on the target human body image does not interfere with the process of acquiring the human body video by using the target camera, and the two processes can be executed simultaneously. Illustratively, if the efficiency of the terminal performing the human body key node identification operation on the target human body image is high enough, the target human body image is the latest frame image in the human body video. For example, if the efficiency of the terminal performing the human body key node identification operation on the target human body image is low, the target human body image is a first frame image in a plurality of frame images in the human body video, on which the human body key node identification operation is not performed.

In the embodiment of the present application, the target human body image refers to other images except for the first frame image in the human body video of the interactive object. It should be noted that, as time advances, each frame image in the human body video except the first frame image is sequentially used as a target human body image required for executing the embodiment of the present application, and the embodiment of the present application only introduces the display method of the virtual environment picture from the perspective of one target human body image.

The first frame image in the human body video refers to an image obtained by acquiring an image of an interactive object standing according to a reference posture by a target camera, and in the first frame image, the terminal displays a default virtual environment picture based on default virtual environment picture data, wherein the default virtual environment picture data refers to data acquired by a virtual camera with default state parameters. The default state parameters are set empirically by the developer of the virtual environment, and are not limited by the embodiments of the present application.

In step 202, human key node recognition is performed on the target human body image to determine a target action corresponding to the target human body image.

And after the target human body image of the interactive object is obtained, identifying the key human body nodes of the target human body image. By carrying out human body key node identification on the target human body image, the target action corresponding to the target human body image can be determined. The target action corresponding to the target human body image is used for indicating the action of the interactive object at the acquisition moment of the target human body image, that is, the target action can reflect the real action of the interactive object. In the embodiment of the application, different virtual environment picture data matched with different actions can be continuously acquired according to the continuously changing actions of the interactive object, and then different virtual environment pictures are displayed, so that the effect of updating the displayed virtual environment pictures according to the actions of the interactive object can be achieved.

The number and specific type of the human body key nodes are not limited, and may be set empirically, and illustratively, as shown in fig. 3, the number of the human body key nodes is 17, and is respectively a left eye 31, a right eye 32, a left ear 33, a right ear 34, a nose 35, a left shoulder 36, a right shoulder 37, a left elbow 38, a right elbow 39, a left wrist 310, a right wrist 311, a left hip 312, a right hip 313, a left knee 314, a right knee 315, a left ankle 316, and a right ankle 317.

In a possible implementation manner, the process of identifying the key human node in the target human body image is as follows: acquiring a human body key node identification model; and calling a human body key node recognition model to perform human body key node recognition on the target human body image.

The human body key node identification model is used for identifying human body key nodes in the image, the type of the human body key node identification model is not limited in the embodiment of the application, and only the human body key nodes can be identified in the image. Illustratively, the human body key node recognition model is a PoseNet (pose network) model. Illustratively, poseNet can be implemented in accordance with a variety of different model structures, such as ResNet50 (residual network 50), or MobileNet (mobile network).

It should be noted that the process of obtaining the human body key node recognition model in step 202 may refer to training an initial human body key node recognition model by using training data to obtain a trained human body key node recognition model; the method may also refer to obtaining a pre-downloaded human body key node identification model, which is not limited in the embodiment of the present application. In the embodiment of the application, the human body key node identification model refers to a human body key node identification model downloaded in advance by a terminal as an example, so that the efficiency of identifying the human body key node on the image can be improved.

In an exemplary embodiment, the human key node identification model is a PoseNet model. The PoseNet model is a human body key node recognition model trained in a Tensorflow platform, and the PoseNet model can be downloaded in advance by initializing the Tensorflow platform by a terminal.

After the human body key node recognition model is obtained, the human body key node recognition model is called to carry out human body key node recognition on the target human body image. In a possible implementation manner, the human body key node recognition model is a PoseNet model, the PoseNet model is a human body key node recognition model trained in a tensraflow platform, and the process of calling the PoseNet model to perform human body key node recognition on a target human body image is realized by transferring the target human body image to a PoseNet model calling interface in the tensraflow platform, which is illustratively an estimatePoses (estimated pose) interface. Illustratively, the process of calling the PoseNet model to identify the key nodes of the human body of the target human body image is realized according to the following codes:

in one possible implementation manner, the target human body image is other images except for the first frame image in the human body video of the interactive object. The process of performing human key node recognition on the target human image to determine the target action corresponding to the target human image includes the following steps 2021 to 2024:

step 2021: and carrying out human body key node identification on the target human body image to obtain a target human body key node identification result.

After the key human body node recognition is carried out on the target human body image, a target human body key node recognition result can be obtained. The target human body key node identification result is used for indicating the position of the human body key node in the target human body image, and exemplarily, the target human body key node identification result is also used for indicating the identification trust value of the human body key node. The embodiment of the application does not limit the expression form of the identification result of the key nodes of the human body.

Illustratively, the identification result of the target human body key node includes a comprehensive trust value and sub-data corresponding to each human body key node. The subdata corresponding to each human body key node comprises position information of the human body key node in the target human body image and an identification trust value of the human body key node. Illustratively, the position information of the key nodes of the human body in the target human body image is expressed by using a form of plane coordinates. The plane coordinates are coordinates in a plane coordinate system, and in an exemplary embodiment, the origin of coordinates of the plane coordinate system is located at the upper left corner of the target human body image, the positive direction of the horizontal axis of the plane coordinate system is a horizontal right direction, and the positive direction of the vertical axis of the plane coordinate system is a vertical downward direction.

Illustratively, the target human body key node identification result is as follows:

according to the identification result of the key nodes of the target human body, the comprehensive trust value for identifying the key nodes of the target human body image is 0.32371445304906; the identification trust value of the human body key node nose is 0.99539834260941, and the plane coordinates (x, y) = (253.36747741699, 76.291801452637) of the human body key node nose in the target human body image; the identification trust value of the human key node left eye is 0.99345678260941, and the plane coordinates (x, y) = (253.54365539551, 71.10383605957) of the human key node nose in the target human image.

Illustratively, a schematic diagram of human body key node recognition on a target human body image is shown in fig. 4. By identifying key human body nodes of the target human body image, each key human body node in the target human body image can be identified.

It should be noted that, in the embodiment of the present application, only one interactive object is included in the target human body image as an example for description. In an exemplary embodiment, for a case where the target human body image includes a plurality of interactive objects, the human body key node identification model can identify human body key nodes belonging to the respective interactive objects, respectively. As shown in (1) in fig. 5, the target human body image includes three interactive objects, and the human body key node recognition model can respectively recognize human body key nodes belonging to the three interactive objects, as shown in (2) in fig. 5.

Step 2022: and obtaining target posture data corresponding to the target human body image based on the target human body key node identification result.

After the target human body key node recognition result is obtained, the target human body key node recognition result can be analyzed and processed to obtain target posture data corresponding to the target human body image. The target posture data corresponding to the target human body image is used for providing data support for subsequently determining the target action corresponding to the target human body image, that is, the target posture data includes data required for determining the target action.

In an exemplary embodiment, the target pose data includes at least one of a target leg height, a target body width, a target body height, a target left ankle ordinate, and a target right ankle ordinate. The target left ankle ordinate and the target right ankle ordinate can be directly determined according to the ordinate indicated by the position information of the left ankle and the right ankle included in the target human body key node identification result. It should be noted that the coordinates in the target human body key node recognition result in the embodiment of the present application all refer to coordinates in a planar coordinate system, the origin of the coordinates in the planar coordinate system is located at the upper left corner of the target human body image, the positive direction of the horizontal axis of the planar coordinate system is the horizontal direction to the right, and the positive direction of the vertical axis of the planar coordinate system is the vertical direction to the bottom.

Next, the determination manners of the target leg height, the target body width, and the target body height will be described, respectively.

Determination of target leg height: calculating a first vertical distance between the left hip and the left ankle according to the position information of the left hip and the left ankle included in the target human body key node identification result; calculating a second vertical distance between the right hip and the right ankle according to the position information of the right hip and the right ankle included in the target human body key node identification result; the average of the first vertical distance and the second vertical distance is taken as the target leg height.

Determination mode of target body width: and calculating the distance between the left shoulder and the right shoulder according to the position information of the left shoulder and the right shoulder included in the target human body key node recognition result, and taking the distance between the left shoulder and the right shoulder as the target body width. Illustratively, the distance between the left shoulder and the right shoulder refers to the straight-line distance between the position of the left shoulder and the position of the right shoulder.

Determination of the target body height: calculating a first average value of a target left shoulder ordinate and a target right shoulder ordinate according to the position information of the left shoulder and the right shoulder included in the target human body key node identification result; calculating a second average value of a target left ankle longitudinal coordinate and a target right ankle longitudinal coordinate according to the position information of the left ankle and the right ankle included in the target human body key node identification result; the absolute value of the difference between the first average and the second average is taken as the target body height. Expressed by the formula:

wherein body height represents the target body height; left _ shoulde.y represents the vertical coordinate of the target left shoulder; right _ shoulde.y represents the vertical coordinate of the right shoulder of the target; left _ ankle.y represents the target left ankle ordinate; right _ ankle. Y represents the target right ankle ordinate.

It should be noted that the target left shoulder ordinate is an ordinate indicated by the position information of the left shoulder included in the target human body key node identification result; the target right shoulder ordinate is an ordinate indicated by the position information of the right shoulder included in the target human body key node identification result; the target left ankle ordinate is an ordinate indicated by position information of a left ankle included in the target human body key node identification result; the target right ankle ordinate is an ordinate indicated by position information of a right ankle included in the target human body key node identification result.

In an exemplary embodiment, the target pose data may also include other data, which is not limited by the embodiments of the present application. For example, the target posture data further includes arm fore and aft arm lengths, leg size and leg length, eye width, and the like.

The length of the front arm and the rear arm of the arm comprises the length of the front arm and the rear arm of the left arm and the length of the front arm and the rear arm of the right arm, the length of the front arm and the rear arm of the left arm comprises the length of the front arm of the left arm and the length of the rear arm of the left arm, the length of the front arm of the left arm is used for indicating the distance from the left wrist to the left elbow, and the length of the rear arm of the left arm is used for indicating the distance from the left elbow to the left shoulder; the length of the front arm and the rear arm of the right arm comprises the length of the front arm of the right arm and the length of the rear arm of the right arm, the length of the front arm of the right arm is used for indicating the distance from the right wrist to the right elbow, and the length of the rear arm of the right arm is used for indicating the distance from the right elbow to the right shoulder.

The length of the big leg and the small leg of the leg comprises the length of the big leg and the small leg of the left leg and the length of the big leg and the small leg of the right leg, the length of the big leg and the small leg of the left leg comprises the length of the thigh of the left leg and the length of the calf of the left leg, the length of the thigh of the left leg is used for indicating the distance from the left hip to the left knee, and the length of the calf of the left leg is used for indicating the distance from the left knee to the left ankle; the right leg length comprises a right leg thigh length and a right leg calf length, the right leg thigh length is used for indicating the distance from the right hip to the right knee, and the right leg calf length is used for indicating the distance from the right knee to the right ankle. The eye width is used to indicate the distance between the left and right eyes.

According to the analysis, after the target human body key node recognition result including the subdata of each human body key node is obtained, the position relation between the human body key nodes can be obtained by analyzing the position information of each human body key node indicated by the target human body key node recognition result, and therefore the target posture data corresponding to the target human body image is obtained.

Step 2023: and extracting basic attitude data corresponding to the first frame of image.

The basic posture data corresponding to the first frame image is obtained by carrying out human key node recognition on the first frame image in the human video in the process of calling the target camera to collect the human video of the interactive object, and because the first frame image is obtained by carrying out image collection on the interactive object standing according to the reference posture, the posture data obtained by carrying out human key node recognition on the first frame image is used as the basic posture data which is used for providing data support for the process of determining the target action corresponding to the target human image.

In this step 2023, the basic pose data corresponding to the first frame image is directly extracted. That is, before performing step 2023, the method further includes: performing human key node recognition on a first frame of image in a human video of an interactive object, and obtaining basic attitude data corresponding to the first frame of image based on a basic human key node recognition result; and storing the basic attitude data corresponding to the first frame of image. The implementation process of performing human key node recognition on the first frame image in the human video of the interactive object and obtaining the basic posture data corresponding to the first frame image based on the basic human key node recognition result is referred to step 2021 and step 2022, and details are not repeated here. After the basic posture data corresponding to the first frame of image is obtained, the basic posture data corresponding to the first frame of image is stored, so that the stored basic posture data can be directly extracted when the target action corresponding to the target human body image needs to be determined based on the basic posture data subsequently.

Step 2024: and determining a target action corresponding to the target human body image based on a comparison result between the target posture data and the basic posture data.

The basic posture data is obtained by identifying key nodes of a human body on a first frame image in a human body video, the first frame image is obtained by carrying out image acquisition on an interactive object standing according to a reference posture, and the reference posture can be regarded as a basic posture. Therefore, by comparing the target posture data with the basic posture data, the difference between the posture of the human body in the target human body image and the reference posture can be determined, and further the target action corresponding to the target human body image can be determined.

In an exemplary embodiment, the action corresponding to the first frame image in the human body video is called standing. The target action corresponding to any target human body image evolves on the basis of standing.

In one possible implementation, the target pose data includes at least one of a target leg height, a target body width, a target body height, a target left ankle ordinate, and a target right ankle ordinate, and the base pose data includes at least one of a base leg height, a base body width, a base body height, a base left ankle ordinate, and a base right ankle ordinate. In this case, determining the target action corresponding to the target human body image based on the comparison result between the target posture data and the basic posture data includes, but is not limited to, the following four cases:

case 1: in response to the target leg height being less than the product of the base leg height and the first value, determining that the target action comprises walking.

The first value is a positive number smaller than a first reference value, which is illustratively 1. The first value is set empirically or flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application, and the first value is exemplarily 0.3.

Illustratively, the target leg height is represented as newBigKneelLength, the base leg height is represented as oriBigKneelLength, and the first value is represented as n ₁ Then respond to newBigKneelLength<oriBigKneelLength*n ₁ And determining that the target action comprises walking. It should be noted that the target actions including walking may refer to the target actions including only walking, or may refer to the target actions including other actions that are not mutually exclusive with walking besides walking, which is not limited in the embodiment of the present application.

Case 2: in response to the target body width being less than the product of the base body width and the second value and the target left ankle ordinate being less than the target right ankle ordinate, determining that the target action comprises a left turn; in response to the target body width being less than the product of the base body width and the second value and the target left ankle ordinate being greater than the target right ankle ordinate, determining that the target action comprises a right turn.

The second value is a positive number smaller than a second reference value, which is the same as or different from the first reference value, and is illustratively 1. The second value is set empirically or flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application, and the second value is exemplarily 0.5. In addition, the second value may be the same as or different from the first value, and this is not limited in this embodiment of the application.

Illustratively, the target body width is denoted as newBodyWidth, the base body width is denoted as oriBodyWidth, and the second numerical value is denoted as n ₂ The target left ankle ordinate is expressed as left _ ankle.y, and the target right ankle ordinate is expressed as right _ ankle.y. In this case, when newBodyWidth<oriBodyWidth*n ₂ And when varankle _ dy is less than 0, determining that the target action comprises left turning; when newBodyWidth<oriBodyWidth*n ₂ And varankle _ dy > 0, the target action is determined to include a right turn. Wherein, variable _ dy = left _ ankle.y-right _ ankle.y, left _ ankle.y represents the target left ankle ordinate, and right _ ankle.y represents the target right ankle ordinate.

Case 3: in response to the target body height being less than the product of the base body height and the third value, determining that the target action comprises squatting.

The third value is a positive number smaller than a third reference value, which is the same as the first reference value or the second reference value, or which is different from both the first reference value and the second reference value, and is exemplarily 1. The third value is set empirically or flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application, and is exemplarily 0.7. In addition, the second value may be the same as the first value or the second value, or may be different from both the first value and the second value, which is not limited in the embodiments of the present application.

Illustratively, the target body height is denoted as newbody height, the base body height is denoted as oribody height, and the third body height is denoted asThe numerical value being represented by n ₃ Then when newBodyHeight<oriBodyHeight*n ₃ Determining the target action includes squatting.

Case 4: in response to the target left ankle ordinate being less than the base left ankle ordinate and the target right ankle ordinate being less than the base right ankle ordinate, determining that the target action comprises a jump-up.

Exemplarily, representing the target left ankle ordinate as left _ ankle.y, the base left ankle ordinate as left _ ankle.y ', the target right ankle ordinate as right _ ankle.y, and the base right ankle ordinate as right _ ankle.y', then when right _ ankle _ dy <0 and left _ ankle _ dy <0, it is determined that the target action includes a jump-up. Wherein, right _ dy = left _ ankle.y-left _ ankle.y ', left _ dy = right _ ankle.y-right _ ankle.y'.

In one possible implementation manner, if the target posture data is the same as the basic posture data or the difference degree between the target posture data and the basic posture data is smaller than the degree threshold, standing is taken as a target action corresponding to the target human body image.

In an exemplary embodiment, if it cannot be accurately determined which of the target motions is the target motion according to the target posture data, it is considered that the interactive object is in a transition state of a conversion motion at the acquisition time of the target human body image.

It should be noted that mutually exclusive actions exist in standing, squatting, jumping, left turning, right turning and walking, such as squatting and jumping; non-mutually exclusive actions also exist, such as turning left and walking. The target actions may include one or more non-mutually exclusive actions, which are not limited in this application. It should be further noted that, since various actions evolve on the basis of standing, standing and other actions are mutually exclusive, that is, when the target action includes standing, the target action does not include other actions, that is, the target action only includes standing. In the embodiment of the present application, if the target movement includes only standing, the target movement is referred to as standing.

In step 203, target virtual environment screen data matching the target motion is acquired, and the target virtual environment screen is displayed based on the target virtual environment screen data.

After the target action corresponding to the target human body image is determined, the target virtual environment picture data matched with the target action are obtained, then the target virtual environment picture is displayed based on the target virtual environment picture data, and an immersive visual experience can be provided for the interactive object.

In one possible implementation manner, the process of acquiring the target virtual environment picture data matched with the target action is as follows: determining a target state parameter of a virtual camera in the virtual environment based on the target action; and taking the virtual environment picture data acquired by the virtual camera with the target state parameters as target virtual environment picture data matched with the target action.

The target state parameter refers to a state parameter matched with the target action and possessed by the virtual camera, and the virtual environment picture data acquired by the virtual camera possessing the target state parameter can be regarded as the virtual environment picture data matched with the target action.

In one possible implementation, based on the target action, the process of determining the target state parameter of the virtual camera in the virtual environment is: responding to the target action meeting a first condition, and taking a reference state parameter as a target state parameter of the virtual camera, wherein the reference state parameter is the state parameter of the virtual camera determined based on a reference human body image, and the reference human body image is an image of a human body video of the interactive object, which is positioned in a frame before the target human body image; and responding to the target action meeting the second condition, adjusting the reference state parameter, and taking the state parameter obtained after adjustment as the target state parameter of the virtual camera. Wherein the second condition is different from the first condition.

The reference human body image is an image of a previous frame of the target human body image in the human body video of the interactive object, and the reference state parameter is a state parameter of the virtual camera determined based on the reference human body image. That is, before determining the target state parameters of the virtual camera based on the target human body image, the state parameters of the virtual camera determined based on the reference human body image located at the frame before the target human body image in the human body video of the interactive object have been previously acquired. The reference state parameter is the latest state parameter of the virtual camera before the target human body image is processed.

The first condition is a condition that the state parameters of the virtual camera do not need to be adjusted, and when the target action meets the first condition, the reference state parameters do not need to be adjusted, that is, the reference state parameters are directly used as the target state parameters of the virtual camera; the second condition is a condition that the state parameters of the virtual camera need to be adjusted, and when the target action parameters meet the second condition, it is described that the reference state parameters need to be adjusted, that is, the state parameters obtained after the reference state parameters are adjusted are used as the target state parameters of the virtual camera.

In one possible implementation, the first condition is satisfied, including: the target action does not include walking, the target action and the reference action meet the matching condition, and the reference action is the action corresponding to the reference human body image. A second condition is satisfied comprising: the target action comprises walking, and the walking direction corresponding to the target action is the propelling direction of the virtual camera; alternatively, the target action does not include walking and the matching condition is not satisfied between the target action and the reference action.

The condition that the matching condition is met indicates that the reference action is changed to the target action without adjusting the state parameters of the virtual camera on the premise that the target action does not include walking. In one possible implementation, satisfying the matching condition means: the target motion and the reference motion are completely identical, or the target motion is standing and the reference motion is walking.

When the target action comprises walking, whether the walking direction corresponding to the target action is the propelling direction of the virtual camera needs to be judged. The walking direction corresponding to the target action is determined according to the target action. Illustratively, the target action includes walking including three cases: the target action includes only walking (i.e., the target action is walking); the target action comprises walking and left turning; the target actions include walking and right turning. In different cases, the walking direction corresponding to the target motion is also different.

Exemplarily, if the target motion only includes walking, the walking direction corresponding to the target motion is the default facing direction of the virtual camera; if the target action comprises walking and left turning, the walking direction corresponding to the target action is the orientation direction obtained after the default orientation direction of the virtual camera is adjusted according to the left turning action; if the target action comprises walking and right turning, the walking direction corresponding to the target action is the orientation direction obtained by adjusting the default orientation direction of the virtual camera according to the right turning action.

After the walking direction corresponding to the target action is determined, whether the walking direction corresponding to the target action is the propelling direction of the virtual camera or not can be judged, and the judging mode is as follows: if the virtual camera touches an obstacle in the process of propelling the reference distance to the walking direction corresponding to the target action, determining the walking direction corresponding to the target action as the non-propelling direction of the virtual camera; and if the virtual camera does not touch the obstacle in the process of propelling the reference distance to the walking direction corresponding to the target action, determining the walking direction corresponding to the target action as the propellable direction of the virtual camera.

In the reference state parameter, the facing direction of the virtual camera may be the same as the walking direction corresponding to the target motion, or may be different from the walking direction corresponding to the target motion, which is related to the actual situation of the reference motion and the target motion, and this is not limited in the embodiment of the present application.

In one possible implementation, the state parameters of the virtual camera include height, planar position, and orientation direction. The height and the plane position are used for indicating the spatial position of the virtual camera in the virtual environment, and the orientation direction is used for indicating the acquisition direction of the virtual camera in the virtual environment. In the embodiment of the application, the target state parameters of the virtual camera comprise a target height, a target plane position and a target orientation direction; the reference state parameters of the virtual camera include a reference height, a reference plane position, and a reference orientation direction. Adjusting the reference state parameter may refer to adjusting one or more of a reference height, a reference plane position, and a reference orientation direction.

In one possible implementation manner, in response to that the target action satisfies the second condition, the reference state parameter is adjusted, and an implementation manner that the state parameter obtained after the adjustment is used as the target state parameter of the virtual camera is as follows: in response to the target action satisfying the second condition, determining a target adjustment mode based on the target action and the reference action; and adjusting the reference state parameters according to a target adjustment mode, and taking the state parameters obtained after adjustment as the target state parameters of the virtual camera.

The target adjustment mode is an adjustment mode which is determined by analyzing the target action and the reference action and is matched with the target action and the reference action. The target adjustment mode is related to the specific situations of the target motion and the reference motion, and the determined target adjustment mode is different under different situations of the target motion and the reference motion, which is not limited in the embodiment of the present application. Next, by way of specific example, a relationship between the target adjustment manner, the target action, and the reference action will be described.

Illustratively, when the target action is squatting and the reference action is standing, the target adjustment mode is as follows: the reference height in the reference state parameter is reduced by a first height, leaving the other parameters in the reference state parameter unchanged. The first height is set empirically or flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application.

For example, when the target motion is a jump and the reference motion is a standing motion, the target adjustment method is as follows: the reference height in the reference state parameter is increased by a second height, keeping the other parameters in the reference state parameter unchanged. The second height is set empirically or flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application. The second height may be the same as or different from the first height.

For example, when the target motion is left turn and the reference motion is standing, the target adjustment method is as follows: and rotating the reference in the reference state parameters towards the left by a first angle, and keeping other parameters in the reference state parameters unchanged. The first angle is set empirically or flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application.

For example, when the target motion is a right turn and the reference motion is a standing motion, the target adjustment method is as follows: and rotating the reference in the reference state parameters towards the right by a second angle, and keeping other parameters in the reference state parameters unchanged. The second angle is set empirically or flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application. The second angle may or may not be the same as the first angle.

For example, when the target motion is walking, the walking direction is the pushable direction of the virtual camera, and the reference motion is standing, the target adjustment mode is as follows: and pushing the reference plane position in the reference state parameter to the walking direction by a reference distance, and keeping other parameters in the reference state parameter unchanged. The reference distance is set empirically or flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application. Illustratively, the reference distance refers to a product of a reference speed and a reference time length, which are empirically set.

It should be noted that, the foregoing is only an exemplary description of the relationship between the target adjustment manner, the target action and the reference action, and the embodiment of the present application is not limited to this, and in the exemplary embodiment, the relationship between the target adjustment manner, the target action and the reference action may also be other cases, and the embodiment of the present application is not described one by one.

In a possible implementation manner, when the target action comprises walking and the walking direction corresponding to the target action is the non-pushable direction of the virtual camera, displaying first prompt information, wherein the first prompt information is used for prompting the interactive object to adjust the walking direction. That is to say, if it is determined that the walking direction corresponding to the target action is the non-pushable direction of the virtual camera, the interactive object is prompted in time to adjust the walking direction, so that the interactive experience of the interactive object and the virtual environment is ensured. In the embodiment of the present application, the representation form of the first prompt information is not limited, and exemplarily, the first prompt information is represented in the form of a text popup.

In an exemplary embodiment, for a case that the target motion includes walking and a walking direction corresponding to the target motion is a non-pushable direction of the virtual camera, a target plane position in the target state parameters of the virtual camera coincides with a reference plane position in the reference state parameters, that is, the reference plane position of the virtual camera is guaranteed to be unchanged. In an exemplary embodiment, if the reference movement is standing and the target movement is walking, when the walking direction corresponding to the target movement is the non-propulsion direction of the virtual camera, the other references, that is, the virtual camera, are kept unchanged in addition to the reference plane position.

In a possible implementation manner, for the condition that the target state parameter of the virtual camera is the state parameter obtained after the reference state parameter is adjusted, after the target state parameter of the virtual camera is determined, the reference state parameter is replaced by the target state parameter, and the latest state parameter of the virtual camera is recorded at the same time, so that a data base is laid for the subsequent adjustment of the state parameter of the virtual camera.

Exemplarily, a process of adjusting the reference state parameter of the virtual camera is shown in fig. 6. Assuming that the reference action is standing, when the target action is jumping or squatting from standing, triggering the change of the reference height in the reference state parameters of the virtual camera; when the target action is a left turn or a right turn evolved from standing, triggering rotation of a reference orientation direction in the reference state parameters of the virtual camera; when the target motion is walking evolved from standing, whether the walking direction is a propellable direction of the virtual camera is judged. And when the walking direction is the propelling direction of the virtual camera, triggering the propelling of the reference plane position in the reference state reference book of the virtual camera. When the walking direction is the non-propelling direction of the virtual camera, the virtual camera is kept still, and the interactive object is prompted to adjust the walking direction. And for the condition that the reference state parameter of the virtual camera is changed, updating and recording the latest state parameter of the virtual camera in time.

After the target state parameters of the virtual cameras are determined, the terminal takes the virtual environment picture data collected by the virtual cameras with the target state parameters as target virtual environment picture data matched with the target actions. And after the target virtual environment picture data matched with the target action is acquired, the terminal displays the target virtual environment picture based on the target virtual environment picture data. The displayed target virtual environment picture is a virtual environment picture matched with the target action corresponding to the target human body image, and the displayed target virtual environment picture is highly matched with the real action of the interactive object.

Exemplarily, the process of displaying the target virtual environment screen by the terminal based on the target virtual environment screen data means that the terminal renders the target virtual environment screen data and displays the rendered target virtual environment screen.

Illustratively, after the target virtual environment picture is displayed based on the target virtual environment picture data, an image positioned in a frame after the target human body image in the human body video of the interactive object is taken as a new target human body image, and then the steps 201 to 203 are continuously executed according to the new target human body image to continuously display the virtual environment picture matched with the new target human body image, and the loop is executed until the last frame image in the human body video is taken as the new target human body image and the steps 201 to 203 are executed.

Illustratively, the display process of the virtual environment screen is as shown in fig. 7. The interactive object authorizes the target camera to be opened; the terminal creates a video label and transmits the human video collected by the target camera to the video label; creating a time schedule controller; transmitting a human body image under a video label into a human body key node identification model to identify a human body key node by using a time sequence controller, wherein the human body key node identification model is a PoseNet model downloaded in advance by a terminal; acquiring basic posture data based on a first frame image in a human body video; obtaining the change of the posture data corresponding to the human body image of the interactive object relative to the basic posture data; determining corresponding actions of the human body image, such as walking, left turning, right turning, squatting, jumping, standing and the like; according to the action, driving the state parameters of the virtual camera in the virtual environment to adjust, wherein the adjusting mode comprises the following steps: planar position advancement, height change, rotation in the direction of orientation, etc.; and displaying a virtual environment picture based on the virtual environment picture data acquired by the virtual camera with the adjusted state parameter.

In the embodiment of the application, after the target camera is adopted at the terminal to carry out video acquisition on the interactive object, a PoseNet model under a TensorFlow platform is utilized to carry out human body key node identification, posture data change of the interactive object is obtained, and judgment of several actions of a human body is further carried out, such as walking, left turning/right turning, squatting/jumping, standing and the like, the virtual camera is bound to carry out the visit of a virtual environment by combining with a virtual online scene (namely a virtual environment), a physical scene of the interactive object is taken to walk and visit in a real walking mode, the demands of different virtual scenes can be visited before an indoor terminal are met, the problem that the interactive object cannot arrive at the scene can be solved to a certain extent, the pressure of on-site visiting can be reduced to a certain extent, and the interest of the motion of the interactive object can be met.

By way of example, the virtual environment may refer to a map environment, an exhibition hall environment, and a house interior environment. As shown in fig. 8, according to the method for displaying a virtual environment screen provided in the embodiment of the present application, an interactive object can navigate a map environment in a real walking manner in front of an indoor terminal, and in fig. 8, in addition to a map environment screen 801, a human body image 802 of the interactive object is displayed.

Exemplarily, as shown in fig. 9, based on the display method of the virtual environment picture provided by the embodiment of the present application, the interactive object can visit the exhibition hall environment in front of the indoor terminal in a real walking manner, and in fig. 9, an exhibition hall environment picture 901 is displayed. As shown in fig. 10, according to the method for displaying a virtual environment screen provided in the embodiment of the present application, an interactive object can navigate a house internal environment in front of an indoor terminal in a real walking manner, and in fig. 10, a house internal environment screen 1001 is displayed.

Referring to fig. 11, an embodiment of the present application provides a display apparatus for a virtual environment screen, including:

a first obtaining unit 1101, configured to obtain a target human body image of an interactive object, where the interactive object interacts with a virtual environment through a virtual environment picture;

a determining unit 1102, configured to perform human key node identification on the target human image to determine a target action corresponding to the target human image;

a second acquiring unit 1103 configured to acquire target virtual environment screen data that matches the target motion;

a display unit 1104 for displaying the target virtual environment screen based on the target virtual environment screen data.

In one possible implementation manner, the target human body image is other images except for the first frame image in the human body video of the interactive object; a determining unit 1102, configured to perform human key node identification on the target human image to obtain a target human key node identification result; obtaining target posture data corresponding to the target human body image based on the target human body key node identification result; extracting basic attitude data corresponding to the first frame of image; and determining a target action corresponding to the target human body image based on the comparison result between the target posture data and the basic posture data.

In one possible implementation, the target pose data includes at least one of a target leg height, a target body width, a target body height, a target left ankle ordinate, and a target right ankle ordinate, and the base pose data includes at least one of a base leg height, a base body width, a base body height, a base left ankle ordinate, and a base right ankle ordinate; a determining unit 1102, further configured to determine that the target action comprises walking in response to the target leg height being less than the product of the base leg height and the first value; in response to the target body width being less than the product of the base body width and the second value and the target left ankle ordinate being less than the target right ankle ordinate, determining that the target action comprises a left turn; in response to the target body width being less than the product of the base body width and the second value and the target left ankle ordinate being greater than the target right ankle ordinate, determining that the target action comprises a right turn; in response to the target body height being less than the product of the base body height and the third value, determining that the target action comprises squatting; in response to the target left ankle ordinate being less than the base left ankle ordinate and the target right ankle ordinate being less than the base right ankle ordinate, determining that the target action comprises a jump-up.

In one possible implementation, the second obtaining unit 1103 is configured to determine, based on the target motion, a target state parameter of a virtual camera in the virtual environment; and taking the virtual environment picture data acquired by the virtual camera with the target state parameters as target virtual environment picture data matched with the target action.

In one possible implementation manner, the target human body image is other images except the first frame image in the human body video of the interactive object; the second obtaining unit 1103 is further configured to, in response to that the target action meets the first condition, use the reference state parameter as a target state parameter of the virtual camera, where the reference state parameter is a state parameter of the virtual camera determined based on the reference human body image, and the reference human body image is an image of a frame before the target human body image in the human body video of the interactive object; and responding to the target action meeting a second condition, adjusting the reference state parameter, and taking the state parameter obtained after adjustment as the target state parameter of the virtual camera, wherein the second condition is different from the first condition.

In one possible implementation, the first condition is satisfied, including: the target action does not comprise walking, the target action and the reference action meet the matching condition, and the reference action is the action corresponding to the reference human body image; a second condition is satisfied comprising: the target action comprises walking, and the walking direction corresponding to the target action is the propelling direction of the virtual camera; or the target action does not comprise walking and the matching condition between the target action and the reference action is not met; wherein, satisfying the matching condition includes: the target motion and the reference motion are completely identical, or the target motion is standing and the reference motion is walking.

In one possible implementation manner, the second obtaining unit 1103 is further configured to determine, in response to the target motion satisfying the second condition, a target adjustment manner based on the target motion and the reference motion; and adjusting the reference state parameters according to a target adjustment mode, and taking the state parameters obtained after adjustment as the target state parameters of the virtual camera.

In one possible implementation manner, the display unit 1104 is further configured to display a first prompt message in response to that the target action includes walking and a walking direction corresponding to the target action is a non-propulsion direction of the virtual camera, where the first prompt message is used for prompting the interactive object to adjust the walking direction.

In one possible implementation, referring to fig. 12, the apparatus further includes:

a replacing unit 1105 for replacing the reference state parameter with the target state parameter.

In a possible implementation manner, the determining unit 1102 is further configured to obtain a human body key node identification model; and calling a human body key node identification model to identify human body key nodes of the target human body image.

In a possible implementation manner, the determining unit 1102 is further configured to perform human key node identification on a first frame image in a human video of the interactive object, and obtain basic pose data corresponding to the first frame image based on a basic human key node identification result;

referring to fig. 12, the apparatus further comprises:

a storage unit 1106, configured to store the basic pose data corresponding to the first frame image.

In one possible implementation manner, the display unit 1104 is further configured to display camera invocation authorization information in response to the camera invocation request; responding to a confirmation instruction of calling the authorization information by the camera, and displaying second prompt information, wherein the second prompt information is used for prompting the interactive object to stand according to the reference posture;

referring to fig. 12, the apparatus further includes:

the calling unit 1107 is configured to respond to the interactive object standing in the reference posture, call the target camera to collect the human body video of the interactive object, and acquire the first frame image in the human body video of the interactive object by performing image collection on the interactive object standing in the reference posture.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 13 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal may be: a smartphone, a tablet computer, a laptop computer, or a desktop computer. A terminal may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

Generally, a terminal includes: a processor 1301 and a memory 1302.

Processor 1301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1301 may be implemented in at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1301 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1301 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, processor 1301 may further include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 1302 may include one or more computer-readable storage media, which may be non-transitory. Memory 1302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1302 is used to store at least one instruction for execution by processor 1301 to implement the method of displaying a virtual environment screen provided by method embodiments herein.

In some embodiments, the terminal may further optionally include: a peripheral interface 1303 and at least one peripheral. The processor 1301, memory 1302 and peripheral interface 1303 may be connected by buses or signal lines. Each peripheral device may be connected to the peripheral device interface 1303 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1304, display screen 1305, camera assembly 1306, audio circuitry 1307, positioning assembly 1308, and power supply 1309.

Peripheral interface 1303 can be used to connect at least one peripheral related to I/O (Input/Output) to processor 1301 and memory 1302. In some embodiments, processor 1301, memory 1302, and peripheral interface 1303 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1301, the memory 1302, and the peripheral device interface 1303 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1304 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. Radio frequency circuit 1304 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1304 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1304 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. Radio frequency circuit 1304 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1304 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1305 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1305 is a touch display screen, the display screen 1305 also has the ability to capture touch signals on or over the surface of the display screen 1305. The touch signal may be input to the processor 1301 as a control signal for processing. At this point, the display 1305 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1305 may be one, disposed on a front panel of the terminal; in other embodiments, the display 1305 may be at least two, respectively disposed on different surfaces of the terminal or in a folded design; in other embodiments, the display 1305 may be a flexible display disposed on a curved surface or on a folded surface of the terminal. Even further, the display 1305 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display 1305 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 1306 is used to capture images or video. Optionally, camera assembly 1306 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of a terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera head assembly 1306 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 1307 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1301 for processing, or inputting the electric signals to the radio frequency circuit 1304 for realizing voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1301 or the radio frequency circuitry 1304 into sound waves. The loudspeaker can be a traditional film loudspeaker and can also be a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 1307 may also include a headphone jack.

The positioning component 1308 is used for locating the current geographic position of the terminal for navigation or LBS (Location Based Service). The Positioning component 1308 may be a Positioning component based on the GPS (Global Positioning System) of the united states, the beidou System of china, the graves System, or the galileo System of the european union.

The power supply 1309 is used to supply power to the various components in the terminal. The power supply 1309 may be alternating current, direct current, disposable or rechargeable batteries. When the power source 1309 comprises a rechargeable battery, the rechargeable battery can support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal also includes one or more sensors 1310. The one or more sensors 1310 include, but are not limited to: acceleration sensor 1311, gyro sensor 1312, pressure sensor 1313, fingerprint sensor 1314, optical sensor 1315, and proximity sensor 1316.

The acceleration sensor 1311 can detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal. For example, the acceleration sensor 1311 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1301 may control the display screen 1305 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1311. The acceleration sensor 1311 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1312 may detect a body direction and a rotation angle of the terminal, and the gyro sensor 1312 may cooperate with the acceleration sensor 1311 to collect a 3D motion of the user with respect to the terminal. From the data collected by gyroscope sensor 1312, processor 1301 may perform the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization while shooting, game control, and inertial navigation.

The pressure sensors 1313 may be provided on the side frame of the terminal and/or on the lower layer of the display 1305. When the pressure sensor 1313 is disposed on the side frame of the terminal, a user holding signal of the terminal may be detected, and the processor 1301 performs left-right hand recognition or shortcut operation according to the holding signal acquired by the pressure sensor 1313. When the pressure sensor 1313 is disposed at a lower layer of the display screen 1305, the processor 1301 controls an operability control on the UI interface according to a pressure operation of the user on the display screen 1305. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1314 is used for collecting the fingerprint of the user, and the processor 1301 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 1314, or the fingerprint sensor 1314 identifies the identity of the user according to the collected fingerprint. When the identity of the user is identified as a trusted identity, the processor 1301 authorizes the user to perform relevant sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 1314 may be disposed on the front, back, or side of the terminal. When a physical button or vendor Logo is provided on the terminal, the fingerprint sensor 1314 may be integrated with the physical button or vendor Logo.

The optical sensor 1315 is used to collect ambient light intensity. In one embodiment, the processor 1301 may control the display brightness of the display screen 1305 according to the ambient light intensity collected by the optical sensor 1315. Specifically, when the ambient light intensity is high, the display luminance of the display screen 1305 is increased; when the ambient light intensity is low, the display brightness of the display screen 1305 is turned down. In another embodiment, the processor 1301 can also dynamically adjust the shooting parameters of the camera assembly 1306 according to the ambient light intensity collected by the optical sensor 1315.

A proximity sensor 1316, also known as a distance sensor, is typically provided on the front panel of the terminal. The proximity sensor 1316 is used to collect the distance between the user and the front face of the terminal. In one embodiment, the display 1305 is controlled by the processor 1301 to switch from the bright screen state to the dark screen state when the proximity sensor 1316 detects that the distance between the user and the front face of the terminal is gradually reduced; the display 1305 is controlled by the processor 1301 to switch from the rest state to the bright state when the proximity sensor 1316 detects that the distance between the user and the front face of the terminal is gradually increasing.

Those skilled in the art will appreciate that the configuration shown in fig. 13 is not intended to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an exemplary embodiment, a computer device is also provided, the computer device comprising a processor and a memory, the memory having at least one computer program stored therein. The at least one computer program is loaded and executed by one or more processors to implement any of the above-described methods for displaying a virtual environment screen.

In an exemplary embodiment, there is also provided a computer-readable storage medium having at least one computer program stored therein, the at least one computer program being loaded and executed by a processor of a computer device to implement any one of the above-described methods for displaying a virtual environment screen.

In one possible implementation, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes any one of the above-mentioned display methods of the virtual environment picture.

It should be noted that the terms "first," "second," and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The implementations described in the exemplary embodiments above do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for displaying a virtual environment picture, the method comprising:

acquiring a target human body image of an interactive object, wherein the interactive object is interacted with a virtual environment through a virtual environment picture, and the target human body image is other images except for a first frame image in a human body video of the interactive object;

in response to that the target action does not include walking and a matching condition is met between the target action and a reference action, taking a reference state parameter as a target state parameter of a virtual camera, wherein the reference action is an action corresponding to a reference human body image, the matching condition is met, the target action is completely consistent with the reference action, or the target action is standing and the reference action is walking, the reference state parameter is a state parameter of the virtual camera determined based on the reference human body image, and the reference human body image is an image of a frame of a human body video of the interactive object, which is located before the target human body image;

responding to the target action including walking and the walking direction corresponding to the target action being the propelling direction of the virtual camera, or responding to the target action not including walking and the target action not satisfying the matching condition with the reference action, adjusting the reference state parameter, and taking the state parameter obtained after adjustment as the target state parameter of the virtual camera;

using the virtual environment picture data acquired by the virtual camera with the target state parameters as target virtual environment picture data matched with the target action;

and displaying a target virtual environment picture based on the target virtual environment picture data, wherein different target actions are matched with different target virtual environment picture data, and the different target virtual environment picture data are used for displaying different target virtual environment pictures.

2. The method according to claim 1, wherein the performing human key node recognition on the target human image to determine a target action corresponding to the target human image comprises:

carrying out human body key node identification on the target human body image to obtain a target human body key node identification result;

obtaining target posture data corresponding to the target human body image based on the target human body key node identification result;

extracting basic attitude data corresponding to the first frame of image;

and determining a target action corresponding to the target human body image based on a comparison result between the target posture data and the basic posture data.

3. The method of claim 2, wherein the target pose data comprises at least one of a target leg height, a target body width, a target body height, a target left ankle ordinate, and a target right ankle ordinate, and wherein the base pose data comprises at least one of a base leg height, a base body width, a base body height, a base left ankle ordinate, and a base right ankle ordinate;

determining a target action corresponding to the target human body image based on a comparison result between the target posture data and the basic posture data, including:

determining that the target action comprises walking in response to the target leg height being less than the product of the base leg height and a first value;

in response to the target body width being less than a product of the base body width and a second value and the target left ankle ordinate being less than the target right ankle ordinate, determining that the target action comprises a left turn;

responsive to the target body width being less than a product of the base body width and a second value, and the target left ankle ordinate being greater than the target right ankle ordinate, determining that the target action comprises a right turn;

in response to the target body height being less than the product of the base body height and a third value, determining that the target action comprises squatting;

in response to the target left ankle ordinate being less than the base left ankle ordinate, and the target right ankle ordinate being less than the base right ankle ordinate, determining that the target action comprises a jump.

4. The method according to claim 1, wherein the adjusting the reference state parameter, and taking the adjusted state parameter as a target state parameter of the virtual camera includes:

determining a target adjustment mode based on the target action and the reference action;

and adjusting the reference state parameters according to the target adjustment mode, and taking the state parameters obtained after adjustment as the target state parameters of the virtual camera.

5. The method of claim 1, further comprising:

and responding to the target action comprising walking and the walking direction corresponding to the target action being the non-propulsion direction of the virtual camera, and displaying first prompt information, wherein the first prompt information is used for prompting the interactive object to adjust the walking direction.

6. The method according to any one of claims 1 to 5, wherein after the adjusted state parameter is used as the target state parameter of the virtual camera, the method further comprises:

replacing the reference state parameter with the target state parameter.

7. The method according to any one of claims 1 to 5, wherein the performing human key node identification on the target human image comprises:

acquiring a human body key node identification model;

and calling the human body key node identification model to identify the human body key nodes of the target human body image.

8. The method according to claim 2 or 3, wherein before extracting the basic pose data corresponding to the first frame image, the method further comprises:

performing human key node recognition on a first frame of image in the human video of the interactive object, and obtaining basic attitude data corresponding to the first frame of image based on a basic human key node recognition result;

and storing the basic attitude data corresponding to the first frame image.

9. The method according to claim 8, wherein before the human key node identification of the first frame image in the human video of the interactive object, the method further comprises:

responding to the camera calling request, and displaying camera calling authorization information;

responding to a confirmation instruction of the camera for calling the authorization information, and displaying second prompt information, wherein the second prompt information is used for prompting the interactive object to stand according to a reference posture;

and responding to the interactive object to stand according to the reference posture, calling the camera to collect the human body video of the interactive object, wherein the first frame image in the human body video of the interactive object is obtained by collecting the image of the interactive object standing according to the reference posture.

10. An apparatus for displaying a virtual environment picture, the apparatus comprising:

the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a target human body image of an interactive object, the interactive object is interacted with a virtual environment through a virtual environment picture, and the target human body image is other images except a first frame image in a human body video of the interactive object;

the determining unit is used for carrying out human key node identification on the target human body image so as to determine a target action corresponding to the target human body image;

a second obtaining unit, configured to, in response to that the target motion does not include walking and a matching condition is met between the target motion and a reference motion, use a reference state parameter as a target state parameter of a virtual camera, where the reference motion is a motion corresponding to a reference human body image, the meeting of the matching condition includes that the target motion is completely consistent with the reference motion, or the target motion is standing and the reference motion is walking, the reference state parameter is a state parameter of the virtual camera determined based on the reference human body image, and the reference human body image is an image of a frame before the target human body image in a human body video of the interactive object; responding to the situation that the target action comprises walking and the walking direction corresponding to the target action is the propelling direction of the virtual camera, or the situation that the target action does not comprise walking and the matching condition is not met between the target action and the reference action, adjusting the reference state parameter, and taking the state parameter obtained after adjustment as the target state parameter of the virtual camera; taking virtual environment picture data acquired by the virtual camera with the target state parameters as target virtual environment picture data matched with the target action;

and the display unit is used for displaying a target virtual environment picture based on the target virtual environment picture data, wherein different target actions are matched with different target virtual environment picture data, and the different target virtual environment picture data are used for displaying different target virtual environment pictures.

11. A computer device comprising a processor and a memory, wherein at least one computer program is stored in the memory, and wherein the at least one computer program is loaded and executed by the processor to implement the method for displaying a virtual environment screen according to any one of claims 1 to 9.

12. A computer-readable storage medium, wherein at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor to implement the method for displaying the virtual environment picture according to any one of claims 1 to 9.