CN114979789B

CN114979789B - Video display method and device and readable storage medium

Info

Publication number: CN114979789B
Application number: CN202110206221.4A
Authority: CN
Inventors: 沙莎; 许显杨; 钱靖
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-02-24
Filing date: 2021-02-24
Publication date: 2024-07-23
Anticipated expiration: 2041-02-24
Also published as: CN114979789A

Abstract

The application discloses a video display method, a device and a readable storage medium, wherein the video display method comprises the following steps: the method comprises the steps that a first terminal displays a video communication interface for providing video communication functions for a first user and a second user in social application, and a target virtual background where the first user and the second user are located together is displayed in the video communication interface; the first user is a user logging in a social application in the first terminal, and the second user is a user carrying out video communication with the first user in a video communication interface; when the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process; and displaying the updated target virtual background in the video communication interface. By adopting the application, the video display mode of video communication can be enriched.

Description

Video display method and device and readable storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a video display method, a video display device, and a readable storage medium.

Background

With the continuous development of mobile communication technology, intelligent terminals such as mobile phones and tablet computers have taken up important roles in daily life of people. Today, people can carry out real-time video communication at any time and any place through the intelligent terminal, so that the communication cost of people is reduced.

Currently, in the process of using an intelligent terminal to perform video communication, when a user wants to display some virtual backgrounds, the virtual backgrounds are usually static pictures or simply circulated sequential frame animations, so that the video display mode of the video communication is relatively single. If the user wants to adjust the virtual background, the user needs to interrupt the video communication to perform manual adjustment, but the user can only realize a simple background switching effect, so that the normal operation and display of the video communication cannot be maintained when the virtual background is updated at present.

Disclosure of Invention

The embodiment of the application provides a video display method, a video display device and a readable storage medium, which can enrich the video display mode of video communication and maintain the normal operation and display of the video communication while updating a virtual background.

In one aspect, an embodiment of the present application provides a video display method, including:

The method comprises the steps that a first terminal displays a video communication interface for providing video communication functions for a first user and a second user in social application, and a target virtual background where the first user and the second user are located together is displayed in the video communication interface; the first user is a user logging in a social application in the first terminal, and the second user is a user carrying out video communication with the first user in a video communication interface;

When the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process;

and displaying the updated target virtual background in the video communication interface.

In one aspect, an embodiment of the present application provides a video display apparatus, including:

The first display module is used for displaying a video communication interface for providing video communication functions for the first user and the second user in the social application of the first terminal, and displaying a target virtual background where the first user and the second user are located together in the video communication interface; the first user is a user logging in a social application in the first terminal, and the second user is a user carrying out video communication with the first user in a video communication interface;

The first updating module is used for updating the target virtual background when the user dynamic information of the first user meets the virtual background updating condition to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process;

and the second display module is used for displaying the updated target virtual background in the video communication interface.

The first display module is specifically configured to display one or more virtual backgrounds in response to a triggering operation for a background switching control in a video communication interface by the first terminal; in response to a selection operation for one or more virtual contexts, determining the selected virtual context as a target virtual context; and in the video communication interface, switching the original background where the first user and the second user are located together into a target virtual background.

The video communication interface also comprises a first user picture and a second user picture;

The device further comprises:

The first fusion display module is used for shooting key parts of a first user in the video communication process; if the video virtual communication function of the first terminal is in an on state, displaying a video picture for covering a key part of the first user in the video communication interface, and determining the video picture as a first user picture; displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying key parts of the second user; and in the video communication interface, the first user picture and the second user picture are displayed in a fusion way with the target virtual background.

Wherein, the device further includes:

The second fusion display module is used for shooting key parts of the first user in the video communication process; if the video virtual communication function of the first terminal is in a closed state, displaying a camera shooting video picture in a video communication interface, and determining the camera shooting video picture as a first user picture; the shooting video picture is a video picture for shooting a key part of the first user; displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying key parts of the second user; and in the video communication interface, the first user picture and the second user picture are displayed in a fusion way with the target virtual background.

Wherein, the first update module includes:

The first position adjusting unit is used for adjusting the first user picture to the position indicated by the user dynamic information in the video communication interface when the user dynamic information of the first user meets the virtual background updating condition; and according to the first user picture after the position adjustment, in the video communication interface, performing position synchronous adjustment on background elements associated with the first user picture in the target virtual background to obtain an updated target virtual background matched with the user dynamic information.

Wherein, the first update module includes:

And the first animation display unit is used for displaying the background animation associated with the user dynamic information in the video communication interface when the user dynamic information of the first user meets the virtual background updating condition, and fusing the background animation with the target virtual background to obtain the updated target virtual background matched with the user dynamic information.

Wherein, the first update module includes:

And the first size adjustment unit is used for carrying out size adjustment on background elements associated with the first user picture in the target virtual background according to the user dynamic information in the video communication interface when the user dynamic information of the first user meets the virtual background updating condition, so as to obtain an updated target virtual background matched with the user dynamic information.

Wherein, the device further includes:

The second updating module is used for acquiring user dynamic information of the second user, and fusing the user dynamic information of the first user with the user dynamic information of the second user to acquire fused dynamic information; the user dynamic information of the second user is acquired from the audio and video data of the second user in the video communication process; and when the fusion dynamic information meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the fusion dynamic information.

The user dynamic information comprises position information corresponding to a key part of the first user;

The device further comprises:

The gesture detection module is used for carrying out gesture detection on the first user in the video communication process and acquiring three-dimensional position data corresponding to the key part of the first user; generating a three-dimensional model matrix, a view matrix and a projection matrix which are associated with the first user respectively according to the geometric relation between the key part of the first user and the world coordinate system, the position relation between the key part of the first user and the camera of the first terminal and the size of the screen display area of the first terminal; the world coordinate system is used for describing the position of the key part of the first user and the camera of the first terminal; according to the three-dimensional model matrix, the view matrix and the projection matrix, performing matrix transformation on the three-dimensional position data to generate vertex position coordinates corresponding to the key parts of the first user, and determining the vertex position coordinates as position information corresponding to the key parts of the first user;

The first condition judging module is used for determining that the position information changes in position if the space coordinates or the rotation angles in the three-dimensional position data change in parameters, and determining that the position information after the position change meets the virtual background updating condition.

The gesture detection module is specifically configured to obtain a geometric relationship between a key part of a first user and an origin and coordinate axes of a world coordinate system, construct a first translation matrix and a first rotation matrix according to the geometric relationship, and generate a three-dimensional model matrix associated with the first user according to the first translation matrix and the first rotation matrix; constructing a second translation matrix and a second rotation matrix according to the position relation between the key part of the first user and the camera of the first terminal, and generating a view matrix associated with the first user according to the second translation matrix and the second rotation matrix; and obtaining the space parameter of the camera coordinate system according to the size of the screen display area of the first terminal, and constructing a projection matrix associated with the first user according to the space parameter.

Wherein, the first update module includes:

The second position adjusting unit is used for determining a displacement distance and a deflection angle according to the position information after the position change in the dynamic information of the user when the position information after the position change in the dynamic information of the user meets the virtual background updating condition, and carrying out displacement deflection processing on the first user picture according to the displacement distance and the deflection angle; and in the video communication interface, according to the displacement distance and the deflection angle, performing synchronous displacement deflection processing on background elements associated with the first user picture in the target virtual background to obtain an updated target virtual background matched with the user dynamic information.

Wherein the user dynamic information further comprises a facial expression of the first user;

The device further comprises:

The expression detection module is used for carrying out expression detection on the first user in the video communication process and obtaining the facial expression of the first user;

And the second condition judging module is used for determining that the dynamic information of the user meets the virtual background updating condition if the facial expression belongs to the target facial expression type.

Wherein, the first update module includes:

The second animation display unit is used for traversing in the expression animation mapping table to obtain a background animation matched with the facial expression when the facial expression in the dynamic information of the user meets the virtual background updating condition; and displaying the background animation in the video communication interface, and fusing the background animation with the target virtual background to obtain the updated target virtual background matched with the dynamic information of the user.

The user dynamic information also comprises volume data corresponding to the first user;

The device further comprises:

The audio detection module is used for acquiring audio data recorded by a first user, sampling the audio data and obtaining volume data corresponding to the first user;

And the third condition judging module is used for determining that the dynamic information of the user meets the virtual background updating condition if the volume data is positioned in the volume detection section.

Wherein, the first update module includes:

The second size adjusting unit is used for extracting volume peaks and valleys from the volume data when the volume data in the dynamic information of the user meets the virtual background updating condition, and constructing a scaling matrix associated with the first user according to the volume peaks and valleys; the scaling matrix is composed of at least two scaling coefficients in different expansion directions; and in the video communication interface, the size of a background element associated with the first user picture in the target virtual background is adjusted according to at least two scaling coefficients in the scaling matrix, so that the updated target virtual background matched with the user dynamic information is obtained.

The user dynamic information also comprises live-action information of the environment where the first user is located;

The device further comprises:

The environment detection module is used for acquiring video data of the environment where the first user is located and extracting live-action information of the environment where the first user is located from the video data; the live-action information comprises one or more of darkness, color composition or key environment objects of the environment where the first user is located;

And the fourth condition judging module is used for determining that the dynamic information of the user meets the virtual background updating condition if the real scene information changes in environment.

Wherein, the device further includes:

The picture switching module is used for determining that the user dynamic information of the first user meets the user picture switching condition if the facial expression of the first user belongs to the target facial expression type and the display time of the facial expression is longer than a first time threshold value, and switching the first user picture from a video picture to a shooting video picture in the video communication interface; the camera shooting video picture is a video picture for shooting a key part of the first user, or if the volume data corresponding to the first user is located in a volume detection interval, and the duration time of the volume data located in the volume detection interval is longer than a second duration threshold value, determining that the user dynamic information of the first user meets a user picture switching condition, and switching the first user picture from the video picture to the camera shooting video picture in a video communication interface.

In one aspect, an embodiment of the present application provides a computer device, including: a processor, a memory, a network interface;

the processor is connected to the memory and the network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing a computer program, and the processor is used for calling the computer program to execute the method in the embodiment of the application.

In one aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, the computer program being adapted to be loaded by a processor and to perform a method according to embodiments of the present application.

In one aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium, the computer instructions being read from the computer-readable storage medium by a processor of a computer device, the computer instructions being executed by the processor, causing the computer device to perform a method according to an embodiment of the present application.

In the embodiment of the application, when the first user and the second user carry out video communication through the social application installed in the respective terminal equipment, the first terminal held by the first user can display the target virtual background where the first user and the second user are located together in the video communication interface. Further, in the video communication process, the first terminal can acquire user dynamic information of the first user from audio and video data of the first user, and when detecting that the user dynamic information meets a virtual background updating condition, the first terminal can update the target virtual background so as to obtain an updated target virtual background matched with the user dynamic information, and then the updated target virtual background can be displayed in a video communication interface. Therefore, in the video communication process, the terminal equipment can automatically acquire the user dynamic information of the current user, and update the virtual background in the current video communication interface in real time through the acquired user dynamic information without interrupting the video communication and manually operating to update, that is, the application can maintain the normal operation and display of the video communication while updating the virtual background, thereby improving the video communication quality. In addition, the application can improve the fusion degree of the image and the background of the user and enrich the video display mode of video communication by constructing the virtual dynamic interactive environment background.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;

FIGS. 2 a-2 e are schematic views of a video presentation according to embodiments of the present application;

fig. 3 is a schematic flow chart of a video display method according to an embodiment of the present application;

fig. 4 is an interface schematic diagram of a video call scenario provided in an embodiment of the present application;

Fig. 5 is a schematic flow chart of a video display method according to an embodiment of the present application;

FIGS. 6 a-6 d are schematic diagrams illustrating an interface of a video presentation process according to an embodiment of the present application;

FIG. 7 is a schematic workflow diagram of a video display system according to an embodiment of the present application;

FIGS. 8 a-8 d are schematic diagrams of a coordinate transformation provided by embodiments of the present application;

fig. 9 is a schematic structural diagram of a video display device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

Computer Vision (CV) is a science of how to "look" at a machine, and more specifically, to replace a camera and a Computer to perform machine Vision such as identifying and measuring a target by human eyes, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include data processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

Key technologies to speech processing technology (Speech Technology) are automatic speech recognition technology (ASR) and speech synthesis technology (TTS) and voiceprint recognition technology. The method can enable the computer to listen, watch, say and feel, is the development direction of human-computer interaction in the future, and voice becomes one of the best human-computer interaction modes in the future.

The scheme provided by the embodiment of the application relates to the technologies of artificial intelligence, such as computer vision technology, voice processing technology and the like, and the specific process is illustrated by the following embodiment.

Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application. The system architecture may include a server 100 and a terminal cluster, which may include: terminal device 200a, terminal device 200b, terminal devices 200c, …, terminal device 200n, wherein a communication connection may exist between the terminal clusters, for example, a communication connection exists between terminal device 200a and terminal device 200b, and a communication connection exists between terminal device 200a and terminal device 200 c. Meanwhile, any terminal device in the terminal cluster may have a communication connection with the server 100, for example, a communication connection exists between the terminal device 200a and the server 100, where the communication connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may be directly or indirectly connected through a wireless communication manner, or may also be other manners, and the application is not limited herein.

It should be understood that each terminal device in the terminal cluster shown in fig. 1 may be provided with an application client, and when the application client runs in each terminal device, data interaction may be performed between the application client and the server 100 shown in fig. 1. The application client can be an application client with video communication functions, such as a social application, an instant messaging application, a live broadcast application, a short video application, a music application, a shopping application, a game application, a novel application, a payment application, a browser and the like. The application client may be an independent client, or may be an embedded sub-client integrated in a certain client (e.g., an instant messaging client, a social client, a video client, etc.), which is not limited herein. Taking a social application as an example, the server 100 may include one or more servers such as a background server, a data processing server and the like corresponding to the social application, so that each terminal device may perform data transmission with the server 100 through an application client corresponding to the social application, for example, each terminal device may perform video communication with other terminal devices through the server 100.

For example, each terminal device may display a virtual background in a video communication interface of the social application. It should be appreciated that in order to improve the richness of conversation backgrounds when users in a social application perform video communication, the present application provides one or more virtual backgrounds in the social application, and two users performing video communication may select any one virtual background, so that an original background in a video communication interface may be switched to a selected virtual background. It should be appreciated that the server 100 of the present application may obtain business data via the applications, e.g., the business data may be a virtual background selected by the user (e.g., the virtual background is a "star," which contains background elements that are associated with a background animation). Taking the terminal device 200a and the terminal device 200b as an example, assuming that the user a selects the virtual background "star" through the terminal device 200a, the terminal device 200a may send the virtual background "star" selected by the user a to the server 100, and may invoke an application client of the local social application, and draw the virtual background "star" in the video communication interface of the terminal device 200 a; after the server 100 obtains the service data related to the virtual background "planet", the service data related to the virtual background "planet" may be further sent to the terminal device 200b, and further, the terminal device 200b may also call the application client of the local social application according to the service data related to the virtual background "planet", and draw the virtual background "planet" in the video communication interface of the terminal device 200 b.

Then, since the two users generate dynamic changes at any time, the server 100 may acquire and detect the user dynamic information corresponding to each of the two users, and when one of the user dynamic information satisfies the virtual background update condition, the user dynamic information may be sent to the terminal device 200a and the terminal device 200b, and further, the terminal device 200a and the terminal device 200b may update the virtual background currently displayed in real time according to the user dynamic information, so as to obtain updated virtual backgrounds, and display the updated virtual backgrounds in respective video communication interfaces. The user dynamic information refers to information acquired from audio and video data of two users in the video communication process, and includes, but is not limited to, position information corresponding to key parts of the users, facial expressions of the users, volume data and live-action information of environments where the users are located. For example, when the facial expression of the user a is smiling, a meteor background animation in the virtual background "star" may be triggered, such as a special effect animation that the user a may see a meteor swipe in the virtual background "star".

Alternatively, it may be understood that a system architecture may include a plurality of servers, where one terminal device may be connected to one server, and each server may obtain service data (e.g., a virtual context selected by a user and including context elements associated with context animation) in the terminal device connected to the server, and obtain and detect user dynamic information corresponding to the terminal device connected to the server, so as to update the current virtual context according to the user dynamic information.

Optionally, it may be understood that the terminal device may also acquire service data and user dynamic information, and detect whether the user dynamic information meets a virtual background update condition, so as to update the current virtual background according to the user dynamic information.

It will be appreciated that the method provided by the embodiments of the present application may be performed by a computer device, including but not limited to a terminal device or a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing a cloud database, cloud service, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, basic cloud computing service such as big data and an artificial intelligent platform. The terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a palm computer, a Mobile Internet Device (MID), a wearable device (e.g., a smart watch, a smart bracelet, etc.), a smart computer, a smart vehicle, etc. may run an intelligent terminal of an instant messaging application or a social application. The terminal device and the server may be directly or indirectly connected through a wired or wireless manner, which is not limited in the embodiment of the present application.

For ease of understanding, the following description will be given specifically taking the case where the terminal device 200a and the terminal device 200b perform video communication through the server 100.

Fig. 2a to fig. 2e are schematic diagrams of a video display scene according to an embodiment of the present application. The implementation process of the video presentation scenario may be performed in the server 100 shown in fig. 1, or may be performed in a terminal device (such as any one of the terminal device 200a, the terminal device 200b, the terminal device 200c, or the terminal device 200n shown in fig. 1), or may be performed jointly by the terminal device and the server, which is not limited herein, and the embodiment of the present application is described by taking the joint execution of the terminal device 200a, the terminal device 200b, and the server 100 as an example. As shown in fig. 2a, the user having a binding relationship with the terminal device 200a is the user a, the user having a binding relationship with the terminal device 200B is the user B, the user a may initiate a video communication request to the server 100 through a social application in the terminal device 200a, after the server 100 receives the video communication request, the video communication request may be issued to the terminal device 200B, so that the user B may see the relevant invitation prompt information on the display interface of the terminal device 200B, and further the server 100 may obtain the invitation result fed back by the terminal device 200B, if the invitation result is that the user B accepts the invitation, The server 100 can inform the terminal device 200a and the terminal device 200B to enter the video communication state in common, that is, the user a and the user B can start the video call. as shown in fig. 2B, the terminal device 200a may display a video frame 301a corresponding to the user B and a video frame 302a corresponding to the user a in the video communication interface 300a, and similarly, the terminal device 200B may display a video frame 301B corresponding to the user a and a video frame 302B corresponding to the user B in the video communication interface 300B, where it is understood that the video frame 301a and the video frame 302B are frames obtained by capturing the user B in real time, the video frame 302a and the video frame 301B are frames obtained by capturing the user a in real time, Usually, all the shooting is performed on key parts of the user (such as the head of the user), alternatively, the head portraits of the corresponding users can be directly displayed at corresponding positions in the video communication interface, and the application is not limited thereto, wherein the display of the video pictures obtained by real-time shooting is a preferred scheme. In the same video communication interface, the video pictures corresponding to the user a and the video pictures corresponding to the user B can be displayed in windows with different sizes and shapes, the sizes of the windows are smaller than those of the video communication interface, and the two video pictures can be mutually non-overlapped or can have partial overlapping areas. For example, referring again to the video communication interface 300a, the video frame 301a corresponding to the user B is displayed in the interface in the form of a large window, the video frame 302a corresponding to the user a is displayed in the interface in the form of a small window, and the video frame 302a covers a small portion of the area of the video frame 301a, so that the user a can put the video frame 302a in a suitable position through a drag operation. in addition, in the video frames 301a and 302B, the original background corresponding to the user B (indicated by the horizontal hatched area in fig. 2B for simplicity) may be displayed, in the video frames 302a and 301B, the original background corresponding to the user a (indicated by the diagonal hatched area in fig. 2B for simplicity), which may be the background obtained by photographing the real environment in which the user a and the user B are located, or may be the default still background picture or the sequence frame dynamic background set in the video communication function by the social application.

Further, in the process of video communication between the user A and the user B, in order to improve the interestingness of video communication, the application supports any party user to switch the original background where the user is currently located into the virtual background. For example, as shown in fig. 2c, assuming that user a wishes to make a context switch, the terminal device 200a may display a virtual context list 301d in response to a triggering operation (e.g., a clicking operation) by user a for a context switch control 301c in the current video communication interface 300c, as shown in the video communication interface 300d, the virtual context list 301d may include one or more virtual contexts, such as a virtual context "street", a virtual context "flower", a virtual context "star", a virtual context "park", and so on. The virtual background list 301d may be displayed in any area (e.g., a bottom area) of the video communication interface 300d in a floating window form or a cover layer form or a semi-transparent form, or may be displayed by an interface that can be changed in display size by a drag operation and is retractable, the size of the interface being smaller than the video communication interface 300d. Optionally, when the virtual background list 301d is displayed, the video frame corresponding to the user B or the user a displayed in the form of a small window may be moved to an area having no overlapping portion with the display area of the virtual background list, that is, the video frame corresponding to the user B or the user a may not be covered by the display area of the virtual background list 301 d. It will be understood that, since the display area of the virtual background list 301d is limited, if there are more virtual background options in the virtual background list 301d, and all the virtual background options cannot be displayed at the same time, the terminal device 200a may display only a part of the virtual background options in the virtual background list 301d and hide the remaining virtual background options, and the user a may find the hidden virtual background options by performing operations such as sliding the virtual background list 301d left and right or sliding or dragging up and down to change the display size of the virtual background list 301 d. The procedure of the background switching by the terminal device 200b may refer to the procedure of the background switching by the terminal device 200a described above, and will not be described herein.

Assuming that the user a selects the virtual background "flower", as shown in fig. 2c, the terminal device 200a may send a background switching request to the server 100 in response to a triggering operation (such as a clicking operation) of the user a on the virtual background "flower", and simultaneously send service data related to the virtual background "flower" to the server 100, so as to switch the original background shown in the video communication interface 300a in fig. 2b into the virtual background "flower"; after receiving the background switching request, the server 100 may send the service data to the terminal device 200b, and the terminal device 200b may obtain the virtual background "flower" according to the service data, and then switch the original background as shown by the video communication interface 300b in fig. 2b into the virtual background "flower".

The video communication interfaces after the terminal device 200a and the terminal device 200b switch the original background to the virtual background "flower" can be seen in fig. 2d, as shown in fig. 2d, the video communication interface 300e is an interface after the terminal device 200a performs the background switching, and the video communication interface 300f is an interface after the terminal device 200b performs the background switching. As shown in the video communication interface 300e, the virtual background "flower" may include two flowers, so that the video frame 301e corresponding to the user a may be displayed in the form of a small window at the pistil position of one flower, and the video frame 302e corresponding to the user B may be displayed in the form of a small window at the pistil position of the other flower. It will be appreciated that in order not to interfere with the quality of video communication, the video frames 301e and 302e do not overlap. In addition, optionally, in the video communication interfaces of different terminal devices, the video frames corresponding to the two users may be displayed at the same position, for example, as shown in fig. 2d, the video frames corresponding to the user a are all displayed on the left side of the video communication interfaces (including the video communication interface 300e and the video communication interface 300 f), and the video frames corresponding to the user B are all displayed on the right side of the video communication interfaces (including the video communication interface 300e and the video communication interface 300 f); optionally, different setting may be performed on the display positions of the video frames according to different terminal devices, for example, a video frame corresponding to a user having a binding relationship with the current terminal device may be displayed on the left side (or the right side) of the video communication interface by default, and a video frame corresponding to another user may be displayed on the right side (or the left side) of the video communication interface by default.

Further, the virtual background may be updated in real time. As shown in fig. 2e, the terminal device 200a and the terminal device 200B may monitor the dynamic changes of the user a and the user B at any time, for example, the terminal device 200a may obtain the audio and video data a (including video data and audio data) of the user a during the video communication process, and similarly, the terminal device 200B may obtain the audio and video data B (including video data and audio data) of the user B during the video communication process, and further, the terminal device 200a may send the collected audio and video data a to the server 100, and the terminal device 200B may also send the collected audio and video data B to the server 100, and the server 100 may extract the user dynamic information of each user from the received audio and video data (including the audio and video data a and the audio and video data B) and detect whether the user dynamic information meets the virtual background update condition. When the user dynamic information of the user a or the user dynamic information of the user B satisfies the virtual background update condition, the server 100 may send the user dynamic information satisfying the condition to the terminal device 200a and the terminal device 200B, respectively, and further, the terminal device 200a and the terminal device 200B may perform any one or more of the update operations such as position adjustment, size adjustment, background animation display, background switching, etc. on the current virtual background according to the user dynamic information satisfying the condition. The user dynamic information includes, but is not limited to, location information corresponding to a key location of the user, facial expression of the user, volume data, and live-action information of an environment in which the user is located. For example, assuming that the key part is the head of the user, when detecting that the head of the user a and the head of the user B have a position change (for example, performing swinging), the server 100 may acquire position information corresponding to the head of the user a (for example, position coordinates of the key point) and position information corresponding to the head of the user B from the video frame of the user a and the video frame of the user B, respectively, and send the position information of the two to the terminal device 200a and the terminal device 200B, and further the terminal device 200a and the terminal device 200B may adjust the video frame corresponding to the user a and the video frame corresponding to the user B to the positions indicated by the position information, respectively, and simultaneously perform position synchronization adjustment on two flowers in the virtual background "flower" according to the positions after the two video frame adjustment, as shown in the video communication interface 300g, and when the head of the user a swings, the flowers associated with the user a on the left side and the flowers associated with the user B on the right side of the virtual background "flower" follow the flowers. In addition, the virtual background may be updated according to facial expression, volume data, live action information of the environment where the user is located, and the like, and the specific implementation manner may refer to the embodiments corresponding to the subsequent fig. 3 and 5, which are not described herein.

In the video communication process, the terminal device is supported to automatically acquire the user dynamic information of the current user, and update the virtual background in the current video communication interface in real time through the acquired user dynamic information, for example, the virtual background can be changed along with the limb movement, facial expression, sound and the like of the user, and the user is not required to interrupt the video communication and update the virtual background manually, that is, the application can maintain the normal operation and display of the video communication and improve the video communication quality while updating the virtual background. In addition, in the traditional video call, the images and the background elements of the user are cut apart and are difficult to fuse, the interestingness of the background to the video call is not obviously improved, and the application can improve the fusion degree of the images and the background of the user, improve the interestingness of the video communication and the interactivity of the call background and enrich the video display modes of the video communication by constructing the virtual dynamic interactable environment background.

Referring to fig. 3, fig. 3 is a flowchart of a video display method according to an embodiment of the application. The video presentation method may be performed by a computer device, which may include a terminal device or a server as described in fig. 1, and for ease of understanding, this embodiment is described by taking the method performed by the terminal device as an example. The video display method at least comprises the following steps S101-S103:

step S101, a first terminal displays a video communication interface for providing video communication functions for a first user and a second user in a social application, and a target virtual background where the first user and the second user are located together is displayed in the video communication interface; the first user is a user logging in a social application in the first terminal, and the second user is a user carrying out video communication with the first user in a video communication interface;

Specifically, in order to enable the first user and the second user to perform video communication, the first user needs to install an application having a video communication function in the first terminal, for example, an application having a video communication function, such as a social application, an instant messaging application, a live broadcast application, a short video application, a music application, a shopping application, a game application, and the like. Taking a social application as an example, when a first user logs in the social application in a first terminal and a second user logs in the social application in a second terminal, the first user and the second user can start video communication through terminal devices held by the first user and the second user respectively. After entering the video communication, the first terminal may display a video communication interface on a screen, where the video communication interface is used to provide video communication functions for the first user and the second user, and the specific form of the video communication interface may be referred to as the video communication interface 300a and the video communication interface 300b in fig. 2 b.

Further, in order to construct the virtual dynamic interactive environment background, the target virtual background where the first user and the second user are located together may be first displayed in the video communication interface. Taking the example that the first terminal performs this step, specifically, the video communication interface may include a context switch control, and the first terminal may display one or more virtual contexts in response to a triggering operation (such as a clicking operation) for the context switch control, for example, see the virtual context list 301d shown in the video communication interface 300d in fig. 2 c. And then the first terminal can respond to the selection operation for the one or more virtual backgrounds, determine the virtual background selected by the first user as a target virtual background, and then switch the original background where the first user and the second user are located together into the target virtual background in the video communication interface. The specific interaction process and interface schematic diagrams may be referred to the description in the embodiment corresponding to fig. 2 c-2 d, where in the description, the terminal device 200a may be a first terminal, the user a may be a first user, and correspondingly, the terminal device 200B may be a second terminal, the user B may be a second user, and the virtual background "flower" is the target virtual background selected by the user a. The virtual background is a dynamic background, and may include one or more background elements, for example, two flower elements in the virtual background "flower" are background elements therein.

Step S102, when user dynamic information of a first user meets a virtual background updating condition, updating a target virtual background to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process;

Specifically, in the video communication process, the first terminal may collect audio and video data (including video data and audio data) of the first user in real time, for example, collect video data through a camera, collect audio data through a microphone, and further analyze the collected audio and video data, so as to obtain user dynamic information, where the user dynamic information may include, but is not limited to, location information corresponding to a key location of the user, facial expression of the user, volume data, and live-action information of an environment where the user is located, so that the user dynamic information may change along with actions and facial expression changes of the user, speaking, and changes of the environment where the user is located.

Further, after the user dynamic information of the first user is obtained, the first terminal may detect whether the user dynamic information meets the virtual background updating condition, and only when it is determined that the user dynamic information meets the virtual background updating condition, the target virtual background may be updated, and the specific process of condition judgment may refer to step S202 in the embodiment corresponding to fig. 5 below.

For convenience of description and understanding, the first user screen and the second user screen in the video communication interface need to be described first. The method provided by the application is not only suitable for the video call scene of the real camera, but also suitable for the video call scene of the virtual image, and is also suitable for the video call scene of the virtual image plus the real camera.

In the video communication process, the first terminal can shoot the key part of the first user through the camera, meanwhile, the video virtual communication function of the first terminal is detected, if the function is detected to be in a closed state (namely, the first terminal does not open the video virtual communication function), the first terminal is indicated to be in a video call scene of the real camera, therefore, a camera video picture can be displayed in a video communication interface, and the camera video picture can be determined to be a first user picture. The image capturing video picture is a video picture for capturing a key part of the first user. Otherwise, if the video virtual communication function is detected to be in an on state (i.e. the first terminal has already opened the video virtual communication function), the first user is indicated to be in an avatar video call scene, so that a video picture for covering a key part of the first user can be displayed in the video communication interface, and the video picture can be determined as the first user picture. The state of the video virtual communication function can be changed by triggering a video virtual communication control in the video communication interface.

It can be understood that the second user screen is also used for displaying the key parts of the second user, and the second user screen may be a captured video screen or any of the video screens, specifically, which form needs to be determined according to the video call scene where the second user is currently located, so that the second terminal is required to execute the process of determining by the first terminal. Optionally, if the type of the video picture corresponding to the first user picture is different from the type of the video picture corresponding to the second user picture, that is, one of the two video pictures is a camera video picture, and the other one is a video picture, then the video call scene is an avatar+a real camera. Wherein the first user picture and the second user picture do not overlap each other. It should be noted that, the video frame may include various forms, for example, an avatar corresponding to the first user may be displayed, the avatar corresponding to the first user may be displayed in a certain area of the background (may be a real background or a virtual background), or an avatar corresponding to the first user may be displayed, or the avatar corresponding to the first user may be displayed in a certain area of the background, and the specific form may be determined according to the product requirement, which is not limited by the present application. For example, for a video frame displaying an avatar, it may be a video frame generated by a key part of a user, which may refer to a user's eye part, lip part, nose part, eyebrow part, etc., which may be used to characterize expression information (e.g., smiling expression, tucking expression, opening expression, eye-open and lip-open expression, etc.) of a user (e.g., a first user and a second user). After the video virtual communication function is turned on, the user can generate a corresponding video picture by selecting one as an avatar for performing avatar conversion.

Further, the first user picture and the second user picture can be displayed in the video communication interface, and the first user picture and the second user picture are displayed in a fusion manner with the target virtual background. For example, the scene depicted in the embodiment corresponding to fig. 2a to 2e is a real camera video call scene, please refer to the video communication interface 300e in fig. 2d, in the video communication process, the video picture 301e obtained by shooting the key part (such as the head) of the user a is the first user picture, the video picture 302e obtained by shooting the key part of the user B is the second user picture, and after the terminal device 200a obtains the video picture 301e and the video picture 302e, the video picture 301e and the video picture 302e can be fused to the corresponding stamen position.

For another example, please refer to fig. 4, fig. 4 is an interface schematic diagram of a video call scene according to an embodiment of the present application. As shown in the video communication interface 400a in fig. 4, in a scenario where both the user a and the user B start the video virtual communication function, the video frame 401a is a first user frame, the avatar selected by the user a may be displayed in the video frame 401a, the video frame 402a is a second user frame, and the avatar selected by the user B may be displayed in the video frame 402 a. Referring to the video communication interface 400B, in a scenario where the user a starts the video virtual communication function and the user B does not start the video virtual communication function, the video frame 401B is a first user frame, the avatar selected by the user a may be displayed in the video frame 401B, the captured video frame 402B is a second user frame, and the key portion of the user B may be displayed in the captured video frame 402B. It should be noted that, each avatar may change along with the change of the key part of the user, for example, when the user a is tucking a smile, the avatar in the video frame 401a may also be tucking a smile; when the user a skews his head, the avatar also skews his head. Alternatively, in a scenario where an avatar is used for video call, if the head of the user is not present in the shooting range of the camera, the currently selected front static avatar of the user may be displayed as a default display avatar in the video picture corresponding to the user, until the head of the user is reappeared in the shooting range of the camera, the avatar may be changed following the change of the user.

Further, optionally, when the user dynamic information of the first user meets the virtual background updating condition, the first terminal may adjust the first user picture to a position indicated by the user dynamic information in the video communication interface; and further, according to the first user picture after the position adjustment, in the video communication interface, the background element associated with the first user picture in the target virtual background is subjected to the position synchronous adjustment to obtain the updated target virtual background matched with the user dynamic information. For example, referring to fig. 2e, as shown in the video communication interface 300g, the floral elements in the virtual background "flower" may be adjusted in synchronization with the position of the user's head.

Optionally, when the user dynamic information of the first user meets the virtual background updating condition, the first terminal may display a background animation associated with the user dynamic information in the video communication interface, and further fuse the background animation with the target virtual background to obtain an updated target virtual background matched with the user dynamic information.

Optionally, when the user dynamic information of the first user meets the virtual background updating condition, the first terminal may perform, in the video communication interface, size adjustment on a background element associated with the first user picture in the target virtual background according to the user dynamic information, to obtain an updated target virtual background matched with the user dynamic information. Wherein the resizing may include one or more of a lifting process or a scaling process, it being understood that the lifting process may change the height of the background element and the scaling process may change the size of the background element.

Optionally, when the user dynamic information of the first user meets the virtual background updating condition, the first terminal may switch the target virtual background in the video communication interface according to the user dynamic information, so as to obtain an updated target virtual background matched with the user dynamic information.

It can be understood that the second terminal may also perform the update processing operation, such as the first terminal, in the corresponding video communication interface, which is not described herein. It should be noted that, because the types of the user dynamic information are various, there may be a case where various user dynamic information satisfies the virtual background update condition at the same time, at this time, the update of the target virtual background may include the various update processing operations described above, that is, in the video communication interface, the superposition effect after the various update processing operations may be displayed. In addition, the application does not limit which update processing operations are triggered by different types of user dynamic information, for example, when the target virtual background is virtual background flower, if the head of the user swings, the flower elements in the target virtual background can swing along with the head, or a background animation with petals falling can be displayed.

In addition, the first terminal may further obtain user dynamic information of the second user in real time, and when the user dynamic information of the second user meets the above-mentioned virtual background update condition, the target virtual background may be updated as well, where the update processing operation is the same as the above-mentioned process of performing update processing according to the user dynamic information of the first user, and details are not repeated here. Optionally, the user dynamic information of the first user and the user dynamic information of the second user may be fused to obtain fused dynamic information, when the fused dynamic information meets the virtual background updating condition, the target virtual background may be updated to obtain an updated target virtual background matched with the fused dynamic information, and at this time, the update processing operation is the same as the above process of performing update processing according to the user dynamic information of the first user, which is not described herein again. For example, if it is detected that the first user and the second user are laughing at the same time, the open animated special effect of the flowers may be displayed in the target virtual background. For another example, when the first user and the second user simultaneously experience head movements and the floral elements on both sides of the target virtual background cross during the swing, an associated background animation, such as an animation showing petals drifting, may be triggered. The user dynamic information of the second user is acquired from audio and video data of the second user in the video communication process.

The implementation manner of outputting the target virtual background and the video picture to the terminal equipment (including the first terminal and the second terminal) can be realized by a real-time rendering component in the terminal equipment. The real-time rendering component herein may refer to a component having picture rendering capabilities. For example, the real-time rendering component may be a real-time three-dimensional (3D) engine, such as an Ace3D engine. The Ace engine is a lightweight cross-platform rendering engine, can be deployed in camera application in terminal equipment, is used for capability reinforcement based on a Google filement open source rendering library, has the advantages of high loading speed, small memory occupation, high compatibility and the like, and can be used for hair rendering, 3D animation expression rendering, custom nodes and the like. The application dynamically draws the virtual background based on the rendering engine, and the flexibility and the interestingness of the virtual background can be far higher than that of the existing static picture or sequence frame animation, and the background is not static or simply circulated, but dynamic and interesting.

Step S103, the updated target virtual background is displayed in the video communication interface.

Specifically, the first terminal may fuse the updated target virtual background with the first user picture and the second user picture, and display the fused target virtual background in the video communication interface.

In the video communication process, the application can acquire the user dynamic information of the user from the audio and video data of the user, and when the user dynamic information is detected to meet the virtual background updating condition, the target virtual background can be updated, so that the updated target virtual background matched with the user dynamic information is obtained. Therefore, the application supports the terminal equipment to automatically acquire the user dynamic information of the current user, and invokes the rendering engine to update the virtual background in the current video communication interface in real time through the acquired user dynamic information, for example, the virtual background can change along with the limb movement, facial expression, sound and the like of the user, and the user is not required to interrupt video communication and manually operate to update, that is, the application can maintain the normal operation and display of video communication while updating the virtual background, and improve the video communication quality. In addition, the application can improve the fusion degree of the images and the backgrounds of the users by constructing the virtual dynamic interactive environment background, improve the interestingness of video communication and enrich the video display modes of the video communication.

Fig. 5 is a schematic flow chart of a video display method according to an embodiment of the present application. The video presentation method may be performed by a computer device, which may include a terminal device or a server as described in fig. 1, and for ease of understanding, this embodiment is described by taking the method performed by the terminal device as an example.

As shown in fig. 5, the method may include the steps of:

Step S201, a first terminal displays a video communication interface for providing video communication functions for a first user and a second user in a social application, and displays a target virtual background, a first user picture and a second user picture where the first user and the second user are located together in the video communication interface;

Specifically, after the first user and the second user start video communication through the social application, the first terminal may select one virtual background as a target virtual background, and in the video communication interface, the first user picture and the second user picture are displayed in a fusion manner with the target virtual background. The first user picture and the second user picture can be any one of a shooting video picture or a video picture. Fig. 6 a-6 d are schematic diagrams illustrating an interface of a video display process according to an embodiment of the present application. As shown in fig. 6a, assuming that the first terminal starts the video virtual communication function in response to a triggering operation (such as a clicking operation) for the video virtual communication control 501a in the video communication interface 500a, and the second terminal starts the video virtual communication function, as shown in the video communication interface 500a, the video frame 502a is a first user frame, the video frame 503a is a second user frame, at this time, both the first user frame and the second user frame belong to the video frame, and further, the first terminal may display one or more virtual backgrounds in the video communication interface 500b in response to the triggering operation for the background switching control, and further may determine the virtual background "flower" selected by the first user as a target virtual background in response to a selecting operation for one or more virtual backgrounds, and then fuse the first user frame and the second user frame with the virtual background "flower" and display the first user frame and the second user frame in the video communication interface 500 c.

Step S202, acquiring audio and video data of a first user, extracting user dynamic information of the first user from the audio and video data, and judging conditions of the user dynamic information;

Specifically, in the video communication process, the first terminal may acquire audio and video data (including video data and audio data) of the first user in real time through the camera and the microphone, so that user dynamic information of the first user may be extracted from the audio and video data, where the user dynamic information of the first user may include one or more of position information corresponding to a key position of the first user, facial expression of the first user, volume data, or real-scene information of an environment where the first user is located.

First, it may be determined whether the user dynamic information satisfies the virtual background update condition. Optionally, the first terminal may perform gesture detection on the first user in the video data, and in this embodiment of the present application, the head of the first user is used as a key location, so that three-dimensional position data corresponding to the head of the first user may be obtained through detection of a head key point, where the three-dimensional position data may include a spatial coordinate and a rotation angle of the head, and the three-dimensional position data is based on a local coordinate system (may also be referred to as a model coordinate system). Further, the first terminal may generate a three-dimensional Model matrix (Model matrix), a View matrix (View matrix) and a Projection matrix (Projection matrix) associated with the first user according to a geometric relationship between the key location of the first user and the world coordinate system, a positional relationship between the key location of the first user and the camera of the first terminal, and a size of the screen display area of the first terminal, respectively, specifically: the geometric relation between the key part of the first user and the origin and coordinate axes of the world coordinate system can be obtained, the first translation matrix and the first rotation matrix can be constructed according to the geometric relation, the three-dimensional model matrix associated with the first user can be generated according to the first translation matrix and the first rotation matrix, and in addition, when scaling transformation exists, the scaling matrix is also required to be constructed to participate in coordinate operation. The world coordinate system is used for describing the key part of the first user and the position of the camera of the first terminal. Meanwhile, a second translation matrix and a second rotation matrix can be constructed according to the position relation between the key part of the first user and the camera of the first terminal, and then a view matrix associated with the first user can be generated according to the second translation matrix and the second rotation matrix; meanwhile, spatial parameters of a camera coordinate system can be obtained according to the size of a screen display area of the first terminal, and a projection matrix associated with the first user is constructed according to the spatial parameters. Further, the three-dimensional position data can be multiplied by the three-dimensional Model matrix, the View matrix and the Projection matrix to realize matrix transformation of the three-dimensional position data, so that vertex position coordinates corresponding to a key part of the first user can be obtained, the vertex position coordinates are determined to be position information corresponding to the key part of the first user, namely, a process of converting the three-dimensional position data in a local coordinate system into a standard View space is realized, the process can be called MVP (Model-View-project) matrix transformation, and the transformation can adopt knowledge of geometric transformation translation, rotation, scaling and the like in graphics. It can be understood that when the spatial coordinates or the rotation angle in the three-dimensional position data are changed, it can be determined that the position information is changed, and it can be determined that the position information after the position change satisfies the virtual background update condition, that is, when the head of the first user deflects or moves, the position information corresponding to the head satisfies the virtual background update condition. It should be noted that the present application supports three degrees of freedom of movement of the x, y, and z coordinate axes, and three degrees of freedom of rotation about this axis.

Optionally, the first terminal may detect an expression of the first user in the obtained video data, specifically may detect the first user based on the deep neural network, so as to obtain a facial expression of the first user, and if the facial expression of the first user is detected to belong to the target facial expression type, it may be determined that the facial expression of the first user meets the virtual background update condition. Among them, the target facial expression types include, but are not limited to, a happy type, a surprise type, a sad type, an anger type, an aversion type, and a fear type.

Optionally, the first terminal may perform audio detection on the acquired audio data, and specifically may sample the audio data, so as to obtain volume data corresponding to the first user, where it may be understood that the volume data includes volume values obtained by multiple sampling. Further, when it is detected that there is a volume value located in the volume detection section in the volume data, it may be determined that the volume value satisfies the virtual background update condition. Preferably, the volume detection interval is 20 db to 90 db.

Optionally, the first terminal may detect an environment where the first user is located in the video data, specifically, may extract, from the video data, live-action information of the environment where the first user is located, where the live-action information may include one or more of darkness, color composition, or a critical environmental object of the environment where the first user is located, where the critical environmental object may be understood as an object occupying a special position in the captured environment, for example, the first user sits on an outdoor lawn, and the camera captures a fresh flower on the lawn multiple times, and then the fresh flower may be determined as one of the critical environmental objects. Further, if it is detected that the real scene information changes in environment, for example, the brightness in the real environment changes, the important color ratio changes, or the key environment object changes, it can be determined that the changed real scene information satisfies the virtual background update condition.

It should be noted that, in the present application, gesture detection, expression detection, audio detection, and environment detection may be performed in parallel, any of a plurality of pieces of information in the above-mentioned position information, facial expression, volume data, or live-action information may be combined, and when all pieces of information in the combination satisfy respective corresponding virtual background update conditions, the combination also satisfies the virtual background update conditions.

And secondly, the first terminal can judge whether the user dynamic information meets the user picture switching condition. Specifically, if the facial expression of the first user belongs to the target facial expression type and the display time of the facial expression is greater than the first time threshold, it may be determined that the user dynamic information of the first user meets the user picture switching condition. Optionally, if the volume data corresponding to the first user is located in the volume detection interval, and the duration of the volume data located in the volume detection interval is longer than the second duration threshold, it may be determined that the user dynamic information of the first user meets the user picture switching condition. Optionally, the first terminal may further perform semantic recognition on audio data of the first user, and when it is recognized that a user picture switching instruction exists in the audio data, it may be determined that user dynamic information of the first user meets a user picture switching condition. The first time length threshold and the second time length threshold can be set according to actual needs, which is not limited by the application.

Step S203, when the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information;

Specifically, when the position information after the position change in the step S202 meets the virtual background updating condition, in the video communication interface, the first terminal may determine a displacement distance and a deflection angle (which may be obtained by MVP matrix transformation calculation) according to the position information after the position change, and further may perform displacement deflection processing on the first user picture according to the displacement distance and the deflection angle, and simultaneously, in the video communication interface, perform synchronous displacement deflection processing on a background element associated with the first user picture in the target virtual background according to the displacement distance and the deflection angle, so as to obtain an updated target virtual background matched with the position information after the position change. For example, referring to fig. 6b together, when the head position of the user a in the left user screen changes, the user screen and the associated flower element are both displaced and deflected from the position shown by the area 501d in the original video communication interface 500d to the position shown by the area 501e in the video communication interface 500e, as shown in fig. 6 b. Alternatively, if crossing of the floral elements on the left and right sides is not desired, the swing range of each floral element may be defined in the video communication interface 500e, or the distance between the floral elements on the two sides may be detected, and when the distance is less than the distance threshold, the floral elements that were not swung may be triggered to swing in the same direction until the distance between the floral elements on the two sides is greater than the distance threshold, and the specific product shape may be determined according to the actual product requirement.

Optionally, when the facial expression of the first user in step S202 meets the virtual background update condition, the process may be performed in the expression animation mapping table to obtain a background animation matched with the facial expression, and then the background animation may be displayed in the video communication interface, and the background animation and the target virtual background may be fused to obtain an updated target virtual background matched with the facial expression. The expression animation mapping table stores background animations corresponding to each virtual background, and establishes a mapping relation between each facial expression and the background animations. For example, referring to fig. 6c together, when it is detected that the user B in the user screen on the right side is smiling (the corresponding facial expression type is a happy type), as shown in the video communication interface 500g in fig. 6c, bees may appear around the user screen of the user B and may move around the user screen, and if the user B has a head motion at this time, the bees may also follow the movement.

Optionally, when the volume data of the first user in the step S202 meets the virtual background update condition, a volume peak-valley value (including a volume peak value and a volume valley value) may be extracted from the volume data, and a scaling matrix associated with the first user may be constructed according to the volume peak-valley value, where the scaling matrix is formed by scaling coefficients in at least two different scaling directions, and the specific scaling direction is not limited in the present application. Further, in the video communication interface, the first terminal may perform size adjustment on a background element associated with the first user picture in the target virtual background according to at least two scaling coefficients in the scaling matrix, so as to obtain an updated target virtual background matched with the user dynamic information. Wherein the resizing may comprise one or more of a lifting process or a scaling process. In addition, the scaling matrix may be put into the MVP matrix transformation in the above step S202 to perform a dynamic operation. For example, referring to fig. 6d together, the video communication interface 500h is in the form of an initial video communication interface corresponding to the user a and the user B, when it is detected that the volume data generated by the user B in the user picture located on the right side is located in the volume detection zone, and the volume peak in the volume data becomes large, in the video communication interface 500i of the user B, the flower element in the virtual background will rise, and at the same time, the user picture of the user B will also rise, and when the rising height exceeds the height threshold, the white cloud element may also be displayed in the video communication interface 500 i. It will be appreciated that, as shown in the video communication interface 500j of the user a on the other side, the picture seen by the user a may also be changed accordingly, and it can be seen that the elevation of the flower element on the right side is greater than the area that can be displayed by the video communication interface 500j, so that the user a can only see the stem portion of the flower element on the right side. Of course, as the volume of the user B speaking becomes smaller, its associated floral element will revert to its original height.

Optionally, when the real scene information of the environment where the first user is located in step S202 satisfies the virtual background update condition, the current target virtual background may be switched to the virtual background associated with the real scene information in the video communication interface. For example, when it is detected that there are many flowers in the environment where the first user is located, the first terminal may generate a keyword "flower" and search for a virtual background matching the keyword as the updated target virtual background.

It should be noted that the above-listed background updating manners are only a few of the various embodiments, and the present application may also support other background updating manners, and the present application does not limit the correspondence between different types of user dynamic information and background updating manners, and is not described herein.

Step S204, when the user dynamic information of the first user meets the user picture switching condition, switching the first user picture;

specifically, when the user dynamic information of the first user satisfies the user picture switching condition, the first terminal may switch the first user picture from the video picture to the captured video picture or switch the first user picture from the captured video picture to the video picture in the video communication interface.

In step S205, in the video communication interface, the switched first user frame, the switched second user frame and the updated target virtual background are displayed in a fusion manner.

Specifically, in the video communication interface, the first terminal may perform fusion display on the switched first user picture, the second user picture and the updated target virtual background.

Fig. 7 is a schematic workflow diagram of a video display system according to an embodiment of the present application, where the video display system may be located in a terminal device or a server. As shown in fig. 7, the workflow of the video presentation system is as follows:

(1) After the video call is started, the system respectively detects the dynamic information of the user in real time: the method comprises the steps of carrying out gesture detection through a camera AI to obtain a head space coordinate and a rotation angle of a user, and obtaining a facial expression type through expression detection; and carrying out audio detection on the audio data through a microphone, and judging the volume.

(2) And (3) refreshing the real-time data (namely the dynamic information of the user) in the step (1) to a local system and sending the data to a communication opposite party system, wherein the two party systems start picture drawing.

(3) The head space coordinates and angles are mapped into a virtual scene.

A) The mapping of the space coordinates uses geometric transformation translation in graphics, and a translation matrix is constructed to participate in coordinate operation to complete the translation operation from a local coordinate system to a world coordinate system through the space distances of an x axis, a y axis and a z axis of a model from an origin, wherein the form of the translation matrix is as follows:

Fig. 8 a-8 d are schematic diagrams of a coordinate transformation according to an embodiment of the present application. As shown in fig. 8a, in the coordinate system, after a translation operation, the original coordinates (x, y) can be transformed to new coordinates (x+t _x,y+T_y).

B) The angle mapping uses geometric transformation rotation in graphics, euler angle sine and cosine transformation of the model in each direction of x, y and z axes is used, a rotation matrix is constructed to participate in coordinate operation to complete rotation operation from a local coordinate system to a world coordinate system, and the rotation matrix is in the following form:

As shown in fig. 8b, in the coordinate system, after the rotation operation, the original coordinate (x, y) can be transformed to the new coordinate (xcos α -ysin α, xsin α+ ycos α) by the rotation angle α.

(4) The rendering engine is used to displace and deflect the virtual background to which the user corresponds. After the 3D API (refers to the direct interface between the display card and the application program) takes the mapping coordinates, the conversion from the partial model local coordinate system to the world coordinate system is completed, and then the View matrix and the project matrix are determined according to the placement position of the camera and the screen display area. Model (i.e. three-dimensional Model matrix) View (i.e. View matrix) project (i.e. Projection matrix) three operations obtain the final vertex position coordinates, and transmit the vertex position coordinates to a shader (loader) to complete the final rasterization and rendering of the graphics of the virtual background (including the video picture if in the video virtual communication state). The Model matrix and the View matrix can be obtained by the rotation matrix and the translation matrix, the Model matrix converts the local coordinate system into a world coordinate system, and the View matrix converts the world coordinate system into a camera coordinate system. The project matrix involves trigonometric functions and similar triangles, and the form of the project matrix is as follows:

referring also to fig. 8c, as shown in fig. 8c, the visible area of the camera is a geometric space extending from the screen thereof along the viewing direction (facing direction) according to a certain rule. Since the geometrical objects far from the camera have a limited contribution to the picture, it is considered that the visible area is cut off when extending a certain distance. If the screen is a plane rectangle, the visual area is a geometric body surrounded by six planes, and the geometric body is a visual body of the camera, and the visual body is respectively represented by English words of representation top, bottom, left, right, near, far as upper, lower, left, right, near and far planes, and top, bottom, left, right, near, far in the project matrix can also represent coordinates of the corresponding planes. The near plane is also called a near clipping plane, the far plane is also called a far clipping plane, the near clipping plane of the shadow part in fig. 8c corresponds to a projection plane, which can be understood as a screen display area, which represents the position and the size of a screen, and the projection center (COP, center of Projection) is the position where the camera is located. Projective transformation by a Projection matrix is a process of transforming view volumes of various shapes into a standard view volume.

(5) According to different facial expression types, a rendering engine is used to trigger corresponding background animations.

(6) And lifting or scaling the virtual background corresponding to the user by using the rendering engine according to the volume. The volume is determined by the captured sampling value of PCM (pulse code modulation), the scaling of the corresponding background element model is adjusted according to the peak-valley value of the volume, the scaling is also the scaling using geometric transformation in graphics, and a scaling matrix is constructed by stretching scaling coefficients of the model in the x axis, the y axis and the z axis to dynamically participate in the MVP operation. The form of the scaling matrix is as follows:

As shown in fig. 8d, in the coordinate system, after scaling operation, the x-axis coordinate of a certain point on the original image may be transformed from x to x×sx, where Sx is a stretch scaling factor in the x-axis direction.

In the video communication process, the application can acquire the user dynamic information of the user from the audio and video data of the user, and when the user dynamic information is detected to meet the virtual background updating condition, the target virtual background can be updated, so that the updated target virtual background matched with the user dynamic information is obtained. Therefore, the application supports the terminal equipment to automatically acquire the user dynamic information of the current user, and invokes the rendering engine to update the virtual background in the current video communication interface in real time through the acquired user dynamic information, for example, the virtual background can change along with the limb movement, facial expression, sound and the like of the user, and the user is not required to interrupt video communication and manually operate to update, that is, the application can maintain the normal operation and display of video communication while updating the virtual background, and improve the video communication quality. In addition, the application combines the rendering engine with the detection of human body gesture, expression, sound and environment to construct a virtual dynamic interactive environment background, which can improve the fusion degree of the image and the background of the user and the interestingness of video communication.

Fig. 9 is a schematic structural diagram of a video display apparatus according to an embodiment of the present application. The video presentation device may be a computer program (including program code) running on a computer apparatus, for example the video presentation device is an application software; the device can be used for executing corresponding steps in the method provided by the embodiment of the application. As shown in fig. 9, the video display apparatus 1 may include: a first display module 11, a first update module 12, a second display module 13;

The first display module 11 is configured to display, in a social application, a video communication interface for providing a video communication function for a first user and a second user, where a target virtual background where the first user and the second user are located together is displayed in the video communication interface; the first user is a user logging in a social application in the first terminal, and the second user is a user carrying out video communication with the first user in a video communication interface;

The first display module 11 is specifically configured to display one or more virtual backgrounds in response to a triggering operation for a background switching control in a video communication interface by a first terminal; in response to a selection operation for one or more virtual contexts, determining the selected virtual context as a target virtual context; in a video communication interface, switching an original background where a first user and a second user are located together into a target virtual background;

The first updating module 12 is configured to update the target virtual background when the user dynamic information of the first user meets the virtual background updating condition, so as to obtain an updated target virtual background that is matched with the user dynamic information; the user dynamic information is acquired from audio and video data of a first user in the video communication process;

and the second display module 13 is used for displaying the updated target virtual background in the video communication interface.

The specific function implementation manner of the first display module 11 may refer to step S101 in the embodiment corresponding to fig. 3, or may refer to step S201 in the embodiment corresponding to fig. 5, the specific function implementation manner of the first update module 12 may refer to step S102 in the embodiment corresponding to fig. 3, or may refer to step S203 in the embodiment corresponding to fig. 5, and the specific function implementation manner of the second display module 13 may refer to step S103 in the embodiment corresponding to fig. 3, or may refer to step S205 in the embodiment corresponding to fig. 5, which will not be described herein.

referring to fig. 9, the video display apparatus 1 may further include: a first fused display module 14;

The first fusion display module 14 is used for shooting key parts of the first user in the video communication process; if the video virtual communication function of the first terminal is in an on state, displaying a video picture for covering a key part of the first user in the video communication interface, and determining the video picture as a first user picture; displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying key parts of the second user; and in the video communication interface, the first user picture and the second user picture are displayed in a fusion way with the target virtual background.

The specific functional implementation manner of the first fusion display module 14 may refer to step S102 in the embodiment corresponding to fig. 3, or may refer to step S201 in the embodiment corresponding to fig. 5, which is not described herein.

Referring to fig. 9, the video display apparatus 1 may further include: a second fusion display module 15;

The second fusion display module 15 is used for shooting key parts of the first user in the video communication process; if the video virtual communication function of the first terminal is in a closed state, displaying a camera shooting video picture in a video communication interface, and determining the camera shooting video picture as a first user picture; the shooting video picture is a video picture for shooting a key part of the first user; displaying a second user picture which is not overlapped with the first user picture in the video communication interface; the second user picture is used for displaying key parts of the second user; and in the video communication interface, the first user picture and the second user picture are displayed in a fusion way with the target virtual background.

The specific function implementation manner of the second fusion display module 15 may refer to step S102 in the embodiment corresponding to fig. 3, or may refer to step S201 in the embodiment corresponding to fig. 5, where the first fusion display module 14 and the second fusion display module 15 may be combined into one fusion display module, which is not described herein again.

Referring to fig. 9, the video display apparatus 1 may further include: a second update module 16;

The second updating module 16 is configured to obtain user dynamic information of the second user, and fuse the user dynamic information of the first user with the user dynamic information of the second user to obtain fused dynamic information; the user dynamic information of the second user is acquired from the audio and video data of the second user in the video communication process; and when the fusion dynamic information meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the fusion dynamic information.

The specific functional implementation manner of the second updating module 16 may refer to step S102 in the embodiment corresponding to fig. 3, and will not be described herein.

Referring to fig. 9, the video display apparatus 1 may further include: a posture detection module 17, a first condition judgment module 18;

The gesture detection module 17 is configured to perform gesture detection on the first user during the video communication process, and obtain three-dimensional position data corresponding to a key position of the first user; generating a three-dimensional model matrix, a view matrix and a projection matrix which are associated with the first user respectively according to the geometric relation between the key part of the first user and the world coordinate system, the position relation between the key part of the first user and the camera of the first terminal and the size of the screen display area of the first terminal; the world coordinate system is used for describing the position of the key part of the first user and the camera of the first terminal; according to the three-dimensional model matrix, the view matrix and the projection matrix, performing matrix transformation on the three-dimensional position data to generate vertex position coordinates corresponding to the key parts of the first user, and determining the vertex position coordinates as position information corresponding to the key parts of the first user;

The gesture detection module 17 is specifically configured to obtain a geometric relationship between a key part of the first user and an origin and coordinate axes of a world coordinate system, construct a first translation matrix and a first rotation matrix according to the geometric relationship, and generate a three-dimensional model matrix associated with the first user according to the first translation matrix and the first rotation matrix; constructing a second translation matrix and a second rotation matrix according to the position relation between the key part of the first user and the camera of the first terminal, and generating a view matrix associated with the first user according to the second translation matrix and the second rotation matrix; according to the size of a screen display area of the first terminal, spatial parameters of a camera coordinate system are obtained, and a projection matrix associated with a first user is constructed according to the spatial parameters;

The first condition determining module 18 is configured to determine that the position information changes in position if the spatial coordinates or the rotation angle in the three-dimensional position data change in parameters, and determine that the position information after the position change satisfies the virtual background updating condition.

The specific functional implementation manner of the gesture detection module 17 and the first condition determination module 18 may refer to step S202 in the embodiment corresponding to fig. 5, which is not described herein.

referring to fig. 9, the video display apparatus 1 may further include: an expression detection module 19 and a second condition judgment module 20;

the expression detection module 19 is configured to perform expression detection on the first user during the video communication process, and obtain a facial expression of the first user;

The second condition judgment module 20 is configured to determine that the user dynamic information satisfies the virtual background update condition if the facial expression belongs to the target facial expression type.

The specific functional implementation manner of the expression detection module 19 and the second condition judgment module 20 may refer to step S202 in the embodiment corresponding to fig. 5, and will not be described herein.

Referring to fig. 9, the video display apparatus 1 may further include: an audio detection module 21, a third condition judgment module 22;

The audio detection module 21 is configured to obtain audio data recorded by a first user, and sample the audio data to obtain volume data corresponding to the first user;

the third condition judgment module 22 is configured to determine that the user dynamic information satisfies the virtual background update condition if the volume data is located in the volume detection section.

The specific functional implementation manner of the audio detection module 21 and the third condition determination module 22 may refer to step S202 in the embodiment corresponding to fig. 5, and will not be described herein.

Referring to fig. 9, the video display apparatus 1 may further include: an environment detection module 23, a fourth condition judgment module 24;

The environment detection module 23 is configured to obtain video data of an environment where the first user is located, and extract live-action information of the environment where the first user is located from the video data; the live-action information comprises one or more of darkness, color composition or key environment objects of the environment where the first user is located;

and the fourth condition judgment module 24 is configured to determine that the dynamic information of the user satisfies the virtual background update condition if the environment of the live-action information changes.

The specific functional implementation manner of the environment detection module 23 and the fourth condition judgment module 24 may refer to step S202 in the embodiment corresponding to fig. 5, and will not be described herein.

Referring to fig. 9, the video display apparatus 1 may further include: a screen switching module 25;

The picture switching module 25 is configured to determine that the user dynamic information of the first user satisfies the user picture switching condition if the facial expression of the first user belongs to the target facial expression type and the display time period of the facial expression is greater than the first time period threshold, and switch the first user picture from the video picture to the captured video picture in the video communication interface; the camera shooting video picture is a video picture for shooting a key part of the first user, or if the volume data corresponding to the first user is located in a volume detection interval, and the duration time of the volume data located in the volume detection interval is longer than a second duration threshold value, determining that the user dynamic information of the first user meets a user picture switching condition, and switching the first user picture from the video picture to the camera shooting video picture in a video communication interface.

The specific function implementation manner of the frame switching module 25 may refer to step S202 and step S204 in the embodiment corresponding to fig. 5, which are not described herein.

Referring to fig. 9, the first updating module 12 may include: a first position adjusting unit 121, a first moving image display unit 122, a first resizing unit 123, a second position adjusting unit 124, a second moving image display unit 125, a second resizing unit 126;

A first position adjustment unit 121, configured to adjust, in the video communication interface, the first user frame to a position indicated by the user dynamic information when the user dynamic information of the first user satisfies the virtual background update condition; according to the first user picture after the position adjustment, in a video communication interface, performing position synchronous adjustment on background elements associated with the first user picture in the target virtual background to obtain an updated target virtual background matched with the user dynamic information;

the first animation display unit 122 is configured to display a background animation associated with the user dynamic information in the video communication interface when the user dynamic information of the first user meets the virtual background update condition, and fuse the background animation with the target virtual background to obtain an updated target virtual background matched with the user dynamic information;

the first size adjustment unit 123 is configured to, when the user dynamic information of the first user meets the virtual background update condition, perform size adjustment on a background element associated with the first user picture in the target virtual background according to the user dynamic information in the video communication interface, so as to obtain an updated target virtual background that is matched with the user dynamic information;

The second position adjustment unit 124 is configured to determine, in the video communication interface, a displacement distance and a deflection angle according to the position information after the position change when the position information after the position change in the user dynamic information meets the virtual background update condition, and perform displacement deflection processing on the first user picture according to the displacement distance and the deflection angle; in a video communication interface, according to the displacement distance and the deflection angle, performing synchronous displacement deflection processing on background elements associated with a first user picture in a target virtual background to obtain an updated target virtual background matched with user dynamic information;

A second animation display unit 125, configured to traverse in the expression animation mapping table to obtain a background animation matching with the facial expression when the facial expression in the dynamic information of the user satisfies the virtual background update condition; displaying a background animation in a video communication interface, and fusing the background animation with a target virtual background to obtain an updated target virtual background matched with user dynamic information;

A second size adjusting unit 126, configured to extract a volume peak-valley value from the volume data when the volume data in the user dynamic information satisfies the virtual background update condition, and construct a scaling matrix associated with the first user according to the volume peak-valley value; the scaling matrix is composed of at least two scaling coefficients in different expansion directions; and in the video communication interface, the size of a background element associated with the first user picture in the target virtual background is adjusted according to at least two scaling coefficients in the scaling matrix, so that the updated target virtual background matched with the user dynamic information is obtained.

The specific functional implementation manners of the first position adjusting unit 121, the first animation displaying unit 122, and the first resizing unit 123 may refer to step S102 in the embodiment corresponding to fig. 3, the specific functional implementation manners of the second position adjusting unit 124, the second animation displaying unit 125, and the second resizing unit 126 may refer to step S203 in the embodiment corresponding to fig. 5, wherein the first position adjusting unit 121 and the second position adjusting unit 124 may be combined into one position adjusting unit, the first animation displaying unit 122 and the second animation displaying unit 125 may be combined into one animation displaying unit, and the first resizing unit 123 and the second resizing unit 126 may be combined into one resizing unit, which will not be described herein.

Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 10, the computer device 1000 may include: processor 1001, network interface 1004, and memory 1005, and in addition, the above-described computer device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface, among others. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 10, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 1005, which is one type of computer-readable storage medium.

In the computer device 1000 shown in FIG. 10, the network interface 1004 may provide network communication functions; while user interface 1003 is primarily used as an interface for providing input to a user; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

It should be understood that the computer device 1000 described in the embodiments of the present application may perform the description of the video presentation method in any of the embodiments corresponding to fig. 3 and 5, and will not be described herein. In addition, the description of the beneficial effects of the same method is omitted.

Furthermore, it should be noted here that: the embodiment of the present application further provides a computer readable storage medium, in which a computer program executed by the video display apparatus 1 mentioned above is stored, and the computer program includes program instructions, when the processor executes the program instructions, the description of the video display method in any of the embodiments corresponding to fig. 3 and 5 can be executed, and therefore, the description will not be repeated here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application.

The computer readable storage medium may be the video display apparatus provided in any one of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), etc. that are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Furthermore, it should be noted here that: embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the method provided by any of the corresponding embodiments of fig. 3 and 5 above.

The terms first, second and the like in the description and in the claims and drawings of embodiments of the application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. A video presentation method, comprising:

The method comprises the steps that a first terminal displays a video communication interface for providing video communication functions for a first user and a second user in social application, and a target virtual background where the first user and the second user are located together is displayed in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user carrying out video communication with the first user in the video communication interface; the target virtual background comprises a plurality of background elements, and a first user picture and a second user picture are fused in the target virtual background; the first user picture and the second user picture are respectively displayed in different background elements, and the background elements are not overlapped with each other; the first user picture is a video picture used for covering the key part of the first user in the video communication interface, and the second user picture is used for displaying the key part of the second user;

When the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from the audio and video data of the first user in the video communication process;

Displaying the updated target virtual background in the video communication interface;

When the user dynamic information of the first user meets the virtual background updating condition, updating the target virtual background to obtain an updated target virtual background matched with the user dynamic information, wherein the method comprises the following steps:

When the user dynamic information of the first user meets the virtual background updating condition, carrying out displacement deflection processing on the first user picture in the video communication interface;

And in the video communication interface, according to the first user picture after displacement deflection, performing synchronous displacement deflection processing on background elements associated with the first user picture in the target virtual background to obtain an updated target virtual background matched with the user dynamic information.

2. The method of claim 1, wherein the presenting in the video communication interface the target virtual context in which the first user and the second user are co-located comprises:

The first terminal responds to triggering operation aiming at a background switching control in the video communication interface, and displays one or more virtual backgrounds;

in response to a selection operation for the one or more virtual contexts, determining the selected virtual context as a target virtual context;

And in the video communication interface, switching the original background where the first user and the second user are located together into the target virtual background.

3. The method according to claim 1, wherein the method further comprises:

Shooting key parts of the first user in the video communication process;

If the video virtual communication function of the first terminal is in an on state, displaying a video picture for covering a key part of the first user in the video communication interface, and determining the video picture as a first user picture;

Displaying a second user picture which is not overlapped with the first user picture in the video communication interface;

And in the video communication interface, the first user picture and the second user picture are displayed in a fusion mode with the target virtual background.

4. The method of claim 1, wherein updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition, to obtain an updated target virtual background that matches the user dynamic information, further comprises:

And when the user dynamic information of the first user meets the virtual background updating condition, displaying a background animation associated with the user dynamic information in the video communication interface, and fusing the background animation with the target virtual background to obtain an updated target virtual background matched with the user dynamic information.

5. The method of claim 3, wherein updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition, to obtain an updated target virtual background that matches the user dynamic information, further comprises:

And when the user dynamic information of the first user meets the virtual background updating condition, in the video communication interface, according to the user dynamic information, performing size adjustment on background elements associated with the first user picture in the target virtual background to obtain an updated target virtual background matched with the user dynamic information.

6. The method as recited in claim 1, further comprising:

Acquiring user dynamic information of the second user, and fusing the user dynamic information of the first user with the user dynamic information of the second user to obtain fused dynamic information; the user dynamic information of the second user is acquired from the audio and video data of the second user in the video communication process;

And when the fusion dynamic information meets the virtual background updating conditions, updating the target virtual background to obtain an updated target virtual background matched with the fusion dynamic information.

7. A method according to claim 3, wherein the user dynamic information comprises location information corresponding to key locations of the first user;

The method further comprises the steps of:

Performing gesture detection on the first user in the video communication process to obtain three-dimensional position data corresponding to the key part of the first user;

Generating a three-dimensional model matrix, a view matrix and a projection matrix which are associated with the first user respectively according to the geometric relation between the key part of the first user and a world coordinate system, the position relation between the key part of the first user and a camera of the first terminal and the size of a screen display area of the first terminal; the world coordinate system is used for describing the key part of the first user and the position of the camera of the first terminal;

According to the three-dimensional model matrix, the view matrix and the projection matrix, performing matrix transformation on the three-dimensional position data to generate vertex position coordinates corresponding to the key positions of the first user, and determining the vertex position coordinates as position information corresponding to the key positions of the first user;

If the space coordinates or the rotation angles in the three-dimensional position data are changed in parameters, determining that the position information is changed in position, and determining that the position information after the position change meets the virtual background updating condition.

8. The method of claim 7, wherein generating the three-dimensional model matrix, the view matrix, and the projection matrix associated with the first user based on the geometric relationship between the key location of the first user and the world coordinate system, the positional relationship between the key location of the first user and the camera of the first terminal, and the size of the screen display area of the first terminal, respectively, comprises:

Acquiring a geometric relation between a key part of the first user and an origin and a coordinate axis of a world coordinate system, constructing a first translation matrix and a first rotation matrix according to the geometric relation, and generating a three-dimensional model matrix associated with the first user according to the first translation matrix and the first rotation matrix;

Constructing a second translation matrix and a second rotation matrix according to the position relation between the key part of the first user and the camera of the first terminal, and generating a view matrix associated with the first user according to the second translation matrix and the second rotation matrix;

and obtaining a space parameter of a camera coordinate system according to the size of the screen display area of the first terminal, and constructing a projection matrix associated with the first user according to the space parameter.

9. The method of claim 1, wherein the user dynamic information comprises a facial expression of the first user;

The method further comprises the steps of:

performing expression detection on the first user in the video communication process to acquire the facial expression of the first user;

And if the facial expression belongs to the target facial expression type, determining that the dynamic information of the user meets the virtual background updating condition.

10. The method of claim 9, wherein updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition, to obtain an updated target virtual background that matches the user dynamic information, further comprises:

When the facial expression in the user dynamic information meets a virtual background updating condition, traversing in an expression animation mapping table to obtain a background animation matched with the facial expression;

and displaying the background animation in the video communication interface, and fusing the background animation with the target virtual background to obtain an updated target virtual background matched with the user dynamic information.

11. A method according to claim 3, wherein the user dynamic information comprises volume data corresponding to the first user;

The method further comprises the steps of:

acquiring audio data recorded by the first user, and sampling the audio data to obtain volume data corresponding to the first user;

and if the volume data is positioned in the volume detection section, determining that the user dynamic information meets the virtual background updating condition.

12. The method of claim 11, wherein updating the target virtual background when the user dynamic information of the first user satisfies a virtual background update condition, to obtain an updated target virtual background that matches the user dynamic information, further comprises:

When the volume data in the user dynamic information meets a virtual background updating condition, extracting volume peaks and valleys from the volume data, and constructing a scaling matrix associated with the first user according to the volume peaks and valleys; the scaling matrix is composed of at least two scaling coefficients in different telescopic directions;

and in the video communication interface, performing size adjustment on background elements associated with the first user picture in the target virtual background according to at least two scaling coefficients in the scaling matrix to obtain an updated target virtual background matched with the user dynamic information.

13. The method of claim 1, wherein the user dynamic information comprises live-action information of an environment in which the first user is located;

The method further comprises the steps of:

Acquiring video data of an environment where the first user is located, and extracting live-action information of the environment where the first user is located from the video data; the live-action information comprises one or more of darkness, color composition or key environment objects of the environment where the first user is located;

and if the environment of the live-action information changes, determining that the dynamic information of the user meets the virtual background updating condition.

14. A method according to claim 3, further comprising:

If the facial expression of the first user belongs to a target facial expression type and the display time of the facial expression is longer than a first time threshold, determining that user dynamic information of the first user meets user picture switching conditions, and switching the first user picture from the video picture to a shooting video picture in the video communication interface; the camera video picture refers to a video picture for shooting the key part of the first user, or

If the volume data corresponding to the first user is located in a volume detection section and the duration time of the volume data located in the volume detection section is longer than a second duration time threshold, determining that the user dynamic information of the first user meets a user picture switching condition, and switching the first user picture from the video picture to a shooting video picture in the video communication interface.

15. A video display apparatus, comprising:

The first display module is used for displaying a video communication interface for providing video communication functions for a first user and a second user in social application of the first terminal, and displaying a target virtual background where the first user and the second user are located together in the video communication interface; the first user is a user logging in the social application in the first terminal, and the second user is a user carrying out video communication with the first user in the video communication interface; the target virtual background comprises a plurality of background elements, and a first user picture and a second user picture are fused in the target virtual background; the first user picture and the second user picture are respectively displayed in different background elements, and the background elements are not overlapped with each other; the first user picture is a video picture used for covering the key part of the first user in the video communication interface, and the second user picture is used for displaying the key part of the second user;

the updating module is used for updating the target virtual background when the user dynamic information of the first user meets the virtual background updating condition to obtain an updated target virtual background matched with the user dynamic information; the user dynamic information is acquired from the audio and video data of the first user in the video communication process;

The second display module is used for displaying the updated target virtual background in the video communication interface;

And the updating module is specifically configured to perform displacement deflection processing on the first user picture in the video communication interface when the user dynamic information of the first user meets a virtual background updating condition, and perform synchronous displacement deflection processing on a background element associated with the first user picture in the target virtual background according to the first user picture after displacement deflection in the video communication interface to obtain an updated target virtual background matched with the user dynamic information.

16. A computer device, comprising: a processor, a memory, and a network interface;

The processor is connected to the memory, the network interface for providing data communication functions, the memory for storing program code, the processor for invoking the program code to perform the method of any of claims 1-14.

17. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded by a processor and to perform the method of any of claims 1-14.